Help

Login Register

Home
Issues
Volume 1, Issue 1 (2023)

The New England Journal of Statistics in Data Science

Submit your article Information Become a Peer-reviewer

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Volume 1, Issue 1 (2023), April 2023

The New England Statistical Society launched a webinar series on selected papers published in the New England Journal of Statistics in Data Science. The inaugural webinar of this series for this issue is available here.

Order by:

Select: All None Download:

Inaugural Editorial. Can We Achieve Our Mission: Fast, Accessible, Cutting-edge, and Top-quality?

Colin O. Wu Ming-Hui Chen Min-ge Xie All authors (5)

https://doi.org/10.51387/23-NEJSDS11EDI

Pub. online: 12 Apr 2023 Type: Editorial

Open Access

Journal: The New England Journal of Statistics in Data Science Volume 1, Issue 1 (2023), pp. 1–3

Abstract

We are pleased to launch the first issue of the New England Journal of Statistics in Data Science (NEJSDS). NEJSDS is the official journal of the New England Statistical Society (NESS) under the leadership of Vice President for Journal and Publication and sponsored by the College of Liberal Arts and Sciences, University of Connecticut. The aims of the journal are to serve as an interface between statistics and other disciplines in data science, to encourage researchers to exchange innovative ideas, and to promote data science methods to the general scientific community. The journal publishes high quality original research, novel applications, and timely review articles in all aspects of data science, including all areas of statistical methodology, methods of machine learning, and artificial intelligence, novel algorithms, computational methods, data management and manipulation, applications of data science methods, among others. We encourage authors to submit collaborative work driven by real life problems posed by researchers, administrators, educators, or other stakeholders, and which require original and innovative solutions from data scientists.

Double Your Variance, Dirtify Your Bayes, Devour Your Pufferfish, and Draw your Kidstrogram

Xiao-Li Meng

https://doi.org/10.51387/22-NEJSDS6

Pub. online: 5 Oct 2022 Type: Methodology Article

Open Access

Area: Statistical Methodology

Journal: The New England Journal of Statistics in Data Science Volume 1, Issue 1 (2023), pp. 4–23

Abstract

This article expands upon my presentation to the panel on “The Radical Prescription for Change” at the 2017 ASA (American Statistical Association) symposium on A World Beyond $p<0.05$. It emphasizes that, to greatly enhance the reliability of—and hence public trust in—statistical and data scientific findings, we need to take a holistic approach. We need to lead by example, incentivize study quality, and inoculate future generations with profound appreciations for the world of uncertainty and the uncertainty world. The four “radical” proposals in the title—with all their inherent defects and trade-offs—are designed to provoke reactions and actions. First, research methodologies are trustworthy only if they deliver what they promise, even if this means that they have to be overly protective, a necessary trade-off for practicing quality-guaranteed statistics. This guiding principle may compel us to doubling variance in some situations, a strategy that also coincides with the call to raise the bar from $p<0.05$ to $p<0.005$ [3]. Second, teaching principled practicality or corner-cutting is a promising strategy to enhance the scientific community’s as well as the general public’s ability to spot—and hence to deter—flawed arguments or findings. A remarkable quick-and-dirty Bayes formula for rare events, which simply divides the prevalence by the sum of the prevalence and the false positive rate (or the total error rate), as featured by the popular radio show Car Talk, illustrates the effectiveness of this strategy. Third, it should be a routine mental exercise to put ourselves in the shoes of those who would be affected by our research finding, in order to combat the tendency of rushing to conclusions or overstating confidence in our findings. A pufferfish/selfish test can serve as an effective reminder, and can help to institute the mantra “Thou shalt not sell what thou refuseth to buy” as the most basic professional decency. Considering personal stakes in our statistical endeavors also points to the concept of behavioral statistics, in the spirit of behavioral economics. Fourth, the current mathematical education paradigm that puts “deterministic first, stochastic second” is likely responsible for the general difficulties with reasoning under uncertainty, a situation that can be improved by introducing the concept of histogram, or rather kidstogram, as early as the concept of counting.

Comment on “Double Your Variance, Dirtify Your Bayes, Devour Your Pufferfish, and Draw your Kidstogram” by Xiao-Li Meng

Christine Franklin

https://doi.org/10.51387/22-NEJSDS6D

Pub. online: 5 Jan 2023 Type: Commentary And/or Historical Perspective

Open Access

Area: Statistical Methodology

Journal: The New England Journal of Statistics in Data Science Volume 1, Issue 1 (2023), pp. 24–25

Comment on “Double Your Variance, Dirtify Your Bayes, Devour Your Pufferfish, and Draw Your Kidstogram,” by Xiao-Li Meng

Thomas R. Junk

ORCID icon link to view author Thomas R. Junk details

https://doi.org/10.51387/22-NEJSDS6B

Pub. online: 19 Oct 2022 Type: Commentary And/or Historical Perspective

Open Access

Area: Statistical Methodology

Journal: The New England Journal of Statistics in Data Science Volume 1, Issue 1 (2023), pp. 26–28

Abstract

This contribution is a series of comments on Prof. Xiao-Li Meng’s article, “Double Your Variance, Dirtify Your Bayes, Devour Your Pufferfish, and Draw Your Kidstogram”. Prof. Meng’s article offers some radical proposals and not-so-radical proposals to improve the quality of statistical inference used in the sciences and also to extend distributional thinking to early education. Discussions and alternative proposals are presented.

Comment on “Double Your Variance, Dirtify Your Bayes, Devour Your Pufferfish, and Draw Your Kidstogram,” by Xiao-Li Meng

Eric D. Kolaczyk

https://doi.org/10.51387/22-NEJSDS6C

Pub. online: 18 Oct 2022 Type: Commentary And/or Historical Perspective

Open Access

Area: Statistical Methodology

Journal: The New England Journal of Statistics in Data Science Volume 1, Issue 1 (2023), pp. 29–30

Comments on Xiao-Li Meng’s Double Your Variance, Dirtify Your Bayes, Devour Your Pufferfish, and Draw Your Kidstogram

Dennis K.J. Lin

https://doi.org/10.51387/23-NEJSDS6E

Pub. online: 20 Jan 2023 Type: Commentary And/or Historical Perspective

Open Access

Area: Statistical Methodology

Journal: The New England Journal of Statistics in Data Science Volume 1, Issue 1 (2023), pp. 31–34

Radical and Not-So-Radical Principles and Practices: Discussion of Meng

Ronald L. Wasserstein Allen L. Schirm Nicole A. Lazar

https://doi.org/10.51387/22-NEJSDS6A

Pub. online: 25 Oct 2022 Type: Methodology Article

Open Access

Area: Statistical Methodology

Journal: The New England Journal of Statistics in Data Science Volume 1, Issue 1 (2023), pp. 35–38

Abstract

We highlight points of agreement between Meng’s suggested principles and those proposed in our 2019 editorial in The American Statistician. We also discuss some questions that arise in the application of Meng’s principles in practice.

A Not-so-radical Rejoinder: Habituate Systems Thinking and Data (Science) Confession for Quality Enhancement

Xiao-Li Meng

https://doi.org/10.51387/22-NEJSDS6REJ

Pub. online: 6 Jan 2023 Type: Commentary And/or Historical Perspective

Open Access

Area: Statistical Methodology

Journal: The New England Journal of Statistics in Data Science Volume 1, Issue 1 (2023), pp. 39–45

Effects of stopping criterion on the growth of trees in regression random forests

Aryana Arsham Philip Rosenberg Mark Little

https://doi.org/10.51387/22-NEJSDS5

Pub. online: 31 Aug 2022 Type: Methodology Article

Open Access

Area: Biomedical Research

Journal: The New England Journal of Statistics in Data Science Volume 1, Issue 1 (2023), pp. 46–61

Abstract

Random forests are a powerful machine learning tool that capture complex relationships between independent variables and an outcome of interest. Trees built in a random forest are dependent on several hyperparameters, one of the more critical being the node size. The original algorithm of Breiman, controls for node size by limiting the size of the parent node, so that a node cannot be split if it has less than a specified number of observations. We propose that this hyperparameter should instead be defined as the minimum number of observations in each terminal node. The two existing random forest approaches are compared in the regression context based on estimated generalization error, bias-squared, and variance of resulting predictions in a number of simulated datasets. Additionally the two approaches are applied to type 2 diabetes data obtained from the National Health and Nutrition Examination Survey. We have developed a straightforward method for incorporating weights into the random forest analysis of survey data. Our results demonstrate that generalization error under the proposed approach is competitive to that attained from the original random forest approach when data have large random error variability. The R code created from this work is available and includes an illustration.

Dynamic Continuous Flows on Networks

Justina Zou Yi Guo David Banks

https://doi.org/10.51387/22-NEJSDS3

Pub. online: 25 Apr 2022 Type: Methodology Article

Open Access

Area: Engineering Science

Journal: The New England Journal of Statistics in Data Science Volume 1, Issue 1 (2023), pp. 62–68

Abstract

There are many cases in which one has continuous flows over networks, and there is interest in predicting and monitoring such flows. This paper provides Bayesian models for two types of networks—those in which flow can be bidirectional, and those in which flow is unidirectional. The former is illustrated by an application to electrical transmission over the power grid, and the latter is examined with data on volumetric water flow in a river system. Both applications yield good predictive accuracy over short time horizons. Predictive accuracy is important in these applications—it improves the efficiency of the energy market and enables flood warnings and water management.

1 2

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

Share

RSS

The New England Journal of Statistics in Data Science

ISSN: 2693-7166
Copyright © 2021 New England Statistical Society

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer

Powered by PubliMill • Privacy policy