Help

Home
Search

The New England Journal of Statistics in Data Science

Submit your article Information Become a Peer-reviewer

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 66

Order by:

Select: All None Download:

Seamless Clinical Trials with Doubly Adaptive Biased Coin Designs

Hongjian Zhu Jun Yu Dejian Lai All authors (4)

https://doi.org/10.51387/23-NEJSDS25

Pub. online: 1 Mar 2023 Type: Biomedical Research

Open Access

Journal: The New England Journal of Statistics in Data Science Volume 1, Issue 3 (2023), pp. 314–322

Abstract

In addition to scientific questions, clinical trialists often explore or require other design features, such as increasing the power while controlling the type I error rate, minimizing unnecessary exposure to inferior treatments, and comparing multiple treatments in one clinical trial. We propose implementing adaptive seamless design (ASD) with response adaptive randomization (RAR) to satisfy various clinical trials’ design objectives. However, the combination of ASD and RAR poses a challenge in controlling the type I error rate. In this paper, we investigated how to utilize the advantages of the two adaptive methods and control the type I error rate. We offered the theoretical foundation for this procedure. Numerical studies demonstrated that our methods could achieve efficient and ethical objectives while controlling the type I error rate.

Evaluating Designs for Hyperparameter Tuning in Deep Neural Networks

Chenlu Shi Ashley Kathleen Chiu Hongquan Xu

https://doi.org/10.51387/23-NEJSDS26

Pub. online: 24 Feb 2023 Type: Machine Learning And Data Mining

Open Access

Journal: The New England Journal of Statistics in Data Science Volume 1, Issue 3 (2023), pp. 334–341

Abstract

The performance of a learning technique relies heavily on hyperparameter settings. It calls for hyperparameter tuning for a deep learning technique, which may be too computationally expensive for sophisticated learning techniques. As such, expeditiously exploring the relationship between hyperparameters and the performance of a learning technique controlled by these hyperparameters is desired, and thus it entails the consideration of design strategies to collect informative data efficiently to do so. Various designs can be considered for this purpose. The question as to which design to use then naturally arises. In this paper, we examine the use of different types of designs in efficiently collecting informative data to study the surface of test accuracy, a measure of the performance of a learning technique, over hyperparameters. Under the settings we considered, we find that the strong orthogonal array outperforms all other comparable designs.

Bayesian Simultaneous Partial Envelope Model with Application to an Imaging Genetics Analysis

Yanbo Shen Yeonhee Park Saptarshi Chakraborty All authors (4)

https://doi.org/10.51387/23-NEJSDS23

Pub. online: 2 Feb 2023 Type: Statistical Methodology

Open Access

Journal: The New England Journal of Statistics in Data Science Volume 1, Issue 2 (2023), pp. 237–269

Abstract

As a prominent dimension reduction method for multivariate linear regression, the envelope model has received increased attention over the past decade due to its modeling flexibility and success in enhancing estimation and prediction efficiencies. Several enveloping approaches have been proposed in the literature; among these, the partial response envelope model [57] that focuses on only enveloping the coefficients for predictors of interest, and the simultaneous envelope model [14] that combines the predictor and the response envelope models within a unified modeling framework, are noteworthy. In this article we incorporate these two approaches within a Bayesian framework, and propose a novel Bayesian simultaneous partial envelope model that generalizes and addresses some limitations of the two approaches. Our method offers the flexibility of incorporating prior information if available, and aids coherent quantification of all modeling uncertainty through the posterior distribution of model parameters. A block Metropolis-within-Gibbs algorithm for Markov chain Monte Carlo (MCMC) sampling from the posterior is developed. The utility of our model is corroborated by theoretical results, comprehensive simulations, and a real imaging genetics data application for the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study.

On Bayesian Sequential Clinical Trial Designs

Tianjian Zhou Yuan Ji

https://doi.org/10.51387/23-NEJSDS24

Pub. online: 31 Jan 2023 Type: Cancer Research

Open Access

Journal: The New England Journal of Statistics in Data Science Volume 2, Issue 1 (2024), pp. 136–151

Abstract

Clinical trials usually involve sequential patient entry. When designing a clinical trial, it is often desirable to include a provision for interim analyses of accumulating data with the potential for stopping the trial early. We review Bayesian sequential clinical trial designs based on posterior probabilities, posterior predictive probabilities, and decision-theoretic frameworks. A pertinent question is whether Bayesian sequential designs need to be adjusted for the planning of interim analyses. We answer this question from three perspectives: a frequentist-oriented perspective, a calibrated Bayesian perspective, and a subjective Bayesian perspective. We also provide new insights into the likelihood principle, which is commonly tied to statistical inference and decision making in sequential clinical trials. Some theoretical results are derived, and numerical studies are conducted to illustrate and assess these designs.

Optimal Design of Controlled Experiments for Personalized Decision Making in the Presence of Observational Covariates

Yezhuo Li Qiong Zhang Amin Khademi All authors (4)

https://doi.org/10.51387/23-NEJSDS22

Pub. online: 26 Jan 2023 Type: Statistical Methodology

Open Access

Journal: The New England Journal of Statistics in Data Science Volume 1, Issue 3 (2023), pp. 386–393

Abstract

Controlled experiments are widely applied in many areas such as clinical trials or user behavior studies in IT companies. Recently, it is popular to study experimental design problems to facilitate personalized decision making. In this paper, we investigate the problem of optimal design of multiple treatment allocation for personalized decision making in the presence of observational covariates associated with experimental units (often, patients or users). We assume that the response of a subject assigned to a treatment follows a linear model which includes the interaction between covariates and treatments to facilitate precision decision making. We define the optimal objective as the maximum variance of estimated personalized treatment effects over different treatments and different covariates values. The optimal design is obtained by minimizing this objective. Under a semi-definite program reformulation of the original optimization problem, we use a YALMIP and MOSEK based optimization solver to provide the optimal design. Numerical studies are provided to assess the quality of the optimal design.

Comments on Xiao-Li Meng’s Double Your Variance, Dirtify Your Bayes, Devour Your Pufferfish, and Draw Your Kidstogram

Dennis K.J. Lin

https://doi.org/10.51387/23-NEJSDS6E

Pub. online: 20 Jan 2023 Type: Statistical Methodology

Open Access

Journal: The New England Journal of Statistics in Data Science Volume 1, Issue 1 (2023), pp. 31–34

Detection of Anomalies in Traffic Flows with Large Amounts of Missing Data

Qing He Charles W. Harrison Hsin-Hsiung Huang

https://doi.org/10.51387/23-NEJSDS20

Pub. online: 11 Jan 2023 Type: Statistical Methodology

Open Access

Journal: The New England Journal of Statistics in Data Science Volume 1, Issue 1 (2023), pp. 84–94

Abstract

Anomaly detection plays an important role in traffic operations and control. Missingness in spatial-temporal datasets prohibits anomaly detection algorithms from learning characteristic rules and patterns due to the lack of large amounts of data. This paper proposes an anomaly detection scheme for the 2021 Algorithms for Threat Detection (ATD) challenge based on Gaussian process models that generate features used in a logistic regression model which leads to high prediction accuracy for sparse traffic flow data with a large proportion of missingness. The dataset is provided by the National Science Foundation (NSF) in conjunction with the National Geospatial-Intelligence Agency (NGA), and it consists of thousands of labeled traffic flow records for 400 sensors from 2011 to 2020. Each sensor is purposely downsampled by NSF and NGA in order to simulate missing completely at random, and the missing rates are 99%, 98%, 95%, and 90%. Hence, it is challenging to detect anomalies from the sparse traffic flow data. The proposed scheme makes use of traffic patterns at different times of day and on different days of week to recover the complete data. The proposed anomaly detection scheme is computationally efficient by allowing parallel computation on different sensors. The proposed method is one of the two top performing algorithms in the 2021 ATD challenge.

A Not-so-radical Rejoinder: Habituate Systems Thinking and Data (Science) Confession for Quality Enhancement