Help

Home
Search

The New England Journal of Statistics in Data Science

Submit your article Information Become a Peer-reviewer

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Areas

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 108

Order by:

Select: All None Download:

A Sliced Design Approach for Conducting Online Experiments with Four Platforms, with Application to an Industry Email Campaign

Soheil Sadeghi Tzu-Hsiang Hung Peter Chien All authors (4)

https://doi.org/10.51387/24-NEJSDS63

Pub. online: 29 May 2024 Type: Methodology Article

Open Access

Area: Statistical Methodology

Journal: The New England Journal of Statistics in Data Science Volume 2, Issue 3 (2024), pp. 311–322

Abstract

Multivariate testing is a popular method to improve the effectiveness of digital marketing in industry. Online campaigns are often conducted across multiple platforms, such as desktops, tablets, smart phones, and smart watches. We propose minimum sliced aberration designs to accommodate online experiments with four platforms. This approach provides important insights into how different sets of design factors work differently across the four platforms, which can be used by industry for optimizing many forms of digital marketing. The effectiveness of the proposed approach is illustrated by an industrial email campaign with four platforms.

The Anytime-Valid Logrank Test: Error Control Under Continuous Monitoring with Unlimited Horizon

Judith ter Schure Muriel F. Pérez-Ortiz Alexander Ly All authors (4)

https://doi.org/10.51387/24-NEJSDS65

Pub. online: 29 May 2024 Type: Methodology Article

Open Access

Area: Statistical Methodology

Journal: The New England Journal of Statistics in Data Science Volume 2, Issue 2 (2024), pp. 190–214

Abstract

We introduce the anytime-valid (AV) logrank test, a version of the logrank test that provides type-I error guarantees under optional stopping and optional continuation. The test is sequential without the need to specify a maximum sample size or stopping rule, and allows for cumulative meta-analysis with type-I error control. The method can be extended to define anytime-valid confidence intervals. The logrank test is an instance of the martingale tests based on E-variables that have been recently developed. We demonstrate type-I error guarantees for the test in a semiparametric setting of proportional hazards, show explicitly how to extend it to ties and confidence sequences and indicate further extensions to the full Cox regression model. Using a Gaussian approximation on the logrank statistic, we show that the AV logrank test (which itself is always exact) has a similar rejection region to O’Brien-Fleming α-spending but with the potential to achieve $100\% $ power by optional continuation. Although our approach to study design requires a larger sample size, the expected sample size is competitive by optional stopping.

Editorial. Novel Statistical Methods and Designs for Clinical Trials

Yuan Ji Ying Lu HaiYing Wang

https://doi.org/10.51387/24-NEJSDS21EDI

Pub. online: 24 Apr 2024 Type: Editorial

Open Access

Journal: The New England Journal of Statistics in Data Science Volume 2, Issue 1 (2024), pp. 1–2

Cost-Aware Generalized α-Investing for Multiple Hypothesis Testing

Thomas Cook Harsh Vardhan Dubey Ji Ah Lee All authors (6)

https://doi.org/10.51387/24-NEJSDS64

Pub. online: 27 Mar 2024 Type: Methodology Article

Open Access

Area: Statistical Methodology

Journal: The New England Journal of Statistics in Data Science Volume 2, Issue 2 (2024), pp. 155–174

Abstract

We consider the problem of sequential multiple hypothesis testing with nontrivial data collection costs. This problem appears, for example, when conducting biological experiments to identify differentially expressed genes of a disease process. This work builds on the generalized α-investing framework which enables control of the marginal false discovery rate in a sequential testing setting. We make a theoretical analysis of the long term asymptotic behavior of α-wealth which motivates a consideration of sample size in the α-investing decision rule. Posing the testing process as a game with nature, we construct a decision rule that optimizes the expected α-wealth reward (ERO) and provides an optimal sample size for each test. Empirical results show that a cost-aware ERO decision rule correctly rejects more false null hypotheses than other methods for $n=1$ where n is the sample size. When the sample size is not fixed cost-aware ERO uses a prior on the null hypothesis to adaptively allocate of the sample budget to each test. We extend cost-aware ERO investing to finite-horizon testing which enables the decision rule to allocate samples in a non-myopic manner. Finally, empirical tests on real data sets from biological experiments show that cost-aware ERO balances the allocation of samples to an individual test against the allocation of samples across multiple tests.

Bayesian Inference of Chemical Mixtures in Risk Assessment Incorporating the Hierarchical Principle

Debamita Kundu Sungduk Kim Paul S. Albert

https://doi.org/10.51387/24-NEJSDS58

Pub. online: 23 Feb 2024 Type: Methodology Article

Open Access

Area: Biomedical Research

Journal: The New England Journal of Statistics in Data Science Volume 2, Issue 3 (2024), pp. 284–295

Abstract

Analyzing health effects associated with exposure to environmental chemical mixtures is a challenging problem in epidemiology, toxicology, and exposure science. In particular, when there are a large number of chemicals under consideration it is difficult to estimate the interactive effects without incorporating reasonable prior information. Based on substantive considerations, researchers believe that true interactions between chemicals need to incorporate their corresponding main effects. In this paper, we use this prior knowledge through a shrinkage prior that a priori assumes an interaction term can only occur when the corresponding main effects exist. Our initial development is for logistic regression with linear chemical effects. We extend this formulation to include non-linear exposure effects and to account for exposure subject to detection limit. We develop an MCMC algorithm using a shrinkage prior that shrinks the interaction terms closer to zero as the main effects get closer to zero. We examine the performance of our methodology through simulation studies and illustrate an analysis of chemical interactions in a case-control study in cancer.

The Backfill i3+3 Design for Dose-Finding Trials in Oncology

Jiaxin Liu Shijie Yuan B. Nebiyou Bekele All authors (4)

https://doi.org/10.51387/24-NEJSDS61

Pub. online: 23 Feb 2024 Type: Methodology Article

Open Access

Area: Statistical Methodology

Journal: The New England Journal of Statistics in Data Science Volume 2, Issue 3 (2024), pp. 271–283

Abstract

We consider a formal statistical design that allows simultaneous enrollment of a main cohort and a backfill cohort of patients in a dose-finding trial. The goal is to accumulate more information at various doses to facilitate dose optimization. The proposed design, called Bi3+3, combines the simple dose-escalation algorithm in the i3+3 design and a model-based inference under the framework of probability of decisions (POD), both previously published. As a result, Bi3+3 provides a simple algorithm for backfilling patients to lower doses in a dose-finding trial once these doses exhibit safety profile in patients. The POD framework allows dosing decisions to be made when some backfill patients are still being followed with incomplete toxicity outcomes, thereby potentially expediting the clinical trial. At the end of the trial, Bi3+3 uses both toxicity and efficacy outcomes to estimate an optimal biological dose (OBD). The proposed inference is based on a dose-response model that takes into account either a monotone or plateau dose-efficacy relationship, which are frequently encountered in modern oncology drug development. Simulation studies show promising operating characteristics of the Bi3+3 design in comparison to existing designs.

Nonparametric E-tests of Symmetry

Vladimir Vovk Ruodu Wang

https://doi.org/10.51387/24-NEJSDS60

Pub. online: 23 Feb 2024 Type: Methodology Article

Open Access

Area: Statistical Methodology

Journal: The New England Journal of Statistics in Data Science Volume 2, Issue 2 (2024), pp. 261–270

Abstract

The notion of an e-value has been recently proposed as a possible alternative to critical regions and p-values in statistical hypothesis testing. In this paper we consider testing the nonparametric hypothesis of symmetry, introduce analogues for e-values of three popular nonparametric tests, define an analogue for e-values of Pitman’s asymptotic relative efficiency, and apply it to the three nonparametric tests. We discuss limitations of our simple definition of asymptotic relative efficiency and list directions of further research.

Non-inferiority Clinical Trials: Treating Margin as Missing Information

Yulia Sidi Benjamin Stockton

Ofer Harel

https://doi.org/10.51387/24-NEJSDS57

Pub. online: 1 Feb 2024 Type: Methodology Article

Open Access

Area: Cancer Research

Journal: The New England Journal of Statistics in Data Science Volume 2, Issue 1 (2024), pp. 104–111

Abstract

Non-inferiority (NI) clinical trials’ goal is to demonstrate that a new treatment is not worse than a standard of care by a certain amount called margin. The choice of non-inferiority margin is not straightforward as it depends on historical data, and clinical experts’ opinion. Knowing the “true”, objective clinical margin would be helpful for design and analysis of non-inferiority trials, but it is not possible in practice. We propose to treat non-inferiority margin as missing information. In order to recover an objective margin, we believe it is essential to conduct a survey among a group of representative clinical experts. We introduce a novel framework, where data obtained from a survey are combined with NI trial data, so that both an estimated clinically acceptable margin and its uncertainty are accounted for when claiming non-inferiority. Through simulations, we compare several methods for implementing this framework. We believe the proposed framework would lead to better informed decisions regarding new potentially non-inferior treatments and could help resolve current practical issues related to the choice of the margin.

Biomarker Panel Development Using Logic Regression in the Presence of Missing Data

Ying Huang Sayan Dasgupta

https://doi.org/10.51387/24-NEJSDS59

Pub. online: 31 Jan 2024 Type: Methodology Article

Open Access

Area: Biomedical Research

Journal: The New England Journal of Statistics in Data Science Volume 2, Issue 1 (2024), pp. 3–14

Abstract

We consider the problem of developing flexible and parsimonious biomarker combinations for cancer early detection in the presence of variable missingness at random. Motivated by the need to develop biomarker panels in a cross-institute pancreatic cyst biomarker validation study, we propose logic-regression based methods for feature selection and construction of logic rules under a multiple imputation framework. We generate ensemble trees for classification decision, and further select a single decision tree for simplicity and interpretability. We demonstrate superior performance of the proposed methods compared to alternative methods based on complete-case data or single imputation. The methods are applied to the pancreatic cyst data to estimate biomarker panels for pancreatic cysts subtype classification and malignant potential prediction.

A Safe Hosmer-Lemeshow Test

Alexander Henzi Marius Puke Timo Dimitriadis All authors (4)

https://doi.org/10.51387/23-NEJSDS56

Pub. online: 18 Dec 2023 Type: Methodology Article

Open Access

Area: Statistical Methodology

Journal: The New England Journal of Statistics in Data Science Volume 2, Issue 2 (2024), pp. 175–189

Abstract

This article proposes an alternative to the Hosmer-Lemeshow (HL) test for evaluating the calibration of probability forecasts for binary events. The approach is based on e-values, a new tool for hypothesis testing. An e-value is a random variable with expected value less or equal to one under a null hypothesis. Large e-values give evidence against the null hypothesis, and the multiplicative inverse of an e-value is a p-value. Our test uses online isotonic regression to estimate the calibration curve as a ‘betting strategy’ against the null hypothesis. We show that the test has power against essentially all alternatives, which makes it theoretically superior to the HL test and at the same time resolves the well-known instability problem of the latter. A simulation study shows that a feasible version of the proposed eHL test can detect slight miscalibrations in practically relevant sample sizes, but trades its universal validity and power guarantees against a reduced empirical power compared to the HL test in a classical simulation setup. We illustrate our test on recalibrated predictions for credit card defaults during the Taiwan credit card crisis, where the classical HL test delivers equivocal results.

3 4 5 6 7

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

The New England Journal of Statistics in Data Science

ISSN: 2693-7166

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer