Non-inferiority (NI) clinical trials’ goal is to demonstrate that a new treatment is not worse than a standard of care by a certain amount called margin. The choice of non-inferiority margin is not straightforward as it depends on historical data, and clinical experts’ opinion. Knowing the “true”, objective clinical margin would be helpful for design and analysis of non-inferiority trials, but it is not possible in practice. We propose to treat non-inferiority margin as missing information. In order to recover an objective margin, we believe it is essential to conduct a survey among a group of representative clinical experts. We introduce a novel framework, where data obtained from a survey are combined with NI trial data, so that both an estimated clinically acceptable margin and its uncertainty are accounted for when claiming non-inferiority. Through simulations, we compare several methods for implementing this framework. We believe the proposed framework would lead to better informed decisions regarding new potentially non-inferior treatments and could help resolve current practical issues related to the choice of the margin.
We consider the problem of developing flexible and parsimonious biomarker combinations for cancer early detection in the presence of variable missingness at random. Motivated by the need to develop biomarker panels in a cross-institute pancreatic cyst biomarker validation study, we propose logic-regression based methods for feature selection and construction of logic rules under a multiple imputation framework. We generate ensemble trees for classification decision, and further select a single decision tree for simplicity and interpretability. We demonstrate superior performance of the proposed methods compared to alternative methods based on complete-case data or single imputation. The methods are applied to the pancreatic cyst data to estimate biomarker panels for pancreatic cysts subtype classification and malignant potential prediction.
This article proposes an alternative to the Hosmer-Lemeshow (HL) test for evaluating the calibration of probability forecasts for binary events. The approach is based on e-values, a new tool for hypothesis testing. An e-value is a random variable with expected value less or equal to one under a null hypothesis. Large e-values give evidence against the null hypothesis, and the multiplicative inverse of an e-value is a p-value. Our test uses online isotonic regression to estimate the calibration curve as a ‘betting strategy’ against the null hypothesis. We show that the test has power against essentially all alternatives, which makes it theoretically superior to the HL test and at the same time resolves the well-known instability problem of the latter. A simulation study shows that a feasible version of the proposed eHL test can detect slight miscalibrations in practically relevant sample sizes, but trades its universal validity and power guarantees against a reduced empirical power compared to the HL test in a classical simulation setup. We illustrate our test on recalibrated predictions for credit card defaults during the Taiwan credit card crisis, where the classical HL test delivers equivocal results.
Two-sample testing is a fundamental problem in statistics. While many powerful nonparametric methods exist for both the univariate and multivariate context, it is comparatively less common to see a framework for determining which data features lead to rejection of the null. In this paper, we propose a new nonparametric two-sample test named AUGUST, which incorporates a framework for interpretation while maintaining power comparable to existing methods. AUGUST tests for inequality in distribution up to a predetermined resolution using symmetry statistics from binary expansion. Designed for univariate and low to moderate-dimensional multivariate data, this construction allows us to understand distributional differences as a combination of fundamental orthogonal signals. Asymptotic theory for the test statistic facilitates p-value computation and power analysis, and an efficient algorithm enables computation on large data sets. In empirical studies, we show that our test has power comparable to that of popular existing methods, as well as greater power in some circumstances. We illustrate the interpretability of our method using NBA shooting data.
Cancer is the second leading cause of death in the world. Diagnosing cancer early on can save many lives. Pathologists have to look at tissue microarray (TMA) images manually to identify tumors, which can be time-consuming, inconsistent and subjective. Existing automatic algorithms either have not achieved the accuracy level of a pathologist or require substantial human involvements. A major challenge is that TMA images with different shapes, sizes, and locations can have the same score. Learning staining patterns in TMA images requires a huge number of images, which are severely limited due to privacy and regulation concerns in medical organizations. TMA images from different cancer types may share certain common characteristics, but combining them directly harms the accuracy due to heterogeneity in their staining patterns. Transfer learning is an emerging learning paradigm that allows borrowing strength from similar problems. However, existing approaches typically require a large sample from similar learning problems, while TMA images of different cancer types are often available in small sample size and further existing algorithms are limited to transfer learning from one similar problem. We propose a new transfer learning algorithm that could learn from multiple related problems, where each problem has a small sample and can have a substantially different distribution from the original one. The proposed algorithm has made it possible to break the critical accuracy barrier (the 75% accuracy level of pathologists), with a reported accuracy of 75.9% on breast cancer TMA images from the Stanford Tissue Microarray Database. It is supported by recent developments in transfer learning theory and empirical evidence in clustering technology. This will allow pathologists to confidently adopt automatic algorithms in recognizing tumors consistently with a higher accuracy in real time.
When testing a statistical hypothesis, is it legitimate to deliberate on the basis of initial data about whether and how to collect further data? Game-theoretic probability’s fundamental principle for testing by betting says yes, provided that you are testing the hypothesis’s predictions by betting and do not risk more capital than initially committed. Standard statistical theory uses Cournot’s principle, which does not allow such optional continuation. Cournot’s principle can be extended to allow optional continuation when testing is carried out by multiplying likelihood ratios, but the extension lacks the simplicity and generality of testing by betting.
Testing by betting can also help us with descriptive data analysis. To obtain a purely and honestly descriptive analysis using competing probability distributions, we have them bet against each other using the principle. The place of confidence intervals is then taken by sets of distributions that do relatively well in the competition. In the simplest implementation, these sets coincide with R. A. Fisher’s likelihood ranges.
In this paper, we present the U.S. Mental Health Dashboard, an R Shiny web application that facilitates exploratory data analysis of U.S. mental health data collected through national surveys. Mental health affects almost every aspect of people’s lives including their social relationships, substance use, academic success, professional productivity, and physical wellness. Even so, mental illnesses are often perceived as less legitimate or serious than physical diseases, and as a result of this stigmatization, many people suffer in silence without access to proper treatment. To address the lack of accessible healthcare information related to mental illness, the U.S. Mental Health Dashboard presents dynamic visualizations, tables, and choropleth maps of the prevalence and geographic distribution of key mental health metrics based on data from the National Survey on Drug Use and Health (NSDUH) and Behavioral Risk Factor Surveillance System (BRFSS). National and state-level estimates are provided for the civilian, non-institutionalized adult population of the United States as well as within relevant demographic subpopulations. By demonstrating the pervasiveness of mental illness and stark health inequities between demographic groups, this application aims to raise mental health awareness and reduce self-blame and stigmatization, especially for individuals that may inherently be at high risk. The U.S. Mental Health Dashboard has a wide variety of potential use cases: to illustrate to individuals suffering from mental illness and those in close proximity to them that they are not alone, identify subpopulations with the biggest need for mental health care, and help epidemiologists planning studies identify the target population for specific mental illness symptoms.
Sequential change detection is a classical problem with a variety of applications. However, the majority of prior work has been parametric, for example, focusing on exponential families. We develop a fundamentally new and general framework for sequential change detection when the pre- and post-change distributions are nonparametrically specified (and thus composite). Our procedures come with clean, nonasymptotic bounds on the average run length (frequency of false alarms). In certain nonparametric cases (like sub-Gaussian or sub-exponential), we also provide near-optimal bounds on the detection delay following a changepoint. The primary technical tool that we introduce is called an e-detector, which is composed of sums of e-processes—a fundamental generalization of nonnegative supermartingales—that are started at consecutive times. We first introduce simple Shiryaev-Roberts and CUSUM-style e-detectors, and then show how to design their mixtures in order to achieve both statistical and computational efficiency. Our e-detector framework can be instantiated to recover classical likelihood-based procedures for parametric problems, as well as yielding the first change detection method for many nonparametric problems. As a running example, we tackle the problem of detecting changes in the mean of a bounded random variable without i.i.d. assumptions, with an application to tracking the performance of a basketball team over multiple seasons.
We address the estimation of the Integrated Squared Error (ISE) of a predictor $\eta (x)$ of an unknown function f learned using data acquired on a given design ${\mathbf{X}_{n}}$. We consider ISE estimators that are weighted averages of the residuals of the predictor $\eta (x)$ on a set of selected points ${\mathbf{Z}_{m}}$. We show that, under a stochastic model for f, minimisation of the mean squared error of these ISE estimators is equivalent to minimisation of a Maximum Mean Discrepancy (MMD) for a non-stationary kernel that is adapted to the geometry of ${\mathbf{X}_{n}}$. Sequential Bayesian quadrature then yields sequences of nested validation designs that minimise, at each step of the construction, the relevant MMD. The optimal ISE estimate can be written in terms of the integral of a linear reconstruction, for the assumed model, of the square of the interpolator residuals over the domain of f. We present an extensive set of numerical experiments which demonstrate the good performance and robustness of the proposed solution. Moreover, we show that the validation designs obtained are space-filling continuations of ${\mathbf{X}_{n}}$, and that correct weighting of the observed interpolator residuals is more important than the precise configuration ${\mathbf{Z}_{m}}$ of the points at which they are observed.