Cost-Aware Generalized α -Investing for Multiple Hypothesis Testing
Volume 2, Issue 2 (2024), pp. 155–174
Pub. online: 27 March 2024
Type: Statistical Methodology
Open Access
Accepted
23 January 2024
23 January 2024
Published
27 March 2024
27 March 2024
Abstract
We consider the problem of sequential multiple hypothesis testing with nontrivial data collection costs. This problem appears, for example, when conducting biological experiments to identify differentially expressed genes of a disease process. This work builds on the generalized α-investing framework which enables control of the marginal false discovery rate in a sequential testing setting. We make a theoretical analysis of the long term asymptotic behavior of α-wealth which motivates a consideration of sample size in the α-investing decision rule. Posing the testing process as a game with nature, we construct a decision rule that optimizes the expected α-wealth reward (ERO) and provides an optimal sample size for each test. Empirical results show that a cost-aware ERO decision rule correctly rejects more false null hypotheses than other methods for $n=1$ where n is the sample size. When the sample size is not fixed cost-aware ERO uses a prior on the null hypothesis to adaptively allocate of the sample budget to each test. We extend cost-aware ERO investing to finite-horizon testing which enables the decision rule to allocate samples in a non-myopic manner. Finally, empirical tests on real data sets from biological experiments show that cost-aware ERO balances the allocation of samples to an individual test against the allocation of samples across multiple tests.
References
Aharoni, E. and Rosset, S. (2014). Generalized α-investing: definitions, optimality results and application to public databases. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76(4) 771–794. https://doi.org/10.1111/rssb.12048. MR3248676
Arrow, K. J., Blackwell, D. and Girshick, M. A. (1949). Bayes and minimax solutions of sequential decision problems. Econometrica, Journal of the Econometric Society 213–244. https://doi.org/10.2307/1905525. MR0032173
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological) 57(1) 289–300. MR1325392
Benjamini, Y., Krieger, A. M. and Yekutieli, D. (2006). Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93(3) 491–507. https://doi.org/10.1093/biomet/93.3.491. MR2261438
Berger, J. O. (2013) Statistical decision theory and Bayesian analysis. Springer Science & Business Media. https://doi.org/10.1007/978-1-4757-4286-2. MR0804611
Blackwell, D. A. and Girshick, M. A. (1979) Theory of games and statistical decisions. Courier Corporation. MR1570712
De, S. K. and Baron, M. (2012). Sequential Bonferroni methods for multiple hypothesis testing with strong control of family-wise error rates I and II. Sequential Analysis 31(2) 238–262. https://doi.org/10.1080/07474946.2012.665730. MR2911288
Dickey, J. M. and Lientz, B. (1970). The weighted likelihood ratio, sharp hypotheses about chances, the order of a Markov chain. The Annals of Mathematical Statistics 214–226. https://doi.org/10.1214/aoms/1177697203. MR0258187
Edgar, R., Domrachev, M. and Lash, A. (2002). Edgar R, Domrachev M, Lash AEGene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucl Acids Res 30: 207-210. Nucleic acids research 30 207–210. https://doi.org/10.1093/nar/30.1.207.
Foster, D. P. and Stine, R. A. (2008). α-investing: a procedure for sequential control of expected false discoveries. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70(2) 429–444. https://doi.org/10.1111/j.1467-9868.2007.00643.x. MR2424761
Javanmard, A. and Montanari, A. (2018). Online rules for control of false discovery rate and false discovery exceedance. The Annals of statistics 46(2) 526–554. https://doi.org/10.1214/17-AOS1559. MR3782376
Liang, K. and Nettleton, D. (2012). Adaptive and dynamic adaptive procedures for false discovery rate control and estimation. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74(1) 163–182. https://doi.org/10.1111/j.1467-9868.2011.01001.x. MR2885844
Parmigiani, G. and Inoue, L. (2009) Decision theory: Principles and approaches. John Wiley & Sons. https://doi.org/10.1002/9780470746684. MR2604978
Ramdas, A., Chen, J., Wainwright, M. J. and Jordan, M. I. (2019). A sequential algorithm for false discovery rate control on directed acyclic graphs. Biometrika 106(1) 69–86. https://doi.org/10.1093/biomet/asy066. MR3912384
Robertson, D. S., Wason, J. M. S. and Ramdas, A. (2023). Online multiple hypothesis testing. arXiv:2208.11418. https://doi.org/10.1214/23-sts901. MR4665026
Robertson, D. S., Wildenhain, J., Javanmard, A. and Karp, N. A. (2019). onlineFDR: an R package to control the false discovery rate for growing data repositories. Bioinformatics 35(20) 4196–4199. https://doi.org/10.1093/bioinformatics/btz191.
Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64(3) 479–498. https://doi.org/10.1111/1467-9868.00346. MR1924302
Storey, J. D., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 66(1) 187–205. https://doi.org/10.1111/j.1467-9868.2004.00439.x. MR2035766
Tukey, J. W. and Braun, H. (1994) The Collected Works of John W. Tukey: Multiple Comparions 8. Elsevier. MR1263027
Verdinelli, I. and Wasserman, L. (1995). Computing Bayes factors using a generalization of the Savage-Dickey density ratio. Journal of the American Statistical Association 90(430) 614–618. MR1340514
Zeisel, A., Zuk, O. and Domany, E. (2011). FDR control with adaptive procedures and FDR monotonicity. The Annals of applied statistics 943–968. https://doi.org/10.1214/10-AOAS399. MR2840182
Zhou, J., Foster, D., Stine, R. and Ungar, L. (2005). Streaming Feature Selection Using Alpha-Investing. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. KDD ’05 384–393. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1081870.1081914.
Zrnic, T., Ramdas, A. and Jordan, M. I. (2021). Asynchronous Online Testing of Multiple Hypotheses. J. Mach. Learn. Res. 22 33. MR4253726
Zrnic, T., Jiang, D., Ramdas, A. and Jordan, M. (2020). The Power of Batching in Multiple Hypothesis Testing. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (S. Chiappa and R. Calandra, eds.). Proceedings of Machine Learning Research 108 3806–3815. PMLR.