The New England Journal of Statistics in Data Science logo


  • Help
Login Register

  1. Home
  2. Issues
  3. Volume 2, Issue 2 (2024)
  4. Cost-Aware Generalized α-Investing for M ...

The New England Journal of Statistics in Data Science

Submit your article Information Become a Peer-reviewer
  • Article info
  • Full article
  • More
    Article info Full article

Cost-Aware Generalized α-Investing for Multiple Hypothesis Testing
Volume 2, Issue 2 (2024), pp. 155–174
Thomas Cook   Harsh Vardhan Dubey   Ji Ah Lee     All authors (6)

Authors

 
Placeholder
https://doi.org/10.51387/24-NEJSDS64
Pub. online: 27 March 2024      Type: Methodology Article      Open accessOpen Access
Area: Statistical Methodology

Accepted
23 January 2024
Published
27 March 2024

Abstract

We consider the problem of sequential multiple hypothesis testing with nontrivial data collection costs. This problem appears, for example, when conducting biological experiments to identify differentially expressed genes of a disease process. This work builds on the generalized α-investing framework which enables control of the marginal false discovery rate in a sequential testing setting. We make a theoretical analysis of the long term asymptotic behavior of α-wealth which motivates a consideration of sample size in the α-investing decision rule. Posing the testing process as a game with nature, we construct a decision rule that optimizes the expected α-wealth reward (ERO) and provides an optimal sample size for each test. Empirical results show that a cost-aware ERO decision rule correctly rejects more false null hypotheses than other methods for $n=1$ where n is the sample size. When the sample size is not fixed cost-aware ERO uses a prior on the null hypothesis to adaptively allocate of the sample budget to each test. We extend cost-aware ERO investing to finite-horizon testing which enables the decision rule to allocate samples in a non-myopic manner. Finally, empirical tests on real data sets from biological experiments show that cost-aware ERO balances the allocation of samples to an individual test against the allocation of samples across multiple tests.

References

[1] 
Aharoni, E. and Rosset, S. (2014). Generalized α-investing: definitions, optimality results and application to public databases. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76(4) 771–794. https://doi.org/10.1111/rssb.12048. MR3248676
[2] 
Arrow, K. J., Blackwell, D. and Girshick, M. A. (1949). Bayes and minimax solutions of sequential decision problems. Econometrica, Journal of the Econometric Society 213–244. https://doi.org/10.2307/1905525. MR0032173
[3] 
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological) 57(1) 289–300. MR1325392
[4] 
Benjamini, Y., Krieger, A. M. and Yekutieli, D. (2006). Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93(3) 491–507. https://doi.org/10.1093/biomet/93.3.491. MR2261438
[5] 
Berger, J. O. (2013) Statistical decision theory and Bayesian analysis. Springer Science & Business Media. https://doi.org/10.1007/978-1-4757-4286-2. MR0804611
[6] 
Blackwell, D. A. and Girshick, M. A. (1979) Theory of games and statistical decisions. Courier Corporation. MR1570712
[7] 
Chen, S. and Kasiviswanathan, S. (2020). Contextual online false discovery rate control. In International Conference on Artificial Intelligence and Statistics 952–961. PMLR.
[8] 
De, S. K. and Baron, M. (2012). Sequential Bonferroni methods for multiple hypothesis testing with strong control of family-wise error rates I and II. Sequential Analysis 31(2) 238–262. https://doi.org/10.1080/07474946.2012.665730. MR2911288
[9] 
Dettling, M. (2004). BagBoosting for tumor classification with gene expression data. Bioinformatics 20(18) 3583–3593.
[10] 
Dickey, J. M. and Lientz, B. (1970). The weighted likelihood ratio, sharp hypotheses about chances, the order of a Markov chain. The Annals of Mathematical Statistics 214–226. https://doi.org/10.1214/aoms/1177697203. MR0258187
[11] 
Drud, A. S. (1994). CONOPT—a large-scale GRG code. ORSA Journal on computing 6(2) 207–216.
[12] 
Edgar, R., Domrachev, M. and Lash, A. (2002). Edgar R, Domrachev M, Lash AEGene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucl Acids Res 30: 207-210. Nucleic acids research 30 207–210. https://doi.org/10.1093/nar/30.1.207.
[13] 
Foster, D. P. and Stine, R. A. (2008). α-investing: a procedure for sequential control of expected false discoveries. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70(2) 429–444. https://doi.org/10.1111/j.1467-9868.2007.00643.x. MR2424761
[14] 
Javanmard, A. and Montanari, A. (2018). Online rules for control of false discovery rate and false discovery exceedance. The Annals of statistics 46(2) 526–554. https://doi.org/10.1214/17-AOS1559. MR3782376
[15] 
Jeon, M., Xie, Z., Evangelista, J. E., Wojciechowicz, M. L., Clarke, D. J. B. and Ma’ayan, A. (2022). Transforming L1000 profiles to RNA-seq-like profiles with deep learning. BMC Bioinformatics 23.
[16] 
Liang, K. and Nettleton, D. (2012). Adaptive and dynamic adaptive procedures for false discovery rate control and estimation. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74(1) 163–182. https://doi.org/10.1111/j.1467-9868.2011.01001.x. MR2885844
[17] 
Parmigiani, G. and Inoue, L. (2009) Decision theory: Principles and approaches. John Wiley & Sons. https://doi.org/10.1002/9780470746684. MR2604978
[18] 
Ramdas, A., Yang, F., Wainwright, M. J. and Jordan, M. I. (2017). Online control of the false discovery rate with decaying memory. Advances in neural information processing systems 30.
[19] 
Ramdas, A., Zrnic, T., Wainwright, M. and Jordan, M. (2018). SAFFRON: an adaptive algorithm for online control of the false discovery rate. In International conference on machine learning 4286–4294. PMLR.
[20] 
Ramdas, A., Chen, J., Wainwright, M. J. and Jordan, M. I. (2019). A sequential algorithm for false discovery rate control on directed acyclic graphs. Biometrika 106(1) 69–86. https://doi.org/10.1093/biomet/asy066. MR3912384
[21] 
Robertson, D. S., Wason, J. M. S. and Ramdas, A. (2023). Online multiple hypothesis testing. arXiv:2208.11418. https://doi.org/10.1214/23-sts901. MR4665026
[22] 
Robertson, D. S., Wildenhain, J., Javanmard, A. and Karp, N. A. (2019). onlineFDR: an R package to control the false discovery rate for growing data repositories. Bioinformatics 35(20) 4196–4199. https://doi.org/10.1093/bioinformatics/btz191.
[23] 
Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A. A., D’Amico, A. V., Richie, J. P. et al. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer cell 1(2) 203–209.
[24] 
Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64(3) 479–498. https://doi.org/10.1111/1467-9868.00346. MR1924302
[25] 
Storey, J. D., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 66(1) 187–205. https://doi.org/10.1111/j.1467-9868.2004.00439.x. MR2035766
[26] 
Tukey, J. W. (1991). The philosophy of multiple comparisons. Statistical science 100–116.
[27] 
Tukey, J. W. and Braun, H. (1994) The Collected Works of John W. Tukey: Multiple Comparions 8. Elsevier. MR1263027
[28] 
Verdinelli, I. and Wasserman, L. (1995). Computing Bayes factors using a generalization of the Savage-Dickey density ratio. Journal of the American Statistical Association 90(430) 614–618. MR1340514
[29] 
Xia, F., Zhang, M. J., Zou, J. Y. and Tse, D. (2017). NeuralFDR: Learning discovery thresholds from hypothesis features. Advances in neural information processing systems 30.
[30] 
Xu, Z. and Ramdas, A. (2022). Dynamic Algorithms for Online Multiple Testing. In Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference (J. Bruna, J. Hesthaven and L. Zdeborova, eds.). Proceedings of Machine Learning Research 145 955–986. PMLR.
[31] 
Yang, F., Ramdas, A., Jamieson, K. G. and Wainwright, M. J. (2017). A framework for multi-a (rmed)/b (andit) testing with online fdr control. Advances in Neural Information Processing Systems 30.
[32] 
Zeisel, A., Zuk, O. and Domany, E. (2011). FDR control with adaptive procedures and FDR monotonicity. The Annals of applied statistics 943–968. https://doi.org/10.1214/10-AOAS399. MR2840182
[33] 
Zhou, J., Foster, D., Stine, R. and Ungar, L. (2005). Streaming Feature Selection Using Alpha-Investing. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. KDD ’05 384–393. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1081870.1081914.
[34] 
Zrnic, T., Ramdas, A. and Jordan, M. I. (2021). Asynchronous Online Testing of Multiple Hypotheses. J. Mach. Learn. Res. 22 33. MR4253726
[35] 
Zrnic, T., Jiang, D., Ramdas, A. and Jordan, M. (2020). The Power of Batching in Multiple Hypothesis Testing. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (S. Chiappa and R. Calandra, eds.). Proceedings of Machine Learning Research 108 3806–3815. PMLR.

Full article PDF XML
Full article PDF XML

Copyright
© 2024 New England Statistical Society
by logo by logo
Open access article under the CC BY license.

Keywords
α-investing FDR control Multiple comparisons Online testing

Funding
This work was funded in part by NSF award CCF-1934846 (TRIPODS) and NIH R01GM135931 (NIGMS).

Metrics
since December 2021
292

Article info
views

113

Full article
views

138

PDF
downloads

47

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

The New England Journal of Statistics in Data Science

  • ISSN: 2693-7166
  • Copyright © 2021 New England Statistical Society

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer
Powered by PubliMill  •  Privacy policy