Contrastive Inverse Regression for Dimension Reduction

Hawke, Sam; Ma, Yueen; Luo, Hengrui; Li, Didong

doi:10.51387/24-NEJSDS72

The New England Journal of Statistics in Data Science

Contrastive Inverse Regression for Dimension Reduction

Volume 3, Issue 1 (2025), pp. 106–118

Sam Hawke Yueen Ma Hengrui Luo All authors (4)

https://doi.org/10.51387/24-NEJSDS72

Pub. online: 19 November 2024 Type: Methodology Article

Open Access

Area: Statistical Methodology

Accepted
3 October 2024

Published
19 November 2024

Abstract

Supervised dimension reduction (SDR) has been a topic of growing interest in data science, as it enables the reduction of high-dimensional covariates while preserving the functional relation with certain response variables of interest. However, existing SDR methods are not suitable for analyzing datasets collected from case-control studies. In this setting, the goal is to learn and exploit the low-dimensional structure unique to or enriched by the case group, also known as the foreground group. While some unsupervised techniques such as the contrastive latent variable model and its variants have been developed for this purpose, they fail to preserve the functional relationship between the dimension-reduced covariates and the response variable. In this paper, we propose a supervised dimension reduction method called contrastive inverse regression (CIR) specifically designed for the contrastive setting. CIR introduces an optimization problem defined on the Stiefel manifold with a non-standard loss function. We prove the convergence of CIR to a local optimum using a gradient descent-based algorithm, and our numerical study empirically demonstrates the improved performance over competing methods for high-dimensional data.

Supplementary material

Supplementary Material

Additional experimental details are included in the supplementary material.

References

[1]

Abid, A., Zhang, M. J., Bagaria, V. K. and Zou, J. (2018). Exploring patterns enriched in a dataset with contrastive principal component analysis. Nature Communications 9(1) 1–7.

[2]

Absil, P.-A., Mahony, R. and Sepulchre, R. (2009). Optimization algorithms on matrix manifolds. In Optimization Algorithms on Matrix Manifolds Princeton University Press. https://doi.org/10.1515/9781400830244. MR2364186

[3]

Alquicira-Hernandez, J., Sathe, A., Ji, H. P., Nguyen, Q. and Powell, J. E. (2019). scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data. Genome Biology 20(1) 1–17.

[4]

Bader, G. D., Cary, M. P. and Sander, C. (2006). Pathguide: a pathway resource list. Nucleic Acids Research 34 504–506.

[5]

Becht, E., McInnes, L., Healy, J., Dutertre, C.-A., Kwok, I. W., Ng, L. G., Ginhoux, F. and Newell, E. W. (2019). Dimensionality reduction for visualizing single-cell data using UMAP. Nature Biotechnology 37(1) 38–44.

[6]

Bini, D. A., Iannazzo, B., Meini, B. and Poloni, F. (2008). Nonsymmetric algebraic Riccati equations associated with an M-matrix: recent advances and algorithms. In Dagstuhl Seminar Proceedings. Schloss Dagstuhl-Leibniz-Zentrum für Informatik.

[7]

Boumal, N., Absil, P.-A. and Cartis, C. (2019). Global rates of convergence for nonconvex optimization on manifolds. IMA Journal of Numerical Analysis 39(1) 1–33. https://doi.org/10.1093/imanum/drx080. MR4023745

[8]

Breiman, L. (1996). Bias, variance, and arcing classifiers. Technical Report, Tech. Rep. 460, Statistics Department, University of California, Berkeley …. https://doi.org/10.1214/aos/1024691079. MR1635406

[9]

Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (2017) Classification and regression trees. Routledge. MR0726392

[10]

Cai, Z., Li, R. and Zhu, L. (2020). Online sufficient dimension reduction through sliced inverse regression. J. Mach. Learn. Res. 21(10) 1–25. MR4071193

[11]

Caliski, T. and Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics – Theory and Methods 3(1) 1–27. https://doi.org/10.1080/03610927408827101. MR0375641

[12]

Camastra, F. and Staiano, A. (2016). Intrinsic dimension estimation: Advances and open problems. Information Sciences 328 26–41.

[13]

Campadelli, P., Casiraghi, E., Ceruti, C. and Rozza, A. (2015). Intrinsic dimension estimation: Relevant techniques and a benchmark framework. Mathematical Problems in Engineering 2015 759567. https://doi.org/10.1155/2015/759567. MR3417646

[14]

Chen, Z., Wang, B. and Gorban, A. N. (2020). Multivariate Gaussian and Student-t process regression for multi-output prediction. Neural Computing and Applications 32(8) 3005–3028.

[15]

Chopra, S., Hadsell, R. and LeCun, Y. (2005). Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) 1 539–546. IEEE.

[16]

Cook, R. D. (1996). Graphics for regressions with a binary response. Journal of the American Statistical Association 91(435) 983–992. https://doi.org/10.2307/2291717. MR1424601

[17]

Cook, R. D. (2009) Regression graphics: Ideas for studying regressions through graphics. John Wiley & Sons. https://doi.org/10.1002/9780470316931. MR1645673

[18]

Cook, R. D. and Weisberg, S. (1991). Sliced inverse regression for dimension reduction: Comment. Journal of the American Statistical Association 86(414) 328–332. MR1137117

[19]

Cortes, C. and Vapnik, V. (1995). Support-vector networks. Machine Learning 20(3) 273–297.

[20]

Cover, T. and Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1) 21–27.

[21]

Dann, E., Cujba, A.-M., Oliver, A. J., Meyer, K. B., Teichmann, S. A. and Marioni, J. C. (2023). Precise identification of cell states altered in disease using healthy single-cell references. Nature Genetics 55(11) 1998–2008.

[22]

Davies, D. L. and Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence (2) 224–227.

[23]

Edelman, A., Arias, T. A. and Smith, S. T. (1998). The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications 20(2) 303–353. https://doi.org/10.1137/S0895479895290954. MR1646856

[24]

Freedman, D. A. (2009) Statistical models: theory and practice. cambridge university press. https://doi.org/10.1017/CBO9780511815867. MR2489600

[25]

Girard, S., Lorenzo, H. and Saracco, J. (2022). Advanced topics in sliced inverse regression. Journal of Multivariate Analysis 188 104852. https://doi.org/10.1016/j.jmva.2021.104852. MR4353861

[26]

Hastie, T., Tibshirani, R. and Buja, A. (1994). Flexible discriminant analysis by optimal scoring. Journal of the American Statistical Association 89(428) 1255–1270. MR1310220

[27]

Higuera, C., Gardiner, K. J. and Cios, K. J. (2015). Self-organizing feature maps identify proteins critical to learning in a mouse model of Down syndrome. PloS One 10(6) 0129126.

[28]

Hilafu, H. and Safo, S. E. (2022). Sparse sliced inverse regression for high dimensional data analysis. BMC Bioinformatics 23(1) 1–19.

[29]

Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation 14(8) 1771–1800.

[30]

Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences 79(8) 2554–2558. https://doi.org/10.1073/pnas.79.8.2554. MR0652033

[31]

Hotelling, H. (1992). Relations between two sets of variates. In Breakthroughs in Statistics 162–190 Springer.

[32]

Hsing, T. and Carroll, R. J. (1992). An asymptotic theory for sliced inverse regression. The Annals of Statistics 20(2) 1040–1061. https://doi.org/10.1214/aos/1176348669. MR1165605

[33]

Jiang, B. and Liu, J. S. (2014). Variable selection for general index models via sliced inverse regression. The Annals of Statistics 42(5) 1751–1786. https://doi.org/10.1214/14-AOS1233. MR3262467

[34]

Jiang, C.-R., Yu, W. and Wang, J.-L. (2014). Inverse regression for longitudinal data. The Annals of Statistics 42(2) 563–591. https://doi.org/10.1214/13-AOS1193. MR3210979

[35]

Jones, A., Townes, F. W., Li, D. and Engelhardt, B. E. (2022). Contrastive latent variable modeling with application to case-control sequencing experiments. The Annals of Applied Statistics 16(3) 1268–1291. https://doi.org/10.1214/21-aoas1534. MR4455880

[36]

Journée, M., Nesterov, Y., Richtárik, P. and Sepulchre, R. (2010). Generalized power method for sparse principal component analysis. Journal of Machine Learning Research 11(2) 517–553. MR2600619

[37]

Li, B. (2018) Sufficient dimension reduction: Methods and applications with R. Chapman and Hall/CRC.

[38]

Li, D., Jones, A. and Engelhardt, B. (2020). Probabilistic contrastive principal component analysis. arXiv preprint arXiv:2012.07977.

[39]

Li, D., Mukhopadhyay, M. and Dunson, D. B. (2017). Efficient manifold and subspace approximations with spherelets. arXiv preprint arXiv:1706.08263. https://doi.org/10.1111/rssb.12508. MR4494155

[40]

Li, D., Mukhopadhyay, M. and Dunson, D. B. (2022). Efficient manifold approximation with spherelets. Journal of the Royal Statistical Society Series B 84(4) 1129–1149. https://doi.org/10.1111/rssb.12508. MR4494155

[41]

Li, K.-C. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association 86(414) 316–327. MR1137117

[42]

Li, L. and Yin, X. (2008). Sliced inverse regression with regularizations. Biometrics 64(1) 124–131. https://doi.org/10.1111/j.1541-0420.2007.00836.x. MR2422826

[43]

Li, L., Simonoff, J. S. and Tsai, C.-L. (2007). Tobit model estimation and sliced inverse regression. Statistical Modelling 7(2) 107–123. https://doi.org/10.1177/1471082X0700700201. MR2749982

[44]

Liao, Y.-t., Luo, H. and Ma, A. (2023). Efficient Bayesian selection of hyper-parameters for dimension reduction: Case studies for t-SNE and UMAP. In preparation.

[45]

Lin, Q., Zhao, Z. and Liu, J. S. (2018). On consistency and sparsity for sliced inverse regression in high dimensions. The Annals of Statistics 46(2) 580–610. https://doi.org/10.1214/17-AOS1561. MR3782378

[46]

Luo, H. and Li, D. (2022). Spherical rotation dimension reduction with geometric loss functions, 1–56. arXiv:2204.10975. MR4777417

[47]

Luo, H. and Strait, J. D. (2022). Nonparametric multi-shape modeling with uncertainty quantification. arXiv preprint arXiv:2206.09127.

[48]

Murray, R., Demmel, J., Mahoney, M. W., Erichson, N. B., Melnichenko, M., Malik, O. A., Grigori, L., Dereziski, M., Lopes, M. E., Liang, T. and Luo, H. (2022). Randomized Numerical Linear Algebra: a perspective on the field with an eye to software.

[49]

Nierenberg, D. W., Stukel, T. A., Baron, J. A., Dain, B. J., Greenberg, E. R. and Group, S. C. P. S. (1989). Determinants of plasma levels of beta-carotene and retinol. American Journal of Epidemiology 130(3) 511–521. MR0210418

[50]

Oviedo, H. and Dalmau, O. (2019). A scaled gradient projection method for minimization over the Stiefel manifold. In Mexican International Conference on Artificial Intelligence 239–250. Springer. https://doi.org/10.1007/s11075-020-01001-9. MR4269662

[51]

Oviedo, H., Dalmau, O. and Lara, H. (2021). Two adaptive scaled gradient projection methods for Stiefel manifold constrained optimization. Numerical Algorithms 87(3) 1107–1127. https://doi.org/10.1007/s11075-020-01001-9. MR4269662

[52]

Quinlan, J. R. (1987). Simplifying decision trees. International Journal of Man-Machine Studies 27(3) 221–234.

[53]

Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20 53–65.

[54]

Ruhe, A. (1970). Perturbation bounds for means of eigenvalues and invariant subspaces. BIT Numerical Mathematics 10(3) 343–354. https://doi.org/10.1007/bf01934203. MR0273802

[55]

Severson, K. A., Ghosh, S. and Ng, K. (2019). Unsupervised learning with contrastive latent variable models. In Proceedings of the AAAI Conference on Artificial Intelligence 33 4862–4869.

[56]

Stephenson, E., Reynolds, G., Botting, R., Calero-Nieto, F., Morgan, M., Tuong, Z., Bach, K., Sungnak, W., Worlock, K., Yoshida, M. et al. (2021). Cambridge Institute of therapeutic immunology and infectious disease-national Institute of health research (CITIID-NIHR) COVID-19 BioResource collaboration, single-cell multi-omics analysis of the immune response in COVID-19. Nat. Med 27(5) 904–916.

[57]

Tagare, H. D. (2011). Notes on optimization on Stiefel manifolds. Yale University, New Haven.

[58]

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58(1) 267–288. MR1379242

[59]

Van der Maaten, L. and Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research 9(11).

[60]

Virta, J., Lee, K.-Y. and Li, L. (2022). Sliced inverse regression in metric spaces. arXiv preprint arXiv:2206.11511. MR4485085

[61]

Vogelstein, J. T., Bridgeford, E. W., Tang, M., Zheng, D., Douville, C., Burns, R. and Maggioni, M. (2021). Supervised dimensionality reduction for big data. Nature Communications 12(1) 1–9.

[62]

Weinberger, E., Lin, C. and Lee, S.-I. (2023). Isolating salient variations of interest in single-cell data with contrastiveVI. Nature Methods 20(9) 1336–1345.

[63]

Weiss, M. and Tonella, P. (2022). Simple techniques work surprisingly well for neural network test prioritization and active learning (replicability study). In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis 139–150.

[64]

Wilkinson, L. and Luo, H. (2022). A distance-preserving matrix sketch. Journal of Computational and Graphical Statistics 31 945–959. https://doi.org/10.1080/10618600.2022.2050246. MR4513361

[65]

Wolpert, D. H. (1992). Stacked generalization. Neural Networks 5(2) 241–259.

[66]

Wu, H.-M., Kao, C.-H. and Chen, C.-h. (2020). Dimension reduction and visualization of symbolic interval-valued data using sliced inverse regression. Advances in Data Science: Symbolic, Complex and Network Data 4 49–77.

[67]

Wu, Q., Liang, F. and Mukherjee, S. (2013). Kernel sliced inverse regression: Regularization and consistency. In Abstract and Applied Analysis 2013. Hindawi. https://doi.org/10.1155/2013/540725. MR3081598

[68]

Wu, Q., Mukherjee, S. and Liang, F. (2008). Localized sliced inverse regression. Advances in Neural Information Processing Systems 21. https://doi.org/10.1198/jcgs.2010.08080. MR2791260

[69]

Xiao, H., Rasul, K. and Vollgraf, R. (2017). Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747.

[70]

Young, M. D., Mitchell, T. J., Vieira Braga, F. A., Tran, M. G., Stewart, B. J., Ferdinand, J. R., Collord, G., Botting, R. A., Popescu, D.-M., Loudon, K. W. et al. (2018). Single-cell transcriptomes from human kidneys reveal the cellular identity of renal tumors. Science 361(6402) 594–599.

[71]

Zheng, G. X., Terry, J. M., Belgrader, P., Ryvkin, P., Bent, Z. W., Wilson, R., Ziraldo, S. B., Wheeler, T. D., McDermott, G. P., Zhu, J. et al. (2017). Massively parallel digital transcriptional profiling of single cells. Nature Communications 8(1) 1–12.

[72]

Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics 15(2) 265–286. https://doi.org/10.1198/106186006X113430. MR2252527

[73]

Zou, J. Y., Hsu, D. J., Parkes, D. C. and Adams, R. P. (2013). Contrastive learning using spectral methods. Advances in Neural Information Processing Systems 26.

Full article

Open access article under the CC BY license.

Keywords

Case-control studies Supervised dimension reduction Optimization on Stiefel manifold

Funding

SH was supported by NIH grants T32ES007018 and UM1 TR004406; DL was supported by NIH grants R01 AG079291, R56 LM013784, R01 HL149683, and UM1 TR004406, R01 LM014407, P30 ES010126.

Metrics

since December 2021

266

Article info
views

Full article
views

PDF
downloads

XML
downloads

RSS

Authors

Abstract

Supplementary material

References

Export citation

Copy and paste formatted citation

Download citation in file