Dietary Patterns and Cancer Risk: An Overview with Focus on Methods
Volume 2, Issue 1 (2024), pp. 30–53
Pub. online: 29 May 2023
Type: Methodology Article
Open Access
Area: Cancer Research
1
Branch of Medical Statistics, Biometry, and Epidemiology “G. A. Maccacaro”, Department of Clinical Sciences and Community Health, Università degli Studi di Milano.
2
Fondazione IRCCS Cà Granda Ospedale Maggiore Policlinico.
3
Department of Biostatistics, Brown University.
4
Data Science Initiative, Brown University.
5
Center for Computational Molecular Biology, Brown University.
6
Department of Medicine - DAME, Università degli Studi di Udine.
Accepted
28 April 2023
28 April 2023
Published
29 May 2023
29 May 2023
Notes
to Adriano Decarli, Honorary Professor of Medical Statistics, Università degli Studi di Milano
Abstract
Traditionally, research in nutritional epidemiology has focused on specific foods/food groups or single nutrients in their relation with disease outcomes, including cancer. Dietary pattern analysis have been introduced to examine potential cumulative and interactive effects of individual dietary components of the overall diet, in which foods are consumed in combination. Dietary patterns can be identified by using evidence-based investigator-defined approaches or by using data-driven approaches, which rely on either response independent (also named “a posteriori” dietary patterns) or response dependent (also named “mixed-type” dietary patterns) multivariate statistical methods. Within the open methodological challenges related to study design, dietary assessment, identification of dietary patterns, confounding phenomena, and cancer risk assessment, the current paper provides an updated landscape review of novel methodological developments in the statistical analysis of a posteriori/mixed-type dietary patterns and cancer risk. The review starts from standard a posteriori dietary patterns from principal component, factor, and cluster analyses, including mixture models, and examines mixed-type dietary patterns from reduced rank regression, partial least squares, classification and regression tree analysis, and least absolute shrinkage and selection operator. Novel statistical approaches reviewed include Bayesian factor analysis with modeling of sparsity through shrinkage and sparse priors and frequentist focused principal component analysis. Most novelties relate to the reproducibility of dietary patterns across studies where potentialities of the Bayesian approach to factor and cluster analysis work at best.
References
Assi, N., Moskal, A., Slimani, N., Viallon, V., Chajes, V., Freisling, H., Monni, S., Knueppel, S., Förster, J., Weiderpass, E. and others. (2016). A treelet transform analysis to relate nutrient patterns to the risk of hormonal receptor-defined breast cancer in the European Prospective Investigation into Cancer and Nutrition (EPIC). Public Health Nutr 19(2) 242–254.
Avalos-Pacheco, A., Rossell, D. and Savage, R. S. (2022). Heterogeneous large datasets integration using Bayesian factor regression. Bayesian Anal 17(1) 33–66. https://doi.org/10.1214/20-ba1240. MR4377136
Bédard, A., Garcia-Aymerich, J., Sanchez, M., Le Moual, N., Clavel-Chapelon, F., Boutron-Ruault, M. -C., Maccario, J. and Varraso, R. (2015). Confirmatory factor analysis compared with principal component analysis to derive dietary patterns: a longitudinal study in adult women. J Nutr 145(7) 1559–1568.
Bhattacharya, A. and Dunson, D. B. (2011). Sparse Bayesian infinite factor models. Biometrika 98(2) 291–306. https://doi.org/10.1093/biomet/asr013. MR2806429
Carvalho, C. M., Chang, J., Lucas, J. E., Nevins, J. R., Wang, Q. and West, M. (2008). High-dimensional sparse factor modeling: applications in gene expression genomics. J Am Stat Assoc 103(484) 1438–1456. https://doi.org/10.1198/016214508000000869. MR2655722
Castelló, A., Buijsse, B., Martín, M., Ruiz, A., Casas, A. M., Baena-Cañada, J. M., Pastor-Barriuso, R., Antolín, S., Ramos, M., Muñoz, M. and others. (2016). Evaluating the applicability of data-driven dietary patterns to independent samples with a focus on measurement tools for pattern similarity. J Acad Nutr Diet 116(12) 1914–1924.
Castelló, A., Lope, V., Vioque, J., Santamariña, C., Pedraz-Pingarrón, C., Abad, S., Ederra, M., Salas-Trejo, D., Vidal, C., Sánchez-Contador, C. and others. (2016). Reproducibility of data-driven dietary patterns in two groups of adult Spanish women from different studies. Brit J Nutr 116(4) 734–742.
De Vito, R., Bellio, R., Trippa, L. and Parmigiani, G. (2019). Multi-study factor analysis. Biometrics 75(1) 337–346. https://doi.org/10.1111/biom.12974. MR3953734
De Vito, R., Bellio, R., Trippa, L. and Parmigiani, G. (2021). Bayesian multistudy factor analysis for high-throughput biological data. Ann Appl Stat 15(4) 1723–1741. https://doi.org/10.1214/21-aoas1456. MR4355073
De Vito, R., Stephenson, B., Sotres-Alvarez, D., Siega-Riz, A. M., Mattei, J., Parpinel, M., Peters, B. A., Bainter, S. A., Daviglus, M. L., Van Horn, L. and others. (2022). Shared and ethnic background site-specific dietary patterns in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). medRxiv.
Edefonti, V., Hashibe, M., Parpinel, M., Turati, F., Serraino, D., Matsuo, K., Olshan, A. F., Zevallos, J. P., Winn, D. M., Moysich, K. et al. (2015). Natural vitamin C intake and the risk of head and neck cancer: A pooled analysis in the International Head and Neck Cancer Epidemiology Consortium. Int J Cancer 137(2) 448–462.
Edefonti, V., Hashibe, M., Ambrogi, F., Parpinel, M., Bravi, F., Talamini, R., Levi, F., Yu, G., Morgenstern, H., Kelsey, K. and others. (2012). Nutrient-based dietary patterns and the risk of head and neck cancer: a pooled analysis in the International Head and Neck Cancer Epidemiology consortium. Ann Oncol 23(7) 1869–1880.
Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. J Am Stat Ass 97(458) 611–631. https://doi.org/10.1198/016214502760047131. MR1951635
Lee, A. B., Nadler, B. and Wasserman, L. (2008). Treelets – An adaptive multi-scale basis for sparse unordered data. Ann Appl Stat 2(2) 435–471. https://doi.org/10.1214/07-AOAS137. MR2524336
Lopes, H. F. and West, M. (2004). Bayesian model assessment in factor analysis. Stat Sinica 14 41–67. MR2036762
Männistö, S., Dixon, L. B., Balder, H. F., Virtanen, M. J., Krogh, V., Khani, B. R., Berrino, F., van den Brandt, P. A., Hartman, A. M., Pietinen, P. and others. (2005). Dietary patterns and breast cancer risk: results from three cohort studies in the DIETSCAN project. Cancer Causes Control 16(6) 725–733.
Meng, X. L. and Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80(2) 267–278. https://doi.org/10.1093/biomet/80.2.267. MR1243503
National Institutes of Health, National Cancer Institute (2023). Dietary Assessment Primer. https://dietassessmentprimer.cancer.gov/ Accessed 2023-02-09.
Park, T. and Casella, G. (2008). The Bayesian lasso. J Am Stat Assoc 103(482) 681–686. https://doi.org/10.1198/016214508000000337. MR2524001
Patterson, B. H., Dayton, C. M. and Graubard, B. I. (2002). Latent class analysis of complex sample survey data: application to dietary data. J Am Stat Ass 97(459) 721–741. https://doi.org/10.1198/016214502388618465. MR1941406
Rita Gaio, A., Costa, J. P., Santos, A. C., Ramos, E. and Lopes, C. (2012). A restricted mixture model for dietary pattern analysis in small samples. Stat Med 31(19) 2137–2150. https://doi.org/10.1002/sim.5336. MR2956067
Roková, V. and George, E. I. (2016). Fast Bayesian factor analysis via automatic rotations to sparsity. J Am Stat Assoc 111(516) 1608–1622. https://doi.org/10.1080/01621459.2015.1100620. MR3601721
Romaguera, D., Vergnaud, A. C., Peeters, P. H., van Gils, C. H., Chan, D. S., Ferrari, P., Romieu, I., Jenab, M., Slimani, N., Clavel-Chapelon, F. et al. (2012). Is concordance with World Cancer Research Fund/American Institute for Cancer Research guidelines for cancer prevention related to subsequent risk of cancer? Results from the EPIC study. Am J Clin Nutr 96(1) 150–163.
Satija, A., Yu, E., Willett, W. C. and Hu, F. B. (2015). Understanding nutritional epidemiology and its role in policy. Adv Nutr 6(1) 5–18. MR3337656
Schatzkin, A., Kipnis, V., Carroll, R. J., Midthune, D., Subar, A. F., Bingham, S., Schoeller, D. A., Troiano, R. P. and Freedman, L. S. (2003). A comparison of a food frequency questionnaire with a 24-hour recall for use in an epidemiological cohort study: results from the biomarker-based Observing Protein and Energy Nutrition (OPEN) study. Int J Epidemiol 32(6) 1054–1062.
Spiegelman, D., Zhao, B. and Kim, J. (2005). Correlated errors in biased surrogates: study designs and methods for measurement error correction. Stat Med 24(11) 1657–1682. https://doi.org/10.1002/sim.2055. MR2137643
Stephenson, B. J., Herring, A. H. and Olshan, A. (2020). Robust clustering with subpopulation-specific deviations. J Am Stat Assoc 115(530) 521–537. https://doi.org/10.1080/01621459.2019.1611583. MR4107655
Stephenson, B. J., Herring, A. H., Olshan, A. F. and others. (2022). Derivation of maternal dietary patterns accounting for regional heterogeneity. J R Stat Soc C: Appl Stat 71(5) 1957–1977. https://doi.org/10.1111/rssc.12604. MR4511136
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J R Stat Soc B 58(1) 267–288. MR1379242
Tuglus, C. and van der Laan, M. J. (2008). Discussion of: Treelets – An adaptive multi-scale basis for sparse unordered data. Ann Appl Stat 2(2) 489. https://doi.org/10.1214/08-AOAS137F. MR2524342
Varraso, R., Garcia-Aymerich, J., Monier, F., Le Moual, N., De Batlle, J., Miranda, G., Pison, C., Romieu, I., Kauffmann, F. and Maccario, J. (2012). Assessment of dietary patterns in nutritional epidemiology: principal component analysis compared with confirmatory factor analysis. Am J Clin Nutr 96(5) 1079–1092.