The New England Journal of Statistics in Data Science logo


  • Help
Login Register

  1. Home
  2. Issues
  3. Volume 1, Issue 1 (2023)
  4. Double Your Variance, Dirtify Your Bayes ...

The New England Journal of Statistics in Data Science

Submit your article Information Become a Peer-reviewer
  • Article info
  • Full article
  • More
    Article info Full article

Double Your Variance, Dirtify Your Bayes, Devour Your Pufferfish, and Draw your Kidstrogram
Volume 1, Issue 1 (2023), pp. 4–23
Xiao-Li Meng  

Authors

 
Placeholder
https://doi.org/10.51387/22-NEJSDS6
Pub. online: 5 October 2022      Type: Methodology Article      Open accessOpen Access
Area: Statistical Methodology

Accepted
1 September 2022
Published
5 October 2022

Abstract

This article expands upon my presentation to the panel on “The Radical Prescription for Change” at the 2017 ASA (American Statistical Association) symposium on A World Beyond $p<0.05$. It emphasizes that, to greatly enhance the reliability of—and hence public trust in—statistical and data scientific findings, we need to take a holistic approach. We need to lead by example, incentivize study quality, and inoculate future generations with profound appreciations for the world of uncertainty and the uncertainty world. The four “radical” proposals in the title—with all their inherent defects and trade-offs—are designed to provoke reactions and actions. First, research methodologies are trustworthy only if they deliver what they promise, even if this means that they have to be overly protective, a necessary trade-off for practicing quality-guaranteed statistics. This guiding principle may compel us to doubling variance in some situations, a strategy that also coincides with the call to raise the bar from $p<0.05$ to $p<0.005$ [3]. Second, teaching principled practicality or corner-cutting is a promising strategy to enhance the scientific community’s as well as the general public’s ability to spot—and hence to deter—flawed arguments or findings. A remarkable quick-and-dirty Bayes formula for rare events, which simply divides the prevalence by the sum of the prevalence and the false positive rate (or the total error rate), as featured by the popular radio show Car Talk, illustrates the effectiveness of this strategy. Third, it should be a routine mental exercise to put ourselves in the shoes of those who would be affected by our research finding, in order to combat the tendency of rushing to conclusions or overstating confidence in our findings. A pufferfish/selfish test can serve as an effective reminder, and can help to institute the mantra “Thou shalt not sell what thou refuseth to buy” as the most basic professional decency. Considering personal stakes in our statistical endeavors also points to the concept of behavioral statistics, in the spirit of behavioral economics. Fourth, the current mathematical education paradigm that puts “deterministic first, stochastic second” is likely responsible for the general difficulties with reasoning under uncertainty, a situation that can be improved by introducing the concept of histogram, or rather kidstogram, as early as the concept of counting.

References

[1] 
Amatya, A., Bhaumik, D. and Gibbons, R. D. (2013). Sample Size Determination for Clustered Count Data. Statistics in Medicine 32(24) 4162–4179. https://doi.org/10.1002/sim.5819. MR3118347
[2] 
Becker, G. S. (1976) The Economic Approach to Human Behavior 803. University of Chicago Press.
[3] 
Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. q. J., Berk, R., Bollen, K. A., Brembs, B., Brown, L., Camerer, C. et al.(2018). Redefine Statistical Significance. Nature Human Behaviour 2 6–10.
[4] 
Benjamini, Y. (2020). Selective Inference: The Silent Killer of Replicability. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.fc62b261.
[5] 
Berthold, M. R. (2019). What Does It Take to be a Successful Data Scientist? Harvard Data Science Review 1(2). https://doi.org/10.1162/99608f92.e0eaabfc.
[6] 
Bhaumik, D. K., Roy, A., Aryal, S., Hur, K., Duan, N., Normand, S. q. L. T., Brown, C. H. and Gibbons, R. D. (2008). Sample Size Determination for Studies With Repeated Continuous Outcomes. Psychiatric Annals 38(12).
[7] 
Bhaumik, D. K., Roy, A., Lazar, N. A., Kapur, K., Aryal, S., Sweeney, J. A., Patterson, D. and Gibbons, R. D. (2009). Hypothesis Testing, Power and Sample Size Determination for Between Group Comparisons in fMRI Experiments. Statistical Methodology 6(2) 133–146. https://doi.org/10.1016/j.stamet.2008.05.003. MR2649612
[8] 
Blitzstein, J. K. and Hwang, J. (2019) Introduction to Probability. Chapman and Hall/CRC. https://doi.org/10.1201/9780429428357. MR3929729
[9] 
Borgman, C. L. (2019). The Lives and After Lives of Data. Harvard Data Science Review 1(1). https://doi.org/10.1162/99608f92.9a36bdb6.
[10] 
Bradley, V. C., Kuriwaki, S., Isakov, M., Sejdinovic, D., Meng, X.-L. and Flaxman, S. (2021). Unrepresentative Big Surveys Significantly Overestimated US Vaccine Uptake. Nature 600(7890) 695–700.
[11] 
Burt, J. (2014) Orsch...Cutting the Edge in Education: Lessons Learned from an Innovative Lab School. Stone Press.
[12] 
Bush, R., Dutton, A., Evans, M., Loft, R. and Schmidt, G. A. (2020). Perspectives on Data Reproducibility and Replicability in Paleoclimate and Climate Science. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.00cd8f85.
[13] 
Came, D. (2009). Disinterestedness and Objectivity. European Journal of Philosophy 17(1) 91.
[14] 
Camerer, C. F. and Loewenstein, G. (2004). BehAvioral Economics: Past, Present, Future. In Advances in Behavioral Economics. (Chapter One). https://doi.org/10.1515/9781400829118.
[15] 
Chaudhuri, S., Lo, A. W., Xiao, D. and Xu, Q. (2020). Bayesian Adaptive Clinical Trials for Anti-Infective Therapeutics During Epidemic Outbreaks. Harvard Data Science Review Special Issue 1. https://doi.org/10.1162/99608f92.7656c213.
[16] 
Copas, J. and Eguchi, S. (2005). Local Model Uncertainty and Incomplete-Data Bias (with Discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(4) 459–513. https://doi.org/10.1111/j.1467-9868.2005.00512.x. MR2168201
[17] 
Cox, D. R. (2006) Principles of Statistical Inference. Cambridge University Press. https://doi.org/10.1017/CBO9780511813559. MR2278763
[18] 
Cox, D. R. and Donnelly, C. A. (2011) Principles of Applied Statistics. Cambridge University Press. https://doi.org/10.1017/CBO9781139005036. MR2817147
[19] 
Cox, D. R. and Snell, E. J. (2018) AppLied Statistics: Principles and Examples. Routledge.
[20] 
Fay, R. E. (1992). When are Inferences from Multiple Imputation Valid? In Proceedings of the Survey Research Methods Section, American Statistical Association 227–232.
[21] 
Fayyad, U. and Hamutcu, H. (2020). Toward Foundations for Data Science and Analytics: A Knowledge Framework for Professional Standards. Harvard Data Science Review 2(2). https://doi.org/10.1162/99608f92.1a99e67a.
[22] 
Fineberg, H., Stodden, V. and Meng, X.-L. (2020). Highlights of the US National Academies Report on “Reproducibility and Replicability in Science”. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.cb310198.
[23] 
Franklin, C. and Bargagliotti, A. (2020). Introducing GAISE II: A Guideline for Precollege Statistics and Data Science Education. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.246107bb.
[24] 
Goeva, A., Stoudt, S. and Trisovic, A. (2020). Toward Reproducible and Extensible Research: From Values to Action. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.1cc3d72a.
[25] 
Gong, R. and Meng, X.-L. (2021). Judicious Judgment Meets Unsettling Updating: Dilation, Sure Loss and Simpson’s Paradox (with Discussions). Statistical Science 36(2) 169–214. https://doi.org/10.1214/19-sts765. MR4255191
[26] 
Gould, S. J. (1985). The Median Isn’t the Message. Discover 6(6) 40–42.
[27] 
Gould, S. J. (2013). The Median Isn’t the Message. AMA Journal of Ethics 15(1) 77–81.
[28] 
Haas, L., Hero, A. and Lue, R. A. (2019). Highlights of the National Academies Report on “Undergraduate Data Science: Opportunities and Options”. Harvard Data Science Review 1(1). https://doi.org/10.1162/99608f92.38f16b68.
[29] 
Hacking, I. (2016) Logic of Statistical Inference. Cambridge University Press. MR0391307
[30] 
Hawes, M. B. (2020). Implementing Differential Privacy: Seven Lessons From the 2020 United States Census. Harvard Data Science Review 2(2). https://doi.org/10.1162/99608f92.353c6f99.
[31] 
Heitjan, D. F. and Rubin, D. B. (1991). Ignorability and Coarse Data. The Annals of Statistics 19(4) 2244–2253. https://doi.org/10.1214/aos/1176348396. MR1135174
[32] 
Howell, E. L. (2020). Science Communication in the Context of Reproducibility and Replicability: How Nonscientists Navigate Scientific Uncertainty. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.f2823096.
[33] 
Ioannidis, J. P. (2005). Why Most Published Research Findings Are False. PLoS Medicine 2(8) 124. https://doi.org/10.1080/09332480.2005.10722754. MR2216666
[34] 
Isakov, M. and Kuriwaki, S. (2020). Towards Principled Unskewing: Viewing 2020 Election Polls Through a Corrective Lens from 2016. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.86a46f38.
[35] 
Junk, T. R. and Lyons, L. (2020). Reproducibility and Replication of Experimental Particle Physics Results. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.250f995b.
[36] 
Katsevich, E. and Ramdas, A. (2018). Towards “Simultaneous Selective Inference”: Post-Hoc Bounds on the False Discovery Proportion. arXiv preprint arXiv:1803.06790. https://doi.org/10.1214/19-AOS1938. MR4185816
[37] 
Koch, L. (2021). Robust Test Statistics for Data Sets With Missing Correlation Information. Physical Review D 103(11) 113008. https://doi.org/10.1103/physrevd.103.113008. MR4284964
[38] 
Kolaczyk, E., Wright, H. and Yajima, M. (2021). Statistics Practicum: Placing ‘Practice’ at the Center of Data Science Education (with Discussions). Harvard Data Science Review 3(1). https://doi.org/10.1162/99608f92.2d65fc70.
[39] 
Kott, P. S. (1995). A Paradox of Multiple Imputation. In Proceedings of the Survey Research Methods Section, American Statistical Association 380–383.
[40] 
Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A., Argamon, S. E., Baguley, T., Becker, R. B., Benning, S. D., Bradford, D. E. et al.(2018). Justify Your Alpha. Nature Human Behaviour 2(3) 168–171.
[41] 
Leonelli, S. (2019). Data Governance is Key to Interpretation: Reconceptualizing Data in Data Science. Harvard Data Science Review 1(1). https://doi.org/10.1162/99608f92.17405bb6.
[42] 
Lin, X. (2020). Learning Lessons on Reproducibility and Replicability in Large Scale Genome-Wide Association Studies. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.33703976.
[43] 
Little, R. J. and Rubin, D. B. (2019) Statistical Analysis with Missing Data 793. John Wiley & Sons. https://doi.org/10.1002/9781119013563. MR1925014
[44] 
Lo, A. W. (2017). Adaptive Markets. In Adaptive Markets. Princeton University Press.
[45] 
Martinez, W. and LaLonde, D. (2020). Data Science for Everyone Starts in Kindergarten: Strategies and Initiatives from the American Statistical Association. Harvard Data Science Review. https://doi.org/10.1162/99608f92.7a9f2f4d.
[46] 
McNutt, M. (2020). Self-Correction by Design. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.32432837.
[47] 
Meng, X.-L. (1994). Multiple-Imputation Inferences with Uncongenial Sources of Input (with Discussion). Statistical Science 9(4) 538–558.
[48] 
Meng, X.-L. (1994). Posterior Predictive p-Values. The Annals of Statistics 22(3) 1142–1160. https://doi.org/10.1214/aos/1176325622. MR1311969
[49] 
Meng, X.-L. (2009). AP Statistics: Passion, Paradox, and Pressure (Part I). Amstat News (December) 7–10.
[50] 
Meng, X.-L. (2009). Desired and Feared — What do We do Now and Over the Next 50 Years? The American Statistician 63(3) 202–210. https://doi.org/10.1198/tast.2009.09045. MR2750343
[51] 
Meng, X.-L. (2010). AP Statistics: Passion, Paradox, and Pressure (Part II). Amstat News (January) 5–9.
[52] 
Meng, X.-L. (2012). You Want Me to Analyze Data I Don’t Have? Are You Insane? Shanghai archives of psychiatry 24(5) 297.
[53] 
Meng, X.-L. (2018). Statistical Paradises and Paradoxes in Big Data (I): Law of Large Populations, Big Data Paradox, and the 2016 US Presidential Election. Annals of Applied Statistics 12(2) 685–726. https://doi.org/10.1214/18-AOAS1161SF. MR3834282
[54] 
Meng, X.-L. (2019). Data Science: An Artificial Ecosystem. Harvard Data Science Review 1(1). https://doi.org/10.1162/99608f92.ba20f892.
[55] 
Meng, X.-L. (2020). Information and Uncertainty: Two Sides of the Same Coin. Harvard Data Science Review 2(2). https://doi.org/10.1162/99608f92.c108a25b.
[56] 
Meng, X.-L. (2020). Reproducibility, Replicability, and Reliability. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.dbfce7f9.
[57] 
Mullainathan, S. and Thaler, R. H. (2000). Behavioral Economics. Technical Report, National Bureau of Economic Research.
[58] 
Oberski, D. L. and Kreuter, F. (2020). Differential Privacy and Social Science: An Urgent Puzzle. Harvard Data Science Review 2(1). https://doi.org/10.1162/99608f92.63a22079.
[59] 
Parashar, M. (2020). Leveraging the National Academies’ Reproducibility and Replication in Science Report to Advance Reproducibility in Publishing. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.b69d3134.
[60] 
Particle Data Group (2020). Review of Particle Physics. Progress of Theoretical and Experimental Physics 2020(8) 1–2093.
[61] 
Perez, L. R., Spangler, D. A. and Franklin, C. (2021). Engaging Young Learners With Data: Highlights From GAISE II, Level A. Harvard Data Science Review. https://doi.org/10.1162/99608f92.be3c2ec8.
[62] 
Plant, A. L. and Hanisch, R. J. (2020). Reproducibility in Science: A Metrology Perspective. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.eb6ddee4.
[63] 
Reid, N. and Cox, D. R. (2015). On Some Principles of Statistical Inference. International Statistical Review 83(2) 293–308. https://doi.org/10.1111/insr.12067. MR3377082
[64] 
Romano, Y., Barber, R. F., Sabatti, C. and Candés, E. (2020). With Malice Toward None: Assessing Uncertainty via Equalized Coverage. Harvard Data Science Review 2(2). https://doi.org/10.1162/99608f92.03f00592.
[65] 
Roy, A., Bhaumik, D. K., Aryal, S. and Gibbons, R. D. (2007). Sample Size Determination for Hierarchical Longitudinal Designs with Differential Attrition Rates. Biometrics 63(3) 699–707. https://doi.org/10.1111/j.1541-0420.2007.00769.x. MR2395706
[66] 
Rubin, D. B. (1976). Inference and Missing data. Biometrika 63(3) 581–592. https://doi.org/10.1093/biomet/63.3.581. MR0455196
[67] 
Rubin, D. B. (1987) Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons. https://doi.org/10.1002/9780470316696. MR0899519
[68] 
Rudin, C. and Radin, J. (2019). Why Are We Using Black Box Models in AI When We Don’t Need To? A Lesson From An Explainable AI Competition. Harvard Data Science Review 1(2). https://doi.org/10.1162/99608f92.5a8a3a3d.
[69] 
Rudin, C., Wang, C. and Coker, B. (2020). The Age of Secrecy and Unfairness in Recidivism Prediction. Harvard Data Science Review 2(1). https://doi.org/10.1162/99608f92.6ed64b30.
[70] 
Samson, A. (2016). The Behavioral Economics Guide 2016 (with an Introduction by Gerd Gigerenzer). Behavioral Science Solutions Ltd. http://eprints.lse.ac.uk/66934/7/Samson_Behavioural%20economics%20guide_%202016_author.pdf.
[71] 
Sanders, N. (2020). Can the Coronavirus Prompt a Global Outbreak of “Distributional Thinking” in Organizations? Harvard Data Science Review 2(2). https://doi.org/10.1162/99608f92.a577296b.
[72] 
Schuemie, M. J., Cepeda, M. S., Suchard, M. A., Yang, J., Tian, Y., Schuler, A., Ryan, P. B., Madigan, D. and Hripcsak, G. (2020). How Confident Are We About Observational Findings in Healthcare: A Benchmark Study. Harvard Data Science Review 2(1). https://doi.org/10.1162/99608f92.147cc28e.
[73] 
Simmons, J. P., Nelson, L. D. and Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science 22(11) 1359–1366.
[74] 
Spiegelhalter, D. (2020). Should We Trust Algorithms? Harvard Data Science Review 2(1). https://doi.org/10.1162/99608f92.cb91a35a.
[75] 
Stodden, V. (2020). Theme Editor’s Introduction to Reproducibility and Replicability in Science. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.c46a02d4.
[76] 
Tukey, J. W. (1962). The Future of Data Analysis. The Annals of Mathematical Statistics 33(1) 1–67. https://doi.org/10.1214/aoms/1177704711. MR0133937
[77] 
Van Belle, G., Fisher, L. D., Heagerty, P. J. and Lumley, T. (2004) BiostatiStics: A Methodology for the Health Sciences 519. John Wiley & Sons.
[78] 
Vilhuber, L. (2020). Reproducibility and Replicability in Economics. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.4f6b9e67.
[79] 
Waller, L. and Levi, T. (2021). Building Intuition Regarding the Statistical Behavior of Mass Medical Testing Programs. Harvard Data Science Review Special Issue 1.
[80] 
Wasserstein, R. L. and Lazar, N. A. (2016). The ASA’s Statement on p-Values: Context, Process, and Purpose. The American Statistician 70(2) 129–133. https://doi.org/10.1080/00031305.2016.1154108. MR3511040
[81] 
Wilkinson, N. and Klaes, M. (2017) An Introduction to Behavioral Economics. Macmillan International Higher Education.
[82] 
Willis, C. and Stodden, V. (2020). Trust but Verify: How to Leverage Policies, Workflows, and Infrastructure to Ensure Computational Reproducibility in Publication. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.25982dcf.
[83] 
Wing, J. M. (2019). The Data Life Cycle. Harvard Data Science Review 1(1). https://doi.org/10.1162/99608f92.e26845b4.
[84] 
Xie, M-G. and Singh, K. (2013). Confidence Distribution, the Frequentist Distribution Estimator of a Parameter: A Review. International Statistical Review 81(1) 3–39. https://doi.org/10.1111/insr.12000. MR3047496
[85] 
Xie, X. and Meng, X.-L. (2017). Dissecting Multiple Imputation from a Multi-phase Inference Perspective: What Happens When God’s, Imputer’s and Analyst’s Models Are Uncongenial? (With Discussion). Statistica Sinica 27 1485–1594. MR3701490

Full article PDF XML
Full article PDF XML

Copyright
© 2023 New England Statistical Society
by logo by logo
Open access article under the CC BY license.

Keywords
Behavioral Statistics K-12 Mathematical Education Outerval p-value Quick-and-dirty Bayes Theorem Research Replicability and Reliability Principled Corner Cutting (PC2) Quality-guaranteed Statistics Selfish Test Soft Elimination

Metrics
since December 2021
1679

Article info
views

611

Full article
views

545

PDF
downloads

150

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

The New England Journal of Statistics in Data Science

  • ISSN: 2693-7166
  • Copyright © 2021 New England Statistical Society

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer
Powered by PubliMill  •  Privacy policy