Double Your Variance, Dirtify Your Bayes, Devour Your Pufferfish, and Draw your Kidstrogram
Volume 1, Issue 1 (2023), pp. 4–23
Pub. online: 5 October 2022
Type: Statistical Methodology
Open Access
Accepted
1 September 2022
1 September 2022
Published
5 October 2022
5 October 2022
Abstract
This article expands upon my presentation to the panel on “The Radical Prescription for Change” at the 2017 ASA (American Statistical Association) symposium on A World Beyond $p<0.05$. It emphasizes that, to greatly enhance the reliability of—and hence public trust in—statistical and data scientific findings, we need to take a holistic approach. We need to lead by example, incentivize study quality, and inoculate future generations with profound appreciations for the world of uncertainty and the uncertainty world. The four “radical” proposals in the title—with all their inherent defects and trade-offs—are designed to provoke reactions and actions. First, research methodologies are trustworthy only if they deliver what they promise, even if this means that they have to be overly protective, a necessary trade-off for practicing quality-guaranteed statistics. This guiding principle may compel us to doubling variance in some situations, a strategy that also coincides with the call to raise the bar from $p<0.05$ to $p<0.005$ [3]. Second, teaching principled practicality or corner-cutting is a promising strategy to enhance the scientific community’s as well as the general public’s ability to spot—and hence to deter—flawed arguments or findings. A remarkable quick-and-dirty Bayes formula for rare events, which simply divides the prevalence by the sum of the prevalence and the false positive rate (or the total error rate), as featured by the popular radio show Car Talk, illustrates the effectiveness of this strategy. Third, it should be a routine mental exercise to put ourselves in the shoes of those who would be affected by our research finding, in order to combat the tendency of rushing to conclusions or overstating confidence in our findings. A pufferfish/selfish test can serve as an effective reminder, and can help to institute the mantra “Thou shalt not sell what thou refuseth to buy” as the most basic professional decency. Considering personal stakes in our statistical endeavors also points to the concept of behavioral statistics, in the spirit of behavioral economics. Fourth, the current mathematical education paradigm that puts “deterministic first, stochastic second” is likely responsible for the general difficulties with reasoning under uncertainty, a situation that can be improved by introducing the concept of histogram, or rather kidstogram, as early as the concept of counting.
References
Amatya, A., Bhaumik, D. and Gibbons, R. D. (2013). Sample Size Determination for Clustered Count Data. Statistics in Medicine 32(24) 4162–4179. https://doi.org/10.1002/sim.5819. MR3118347
Benjamini, Y. (2020). Selective Inference: The Silent Killer of Replicability. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.fc62b261.
Berthold, M. R. (2019). What Does It Take to be a Successful Data Scientist? Harvard Data Science Review 1(2). https://doi.org/10.1162/99608f92.e0eaabfc.
Bhaumik, D. K., Roy, A., Lazar, N. A., Kapur, K., Aryal, S., Sweeney, J. A., Patterson, D. and Gibbons, R. D. (2009). Hypothesis Testing, Power and Sample Size Determination for Between Group Comparisons in fMRI Experiments. Statistical Methodology 6(2) 133–146. https://doi.org/10.1016/j.stamet.2008.05.003. MR2649612
Blitzstein, J. K. and Hwang, J. (2019) Introduction to Probability. Chapman and Hall/CRC. https://doi.org/10.1201/9780429428357. MR3929729
Borgman, C. L. (2019). The Lives and After Lives of Data. Harvard Data Science Review 1(1). https://doi.org/10.1162/99608f92.9a36bdb6.
Bush, R., Dutton, A., Evans, M., Loft, R. and Schmidt, G. A. (2020). Perspectives on Data Reproducibility and Replicability in Paleoclimate and Climate Science. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.00cd8f85.
Camerer, C. F. and Loewenstein, G. (2004). BehAvioral Economics: Past, Present, Future. In Advances in Behavioral Economics. (Chapter One). https://doi.org/10.1515/9781400829118.
Chaudhuri, S., Lo, A. W., Xiao, D. and Xu, Q. (2020). Bayesian Adaptive Clinical Trials for Anti-Infective Therapeutics During Epidemic Outbreaks. Harvard Data Science Review Special Issue 1. https://doi.org/10.1162/99608f92.7656c213.
Copas, J. and Eguchi, S. (2005). Local Model Uncertainty and Incomplete-Data Bias (with Discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(4) 459–513. https://doi.org/10.1111/j.1467-9868.2005.00512.x. MR2168201
Cox, D. R. (2006) Principles of Statistical Inference. Cambridge University Press. https://doi.org/10.1017/CBO9780511813559. MR2278763
Cox, D. R. and Donnelly, C. A. (2011) Principles of Applied Statistics. Cambridge University Press. https://doi.org/10.1017/CBO9781139005036. MR2817147
Fayyad, U. and Hamutcu, H. (2020). Toward Foundations for Data Science and Analytics: A Knowledge Framework for Professional Standards. Harvard Data Science Review 2(2). https://doi.org/10.1162/99608f92.1a99e67a.
Fineberg, H., Stodden, V. and Meng, X.-L. (2020). Highlights of the US National Academies Report on “Reproducibility and Replicability in Science”. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.cb310198.
Franklin, C. and Bargagliotti, A. (2020). Introducing GAISE II: A Guideline for Precollege Statistics and Data Science Education. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.246107bb.
Goeva, A., Stoudt, S. and Trisovic, A. (2020). Toward Reproducible and Extensible Research: From Values to Action. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.1cc3d72a.
Gong, R. and Meng, X.-L. (2021). Judicious Judgment Meets Unsettling Updating: Dilation, Sure Loss and Simpson’s Paradox (with Discussions). Statistical Science 36(2) 169–214. https://doi.org/10.1214/19-sts765. MR4255191
Haas, L., Hero, A. and Lue, R. A. (2019). Highlights of the National Academies Report on “Undergraduate Data Science: Opportunities and Options”. Harvard Data Science Review 1(1). https://doi.org/10.1162/99608f92.38f16b68.
Hacking, I. (2016) Logic of Statistical Inference. Cambridge University Press. MR0391307
Hawes, M. B. (2020). Implementing Differential Privacy: Seven Lessons From the 2020 United States Census. Harvard Data Science Review 2(2). https://doi.org/10.1162/99608f92.353c6f99.
Heitjan, D. F. and Rubin, D. B. (1991). Ignorability and Coarse Data. The Annals of Statistics 19(4) 2244–2253. https://doi.org/10.1214/aos/1176348396. MR1135174
Howell, E. L. (2020). Science Communication in the Context of Reproducibility and Replicability: How Nonscientists Navigate Scientific Uncertainty. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.f2823096.
Ioannidis, J. P. (2005). Why Most Published Research Findings Are False. PLoS Medicine 2(8) 124. https://doi.org/10.1080/09332480.2005.10722754. MR2216666
Isakov, M. and Kuriwaki, S. (2020). Towards Principled Unskewing: Viewing 2020 Election Polls Through a Corrective Lens from 2016. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.86a46f38.
Junk, T. R. and Lyons, L. (2020). Reproducibility and Replication of Experimental Particle Physics Results. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.250f995b.
Katsevich, E. and Ramdas, A. (2018). Towards “Simultaneous Selective Inference”: Post-Hoc Bounds on the False Discovery Proportion. arXiv preprint arXiv:1803.06790. https://doi.org/10.1214/19-AOS1938. MR4185816
Koch, L. (2021). Robust Test Statistics for Data Sets With Missing Correlation Information. Physical Review D 103(11) 113008. https://doi.org/10.1103/physrevd.103.113008. MR4284964
Kolaczyk, E., Wright, H. and Yajima, M. (2021). Statistics Practicum: Placing ‘Practice’ at the Center of Data Science Education (with Discussions). Harvard Data Science Review 3(1). https://doi.org/10.1162/99608f92.2d65fc70.
Leonelli, S. (2019). Data Governance is Key to Interpretation: Reconceptualizing Data in Data Science. Harvard Data Science Review 1(1). https://doi.org/10.1162/99608f92.17405bb6.
Lin, X. (2020). Learning Lessons on Reproducibility and Replicability in Large Scale Genome-Wide Association Studies. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.33703976.
Little, R. J. and Rubin, D. B. (2019) Statistical Analysis with Missing Data 793. John Wiley & Sons. https://doi.org/10.1002/9781119013563. MR1925014
Martinez, W. and LaLonde, D. (2020). Data Science for Everyone Starts in Kindergarten: Strategies and Initiatives from the American Statistical Association. Harvard Data Science Review. https://doi.org/10.1162/99608f92.7a9f2f4d.
McNutt, M. (2020). Self-Correction by Design. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.32432837.
Meng, X.-L. (1994). Posterior Predictive p-Values. The Annals of Statistics 22(3) 1142–1160. https://doi.org/10.1214/aos/1176325622. MR1311969
Meng, X.-L. (2009). Desired and Feared — What do We do Now and Over the Next 50 Years? The American Statistician 63(3) 202–210. https://doi.org/10.1198/tast.2009.09045. MR2750343
Meng, X.-L. (2018). Statistical Paradises and Paradoxes in Big Data (I): Law of Large Populations, Big Data Paradox, and the 2016 US Presidential Election. Annals of Applied Statistics 12(2) 685–726. https://doi.org/10.1214/18-AOAS1161SF. MR3834282
Meng, X.-L. (2019). Data Science: An Artificial Ecosystem. Harvard Data Science Review 1(1). https://doi.org/10.1162/99608f92.ba20f892.
Meng, X.-L. (2020). Information and Uncertainty: Two Sides of the Same Coin. Harvard Data Science Review 2(2). https://doi.org/10.1162/99608f92.c108a25b.
Meng, X.-L. (2020). Reproducibility, Replicability, and Reliability. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.dbfce7f9.
Oberski, D. L. and Kreuter, F. (2020). Differential Privacy and Social Science: An Urgent Puzzle. Harvard Data Science Review 2(1). https://doi.org/10.1162/99608f92.63a22079.
Parashar, M. (2020). Leveraging the National Academies’ Reproducibility and Replication in Science Report to Advance Reproducibility in Publishing. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.b69d3134.
Perez, L. R., Spangler, D. A. and Franklin, C. (2021). Engaging Young Learners With Data: Highlights From GAISE II, Level A. Harvard Data Science Review. https://doi.org/10.1162/99608f92.be3c2ec8.
Plant, A. L. and Hanisch, R. J. (2020). Reproducibility in Science: A Metrology Perspective. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.eb6ddee4.
Reid, N. and Cox, D. R. (2015). On Some Principles of Statistical Inference. International Statistical Review 83(2) 293–308. https://doi.org/10.1111/insr.12067. MR3377082
Romano, Y., Barber, R. F., Sabatti, C. and Candés, E. (2020). With Malice Toward None: Assessing Uncertainty via Equalized Coverage. Harvard Data Science Review 2(2). https://doi.org/10.1162/99608f92.03f00592.
Roy, A., Bhaumik, D. K., Aryal, S. and Gibbons, R. D. (2007). Sample Size Determination for Hierarchical Longitudinal Designs with Differential Attrition Rates. Biometrics 63(3) 699–707. https://doi.org/10.1111/j.1541-0420.2007.00769.x. MR2395706
Rubin, D. B. (1976). Inference and Missing data. Biometrika 63(3) 581–592. https://doi.org/10.1093/biomet/63.3.581. MR0455196
Rubin, D. B. (1987) Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons. https://doi.org/10.1002/9780470316696. MR0899519
Rudin, C. and Radin, J. (2019). Why Are We Using Black Box Models in AI When We Don’t Need To? A Lesson From An Explainable AI Competition. Harvard Data Science Review 1(2). https://doi.org/10.1162/99608f92.5a8a3a3d.
Rudin, C., Wang, C. and Coker, B. (2020). The Age of Secrecy and Unfairness in Recidivism Prediction. Harvard Data Science Review 2(1). https://doi.org/10.1162/99608f92.6ed64b30.
Samson, A. (2016). The Behavioral Economics Guide 2016 (with an Introduction by Gerd Gigerenzer). Behavioral Science Solutions Ltd. http://eprints.lse.ac.uk/66934/7/Samson_Behavioural%20economics%20guide_%202016_author.pdf.
Sanders, N. (2020). Can the Coronavirus Prompt a Global Outbreak of “Distributional Thinking” in Organizations? Harvard Data Science Review 2(2). https://doi.org/10.1162/99608f92.a577296b.
Schuemie, M. J., Cepeda, M. S., Suchard, M. A., Yang, J., Tian, Y., Schuler, A., Ryan, P. B., Madigan, D. and Hripcsak, G. (2020). How Confident Are We About Observational Findings in Healthcare: A Benchmark Study. Harvard Data Science Review 2(1). https://doi.org/10.1162/99608f92.147cc28e.
Spiegelhalter, D. (2020). Should We Trust Algorithms? Harvard Data Science Review 2(1). https://doi.org/10.1162/99608f92.cb91a35a.
Stodden, V. (2020). Theme Editor’s Introduction to Reproducibility and Replicability in Science. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.c46a02d4.
Tukey, J. W. (1962). The Future of Data Analysis. The Annals of Mathematical Statistics 33(1) 1–67. https://doi.org/10.1214/aoms/1177704711. MR0133937
Vilhuber, L. (2020). Reproducibility and Replicability in Economics. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.4f6b9e67.
Wasserstein, R. L. and Lazar, N. A. (2016). The ASA’s Statement on p-Values: Context, Process, and Purpose. The American Statistician 70(2) 129–133. https://doi.org/10.1080/00031305.2016.1154108. MR3511040
Willis, C. and Stodden, V. (2020). Trust but Verify: How to Leverage Policies, Workflows, and Infrastructure to Ensure Computational Reproducibility in Publication. Harvard Data Science Review 2(4). https://doi.org/10.1162/99608f92.25982dcf.
Wing, J. M. (2019). The Data Life Cycle. Harvard Data Science Review 1(1). https://doi.org/10.1162/99608f92.e26845b4.
Xie, M-G. and Singh, K. (2013). Confidence Distribution, the Frequentist Distribution Estimator of a Parameter: A Review. International Statistical Review 81(1) 3–39. https://doi.org/10.1111/insr.12000. MR3047496
Xie, X. and Meng, X.-L. (2017). Dissecting Multiple Imputation from a Multi-phase Inference Perspective: What Happens When God’s, Imputer’s and Analyst’s Models Are Uncongenial? (With Discussion). Statistica Sinica 27 1485–1594. MR3701490