Extreme Value Modeling with Generalized Pareto Distributions for Rounded Data

Ma, Sai; Yan, Jun; Zhang, Xuebin

doi:10.51387/26-NEJSDS101

The New England Journal of Statistics in Data Science

Extreme Value Modeling with Generalized Pareto Distributions for Rounded Data

Sai Ma Jun Yan

Xuebin Zhang

https://doi.org/10.51387/26-NEJSDS101

Pub. online: 5 March 2026 Type: Methodology Article

Open Access

Area: Spatial and Environmental Statistics

Accepted
3 February 2026

Published
5 March 2026

Abstract

In extreme value analysis, the impact of rounding in data, a form of quantization, on statistical inferences beyond point estimation has not been comprehensively studied. This paper addresses these challenges by considering rounded data as interval-censored. The maximum likelihood estimators of the model parameters tailored to account for interval censoring are asymptotically unbiased and efficient. Further, we adapt classic goodness-of-fit tests, such as the Anderson-Darling test, for rounded data based on the maximum likelihood estimator. The resulting tests have appropriate sizes and considerable power. One application of such tests is threshold selection for the peak over threshold approach in extreme value analysis. The efficacy of our estimation approach and the goodness-of-fit tests are demonstrated through a simulation study involving data rounded from generalized Pareto distributions. Applying this method to precipitation data from 18 stations in eastern Washington, an area with typically low precipitation and expecting a significant rounding effect, we observe narrower interval estimates of return levels.

Supplementary material

Supplementary Material

The Supplementary Material presented additional simulation results confirming MLE-IC’s robustness under extreme rounding conditions and extended power comparisons. It also provided supplementary precipitation data analysis across 18 eastern Washington stations, showing that MLE-IC consistently selected lower thresholds and yielded better model fit and return-level estimates than MLE-N.

References

[1]

Anderson, T. W. and Darling, D. A. (1952). Asymptotic Theory of Certain “Goodness of Fit” Criteria Based on Stochastic Processes. The Annals of Mathematical Statistics 23(2) 193–212. https://doi.org/10.1214/aoms/1177729437. MR0050238

[2]

Arnold, T. B. and Emerson, J. W. (2011). Nonparametric Goodness-of-fit Tests for Discrete Null Distributions. R Journal 3(2) 34–39.

[3]

Arnold, T. B. and Emerson, J. W. (2013). dgof: Discrete Goodness-of-Fit Tests. R package version 1.2. https://CRAN.R-project.org/package=dgof.

[4]

Bader, B., Yan, J. and Zhang, X. (2018). Automated Threshold Selection for Extreme Value Analysis via Ordered Goodness-of-fit Tests with Adjustment for False Discovery Rate. The Annals of Applied Statistics 12(1) 310–329. https://doi.org/10.1214/17-AOAS1092. MR3773395

[5]

Bai, Z., Zheng, S., Zhang, B. and Hu, G. (2009). Statistical Analysis for Rounded Data. Journal of Statistical Planning and Inference 139(8) 2526–2542. https://doi.org/10.1016/j.jspi.2008.11.018. MR2523645

[6]

Behrens, C. N., Lopes, H. F. and Gamerman, D. (2004). Bayesian Analysis of Extreme Events with Threshold Estimation. Statistical Modelling 4(3) 227–244. https://doi.org/10.1191/1471082X04st075oa. MR2062102

[7]

Chernoff, H. and Lehmann, E. (1954). The Use of Maximum Likelihood Estimates in ${\chi ^{2}}$ Tests for Goodness of Fit. The Annals of Mathematical Statistics 25(3) 579–586. https://doi.org/10.1214/aoms/1177728726. MR0065109

[8]

Choulakian, V., Lockhart, R. A. and Stephens, M. A. (1994). Cramér-von Mises Statistics for Discrete Distributions. Canadian Journal of Statistics 22(1) 125–137. https://doi.org/10.2307/3315828. MR1271450

[9]

Conover, W. J. (1972). A Kolmogorov Goodness-of-fit Test for Discontinuous Distributions. Journal of the American Statistical Association 67(339) 591–596. MR0391375

[10]

Cramér, H. (1946). A Contribution to The Theory of Statistical Estimation. Scandinavian Actuarial Journal 1946(1) 85–94. https://doi.org/10.1080/03461238.1946.10419631. MR0017505

[11]

Cramér, H. (1928). On the Composition of Elementary Errors. Scandinavian Actuarial Journal 1928(1) 13–74.

[12]

Deidda, R. and Puliga, M. (2006). Sensitivity of Goodness-of-fit Statistics to Rainfall Data Rounding off. Physics and Chemistry of the Earth, Parts A/B/C 31(18) 1240–1251.

[13]

Deidda, R. and Puliga, M. (2009). Performances of Some Parameter Estimators of the Generalized Pareto Distribution over Rounded-off Samples. Physics and Chemistry of the Earth, Parts A/B/C 34(10–12) 626–634.

[14]

Fisher, R. A. (1922). On the Interpretation of ${\chi ^{2}}$ from Contingency Tables, and the Calculation of P. Journal of the Royal Statistical Society 85(1) 87–94.

[15]

Giesbrecht, F. and Kempthorne, O. (1976). Maximum Likelihood Estimation in the Three-parameter Lognormal Distribution. Journal of the Royal Statistical Society: Series B (Methodological) 38(3) 257–264. MR0652563

[16]

G’Sell, M. G., Wager, S., Chouldechova, A. and Tibshirani, R. (2016). Sequential Selection Procedures and False Discovery Rate Control. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78(2) 423–444. https://doi.org/10.1111/rssb.12122. MR3454203

[17]

Heitjan, D. F. (1989). Inference from Grouped Continuous Data: A Review. Statistical Science 4 164–179.

[18]

Hitz, A. S., Davis, R. A. and Samorodnitsky, G. (2024). Discrete Extremes. Journal of Data Science 22(4) 524–536.

[19]

Jalbert, J., Farmer, M., Gobeil, G. and Roy, P. (2024). Extremes.jl: Extreme Value Analysis in Julia. Journal of Statistical Software 109(6) 1–35. https://doi.org/10.18637/jss.v109.i06.

[20]

Kempthorne, O. (1966). Some Aspects of Experimental Inference. Journal of the American Statistical Association 61(313) 11–34. MR0195211

[21]

Kolmogorov, A. (1933). Sulla Determinazione Empirica di una Legge di Distribuzione. Giornale dell’Istituto Italiano degli Attuari 4 83–91.

[22]

Menne, M. J., Durre, I., Vose, R. S., Gleason, B. E. and Houston, T. G. (2012). An Overview of the Global Historical Climatology Network-daily Database. Journal of Atmospheric and Oceanic Technology 29(7) 897–910.

[23]

Mogensen, P. and Riseth, A. (2018). Optim: A Mathematical Optimization Package For Julia. Journal of Open Source Software 3(24).

[24]

Pasari, Z. and Cindri, K. (2019). Generalised Pareto Distribution: Impact of Rounding on Parameter Estimation. Theoretical and Applied Climatology 136(1) 417–427.

[25]

Pearson, K. (1900). On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is such that it can be Reasonably Supposed to Have Arisen from Random Sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 50(302) 157–175.

[26]

Pickands, J. (1975). Statistical Inference Using Extreme Order Statistics. The Annals of Statistics 3(1) 119–131. MR0423667

[27]

Ranjbar, S., Cantoni, E., Chavez-Demoulin, V., Marra, G., Radice, R. and Jaton-Ogay, K. (2022). Modelling the Extremes of Seasonal Viruses and Hospital Congestion: The Example of Flu in a Swiss Hospital. Journal of the Royal Statistical Society: Series C (Applied Statistics) 71(4) 884–905. https://doi.org/10.1111/rssc.12559. MR4470824

[28]

Rubin, D. B. (2004) Multiple Imputation for Nonresponse in Surveys 81. John Wiley & Sons. MR2117498

[29]

SchneeweiSS, H., Komlos, J. and Ahmad, A. S. (2010). Symmetric and Asymmetric Rounding: A Review and Some New Results. AStA Advances in Statistical Analysis 94(3) 247–271. https://doi.org/10.1007/s10182-010-0125-2. MR2733174

[30]

Smirnov, N. (1939). On the Estimation of the Discrepancy Between Empirical Curves of Distribution for Two Independent Samples. Bulletin Mathématique de l’Université de Moscou 2 3–14. MR0002062

[31]

Smith, R. L. (1985). Maximum Likelihood Estimation in a Class of Nonregular Cases. Biometrika 72(1) 67–90. https://doi.org/10.1093/biomet/72.1.67. MR0790201

[32]

Tibshirani, R. J. and Efron, B. (1993) An Introduction to the Bootstrap. Chapman & Hall, New York. https://doi.org/10.1007/978-1-4899-4541-9. MR1270903

[33]

Vardeman, S. B. and Lee, C. q. q. S. (2005). Likelihood-based Statistical Estimation from Quantized Data. IEEE Transactions on Instrumentation and Measurement 54(1) 409–414.

[34]

Villa, C. (2017). Bayesian Estimation of the Threshold of a Generalised Pareto Distribution for Heavy-tailed Observations. Test 26(1) 95–118. https://doi.org/10.1007/s11749-016-0501-7. MR3613607

[35]

von Mises, R. (1931) Wahrscheinlichkeitsrechnung und ihre Anwendung in der Statistik und Theoretischen Physik. Deuticke, Leipzig und Wien.

[36]

Zhang, B., Liu, T. and Bai, Z. (2010). Analysis of Rounded Data from Dependent Sequences. Annals of the Institute of Statistical Mathematics 62(6) 1143–1173. https://doi.org/10.1007/s10463-009-0224-6. MR2729157

Full article

Open access article under the CC BY license.

Keywords

Discretized continuous distribution Interval-censored Quantized data Threshold selection

Funding

J. Yan’s research was partially supported by the NSF grant DMS 1521730.

Metrics

since December 2021

145

Article info
views

Full article
views

101

PDF
downloads

XML
downloads

RSS

Authors

Abstract

Supplementary material

References

Export citation

Copy and paste formatted citation

Download citation in file