Extreme Value Modeling with Generalized Pareto Distributions for Rounded Data
Pub. online: 5 March 2026
Type: Methodology Article
Open Access
Area: Spatial and Environmental Statistics
Accepted
3 February 2026
3 February 2026
Published
5 March 2026
5 March 2026
Abstract
In extreme value analysis, the impact of rounding in data, a form of quantization, on statistical inferences beyond point estimation has not been comprehensively studied. This paper addresses these challenges by considering rounded data as interval-censored. The maximum likelihood estimators of the model parameters tailored to account for interval censoring are asymptotically unbiased and efficient. Further, we adapt classic goodness-of-fit tests, such as the Anderson-Darling test, for rounded data based on the maximum likelihood estimator. The resulting tests have appropriate sizes and considerable power. One application of such tests is threshold selection for the peak over threshold approach in extreme value analysis. The efficacy of our estimation approach and the goodness-of-fit tests are demonstrated through a simulation study involving data rounded from generalized Pareto distributions. Applying this method to precipitation data from 18 stations in eastern Washington, an area with typically low precipitation and expecting a significant rounding effect, we observe narrower interval estimates of return levels.
Supplementary material
Supplementary MaterialThe Supplementary Material presented additional simulation results confirming MLE-IC’s robustness under extreme rounding conditions and extended power comparisons. It also provided supplementary precipitation data analysis across 18 eastern Washington stations, showing that MLE-IC consistently selected lower thresholds and yielded better model fit and return-level estimates than MLE-N.
References
Anderson, T. W. and Darling, D. A. (1952). Asymptotic Theory of Certain “Goodness of Fit” Criteria Based on Stochastic Processes. The Annals of Mathematical Statistics 23(2) 193–212. https://doi.org/10.1214/aoms/1177729437. MR0050238
Arnold, T. B. and Emerson, J. W. (2013). dgof: Discrete Goodness-of-Fit Tests. R package version 1.2. https://CRAN.R-project.org/package=dgof.
Bader, B., Yan, J. and Zhang, X. (2018). Automated Threshold Selection for Extreme Value Analysis via Ordered Goodness-of-fit Tests with Adjustment for False Discovery Rate. The Annals of Applied Statistics 12(1) 310–329. https://doi.org/10.1214/17-AOAS1092. MR3773395
Bai, Z., Zheng, S., Zhang, B. and Hu, G. (2009). Statistical Analysis for Rounded Data. Journal of Statistical Planning and Inference 139(8) 2526–2542. https://doi.org/10.1016/j.jspi.2008.11.018. MR2523645
Behrens, C. N., Lopes, H. F. and Gamerman, D. (2004). Bayesian Analysis of Extreme Events with Threshold Estimation. Statistical Modelling 4(3) 227–244. https://doi.org/10.1191/1471082X04st075oa. MR2062102
Chernoff, H. and Lehmann, E. (1954). The Use of Maximum Likelihood Estimates in ${\chi ^{2}}$ Tests for Goodness of Fit. The Annals of Mathematical Statistics 25(3) 579–586. https://doi.org/10.1214/aoms/1177728726. MR0065109
Choulakian, V., Lockhart, R. A. and Stephens, M. A. (1994). Cramér-von Mises Statistics for Discrete Distributions. Canadian Journal of Statistics 22(1) 125–137. https://doi.org/10.2307/3315828. MR1271450
Conover, W. J. (1972). A Kolmogorov Goodness-of-fit Test for Discontinuous Distributions. Journal of the American Statistical Association 67(339) 591–596. MR0391375
Cramér, H. (1946). A Contribution to The Theory of Statistical Estimation. Scandinavian Actuarial Journal 1946(1) 85–94. https://doi.org/10.1080/03461238.1946.10419631. MR0017505
Giesbrecht, F. and Kempthorne, O. (1976). Maximum Likelihood Estimation in the Three-parameter Lognormal Distribution. Journal of the Royal Statistical Society: Series B (Methodological) 38(3) 257–264. MR0652563
G’Sell, M. G., Wager, S., Chouldechova, A. and Tibshirani, R. (2016). Sequential Selection Procedures and False Discovery Rate Control. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78(2) 423–444. https://doi.org/10.1111/rssb.12122. MR3454203
Jalbert, J., Farmer, M., Gobeil, G. and Roy, P. (2024). Extremes.jl: Extreme Value Analysis in Julia. Journal of Statistical Software 109(6) 1–35. https://doi.org/10.18637/jss.v109.i06.
Kempthorne, O. (1966). Some Aspects of Experimental Inference. Journal of the American Statistical Association 61(313) 11–34. MR0195211
Pearson, K. (1900). On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is such that it can be Reasonably Supposed to Have Arisen from Random Sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 50(302) 157–175.
Pickands, J. (1975). Statistical Inference Using Extreme Order Statistics. The Annals of Statistics 3(1) 119–131. MR0423667
Ranjbar, S., Cantoni, E., Chavez-Demoulin, V., Marra, G., Radice, R. and Jaton-Ogay, K. (2022). Modelling the Extremes of Seasonal Viruses and Hospital Congestion: The Example of Flu in a Swiss Hospital. Journal of the Royal Statistical Society: Series C (Applied Statistics) 71(4) 884–905. https://doi.org/10.1111/rssc.12559. MR4470824
Rubin, D. B. (2004) Multiple Imputation for Nonresponse in Surveys 81. John Wiley & Sons. MR2117498
SchneeweiSS, H., Komlos, J. and Ahmad, A. S. (2010). Symmetric and Asymmetric Rounding: A Review and Some New Results. AStA Advances in Statistical Analysis 94(3) 247–271. https://doi.org/10.1007/s10182-010-0125-2. MR2733174
Smirnov, N. (1939). On the Estimation of the Discrepancy Between Empirical Curves of Distribution for Two Independent Samples. Bulletin Mathématique de l’Université de Moscou 2 3–14. MR0002062
Smith, R. L. (1985). Maximum Likelihood Estimation in a Class of Nonregular Cases. Biometrika 72(1) 67–90. https://doi.org/10.1093/biomet/72.1.67. MR0790201
Tibshirani, R. J. and Efron, B. (1993) An Introduction to the Bootstrap. Chapman & Hall, New York. https://doi.org/10.1007/978-1-4899-4541-9. MR1270903
Villa, C. (2017). Bayesian Estimation of the Threshold of a Generalised Pareto Distribution for Heavy-tailed Observations. Test 26(1) 95–118. https://doi.org/10.1007/s11749-016-0501-7. MR3613607
Zhang, B., Liu, T. and Bai, Z. (2010). Analysis of Rounded Data from Dependent Sequences. Annals of the Institute of Statistical Mathematics 62(6) 1143–1173. https://doi.org/10.1007/s10463-009-0224-6. MR2729157