In extreme value analysis, the impact of rounding in data, a form of quantization, on statistical inferences beyond point estimation has not been comprehensively studied. This paper addresses these challenges by considering rounded data as interval-censored. The maximum likelihood estimators of the model parameters tailored to account for interval censoring are asymptotically unbiased and efficient. Further, we adapt classic goodness-of-fit tests, such as the Anderson-Darling test, for rounded data based on the maximum likelihood estimator. The resulting tests have appropriate sizes and considerable power. One application of such tests is threshold selection for the peak over threshold approach in extreme value analysis. The efficacy of our estimation approach and the goodness-of-fit tests are demonstrated through a simulation study involving data rounded from generalized Pareto distributions. Applying this method to precipitation data from 18 stations in eastern Washington, an area with typically low precipitation and expecting a significant rounding effect, we observe narrower interval estimates of return levels.
The evolving focus in statistics and data science education highlights the growing importance of computing. This paper presents the Data Jamboree, a live event that combines computational methods with traditional statistical techniques to address real-world data science problems. Participants, ranging from novices to experienced users, followed workshop leaders in using open-source tools like Julia, Python, and R to perform tasks such as data cleaning, manipulation, and predictive modeling. The Jamboree showcased the educational benefits of working with open data, providing participants with practical, hands-on experience. We compared the tools in terms of efficiency, flexibility, and statistical power, with Julia excelling in performance, Python in versatility, and R in statistical analysis and visualization. The paper concludes with recommendations for designing similar events to encourage collaborative learning and critical thinking in data science.
Growth curve analysis (GCA) has a wide range of applications in various fields where growth trajectories need to be modeled. Heteroscedasticity is often present in the error term, which can not be handled with sufficient flexibility by standard linear fixed or mixed-effects models. One situation that has been addressed is where the error variance is characterized by a linear predictor with certain covariates. A frequently encountered scenario in GCA, however, is one in which the variance is a smooth function of the mean with known shape restrictions. A naive application of standard linear mixed-effects models would underestimate the variance of the fixed effects estimators and, consequently, the uncertainty of the estimated growth curve. We propose to model the variance of the response variable as a shape-restricted (increasing/decreasing; convex/concave) function of the marginal or conditional mean using shape-restricted splines. A simple iteratively reweighted fitting algorithm that takes advantage of existing software for linear mixed-effects models is developed. For inference, a parametric bootstrap procedure is recommended. Our simulation study shows that the proposed method gives satisfactory inference with moderate sample sizes. The utility of the method is demonstrated using two real-world applications.