Validation of Machine Learning Prediction Models
Volume 1, Issue 3 (2023), pp. 394–414
Pub. online: 7 November 2023
Type: Machine Learning And Data Mining
Open Access
Accepted
9 June 2023
9 June 2023
Published
7 November 2023
7 November 2023
Abstract
We address the estimation of the Integrated Squared Error (ISE) of a predictor $\eta (x)$ of an unknown function f learned using data acquired on a given design ${\mathbf{X}_{n}}$. We consider ISE estimators that are weighted averages of the residuals of the predictor $\eta (x)$ on a set of selected points ${\mathbf{Z}_{m}}$. We show that, under a stochastic model for f, minimisation of the mean squared error of these ISE estimators is equivalent to minimisation of a Maximum Mean Discrepancy (MMD) for a non-stationary kernel that is adapted to the geometry of ${\mathbf{X}_{n}}$. Sequential Bayesian quadrature then yields sequences of nested validation designs that minimise, at each step of the construction, the relevant MMD. The optimal ISE estimate can be written in terms of the integral of a linear reconstruction, for the assumed model, of the square of the interpolator residuals over the domain of f. We present an extensive set of numerical experiments which demonstrate the good performance and robustness of the proposed solution. Moreover, we show that the validation designs obtained are space-filling continuations of ${\mathbf{X}_{n}}$, and that correct weighting of the observed interpolator residuals is more important than the precise configuration ${\mathbf{Z}_{m}}$ of the points at which they are observed.
References
Piston model. https://www.sfu.ca/~ssurjano/piston.html. Accessed: 2023-03-17.
Bachoc, F. Cross validation and maximum likelihood estimations of hyper-parameters of Gaussian processes with model misspecification. Computational Statistics and Data Analysis 66. 55–69 (2013). https://doi.org/10.1016/j.csda.2013.03.016. MR3064023
Chevalier, C., Bect, J., Ginsbourger, D., Picheny, V., Richet, Y. and Vazquez, E. Fast kriging-based stepwise uncertainty reduction with application to the identification of an excursion set. Technometrics 56. 455–465 (2014). https://doi.org/10.1080/00401706.2013.860918. MR3290615
Dubrule, O. Cross validation of kriging in a unique neighborhood. Journal of the International Association for Mathematical Geology 15(6) 687–699 (1983). https://doi.org/10.1007/BF01033232. MR0720633
Fang, K-T., Li, R. and Sudjianto, A. Design and Modeling for Computer Experiments. Chapman & Hall/CRC (2006). MR2510302
Fedorov, V. V. Theory of Optimal Experiments. Academic Press, New York (1972). MR0403103
Fekhari, E., Iooss, B., Muré, J., Pronzato, L. and Rendas, J. Model predictivity assessment: incremental test-set selection and accuracy evaluation. (N. Salvati, C. Perna, S. Marchetti and R. Chambers, eds.) In Studies in Theoretical and Applied Statistics, SIS 2021, Pisa, Italy, June 21–25, Springer, (2022). Preprint hal-03523695. https://doi.org/10.1007/978-3-031-16609-9_20. MR4606592
Gneiting, T. and Schlather, M. Stochastic models that separate fractal dimension and the Hurst effect. SIAM Review 46(2) 269–282 (2004). https://doi.org/10.1137/S0036144501394387. MR2114455
Hawkins, R., Paterson, C., Picardi, C., Jia, Y., Calinescu, R. and Habli, I. Guidance on the assurance of machine learning in autonomous systems (AMLAS). Assuring Autonomy International Programme (AAIP), University of York (2021). MR4326507
Iooss, B. Sample selection from a given dataset to validate machine learning models (2021), arXiv preprint arXiv:2104.14401.
Joseph, V. R. Space-filling designs for computer experiments: A review. Quality Engineering 28(1) 28–35 (2016). MR3528792
Karvonen, T., Kanagawa, M. and Särkkä, S. On the positivity and magnitudes of Bayesian quadrature weights. Statistics and Computing 29(6) 1317–1333 (2019). https://doi.org/10.1007/s11222-019-09901-0. MR4026673
Karvonen, T., Wynne, G., Tronarp, F., Oates, C. and Särkkä, S. Maximum likelihood estimation and uncertainty quantification for Gaussian process approximation of deterministic functions. SIAM/ASA Journal on Uncertainty Quantification 8(3) 926–958 (2020). https://doi.org/10.1137/20M1315968. MR4130422
Kleijnen, J. P. C. and Sargent, R. G. A methodology for fitting and validating metamodels in simulation. European Journal of Operational Research 120. 14–29 (2000). https://doi.org/10.1016/j.ejor.2016.06.041. MR3543078
Lorenzo, G., Zanocco, P., Giménez, M., Marquès, M., Iooss, B., Bolado-Lavin, R., Pierro, F., Galassi, G., D’Auria, F. and Burgazzi, L. Assessment of an isolation condenser of an integral reactor in view of uncertainties in engineering parameters. Science and Technology of Nuclear Installations 2011, Article ID 827354 (2011). https://doi.org/10.1155/2011/827354
Mak, S. and Joseph, V. R. Support points. The Annals of Statistics 46(6A) 2562–2592 (2018). https://doi.org/10.1214/17-AOS1629. MR3851748
Moon, H. Design and analysis of computer experiments for screening input variables. PhD thesis, Ohio State University, USA (2010). MR2794741
O’Hagan, A. Bayes–Hermite quadrature. Journal of Statistical Planning and Inference 29(3) 245–260 (1991). https://doi.org/10.1016/0378-3758(91)90002-V. MR1144171
Pronzato, L. Minimax and maximin space-filling designs: some properties and methods for construction. Journal de la Société Française de Statistique 158(1) 7–36 (2017). MR3637639
Pronzato, L. Performance analysis of greedy algorithms for minimising a maximum mean discrepancy. Statistics and Computing 33. 14 (2023). Preprint hal-03114891. arXiv:2101.07564. https://doi.org/10.1007/s11222-022-10184-1. MR4519641
Pronzato, L. and Müller, W. G. Design of computer experiments: space filling and beyond. Statistics and Computing 22(3) 681–701 (2012). https://doi.org/10.1007/s11222-011-9242-3. MR2909615
Pronzato, L. and Rendas, M. -J. Validation design I: construction of validation designs via kernel herding (2021). Preprint hal-03474805. arXiv:2112.05583.
Pronzato, L. and Zhigljavsky, A. A. Bayesian quadrature, energy minimization and space-filling design. SIAM/ASA J. Uncertainty Quantification 8(3) 959–1011 (2020). https://doi.org/10.1137/18M1210332. MR4133484
Sacks, J., Welch, W. J., Mitchell, T. J. and Wynn, H. P. Design and analysis of computer experiments. Statistical Science 4(4) 409–435 (1989). MR1041765
Santner, T., Williams, B. and Notz, W. The Design and Analysis of Computer Experiments. Springer (2003). https://doi.org/10.1007/978-1-4757-3799-8. MR2160708
Sejdinovic, S., Sriperumbudur, B., Gretton, A. and Fukumizu, K. Equivalence of distance-based and RKHS-based statistics in hypothesis testing. The Annals of Statistics 41(5) 2263–2291 (2013). https://doi.org/10.1214/13-AOS1140. MR3127866
Sriperumbudur, B. K., Gretton, A., Fukumizu, K., Schölkopf, B. and Lanckriet, G. R. G. Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research 11. 1517–1561 (2010). MR2645460
Stein, M. L. Interpolation of Spatial Data. Some Theory for Kriging. Springer, Heidelberg (1999). https://doi.org/10.1007/978-1-4612-1494-6. MR1697409
Szabó, Z. and Sriperumbudur, B. Characteristic and universal tensor product kernels. Journal of Machine Learning Research 18. 1–29 (2018). MR3845532
Wynn, H. P. The sequential generation of D-optimum experimental designs. Annals of Math. Stat. 41. 1655–1664 (1970). https://doi.org/10.1214/aoms/1177696809. MR0267704