Validation of Machine Learning Prediction Models

Pronzato, Luc; Rendas, Maria-João

doi:10.51387/23-NEJSDS50

The New England Journal of Statistics in Data Science

Validation of Machine Learning Prediction Models

Volume 1, Issue 3 (2023), pp. 394–414

Luc Pronzato Maria-João Rendas

https://doi.org/10.51387/23-NEJSDS50

Pub. online: 7 November 2023 Type: Methodology Article

Open Access

Area: Machine Learning and Data Mining

Accepted
9 June 2023

Published
7 November 2023

Abstract

We address the estimation of the Integrated Squared Error (ISE) of a predictor $\eta (x)$ of an unknown function f learned using data acquired on a given design ${\mathbf{X}_{n}}$. We consider ISE estimators that are weighted averages of the residuals of the predictor $\eta (x)$ on a set of selected points ${\mathbf{Z}_{m}}$. We show that, under a stochastic model for f, minimisation of the mean squared error of these ISE estimators is equivalent to minimisation of a Maximum Mean Discrepancy (MMD) for a non-stationary kernel that is adapted to the geometry of ${\mathbf{X}_{n}}$. Sequential Bayesian quadrature then yields sequences of nested validation designs that minimise, at each step of the construction, the relevant MMD. The optimal ISE estimate can be written in terms of the integral of a linear reconstruction, for the assumed model, of the square of the interpolator residuals over the domain of f. We present an extensive set of numerical experiments which demonstrate the good performance and robustness of the proposed solution. Moreover, we show that the validation designs obtained are space-filling continuations of ${\mathbf{X}_{n}}$, and that correct weighting of the observed interpolator residuals is more important than the precise configuration ${\mathbf{Z}_{m}}$ of the points at which they are observed.

References

[1]

Piston model. https://www.sfu.ca/~ssurjano/piston.html. Accessed: 2023-03-17.

[2]

Anand, M., Velu, A. and Whig, P. Prediction of loan behaviour with machine learning models for secure banking. Journal of Computer Science and Engineering (JCSE) 3(1) 1–13 (2022).

[3]

Bach, F., Lacoste-Julien, S. and Obozinski, G. On the equivalence between herding and conditional gradient algorithms. In Proc. 29th Annual International Conference on Machine Learning 1355–1362 (2012).

[4]

Bachoc, F. Cross validation and maximum likelihood estimations of hyper-parameters of Gaussian processes with model misspecification. Computational Statistics and Data Analysis 66. 55–69 (2013). https://doi.org/10.1016/j.csda.2013.03.016. MR3064023

[5]

Borovicka, T., Jirina, M. Jr., Kordik, P. and Jirina, M. Selecting representative data sets. (A. Karahoca, ed.) In Advances in Data Mining, Knowledge Discovery and Applications 43–70. INTECH (2012).

[6]

Chevalier, C., Bect, J., Ginsbourger, D., Picheny, V., Richet, Y. and Vazquez, E. Fast kriging-based stepwise uncertainty reduction with application to the identification of an excursion set. Technometrics 56. 455–465 (2014). https://doi.org/10.1080/00401706.2013.860918. MR3290615

[7]

Demay, C., Iooss, B., Le Gratiet, L. and Marrel, A. Model selection for Gaussian Process regression: an application with highlights on the model variance validation. Quality and Reliability Engineering International Journal 8. 1482–1500 (2021).

[8]

Dubrule, O. Cross validation of kriging in a unique neighborhood. Journal of the International Association for Mathematical Geology 15(6) 687–699 (1983). https://doi.org/10.1007/BF01033232. MR0720633

[9]

ENIQ. Qualification of an AI / ML NDT system – Technical basis. NUGENIA, ENIQ Technical Report (2019).

[10]

Fang, K-T., Li, R. and Sudjianto, A. Design and Modeling for Computer Experiments. Chapman & Hall/CRC (2006). MR2510302

[11]

Fedorov, V. V. Theory of Optimal Experiments. Academic Press, New York (1972). MR0403103

[12]

Fekhari, E., Iooss, B., Muré, J., Pronzato, L. and Rendas, J. Model predictivity assessment: incremental test-set selection and accuracy evaluation. (N. Salvati, C. Perna, S. Marchetti and R. Chambers, eds.) In Studies in Theoretical and Applied Statistics, SIS 2021, Pisa, Italy, June 21–25, Springer, (2022). Preprint hal-03523695. https://doi.org/10.1007/978-3-031-16609-9_20. MR4606592

[13]

Gneiting, T. and Schlather, M. Stochastic models that separate fractal dimension and the Hurst effect. SIAM Review 46(2) 269–282 (2004). https://doi.org/10.1137/S0036144501394387. MR2114455

[14]

Hawkins, R., Paterson, C., Picardi, C., Jia, Y., Calinescu, R. and Habli, I. Guidance on the assurance of machine learning in autonomous systems (AMLAS). Assuring Autonomy International Programme (AAIP), University of York (2021). MR4326507

[15]

Hindman, M. Building better models: Prediction, replication, and machine learning in the social sciences. The Annals of the American Academy of Political and Social Science 659(1) 48–62 (2015).

[16]

Huszár, F. and Duvenaud, D. Optimally-weighted herding is Bayesian quadrature. In Uncertainty in Artificial Intelligence 377–385 (2012).

[17]

Iooss, B., Boussouf, L., Feuillard, V. and Marrel, A. Numerical studies of the metamodel fitting and validation processes. International Journal of Advances in Systems and Measurements 3. 11–21 (2010).

[18]

Iooss, B. Sample selection from a given dataset to validate machine learning models (2021), arXiv preprint arXiv:2104.14401.

[19]

Joseph, V. R. Space-filling designs for computer experiments: A review. Quality Engineering 28(1) 28–35 (2016). MR3528792

[20]

Kanagawa, M., Sriperumbudur, B. K. and Fukumizu, K. Convergence guarantees for kernel-based quadrature rules in misspecified settings. In Advances in Neural Information Processing Systems 3288–3296 (2016).

[21]

Karvonen, T., Kanagawa, M. and Särkkä, S. On the positivity and magnitudes of Bayesian quadrature weights. Statistics and Computing 29(6) 1317–1333 (2019). https://doi.org/10.1007/s11222-019-09901-0. MR4026673

[22]

Karvonen, T., Wynne, G., Tronarp, F., Oates, C. and Särkkä, S. Maximum likelihood estimation and uncertainty quantification for Gaussian process approximation of deterministic functions. SIAM/ASA Journal on Uncertainty Quantification 8(3) 926–958 (2020). https://doi.org/10.1137/20M1315968. MR4130422

[23]

Kleijnen, J. P. C. and Sargent, R. G. A methodology for fitting and validating metamodels in simulation. European Journal of Operational Research 120. 14–29 (2000). https://doi.org/10.1016/j.ejor.2016.06.041. MR3543078

[24]

Kupiec, P. H. On the accuracy of alternative approaches for calibrating bank stress test models. Journal of financial stability 38. 132–146 (2018).

[25]

Lorenzo, G., Zanocco, P., Giménez, M., Marquès, M., Iooss, B., Bolado-Lavin, R., Pierro, F., Galassi, G., D’Auria, F. and Burgazzi, L. Assessment of an isolation condenser of an integral reactor in view of uncertainties in engineering parameters. Science and Technology of Nuclear Installations 2011, Article ID 827354 (2011). https://doi.org/10.1155/2011/827354

[26]

Mak, S. and Joseph, V. R. Support points. The Annals of Statistics 46(6A) 2562–2592 (2018). https://doi.org/10.1214/17-AOS1629. MR3851748

[27]

Marrel, A., Iooss, B. and Chabridon, V. The ICSCREAM methodology: Identification of penalizing configurations in computer experiments using screening and metamodel – Applications in thermal-hydraulics. Nuclear Science and Engineering 196. 301–321 (2022).

[28]

Moon, H. Design and analysis of computer experiments for screening input variables. PhD thesis, Ohio State University, USA (2010). MR2794741

[29]

O’Hagan, A. Bayes–Hermite quadrature. Journal of Statistical Planning and Inference 29(3) 245–260 (1991). https://doi.org/10.1016/0378-3758(91)90002-V. MR1144171

[30]

Parmar, M., Haselbacher, A. and Balachandar, S. Improved drag correlation for spheres and application to shock-tube experiments. Aiaa Journal 48(6) 1273–1276 (2010).

[31]

Petropoulos, A., Siakoulis, V., Stavroulakis, E. and Vlachogiannakis, N. E. Predicting bank insolvencies using machine learning techniques. International Journal of Forecasting 36(3) 1092–1113 (2020).

[32]

Pronzato, L. Minimax and maximin space-filling designs: some properties and methods for construction. Journal de la Société Française de Statistique 158(1) 7–36 (2017). MR3637639

[33]

Pronzato, L. Performance analysis of greedy algorithms for minimising a maximum mean discrepancy. Statistics and Computing 33. 14 (2023). Preprint hal-03114891. arXiv:2101.07564. https://doi.org/10.1007/s11222-022-10184-1. MR4519641

[34]

Pronzato, L. and Müller, W. G. Design of computer experiments: space filling and beyond. Statistics and Computing 22(3) 681–701 (2012). https://doi.org/10.1007/s11222-011-9242-3. MR2909615

[35]

Pronzato, L. and Rendas, M. -J. Validation design I: construction of validation designs via kernel herding (2021). Preprint hal-03474805. arXiv:2112.05583.

[36]

Pronzato, L. and Zhigljavsky, A. A. Bayesian quadrature, energy minimization and space-filling design. SIAM/ASA J. Uncertainty Quantification 8(3) 959–1011 (2020). https://doi.org/10.1137/18M1210332. MR4133484

[37]

Rasmussen, C. E. and Ghahramani, Z. Bayesian Monte Carlo. In Advances in Neural Information Processing Systems 505–512 (2003).

[38]

Sacks, J., Welch, W. J., Mitchell, T. J. and Wynn, H. P. Design and analysis of computer experiments. Statistical Science 4(4) 409–435 (1989). MR1041765

[39]

Santner, T., Williams, B. and Notz, W. The Design and Analysis of Computer Experiments. Springer (2003). https://doi.org/10.1007/978-1-4757-3799-8. MR2160708

[40]

Sejdinovic, S., Sriperumbudur, B., Gretton, A. and Fukumizu, K. Equivalence of distance-based and RKHS-based statistics in hypothesis testing. The Annals of Statistics 41(5) 2263–2291 (2013). https://doi.org/10.1214/13-AOS1140. MR3127866

[41]

Sriperumbudur, B. K., Gretton, A., Fukumizu, K., Schölkopf, B. and Lanckriet, G. R. G. Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research 11. 1517–1561 (2010). MR2645460

[42]

Stein, M. L. Interpolation of Spatial Data. Some Theory for Kriging. Springer, Heidelberg (1999). https://doi.org/10.1007/978-1-4612-1494-6. MR1697409

[43]

Szabó, Z. and Sriperumbudur, B. Characteristic and universal tensor product kernels. Journal of Machine Learning Research 18. 1–29 (2018). MR3845532

[44]

Welling, M. Herding dynamical weights to learn. In Proceedings of the 26th Annual International Conference on Machine Learning 1121–1128 (2009).

[45]

Wynn, H. P. The sequential generation of D-optimum experimental designs. Annals of Math. Stat. 41. 1655–1664 (1970). https://doi.org/10.1214/aoms/1177696809. MR0267704

[46]

Xu, Y. and Goodacre, R. On splitting training and validation set: A comparative study of cross-validation, bootstrap and systematic sampling for estimating the generalization performance of supervised learning. Journal of Analysis and Testing 2. 249–262 (2018).

Full article

Open access article under the CC BY license.

Keywords

Model Validation Bayesian Quadrature Maximum Mean Discrepancy Experimental Design

Funding

This work was partially funded by project ANR INDEX (ANR-18-CE91-0007) https://sdb3.i3s.unice.fr/anrindex/.

Metrics

since December 2021

320

Article info
views

Full article
views

153

PDF
downloads

XML
downloads

RSS

Authors

Abstract

References

Export citation

Copy and paste formatted citation

Download citation in file