Evaluating Designs for Hyperparameter Tuning in Deep Neural Networks

Shi, Chenlu; Chiu, Ashley Kathleen; Xu, Hongquan

doi:10.51387/23-NEJSDS26

The New England Journal of Statistics in Data Science

Evaluating Designs for Hyperparameter Tuning in Deep Neural Networks

Volume 1, Issue 3 (2023), pp. 334–341

Chenlu Shi Ashley Kathleen Chiu Hongquan Xu

https://doi.org/10.51387/23-NEJSDS26

Pub. online: 24 February 2023 Type: Methodology Article

Open Access

Area: Machine Learning and Data Mining

Accepted
14 February 2023

Published
24 February 2023

Abstract

The performance of a learning technique relies heavily on hyperparameter settings. It calls for hyperparameter tuning for a deep learning technique, which may be too computationally expensive for sophisticated learning techniques. As such, expeditiously exploring the relationship between hyperparameters and the performance of a learning technique controlled by these hyperparameters is desired, and thus it entails the consideration of design strategies to collect informative data efficiently to do so. Various designs can be considered for this purpose. The question as to which design to use then naturally arises. In this paper, we examine the use of different types of designs in efficiently collecting informative data to study the surface of test accuracy, a measure of the performance of a learning technique, over hyperparameters. Under the settings we considered, we find that the strong orthogonal array outperforms all other comparable designs.

Supplementary material

Supplementary Material

The supplementary material includes all design matrices in terms of the natural units we used.

References

[1]

Ba, S., Myers, W. R. and Brenneman, W. A. (2015). Optimal sliced Latin hypercube designs. Technometrics 57 479–487. https://doi.org/10.1080/00401706.2014.957867. MR3425485

[2]

Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in Machine Learning 2 1–127.

[3]

Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research 13 281–305. MR2913701

[4]

Bergstra, J., Bardenet, R., Bengio, Y. and Kégl, B. (2011). Algorithms for hyper-parameter optimization. Advances in Neural Information Processing Systems 24 2546–2554.

[5]

Bingham, D., Sitter, R. R. and Tang, B. (2009). Orthogonal and nearly orthogonal designs for computer experiments. Biometrika 96 51–65. https://doi.org/10.1093/biomet/asn057. MR2482134

[6]

Carnell, R. (2022). lhs: Latin hypercube samples. R package version 1.1.5. https://cran.r-project.org/web/packages/lhs/index.html.

[7]

Cheng, C. S. (2014) Theory of factorial design: single- and multi-stratum experiments. CRC Press.

[8]

Cressie, N. (2015) Statistics for spatial data. John Wiley & Sons. MR3559472

[9]

Falkner, S., Klein, A. and Hutter, F. (2018). BOHB: robust and efficient hyperparameter optimization at scale. In International Conference on Machine Learning 80 1437–1446. PMLR.

[10]

Fang, K. T., Li, R. and Sudjianto, A. (2006) Design and modeling for computer experiments. CRC Press. MR2510302

[11]

Fang, K. T., Lin, D. K., Winker, P. and Zhang, Y. (2000). Uniform design: theory and application. Technometrics 42 237–248. https://doi.org/10.2307/1271079. MR1801031

[12]

Fang, K. T., Liu, M. Q., Qin, H. and Zhou, Y. (2018) Theory and application of uniform experimental designs. Springer. https://doi.org/10.1007/978-981-13-2041-5. MR3837569

[13]

Feurer, M. and Hutter, F. (2019). Hyperparameter optimization. In Automated Machine Learning 3–33 Springer.

[14]

Ginsbourger, D., Dupuy, D., Badea, A., Carraro, L. and Roustant, O. (2009). A note on the choice and the estimation of kriging models for the analysis of deterministic computer experiments. Applied Stochastic Models in Business and Industry 25 115–131. https://doi.org/10.1002/asmb.741. MR2510851

[15]

Groemping, U. and Carnell, R. (2022). SOAs: creation of stratum orthogonal arrays. R package version 1.3. https://cran.r-project.org/web/packages/SOAs/index.html.

[16]

Groemping, U., Amarov, B. and Xu, H. (2022). DoE.base: full factorials, orthogonal arrays and base utilities for DoE packages. R package version 1.2-1. https://cran.r-project.org/web/packages/DoE.base/index.html.

[17]

He, Y. and Tang, B. (2013). Strong orthogonal arrays and associated Latin hypercubes for computer experiments. Biometrika 100 254–260. https://doi.org/10.1093/biomet/ass065. MR3034340

[18]

He, Y., Cheng, C. -S. and Tang, B. (2018). Strong orthogonal arrays of strength two plus. The Annals of Statistics 46 457–468. https://doi.org/10.1214/17-AOS1555. MR3782373

[19]

Hutter, F., Hoos, H. H. and Leyton-Brown, K. (2011). Sequential model-based optimization for general algorithm configuration. In International Conference on Learning and Intelligent Optimization 507–523. Springer.

[20]

Johnson, M. E., Moore, L. M. and Ylvisaker, D. (1990). Minimax and maximin distance designs. Journal of Statistical Planning and Inference 26 131–148. https://doi.org/10.1016/0378-3758(90)90122-B. MR1079258

[21]

Joseph, V. R., Gul, E. and Ba, S. (2015). Maximum projection designs for computer experiments. Biometrika 102 371–380. https://doi.org/10.1093/biomet/asv002. MR3371010

[22]

Kleijnen, J. P. (2009). Kriging metamodeling in simulation: a review. European Journal of Operational Research 192 707–716. https://doi.org/10.1016/j.ejor.2007.10.013. MR2457613

[23]

Krige, D. G. (1951). A statistical approach to some basic mine valuation problems on the Witwatersrand. Journal of the Southern African Institute of Mining and Metallurgy 52 119–139.

[24]

Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Technical Report, University of Toronto. http://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdf.

[25]

Lecun, Y., Bottou, L., Bengio, Y. and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 2278–2324.

[26]

Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A. and Talwalkar, A. (2017). Hyperband: a novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research 18 6765–6816. MR3827073

[27]

Lin, C. D., Mukerjee, R. and Tang, B. (2009). Construction of orthogonal and nearly orthogonal Latin hypercubes. Biometrika 96 243–247. https://doi.org/10.1093/biomet/asn064. MR2482150

[28]

Liu, H. and Liu, M. Q. (2015). Column-orthogonal strong orthogonal arrays and sliced strong orthogonal arrays. Statistica Sinica 1713–1734. MR3409089

[29]

Livingstone, D. J. (2008) Artificial neural networks: methods and applications. Springer.

[30]

Lujan-Moreno, G. A., Howard, P. R., Rojas, O. G. and Montgomery, D. C. (2018). Design of experiments and response surface methodology to tune machine learning hyperparameters, with a random forest case-study. Expert Systems with Applications 109 195–205.

[31]

McCulloch, W. S. and Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics 5 115–133. https://doi.org/10.1007/bf02478259. MR0010388

[32]

McKay, M. D., Beckman, R. J. and Conover, W. J. (1979). A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21 239–245. https://doi.org/10.2307/1268522. MR0533252

[33]

Mee, R. (2009) A comprehensive guide to factorial two-level experimentation. Springer Science & Business Media.

[34]

Mockus, J., Tiesis, V. and Zilinskas, A. (1978). The application of Bayesian methods for seeking the extremum. Towards Global Optimization 2 117–129. MR0471305

[35]

Sacks, J., Welch, W. J., Mitchell, T. J. and Wynn, H. P. (1989). Design and analysis of computer experiments. Statistical Science 4 409–423. MR1041765

[36]

Santner, T. J., Williams, B. J. and Notz, W. I. (2003) The design and analysis of computer experiments. Springer. https://doi.org/10.1007/978-1-4757-3799-8. MR2160708

[37]

Schmidhuber, J. (2015). Deep learning in neural networks: an overview. Neural Networks 61 85–117.

[38]

Shi, C. and Tang, B. (2020). Construction results for strong orthogonal arrays of strength three. Bernoulli 26 418–431. https://doi.org/10.3150/19-BEJ1130. MR4036039

[39]

Snoek, J., Larochelle, H. and Adams, R. P. (2012). Practical bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems 25.

[40]

Sun, C. and Tang, B. (2021). Uniform projection designs and strong orthogonal arrays. Journal of the American Statistical Association 0 1–15. https://doi.org/10.1080/01621459.2021.1935268.

[41]

Sun, F., Wang, Y. and Xu, H. (2019). Uniform projection designs. The Annals of Statistics 47 641–661. https://doi.org/10.1214/18-AOS1705. MR3909945

[42]

Tian, Y. and Xu, H. (2022). A minimum aberration-type criterion for selecting space-filling designs. Biometrika 109 489–501. https://doi.org/10.1093/biomet/asab021. MR4430970

[43]

Van Rijn, J. N. and Hutter, F. (2018). Hyperparameter importance across datasets. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2367–2376.

[44]

Wackernagel, H. (2003) Multivariate geostatistics: an introduction with applications. Springer Science & Business Media.

[45]

Wu, C. F. J. and Hamada, M. S. (2009) Experiments: planning, analysis, and optimization. John Wiley & Sons. MR2583259

[46]

Wu, J., Chen, S. and Liu, X. (2020). Efficient hyperparameter optimization through model-based reinforcement learning. Neurocomputing 409 381–393.

[47]

Xiao, Q., Wang, L. and Xu, H. (2019). Application of Kriging models for a drug combination experiment on lung cancer. Statistics in Medicine 38 236–246. https://doi.org/10.1002/sim.7971. MR3892817

[48]

Xu, H., Jaynes, J. and Ding, X. (2014). Combining two-level and three-level orthogonal arrays for factor screening and response surface exploration. Statistica Sinica 24 269–289. MR3183684

[49]

Zhang, A., Li, H., Quan, S. and Yang, Z. (2018). UniDOE: uniform design of experiments. R package version 1.0.2. http://rmirror.lau.edu.lb/web/packages/UniDOE/index.html.

[50]

Zhang, X., Chen, X., Yao, L., Ge, C. and Dong, M. (2019). Deep neural network hyperparameter optimization with orthogonal array tuning. In International Conference on Neural Information Processing 287–295. Springer.

[51]

Zoph, B. and Le, Q. V. (2016). Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578.

Full article Related articles

Open access article under the CC BY license.

Keywords

Big data analysis Factorial design Kriging model Machine learning MNIST dataset Space-filling design

Metrics

since December 2021

1229

Article info
views

394

Full article
views

541

PDF
downloads

XML
downloads

RSS

Authors

Abstract

Supplementary material

References

Export citation

Copy and paste formatted citation

Download citation in file