Evaluating Designs for Hyperparameter Tuning in Deep Neural Networks
Volume 1, Issue 3 (2023), pp. 334–341
Pub. online: 24 February 2023
Type: Machine Learning And Data Mining
Open Access
Accepted
14 February 2023
14 February 2023
Published
24 February 2023
24 February 2023
Abstract
The performance of a learning technique relies heavily on hyperparameter settings. It calls for hyperparameter tuning for a deep learning technique, which may be too computationally expensive for sophisticated learning techniques. As such, expeditiously exploring the relationship between hyperparameters and the performance of a learning technique controlled by these hyperparameters is desired, and thus it entails the consideration of design strategies to collect informative data efficiently to do so. Various designs can be considered for this purpose. The question as to which design to use then naturally arises. In this paper, we examine the use of different types of designs in efficiently collecting informative data to study the surface of test accuracy, a measure of the performance of a learning technique, over hyperparameters. Under the settings we considered, we find that the strong orthogonal array outperforms all other comparable designs.
Supplementary material
Supplementary MaterialThe supplementary material includes all design matrices in terms of the natural units we used.
References
Ba, S., Myers, W. R. and Brenneman, W. A. (2015). Optimal sliced Latin hypercube designs. Technometrics 57 479–487. https://doi.org/10.1080/00401706.2014.957867. MR3425485
Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research 13 281–305. MR2913701
Bingham, D., Sitter, R. R. and Tang, B. (2009). Orthogonal and nearly orthogonal designs for computer experiments. Biometrika 96 51–65. https://doi.org/10.1093/biomet/asn057. MR2482134
Carnell, R. (2022). lhs: Latin hypercube samples. R package version 1.1.5. https://cran.r-project.org/web/packages/lhs/index.html.
Cressie, N. (2015) Statistics for spatial data. John Wiley & Sons. MR3559472
Fang, K. T., Li, R. and Sudjianto, A. (2006) Design and modeling for computer experiments. CRC Press. MR2510302
Fang, K. T., Lin, D. K., Winker, P. and Zhang, Y. (2000). Uniform design: theory and application. Technometrics 42 237–248. https://doi.org/10.2307/1271079. MR1801031
Fang, K. T., Liu, M. Q., Qin, H. and Zhou, Y. (2018) Theory and application of uniform experimental designs. Springer. https://doi.org/10.1007/978-981-13-2041-5. MR3837569
Ginsbourger, D., Dupuy, D., Badea, A., Carraro, L. and Roustant, O. (2009). A note on the choice and the estimation of kriging models for the analysis of deterministic computer experiments. Applied Stochastic Models in Business and Industry 25 115–131. https://doi.org/10.1002/asmb.741. MR2510851
Groemping, U. and Carnell, R. (2022). SOAs: creation of stratum orthogonal arrays. R package version 1.3. https://cran.r-project.org/web/packages/SOAs/index.html.
Groemping, U., Amarov, B. and Xu, H. (2022). DoE.base: full factorials, orthogonal arrays and base utilities for DoE packages. R package version 1.2-1. https://cran.r-project.org/web/packages/DoE.base/index.html.
He, Y. and Tang, B. (2013). Strong orthogonal arrays and associated Latin hypercubes for computer experiments. Biometrika 100 254–260. https://doi.org/10.1093/biomet/ass065. MR3034340
He, Y., Cheng, C. -S. and Tang, B. (2018). Strong orthogonal arrays of strength two plus. The Annals of Statistics 46 457–468. https://doi.org/10.1214/17-AOS1555. MR3782373
Johnson, M. E., Moore, L. M. and Ylvisaker, D. (1990). Minimax and maximin distance designs. Journal of Statistical Planning and Inference 26 131–148. https://doi.org/10.1016/0378-3758(90)90122-B. MR1079258
Joseph, V. R., Gul, E. and Ba, S. (2015). Maximum projection designs for computer experiments. Biometrika 102 371–380. https://doi.org/10.1093/biomet/asv002. MR3371010
Kleijnen, J. P. (2009). Kriging metamodeling in simulation: a review. European Journal of Operational Research 192 707–716. https://doi.org/10.1016/j.ejor.2007.10.013. MR2457613
Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Technical Report, University of Toronto. http://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdf.
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A. and Talwalkar, A. (2017). Hyperband: a novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research 18 6765–6816. MR3827073
Lin, C. D., Mukerjee, R. and Tang, B. (2009). Construction of orthogonal and nearly orthogonal Latin hypercubes. Biometrika 96 243–247. https://doi.org/10.1093/biomet/asn064. MR2482150
Liu, H. and Liu, M. Q. (2015). Column-orthogonal strong orthogonal arrays and sliced strong orthogonal arrays. Statistica Sinica 1713–1734. MR3409089
McCulloch, W. S. and Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics 5 115–133. https://doi.org/10.1007/bf02478259. MR0010388
McKay, M. D., Beckman, R. J. and Conover, W. J. (1979). A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21 239–245. https://doi.org/10.2307/1268522. MR0533252
Mockus, J., Tiesis, V. and Zilinskas, A. (1978). The application of Bayesian methods for seeking the extremum. Towards Global Optimization 2 117–129. MR0471305
Sacks, J., Welch, W. J., Mitchell, T. J. and Wynn, H. P. (1989). Design and analysis of computer experiments. Statistical Science 4 409–423. MR1041765
Santner, T. J., Williams, B. J. and Notz, W. I. (2003) The design and analysis of computer experiments. Springer. https://doi.org/10.1007/978-1-4757-3799-8. MR2160708
Shi, C. and Tang, B. (2020). Construction results for strong orthogonal arrays of strength three. Bernoulli 26 418–431. https://doi.org/10.3150/19-BEJ1130. MR4036039
Sun, C. and Tang, B. (2021). Uniform projection designs and strong orthogonal arrays. Journal of the American Statistical Association 0 1–15. https://doi.org/10.1080/01621459.2021.1935268.
Sun, F., Wang, Y. and Xu, H. (2019). Uniform projection designs. The Annals of Statistics 47 641–661. https://doi.org/10.1214/18-AOS1705. MR3909945
Tian, Y. and Xu, H. (2022). A minimum aberration-type criterion for selecting space-filling designs. Biometrika 109 489–501. https://doi.org/10.1093/biomet/asab021. MR4430970
Wu, C. F. J. and Hamada, M. S. (2009) Experiments: planning, analysis, and optimization. John Wiley & Sons. MR2583259
Xiao, Q., Wang, L. and Xu, H. (2019). Application of Kriging models for a drug combination experiment on lung cancer. Statistics in Medicine 38 236–246. https://doi.org/10.1002/sim.7971. MR3892817
Xu, H., Jaynes, J. and Ding, X. (2014). Combining two-level and three-level orthogonal arrays for factor screening and response surface exploration. Statistica Sinica 24 269–289. MR3183684
Zhang, A., Li, H., Quan, S. and Yang, Z. (2018). UniDOE: uniform design of experiments. R package version 1.0.2. http://rmirror.lau.edu.lb/web/packages/UniDOE/index.html.
Zoph, B. and Le, Q. V. (2016). Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578.