<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">NEJSDS</journal-id>
<journal-title-group><journal-title>The New England Journal of Statistics in Data Science</journal-title></journal-title-group>
<issn pub-type="ppub">2693-7166</issn><issn-l>2693-7166</issn-l>
<publisher>
<publisher-name>New England Statistical Society</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">NEJSDS26</article-id>
<article-id pub-id-type="doi">10.51387/23-NEJSDS26</article-id>
<article-categories>
<subj-group subj-group-type="heading"><subject>Methodology Article</subject></subj-group>
<subj-group subj-group-type="area"><subject>Machine Learning and Data Mining</subject></subj-group>
</article-categories>
<title-group>
<article-title>Evaluating Designs for Hyperparameter Tuning in Deep Neural Networks</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Shi</surname><given-names>Chenlu</given-names></name><email xlink:href="mailto:chenlu.shi@colostate.edu">chenlu.shi@colostate.edu</email><xref ref-type="aff" rid="j_nejsds26_aff_001"/><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Chiu</surname><given-names>Ashley Kathleen</given-names></name><email xlink:href="mailto:ashleychiu@ucla.edu">ashleychiu@ucla.edu</email><xref ref-type="aff" rid="j_nejsds26_aff_002"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Xu</surname><given-names>Hongquan</given-names></name><email xlink:href="mailto:hqxu@stat.ucla.edu">hqxu@stat.ucla.edu</email><xref ref-type="aff" rid="j_nejsds26_aff_003"/>
</contrib>
<aff id="j_nejsds26_aff_001">Department of Statistics, <institution>Colorado State University</institution>, <country>USA</country>. E-mail address: <email xlink:href="mailto:chenlu.shi@colostate.edu">chenlu.shi@colostate.edu</email></aff>
<aff id="j_nejsds26_aff_002">Department of Statistics, <institution>University of California</institution>, Los Angeles, <country>USA</country>. E-mail address: <email xlink:href="mailto:ashleychiu@ucla.edu">ashleychiu@ucla.edu</email></aff>
<aff id="j_nejsds26_aff_003">Department of Statistics, <institution>University of California</institution>, Los Angeles, <country>USA</country>. E-mail address: <email xlink:href="mailto:hqxu@stat.ucla.edu">hqxu@stat.ucla.edu</email></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2023</year></pub-date><pub-date pub-type="epub"><day>24</day><month>2</month><year>2023</year></pub-date><volume>1</volume><issue>3</issue><fpage>334</fpage><lpage>341</lpage><supplementary-material id="S1" content-type="archive" xlink:href="nejsds26_s001.zip" mimetype="application" mime-subtype="x-zip-compressed">
<caption>
<title>Supplementary Material</title>
<p>The supplementary material includes all design matrices in terms of the natural units we used.</p>
</caption>
</supplementary-material><history><date date-type="accepted"><day>14</day><month>2</month><year>2023</year></date></history>
<permissions><copyright-statement>© 2023 New England Statistical Society</copyright-statement><copyright-year>2023</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>The performance of a learning technique relies heavily on hyperparameter settings. It calls for hyperparameter tuning for a deep learning technique, which may be too computationally expensive for sophisticated learning techniques. As such, expeditiously exploring the relationship between hyperparameters and the performance of a learning technique controlled by these hyperparameters is desired, and thus it entails the consideration of design strategies to collect informative data efficiently to do so. Various designs can be considered for this purpose. The question as to which design to use then naturally arises. In this paper, we examine the use of different types of designs in efficiently collecting informative data to study the surface of test accuracy, a measure of the performance of a learning technique, over hyperparameters. Under the settings we considered, we find that the strong orthogonal array outperforms all other comparable designs.</p>
</abstract>
<kwd-group>
<label>Keywords and phrases</label>
<kwd>Big data analysis</kwd>
<kwd>Factorial design</kwd>
<kwd>Kriging model</kwd>
<kwd>Machine learning</kwd>
<kwd>MNIST dataset</kwd>
<kwd>Space-filling design</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="j_nejsds26_s_001">
<label>1</label>
<title>Introduction</title>
<p>In the modern computer age, deep learning has been a powerful tool for big data analysis, which utilizes various sophisticated computer models to learn hidden patterns or intricate relationships among a large number of variables from monumental amounts of complex and challenging data. The performance of a deep learning technique, however, depends heavily on hyperparameters – parameters that control the learning process whose values are predetermined by the user in advance of the process of training the learning model. An inappropriate hyperparameter setting can result in poor performance of the learning process [<xref ref-type="bibr" rid="j_nejsds26_ref_013">13</xref>]. Not only do possible hyperparameters vary algorithm to algorithm, architecture to architecture, and dataset to dataset, but also those most important hyperparameters can be unique across different domains [<xref ref-type="bibr" rid="j_nejsds26_ref_003">3</xref>, <xref ref-type="bibr" rid="j_nejsds26_ref_043">43</xref>]. These call for investigations on hyperparameter optimization.</p>
<p>Generally, work in the literature on hyperparameter optimization has been discussed under both model-free and model-based frameworks. Model-based hyperparameter optimization targets tuning hyperparameters by finding the best possible approximation to the true learning algorithm, while model-free methods consider this optimization problem without making any parametric assumptions. Relevant work along the line of model-based approaches can be found in [<xref ref-type="bibr" rid="j_nejsds26_ref_004">4</xref>, <xref ref-type="bibr" rid="j_nejsds26_ref_009">9</xref>, <xref ref-type="bibr" rid="j_nejsds26_ref_019">19</xref>, <xref ref-type="bibr" rid="j_nejsds26_ref_026">26</xref>, <xref ref-type="bibr" rid="j_nejsds26_ref_030">30</xref>, <xref ref-type="bibr" rid="j_nejsds26_ref_034">34</xref>, <xref ref-type="bibr" rid="j_nejsds26_ref_039">39</xref>, <xref ref-type="bibr" rid="j_nejsds26_ref_046">46</xref>, <xref ref-type="bibr" rid="j_nejsds26_ref_051">51</xref>]. There has been extensive work under the model-free framework – see, for example, manual search, grid search, random search [<xref ref-type="bibr" rid="j_nejsds26_ref_003">3</xref>] and orthogonal array tuning method [<xref ref-type="bibr" rid="j_nejsds26_ref_050">50</xref>].</p>
<p>For more sophisticated learning algorithms, however, the above hyperparameter optimization approaches, either model-free or model-based methods, may be too computationally expensive to afford. This draws our attention to expeditiously explore the relationship between hyperparameters and the response containing measures of the performances of models learned by an algorithm with different hyperparameter combinations using a set of data collected through testing a minimal number of hyperparameter combinations. It alludes to a primary goal of design and analysis of experiments – executing efficient experiments to collect informative data for studying the relationship between multiple input variables and an output variable. As such, it is natural to consider design strategies to efficiently collect data for exploring the surface of the performances of a hyperparameter-controlled learning algorithm, which gives rise to the question of which design to use for this purpose, as diverse designs are available in the literature on experimental design.</p>
<p>In this paper, we investigate the above question by comparing different types of designs in terms of their ability to efficiently collect informative data to study how hyperparameters influence the performances of a learning algorithm controlled by them. In order to break the bottleneck of computational limitations, we use a deep neural network model with three hidden layers and apply this comparison to a popular dataset, the MNIST dataset. Five hyperparameters and various factorial and space-filling designs for selecting hyperparameter combinations are considered. We choose the Kriging model to describe the complex relationship between the hyperparameters of a learning algorithm and the test accuracy that measures the performance of such a learning algorithm in the study. The results show that the performance of the 32-run strong orthogonal array surpasses the performances of all other comparable designs.</p>
<p>This paper is organized as follows. In Section <xref rid="j_nejsds26_s_002">2</xref>, we provide preparation for the comparison, reviewing (deep) neural networks, hyperparameters, experimental designs and the Kriging model. Section <xref rid="j_nejsds26_s_007">3</xref> compares various designs by applying them to the MNIST dataset. We end this paper with some discussion in Section <xref rid="j_nejsds26_s_010">4</xref>.</p>
</sec>
<sec id="j_nejsds26_s_002">
<label>2</label>
<title>Preliminaries</title>
<p>Suppose that <inline-formula id="j_nejsds26_ineq_001"><alternatives><mml:math>
<mml:mi mathvariant="script">A</mml:mi></mml:math><tex-math><![CDATA[$\mathcal{A}$]]></tex-math></alternatives></inline-formula> is a machine learning algorithm with <italic>k</italic> hyperparameters. Let <inline-formula id="j_nejsds26_ineq_002"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="normal">Λ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\Lambda _{i}}$]]></tex-math></alternatives></inline-formula> be the domain of the <italic>i</italic>-th hyperparameter and <inline-formula id="j_nejsds26_ineq_003"><alternatives><mml:math>
<mml:mi mathvariant="bold">Λ</mml:mi>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="normal">Λ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>×</mml:mo>
<mml:mo stretchy="false">⋯</mml:mo>
<mml:mo>×</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="normal">Λ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">k</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[$\boldsymbol{\Lambda }={\Lambda _{1}}\times \cdots \times {\Lambda _{k}}$]]></tex-math></alternatives></inline-formula>. We write values of a combination of <italic>k</italic> hyperparameters into a vector <italic>x</italic>. For any <inline-formula id="j_nejsds26_ineq_004"><alternatives><mml:math>
<mml:mi mathvariant="italic">x</mml:mi>
<mml:mo stretchy="false">∈</mml:mo>
<mml:mi mathvariant="bold">Λ</mml:mi></mml:math><tex-math><![CDATA[$x\in \boldsymbol{\Lambda }$]]></tex-math></alternatives></inline-formula>, a learning algorithm based on the hyperparameter setting <italic>x</italic> is denoted by <inline-formula id="j_nejsds26_ineq_005"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="script">A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\mathcal{A}_{x}}$]]></tex-math></alternatives></inline-formula>. Given a set of data <italic>D</italic>, we denote a measure of the performance of a model learned by algorithm <inline-formula id="j_nejsds26_ineq_006"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="script">A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\mathcal{A}_{x}}$]]></tex-math></alternatives></inline-formula> on data <italic>D</italic> by <italic>y</italic>. One commonly used measure of the performance of a learning algorithm model is test accuracy which assesses how accurate the prediction of test data from the model based on the training dataset is compared to the observed value. The higher the test accuracy, the better the performance of the algorithm.</p>
<p>In order to find an appropriate hyperparameter combination, it is desirable to explore the relationship between hyperparameter combination <italic>x</italic> and test accuracy <italic>y</italic>. The complexity of the algorithm <inline-formula id="j_nejsds26_ineq_007"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="script">A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\mathcal{A}_{x}}$]]></tex-math></alternatives></inline-formula> results in obtaining the accuracy <italic>y</italic> being computationally expensive. Therefore, a design strategy is needed to efficiently collect informative data for building a statistical model to specify the relationship between hyperparameter combination <italic>x</italic> and test accuracy <italic>y</italic>. Our goal here is to compare different types of experimental designs for doing this job.</p>
<sec id="j_nejsds26_s_003">
<title>Neural Networks and Deep Neural Networks</title>
<p>Neural networks [<xref ref-type="bibr" rid="j_nejsds26_ref_031">31</xref>], also known as artificial neural networks, are one of the most well-known algorithms used to recognize patterns and solve common problems in statistics, data science and machine learning. A neural network is comprised of an input layer, one or more hidden layers and an output layer, with several units called neurons in each layer. Neurons in the input layer bring the initial data into the network for further processing, while the result from this network is produced by neurons in the output layer. Each neuron in hidden layers connects to another via receiving input produced by neurons in the preceding layer and producing the output to neurons in the next layer. Neural network models, in general, are very flexible, which, on the other hand, brings challenges in terms of constructing the best network structures. See [<xref ref-type="bibr" rid="j_nejsds26_ref_029">29</xref>] for more details on neural networks.</p>
<p>In this day and age, problems, such as image classification, object detection, or natural language processing task, become increasingly complex. This calls for deep neural networks. A deep neural network refers to an artificial neural network with more than one hidden layer [<xref ref-type="bibr" rid="j_nejsds26_ref_002">2</xref>, <xref ref-type="bibr" rid="j_nejsds26_ref_037">37</xref>]. As an example, consider a deep neural network with three hidden layers, which is shown in Figure <xref rid="j_nejsds26_fig_001">1</xref>. The number of neurons in the hidden layers is one of the common hyperparameters of interest. The number of neurons in the input layer is equal to the number of features in the data, while the number of neurons in the output layer depends on the type of problems we solve. For example, very often, there is one neuron in the output layer for a regression problem. If a multi-class classification problem is considered, one common choice is to use one neuron per a class.</p>
<fig id="j_nejsds26_fig_001">
<label>Figure 1</label>
<caption>
<p>The neural network model with three hidden layers.</p>
</caption>
<graphic xlink:href="nejsds26_g001.jpg"/>
</fig>
</sec>
<sec id="j_nejsds26_s_004">
<title>Hyperparameters</title>
<p>Hyperparameters for neural networks are those variables whose values determine the network structure and/or the way a network is learned. The number of neurons in the hidden layers of Figure <xref rid="j_nejsds26_fig_001">1</xref> is an example of a hyperparameter. Typically, a neural network requires us to determine values for a set of different hyperparameters including, but not limited to, learning rate, batch size, number of layers, dropout rate, number of epochs, number of training iterations, normalization, pooling, momentum and weight initialization. Those hyperparameter settings drastically affect the success and accuracy of the network. As a consequence, finding an appropriate hyperparameter combination is increasingly important.</p>
</sec>
<sec id="j_nejsds26_s_005">
<title>Design of Experiments</title>
<p>Design of experiments is a systematic and efficient method that allows scientists and engineers to explore the relationship between multiple input variables and output variables. Traditional physical experiments favor various factorial designs for studying systems or processes [<xref ref-type="bibr" rid="j_nejsds26_ref_045">45</xref>]. But they are no longer proper when the systems or processes become complex, which calls for computer experiments and space-filling designs, a family of designs appropriate for computer experiments [<xref ref-type="bibr" rid="j_nejsds26_ref_010">10</xref>, <xref ref-type="bibr" rid="j_nejsds26_ref_036">36</xref>].</p>
<p><bold>Factorial Designs</bold> At the early stage of an investigation, due to the lack of enough prior knowledge, the experimenter conducts a screening experiment involving as many variables as possible and the primary goal is to identify some important factors from them using a first-order model. Two-level factorial designs [<xref ref-type="bibr" rid="j_nejsds26_ref_007">7</xref>, <xref ref-type="bibr" rid="j_nejsds26_ref_033">33</xref>], thanks to their simple structure and nice statistical properties, are commonly used for this purpose. The experimenter often proceeds to the next stage by capturing the curvature in the response surface, which requires the consideration of three-level factorial designs or composite designs such as orthogonal array composite designs (OACDs) [<xref ref-type="bibr" rid="j_nejsds26_ref_048">48</xref>].</p>
<p><bold>Space-Filling Designs</bold> A space-filling design refers to a design that scatters its design points uniformly over the whole design region. It is tremendously complicated to find a space-filling design that provides good coverage of the entire input space, especially in the high-dimensional input space. Instead, it is natural and reasonable to consider a design that enjoys the space-filling property in low-dimensional projections. This idea dates back to Latin hypercube designs (LHDs) proposed by [<xref ref-type="bibr" rid="j_nejsds26_ref_032">32</xref>]. Such designs are orthogonal arrays of strength one and enjoy the maximum space-filling property in all univariate projections – there is exactly one observation in any interval formed through dividing the range of an input variable into the same number of equally spaced intervals as the run size. The space-filling property of an LHD may be further evaluated by other optimality criteria such as orthogonality [<xref ref-type="bibr" rid="j_nejsds26_ref_005">5</xref>], a distance criterion [<xref ref-type="bibr" rid="j_nejsds26_ref_020">20</xref>] and a discrepancy criterion [<xref ref-type="bibr" rid="j_nejsds26_ref_011">11</xref>]. There has been extensive work along this line – see, for example, [<xref ref-type="bibr" rid="j_nejsds26_ref_012">12</xref>, <xref ref-type="bibr" rid="j_nejsds26_ref_021">21</xref>, <xref ref-type="bibr" rid="j_nejsds26_ref_027">27</xref>].</p>
<p>Strong orthogonal arrays (SOAs) introduced and studied by [<xref ref-type="bibr" rid="j_nejsds26_ref_017">17</xref>] are the other class of space-filling designs in the literature with a focus on low-dimensional projections. An SOA of strength <italic>t</italic> achieves the space-filling properties in all <inline-formula id="j_nejsds26_ineq_008"><alternatives><mml:math>
<mml:mi mathvariant="italic">g</mml:mi>
<mml:mo stretchy="false">≤</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi></mml:math><tex-math><![CDATA[$g\le t$]]></tex-math></alternatives></inline-formula> dimensions. Such an array does well as a comparable orthogonal array of strength <italic>t</italic> in all <italic>t</italic> dimensions but is more space-filling in all <inline-formula id="j_nejsds26_ineq_009"><alternatives><mml:math>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">≤</mml:mo>
<mml:mi mathvariant="italic">g</mml:mi>
<mml:mo stretchy="false">≤</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:math><tex-math><![CDATA[$1\le g\le t-1$]]></tex-math></alternatives></inline-formula> dimensions than the latter. It can also be made to be an LHD through level expansion to achieve the maximum space-filling property in all one-dimensional projections. As a result, this SOA-based LHD enjoys the benefits of better space-filling properties than the comparable ordinary LHD in all <inline-formula id="j_nejsds26_ineq_010"><alternatives><mml:math>
<mml:mn>2</mml:mn>
<mml:mo stretchy="false">≤</mml:mo>
<mml:mi mathvariant="italic">g</mml:mi>
<mml:mo stretchy="false">≤</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi></mml:math><tex-math><![CDATA[$2\le g\le t$]]></tex-math></alternatives></inline-formula> dimensions. More developments on SOAs can be found in [<xref ref-type="bibr" rid="j_nejsds26_ref_015">15</xref>, <xref ref-type="bibr" rid="j_nejsds26_ref_018">18</xref>, <xref ref-type="bibr" rid="j_nejsds26_ref_028">28</xref>, <xref ref-type="bibr" rid="j_nejsds26_ref_038">38</xref>, <xref ref-type="bibr" rid="j_nejsds26_ref_042">42</xref>].</p>
</sec>
<sec id="j_nejsds26_s_006">
<title>Kriging Model</title>
<p>The Kriging model originated in the areas of geosciences [<xref ref-type="bibr" rid="j_nejsds26_ref_023">23</xref>] and is now popular in computer experiments for building a surrogate model for complicated computer models. In a Kriging model, the response is referred to as a realization of a Gaussian process, and the predicted response at a target point can be represented by a weighted average of the responses at observed points. For more details on Kriging models, we refer to [<xref ref-type="bibr" rid="j_nejsds26_ref_008">8</xref>, <xref ref-type="bibr" rid="j_nejsds26_ref_014">14</xref>, <xref ref-type="bibr" rid="j_nejsds26_ref_022">22</xref>, <xref ref-type="bibr" rid="j_nejsds26_ref_035">35</xref>, <xref ref-type="bibr" rid="j_nejsds26_ref_044">44</xref>].</p>
<p>As most learning algorithms are stochastic, instead of using the Universal Kriging model for the deterministic case in computer experiments, we consider the Universal Kriging model with a noise term 
<disp-formula id="j_nejsds26_eq_001">
<label>(2.1)</label><alternatives><mml:math display="block">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd>
<mml:mi mathvariant="italic">y</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">x</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:mo largeop="true" movablelimits="false">∑</mml:mo></mml:mstyle>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">β</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">x</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="italic">Z</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">x</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="italic">ϵ</mml:mi>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[ y(x)={\sum \limits_{i=1}^{m}}{\beta _{i}}{f_{i}}(x)+Z(x)+\epsilon ,\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_nejsds26_ineq_011"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mo largeop="false" movablelimits="false">∑</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">β</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">x</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[${\textstyle\sum _{i=1}^{m}}{\beta _{i}}{f_{i}}(x)$]]></tex-math></alternatives></inline-formula> is the mean function with <inline-formula id="j_nejsds26_ineq_012"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${f_{i}}$]]></tex-math></alternatives></inline-formula> being the <italic>i</italic>th basis function and <inline-formula id="j_nejsds26_ineq_013"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">β</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\beta _{i}}$]]></tex-math></alternatives></inline-formula> being the <italic>i</italic>th coefficient, <inline-formula id="j_nejsds26_ineq_014"><alternatives><mml:math>
<mml:mi mathvariant="italic">Z</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">x</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[$Z(x)$]]></tex-math></alternatives></inline-formula> is a second-order stationary random process with constant mean 0 and the covariance matrix given by <inline-formula id="j_nejsds26_ineq_015"><alternatives><mml:math>
<mml:mo movablelimits="false">Cov</mml:mo>
<mml:mo fence="true" stretchy="false">[</mml:mo>
<mml:mi mathvariant="italic">Z</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">u</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">Z</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">v</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo fence="true" stretchy="false">]</mml:mo>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="italic">σ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:msubsup>
<mml:mrow>
<mml:mo largeop="false" movablelimits="false">∏</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mi mathvariant="italic">R</mml:mi>
<mml:mfenced separators="" open="(" close=")">
<mml:mrow>
<mml:mfenced separators="" open="|" close="|">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:math><tex-math><![CDATA[$\operatorname{Cov}[Z(u),Z(v)]={\sigma ^{2}}{\textstyle\prod _{i=1}^{k}}R\left(\left|{u_{i}}-{v_{i}}\right|\right)$]]></tex-math></alternatives></inline-formula>, where <inline-formula id="j_nejsds26_ineq_016"><alternatives><mml:math>
<mml:mi mathvariant="italic">u</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[$u=({u_{1}},\dots ,{u_{k}})$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_nejsds26_ineq_017"><alternatives><mml:math>
<mml:mi mathvariant="italic">v</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[$v=({v_{1}},\dots ,{v_{k}})$]]></tex-math></alternatives></inline-formula> are two points (runs), <inline-formula id="j_nejsds26_ineq_018"><alternatives><mml:math>
<mml:mi mathvariant="italic">R</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mo>·</mml:mo>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[$R(\cdot )$]]></tex-math></alternatives></inline-formula> is a spatial correlation function and <inline-formula id="j_nejsds26_ineq_019"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="italic">σ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${\sigma ^{2}}$]]></tex-math></alternatives></inline-formula> is the variance of the random process, and <inline-formula id="j_nejsds26_ineq_020"><alternatives><mml:math>
<mml:mi mathvariant="italic">ϵ</mml:mi>
<mml:mo stretchy="false">∼</mml:mo>
<mml:mi mathvariant="italic">N</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="italic">τ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[$\epsilon \sim N(0,{\tau ^{2}})$]]></tex-math></alternatives></inline-formula> is independent of <inline-formula id="j_nejsds26_ineq_021"><alternatives><mml:math>
<mml:mi mathvariant="italic">Z</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">x</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[$Z(x)$]]></tex-math></alternatives></inline-formula>. In this study, we adopt the Ordinary Kriging model with a noise term, where the mean function in equation (<xref rid="j_nejsds26_eq_001">2.1</xref>) is a constant. We choose the Matern correlation function with parameter <inline-formula id="j_nejsds26_ineq_022"><alternatives><mml:math>
<mml:mi mathvariant="italic">ν</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>5</mml:mn>
<mml:mo mathvariant="normal" stretchy="false">/</mml:mo>
<mml:mn>2</mml:mn></mml:math><tex-math><![CDATA[$\nu =5/2$]]></tex-math></alternatives></inline-formula> in the study based on the conclusion drawn by [<xref ref-type="bibr" rid="j_nejsds26_ref_047">47</xref>] that the Matern correlation function provides a good balance between differentiability and smoothness. The explicit form is given by 
<disp-formula id="j_nejsds26_eq_002">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd>
<mml:mi mathvariant="italic">R</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">θ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mfenced separators="" open="(" close=")">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo><mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msqrt>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">θ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
<mml:mo>+</mml:mo><mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:mn>5</mml:mn>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">θ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
</mml:mrow>
</mml:mfenced>
<mml:mo movablelimits="false">exp</mml:mo>
<mml:mfenced separators="" open="(" close=")">
<mml:mrow>
<mml:mo>−</mml:mo><mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msqrt>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">θ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
</mml:mrow>
</mml:mfenced>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[ R({h_{i}};{\theta _{i}})=\left(1+\frac{\sqrt{5}{h_{i}}}{{\theta _{i}}}+\frac{5{h_{i}^{2}}}{3{\theta _{i}^{2}}}\right)\exp \left(-\frac{\sqrt{5}{h_{i}}}{{\theta _{i}}}\right),\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_nejsds26_ineq_023"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo stretchy="false">|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">|</mml:mo></mml:math><tex-math><![CDATA[${h_{i}}=|{u_{i}}-{v_{i}}|$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_nejsds26_ineq_024"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">θ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\theta _{i}}$]]></tex-math></alternatives></inline-formula> is the unknown hyperparameter which can be estimated using the maximum likelihood estimation method.</p>
</sec>
</sec>
<sec id="j_nejsds26_s_007">
<label>3</label>
<title>Comparison Study Based on MNIST Dataset</title>
<p>Due to computational limitations, to compare the performance of various designs, we explore the test accuracy surface on hyperparameters by considering a deep neural network model with three hidden layers, as shown in Figure <xref rid="j_nejsds26_fig_001">1</xref>, for a manageable dataset, MNIST.</p>
<sec id="j_nejsds26_s_008">
<label>3.1</label>
<title>Setups</title>
<p><bold>MNIST Dataset</bold> The MNIST dataset (Modified National Institute of Standards and Technology database) is a subset of a larger dataset available at NIST, the National Institute of Standards and Technology, and has served as a canonical training dataset for many learning techniques and pattern recognition methods. It is a dataset of handwritten digits, first developed and released by [<xref ref-type="bibr" rid="j_nejsds26_ref_025">25</xref>], containing 70000 grayscale images of the 10 digits that are <inline-formula id="j_nejsds26_ineq_025"><alternatives><mml:math>
<mml:mn>28</mml:mn>
<mml:mo>×</mml:mo>
<mml:mn>28</mml:mn></mml:math><tex-math><![CDATA[$28\times 28$]]></tex-math></alternatives></inline-formula> pixels in width and height. Some examples are demonstrated in Figure <xref rid="j_nejsds26_fig_002">2</xref>. These images come with labels and thus are used for supervised learning tasks. MNIST is user-friendly, as images in this set have been split into a training set of 60,000 images, and a test set of 10,000 images. In this paper, we further consider a subset of the training set as a validation set. Each time, we randomly take out 15% of the full training set to serve as a validation set.</p>
<p><bold>Hyperparameters</bold> We consider five common hyperparameters: learning rate, number of epochs, batch size, dropout rate and number of neurons in a hidden layer. These five hyperparameters are numerical factors: learning rate and dropout rate are continuous factors and number of epochs, batch size and number of neurons are discrete factors. The domain of each hyperparameter is given in Table <xref rid="j_nejsds26_tab_001">1</xref>.</p>
<fig id="j_nejsds26_fig_002">
<label>Figure 2</label>
<caption>
<p>Examples from MNIST Handwritten Digit Database.</p>
</caption>
<graphic xlink:href="nejsds26_g002.jpg"/>
</fig>
<table-wrap id="j_nejsds26_tab_001">
<label>Table 1</label>
<caption>
<p>Domains of hyperparameters.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: center; border-top: double; border-bottom: solid thin">Learning Rate</td>
<td style="vertical-align: top; text-align: center; border-top: double; border-bottom: solid thin">Number of Epochs</td>
<td style="vertical-align: top; text-align: center; border-top: double; border-bottom: solid thin">Batch Size</td>
<td style="vertical-align: top; text-align: center; border-top: double; border-bottom: solid thin">Dropout Rate</td>
<td style="vertical-align: top; text-align: center; border-top: double; border-bottom: solid thin">Number of units</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">[0.0001, 0.01]</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">[1, 32]</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">[16, 128]</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">[0.5, 0.8]</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">[32, 256]</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><bold>Designs</bold> Various factorial designs and space-filling designs of 5 columns corresponding to five hyperparameters are examined. More specifically, we use three factorials: the <inline-formula id="j_nejsds26_ineq_026"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${2^{5}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_nejsds26_ineq_027"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${3^{5}}$]]></tex-math></alternatives></inline-formula> full factorials and an OACD of 34 runs combining the <inline-formula id="j_nejsds26_ineq_028"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup></mml:math><tex-math><![CDATA[${2_{V}^{5-1}}$]]></tex-math></alternatives></inline-formula> factorial with a three-level 18-run orthogonal array. Two full factorials are generated using the R package DoE.base [<xref ref-type="bibr" rid="j_nejsds26_ref_016">16</xref>]. The 34-run OACD can be constructed using Table 2 in [<xref ref-type="bibr" rid="j_nejsds26_ref_048">48</xref>] and is given in the supplementary material. In addition to three factorials, this study includes a 243-run random LHD generated using the R package lhs [<xref ref-type="bibr" rid="j_nejsds26_ref_006">6</xref>] and several 32-run space-filling designs – an SOA of strength three, its corresponding LHD and four other types of LHDs: a random LHD, a maximin LHD, a maximum projection LHD and a uniform LHD, which are generated using R packages lhs, SLHD [<xref ref-type="bibr" rid="j_nejsds26_ref_001">1</xref>], MaxPro [<xref ref-type="bibr" rid="j_nejsds26_ref_021">21</xref>], and UniDOE [<xref ref-type="bibr" rid="j_nejsds26_ref_049">49</xref>], respectively. There are 32 levels in each column of these LHDs while the SOA of strength three has 8 levels and is listed in the supplementary material. An LHD based on this SOA can be obtained by expanding 8 levels to 32 levels following [<xref ref-type="bibr" rid="j_nejsds26_ref_017">17</xref>]. The two largest designs, <inline-formula id="j_nejsds26_ineq_029"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${3^{5}}$]]></tex-math></alternatives></inline-formula> factorial and 243-run random LHD, serve as the benchmarks in comparison with designs of small run sizes.</p>
<p>The design matrices of the above designs in terms of natural units are provided in the supplementary material for reference, where the lowest and highest levels are given in Table <xref rid="j_nejsds26_tab_001">1</xref> for each factor. We linearly interpolate other levels within the range. For discrete variables, we further round the levels to the nearest integers.</p>
<p><bold>Data Collection</bold> All neural network implementations are built in TensorFlow: <uri>https://www.tensorflow.org</uri>. There is a stochastic component to the achieved accuracy, as images are shuffled for each model trained. The network based on the training dataset is further assessed on the test dataset of 10,000 images using test accuracy. The test accuracy is appended in the tables of design matrices with natural units in the supplementary material. We also record the cross-entropy loss of each hyperparameter combination in the column right before the test accuracy column in these tables for those interested parties.</p>
</sec>
<sec id="j_nejsds26_s_009">
<label>3.2</label>
<title>Analysis and Results</title>
<p>We evaluate designs by considering their ability to collect informative data for building a statistical model that specifies the relationship between hyperparameters and test accuracy. In our comparison, the Kriging model is an appropriate consideration, as it can compensate for the effects of data clustering and give a better estimation of prediction error. More specifically, we use the Ordinary Kriging model with a noise term. As test accuracy has values with <inline-formula id="j_nejsds26_ineq_030"><alternatives><mml:math>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[$(0,1)$]]></tex-math></alternatives></inline-formula>, we consider the Arcsine Transformation for test accuracy when building the model.</p>
<p>Essentially, the model is built using a set of data including hyperparameter combinations selected by a design we consider and their corresponding accuracy, which is then expected to be evaluated on the data comprising as many diverse hyperparameter combinations as possible over the whole hyperparameter region and their accuracy. This, however, is infeasible especially for the continuous hyperparameters. In this paper, we consider two extreme cases – generating test data from the <inline-formula id="j_nejsds26_ineq_031"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${3^{5}}$]]></tex-math></alternatives></inline-formula> factorial and 243-run random LHD. The random LHD enjoys the maximum space-filling property in all one dimensions, while the <inline-formula id="j_nejsds26_ineq_032"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${3^{5}}$]]></tex-math></alternatives></inline-formula> factorial covers the entire 5-dimensional input space in a uniform fashion. The root mean square error (RMSE) values for predicting the mean test accuracy on these two designs are listed in Table <xref rid="j_nejsds26_tab_002">2</xref>. Table <xref rid="j_nejsds26_tab_002">2</xref> also includes results on the hybrid design combining the <inline-formula id="j_nejsds26_ineq_033"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${3^{5}}$]]></tex-math></alternatives></inline-formula> factorial with 243-run random LHD.</p>
<table-wrap id="j_nejsds26_tab_002">
<label>Table 2</label>
<caption>
<p>Test RMSE values on <inline-formula id="j_nejsds26_ineq_034"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${3^{5}}$]]></tex-math></alternatives></inline-formula> factorial, 243-run random LHD and their hybrid, respectively, where Maxpro LHD represents Maximum Projection LHD.</p>
</caption>
<table>
<thead>
<tr>
<td rowspan="2" style="vertical-align: top; text-align: right; border-top: double; border-bottom: solid thin"><bold>Training Designs</bold></td>
<td colspan="3" style="vertical-align: top; text-align: center; border-top: double; border-bottom: solid thin"><bold>Testing Designs</bold></td>
</tr>
<tr>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin"><inline-formula id="j_nejsds26_ineq_035"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${3^{5}}$]]></tex-math></alternatives></inline-formula> Factorial</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">243-run Random LHD</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin"><inline-formula id="j_nejsds26_ineq_036"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${3^{5}}$]]></tex-math></alternatives></inline-formula> Factorial + 243-run Random LHD</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: right"><inline-formula id="j_nejsds26_ineq_037"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${2^{5}}$]]></tex-math></alternatives></inline-formula> factorial</td>
<td style="vertical-align: top; text-align: center">0.131</td>
<td style="vertical-align: top; text-align: center">0.223</td>
<td style="vertical-align: top; text-align: center">0.183</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: right">34-run OACD</td>
<td style="vertical-align: top; text-align: center">0.092</td>
<td style="vertical-align: top; text-align: center">0.112</td>
<td style="vertical-align: top; text-align: center">0.102</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: right; border-bottom: solid thin"><inline-formula id="j_nejsds26_ineq_038"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${3^{5}}$]]></tex-math></alternatives></inline-formula> Factorial</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">0.035</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">0.090</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">0.068</td>
</tr>
</tbody><tbody>
<tr>
<td style="vertical-align: top; text-align: right">32-run SOA</td>
<td style="vertical-align: top; text-align: center">0.118</td>
<td style="vertical-align: top; text-align: center">0.068</td>
<td style="vertical-align: top; text-align: center">0.097</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: right">32-run SOA LHD</td>
<td style="vertical-align: top; text-align: center">0.164</td>
<td style="vertical-align: top; text-align: center">0.083</td>
<td style="vertical-align: top; text-align: center">0.130</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: right">32-run Maximin LHD</td>
<td style="vertical-align: top; text-align: center">0.175</td>
<td style="vertical-align: top; text-align: center">0.067</td>
<td style="vertical-align: top; text-align: center">0.133</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: right">32-run Maxpro LHD</td>
<td style="vertical-align: top; text-align: center">0.148</td>
<td style="vertical-align: top; text-align: center">0.072</td>
<td style="vertical-align: top; text-align: center">0.116</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: right">32-run Uniform LHD</td>
<td style="vertical-align: top; text-align: center">0.200</td>
<td style="vertical-align: top; text-align: center">0.076</td>
<td style="vertical-align: top; text-align: center">0.151</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: right">32-run Random LHD</td>
<td style="vertical-align: top; text-align: center">0.160</td>
<td style="vertical-align: top; text-align: center">0.097</td>
<td style="vertical-align: top; text-align: center">0.132</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: right; border-bottom: solid thin">243-run Random LHD</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">0.144</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">0.027</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">0.104</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>When tested on the <inline-formula id="j_nejsds26_ineq_039"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${3^{5}}$]]></tex-math></alternatives></inline-formula> factorial, from Table <xref rid="j_nejsds26_tab_002">2</xref>, it can be seen that a three-level design, 34-run OACD, outperforms all other small designs. This design performs as well as we expected because this 34-run OACD is a design comprising 34 runs taken from the <inline-formula id="j_nejsds26_ineq_040"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${3^{5}}$]]></tex-math></alternatives></inline-formula> factorial. In other words, a subset of the test data is used to train the Kriging model in this case. Surprisingly, although the SOA is tested on a different type of design, the <inline-formula id="j_nejsds26_ineq_041"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${3^{5}}$]]></tex-math></alternatives></inline-formula> factorial, it dramatically outclasses all other small designs except the OACD. Notably, both the OACD and SOA have smaller RMSE than the 243-run LHD when tested on the <inline-formula id="j_nejsds26_ineq_042"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${3^{5}}$]]></tex-math></alternatives></inline-formula> factorial. Moreover, the model on the <inline-formula id="j_nejsds26_ineq_043"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${3^{5}}$]]></tex-math></alternatives></inline-formula> factorial produces the smallest test RMSE, because the test RMSE is indeed the training RMSE.</p>
<p>Considering the test data generated from the 243-run random LHD, the results in Table <xref rid="j_nejsds26_tab_002">2</xref> clearly show that all factorials are worse than those space-filling designs, and that, when the Kriging model is built on the 243-run random LHD, the test RMSE is the training RMSE and thus is the smallest. Though the maximin LHD produces as the smallest test RMSEs as the SOA in this case, its performance when tested on the <inline-formula id="j_nejsds26_ineq_044"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${3^{5}}$]]></tex-math></alternatives></inline-formula> factorial is not as well as that of the SOA – the SOA performs exceedingly well compared to the other small space-filling designs.</p>
<p>Both the OACD and SOA perform excessively well when tested on the same type of design, the <inline-formula id="j_nejsds26_ineq_045"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${3^{5}}$]]></tex-math></alternatives></inline-formula> factorial and 243-run random LHD, respectively. But their performances tested on the different types of designs considerably differ. The SOA is still very competitive on the <inline-formula id="j_nejsds26_ineq_046"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${3^{5}}$]]></tex-math></alternatives></inline-formula> factorial, as it produces the smallest test RMSE among all comparable designs except the OACD. On the 243-run random LHD, however, the performance of OACD is the worst except for the performance of the <inline-formula id="j_nejsds26_ineq_047"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${2^{5}}$]]></tex-math></alternatives></inline-formula> factorial. The SOA is a clear winner among all small designs we considered. The numerical results in the last column of Table <xref rid="j_nejsds26_tab_002">2</xref> further support the above conclusion. In other words, for the setting we considered, this SOA allows us to efficiently collect the most informative data for building a Kriging model that specifies the relationship between hyperparameters and test accuracy. The SOA of strength three we use is indeed optimal in some sense – Theorem 4 of [<xref ref-type="bibr" rid="j_nejsds26_ref_040">40</xref>] implies that this SOA is optimal under the uniform projection criterion. Moreover, it enjoys better space-filling properties in all two dimensions than an ordinary SOA of strength three. Notably, the model built on the <inline-formula id="j_nejsds26_ineq_048"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${2^{5}}$]]></tex-math></alternatives></inline-formula> design produces an extremely large test RMSE value when tested on the hybrid design, which implies that this two-level design is inadequate for building a Kriging model to capture the true surface of test accuracy.</p>
<fig id="j_nejsds26_fig_003">
<label>Figure 3</label>
<caption>
<p>Density plots of test accuracy values.</p>
</caption>
<graphic xlink:href="nejsds26_g003.jpg"/>
</fig>
<p>Figure <xref rid="j_nejsds26_fig_003">3</xref> and Figure <xref rid="j_nejsds26_fig_004">4</xref> also sustain that the performance of this SOA is superb. Figure <xref rid="j_nejsds26_fig_003">3</xref> displays the density plots of the observed test accuracy values for all designs. Each density plot in the first two rows of Figure <xref rid="j_nejsds26_fig_003">3</xref> corresponds to a type of design that is used to generate the training data. The density curve of observed test accuracy values for the hybrid design combining the <inline-formula id="j_nejsds26_ineq_049"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${3^{5}}$]]></tex-math></alternatives></inline-formula> factorial and 243-run random LHD is present in the last plot of Figure <xref rid="j_nejsds26_fig_003">3</xref>. It is bimodal with a higher peak within <inline-formula id="j_nejsds26_ineq_050"><alternatives><mml:math>
<mml:mo fence="true" stretchy="false">[</mml:mo>
<mml:mn>0.8</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo fence="true" stretchy="false">]</mml:mo></mml:math><tex-math><![CDATA[$[0.8,1]$]]></tex-math></alternatives></inline-formula> and a lower peak of no more than 0.4. Clearly, only the SOA correctly captures this feature while all others fail to do so. We also provide the density curves of observed test accuracy values for the <inline-formula id="j_nejsds26_ineq_051"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${3^{5}}$]]></tex-math></alternatives></inline-formula> factorial and 243-run random LHD, respectively, in the last row for reference. Figure <xref rid="j_nejsds26_fig_004">4</xref> gives the histograms of the test errors which are the differences from predictions of test accuracy on the hybrid design to the observed values. The histograms of those LHDs are skewed to the left, which implies that the models based on these designs tend to overpredict the test accuracy on the <inline-formula id="j_nejsds26_ineq_052"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${3^{5}}$]]></tex-math></alternatives></inline-formula> factorial, while models from the <inline-formula id="j_nejsds26_ineq_053"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${2^{5}}$]]></tex-math></alternatives></inline-formula> factorial and OACD tend to underpredict the test accuracy on the 243-run LHD, as their histograms are skewed to the right. Only the SOA is superior because its histogram depicts an approximately normal distribution that is symmetric around 0.</p>
<fig id="j_nejsds26_fig_004">
<label>Figure 4</label>
<caption>
<p>Histograms of test errors, differences between the observed test accuracy and the predicted test accuracy.</p>
</caption>
<graphic xlink:href="nejsds26_g004.jpg"/>
</fig>
<fig id="j_nejsds26_fig_005">
<label>Figure 5</label>
<caption>
<p>Histograms of distances from design points to design centers.</p>
</caption>
<graphic xlink:href="nejsds26_g005.jpg"/>
</fig>
<p>The significant advantage in building a Kriging model of the SOA brings us to take a closer look at these small designs themselves. For all designs, each column is rescaled to <inline-formula id="j_nejsds26_ineq_054"><alternatives><mml:math>
<mml:mo fence="true" stretchy="false">[</mml:mo>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo fence="true" stretchy="false">]</mml:mo></mml:math><tex-math><![CDATA[$[-1,1]$]]></tex-math></alternatives></inline-formula>, and we then calculate the Euclidean distance from each design point to the center of the design and make a histogram with a density curve for the distances of each design. These plots are given in Figure <xref rid="j_nejsds26_fig_005">5</xref>. Histograms of the distances for the <inline-formula id="j_nejsds26_ineq_055"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${3^{5}}$]]></tex-math></alternatives></inline-formula> factorial, 243-run random LHD and their hybrid design provided in the last row of Figure <xref rid="j_nejsds26_fig_005">5</xref> serve as benchmarks for comparison. Only the SOA captures the feature of the hybrid design that the density plot has a light lower tail but a heavy upper tail, while all other designs fail to do so. Remarkably, though a space-filling design is expected to scatter its design points uniformly over the design region, these space-filling designs we considered seem not as space-filling as expected because Figure <xref rid="j_nejsds26_fig_005">5</xref> shows that they do not provide points that are close to the centers or at corners of the regions.</p>
<p>In addition to using the Kriging model to determine the relationship between hyperparameters and test accuracy, we also consider the second-order model. Although the findings from this study are similar to the above, the second-order model is incapable of adequately capturing the test accuracy surface, as the test RMSE values are consistently larger than those obtained using the Kriging model for all cases listed in Table <xref rid="j_nejsds26_tab_002">2</xref>.</p>
</sec>
</sec>
<sec id="j_nejsds26_s_010">
<label>4</label>
<title>Conclusion and Discussion</title>
<p>This paper compared small designs in exploring the relationship between hyperparameters and test accuracy. We considered five numerical hyperparameters: learning rate, number of epochs, batch size, dropout rate and number of neurons in a hidden layer. Various factorials and space-filling designs for selecting combinations of these hyperparameters were examined. We evaluated the performance of a design via building a Kriging model that describes the relationship between hyperparameters and test accuracy. The comparison is made based on the MNIST dataset, and a deep neural network model with three hidden layers was our learning algorithm. Under the settings we considered, the comparison demonstrates that the 32-run SOA is the best choice for exploring the test accuracy surface based on the hyperparameters we used.</p>
<p>To further explore the usefulness of SOAs in hyperparameter optimization, more investigations are needed. For example, we may set up simulations for evaluating the performance of various designs, similar to the simulation in [<xref ref-type="bibr" rid="j_nejsds26_ref_042">42</xref>], use other datasets for the implementation, like the CIFAR-10 dataset [<xref ref-type="bibr" rid="j_nejsds26_ref_024">24</xref>], or consider broader comparison settings such as more designs, e.g., uniform projection designs [<xref ref-type="bibr" rid="j_nejsds26_ref_041">41</xref>], more hyperparameters, e.g., the number of training iterations, normalization and weight initialization, and so on. The present paper centers on examining the effect of various designs on the performance of hyperparameter tuning, while the primary goal of optimizing hyperparameters is to find an optimal hyperparameter combination that maximizes the overall performance of a learning algorithm. One may wish to consider a model-based method using a carefully selected design in the initial step of the hyperparameter tuning process in the follow-up work. Moreover, in this study, the SOA that outperforms all other comparable designs has 8 levels, while all other space-filling designs have 32 levels and factorials have no more than 3 levels. It would be interesting to consider a problem as to how many levels of a design are appropriate for use in the investigation of the test accuracy surface over hyperparameters. We leave all these to future research.</p>
</sec>
</body>
<back>
<ref-list id="j_nejsds26_reflist_001">
<title>References</title>
<ref id="j_nejsds26_ref_001">
<label>[1]</label><mixed-citation publication-type="journal"> <string-name><surname>Ba</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Myers</surname>, <given-names>W. R.</given-names></string-name> and <string-name><surname>Brenneman</surname>, <given-names>W. A.</given-names></string-name> (<year>2015</year>). <article-title>Optimal sliced Latin hypercube designs</article-title>. <source>Technometrics</source> <volume>57</volume> <fpage>479</fpage>–<lpage>487</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/00401706.2014.957867" xlink:type="simple">https://doi.org/10.1080/00401706.2014.957867</ext-link>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=3425485">MR3425485</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_002">
<label>[2]</label><mixed-citation publication-type="journal"> <string-name><surname>Bengio</surname>, <given-names>Y.</given-names></string-name> (<year>2009</year>). <article-title>Learning deep architectures for AI</article-title>. <source>Foundations and Trends in Machine Learning</source> <volume>2</volume> <fpage>1</fpage>–<lpage>127</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds26_ref_003">
<label>[3]</label><mixed-citation publication-type="journal"> <string-name><surname>Bergstra</surname>, <given-names>J.</given-names></string-name> and <string-name><surname>Bengio</surname>, <given-names>Y.</given-names></string-name> (<year>2012</year>). <article-title>Random search for hyper-parameter optimization</article-title>. <source>Journal of Machine Learning Research</source> <volume>13</volume> <fpage>281</fpage>–<lpage>305</lpage>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=2913701">MR2913701</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_004">
<label>[4]</label><mixed-citation publication-type="journal"> <string-name><surname>Bergstra</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Bardenet</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Bengio</surname>, <given-names>Y.</given-names></string-name> and <string-name><surname>Kégl</surname>, <given-names>B.</given-names></string-name> (<year>2011</year>). <article-title>Algorithms for hyper-parameter optimization</article-title>. <source>Advances in Neural Information Processing Systems</source> <volume>24</volume> <fpage>2546</fpage>–<lpage>2554</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds26_ref_005">
<label>[5]</label><mixed-citation publication-type="journal"> <string-name><surname>Bingham</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Sitter</surname>, <given-names>R. R.</given-names></string-name> and <string-name><surname>Tang</surname>, <given-names>B.</given-names></string-name> (<year>2009</year>). <article-title>Orthogonal and nearly orthogonal designs for computer experiments</article-title>. <source>Biometrika</source> <volume>96</volume> <fpage>51</fpage>–<lpage>65</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1093/biomet/asn057" xlink:type="simple">https://doi.org/10.1093/biomet/asn057</ext-link>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=2482134">MR2482134</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_006">
<label>[6]</label><mixed-citation publication-type="other"> <string-name><surname>Carnell</surname>, <given-names>R.</given-names></string-name> (2022). lhs: Latin hypercube samples. R package version 1.1.5. <uri>https://cran.r-project.org/web/packages/lhs/index.html</uri>.</mixed-citation>
</ref>
<ref id="j_nejsds26_ref_007">
<label>[7]</label><mixed-citation publication-type="book"> <string-name><surname>Cheng</surname>, <given-names>C. S.</given-names></string-name> (<year>2014</year>) <source>Theory of factorial design: single- and multi-stratum experiments</source>. <publisher-name>CRC Press</publisher-name>.</mixed-citation>
</ref>
<ref id="j_nejsds26_ref_008">
<label>[8]</label><mixed-citation publication-type="book"> <string-name><surname>Cressie</surname>, <given-names>N.</given-names></string-name> (<year>2015</year>) <source>Statistics for spatial data</source>. <publisher-name>John Wiley &amp; Sons</publisher-name>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=3559472">MR3559472</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_009">
<label>[9]</label><mixed-citation publication-type="chapter"> <string-name><surname>Falkner</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Klein</surname>, <given-names>A.</given-names></string-name> and <string-name><surname>Hutter</surname>, <given-names>F.</given-names></string-name> (<year>2018</year>). <chapter-title>BOHB: robust and efficient hyperparameter optimization at scale</chapter-title>. In <source>International Conference on Machine Learning</source> <volume>80</volume> <fpage>1437</fpage>–<lpage>1446</lpage>. <publisher-name>PMLR</publisher-name>.</mixed-citation>
</ref>
<ref id="j_nejsds26_ref_010">
<label>[10]</label><mixed-citation publication-type="book"> <string-name><surname>Fang</surname>, <given-names>K. T.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>R.</given-names></string-name> and <string-name><surname>Sudjianto</surname>, <given-names>A.</given-names></string-name> (<year>2006</year>) <source>Design and modeling for computer experiments</source>. <publisher-name>CRC Press</publisher-name>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=2510302">MR2510302</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_011">
<label>[11]</label><mixed-citation publication-type="journal"> <string-name><surname>Fang</surname>, <given-names>K. T.</given-names></string-name>, <string-name><surname>Lin</surname>, <given-names>D. K.</given-names></string-name>, <string-name><surname>Winker</surname>, <given-names>P.</given-names></string-name> and <string-name><surname>Zhang</surname>, <given-names>Y.</given-names></string-name> (<year>2000</year>). <article-title>Uniform design: theory and application</article-title>. <source>Technometrics</source> <volume>42</volume> <fpage>237</fpage>–<lpage>248</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.2307/1271079" xlink:type="simple">https://doi.org/10.2307/1271079</ext-link>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=1801031">MR1801031</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_012">
<label>[12]</label><mixed-citation publication-type="book"> <string-name><surname>Fang</surname>, <given-names>K. T.</given-names></string-name>, <string-name><surname>Liu</surname>, <given-names>M. Q.</given-names></string-name>, <string-name><surname>Qin</surname>, <given-names>H.</given-names></string-name> and <string-name><surname>Zhou</surname>, <given-names>Y.</given-names></string-name> (<year>2018</year>) <source>Theory and application of uniform experimental designs</source>. <publisher-name>Springer</publisher-name>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/978-981-13-2041-5" xlink:type="simple">https://doi.org/10.1007/978-981-13-2041-5</ext-link>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=3837569">MR3837569</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_013">
<label>[13]</label><mixed-citation publication-type="chapter"> <string-name><surname>Feurer</surname>, <given-names>M.</given-names></string-name> and <string-name><surname>Hutter</surname>, <given-names>F.</given-names></string-name> (<year>2019</year>). <chapter-title>Hyperparameter optimization</chapter-title>. In <source>Automated Machine Learning</source> <fpage>3</fpage>–<lpage>33</lpage> <publisher-name>Springer</publisher-name>.</mixed-citation>
</ref>
<ref id="j_nejsds26_ref_014">
<label>[14]</label><mixed-citation publication-type="journal"> <string-name><surname>Ginsbourger</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Dupuy</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Badea</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Carraro</surname>, <given-names>L.</given-names></string-name> and <string-name><surname>Roustant</surname>, <given-names>O.</given-names></string-name> (<year>2009</year>). <article-title>A note on the choice and the estimation of kriging models for the analysis of deterministic computer experiments</article-title>. <source>Applied Stochastic Models in Business and Industry</source> <volume>25</volume> <fpage>115</fpage>–<lpage>131</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1002/asmb.741" xlink:type="simple">https://doi.org/10.1002/asmb.741</ext-link>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=2510851">MR2510851</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_015">
<label>[15]</label><mixed-citation publication-type="other"> <string-name><surname>Groemping</surname>, <given-names>U.</given-names></string-name> and <string-name><surname>Carnell</surname>, <given-names>R.</given-names></string-name> (2022). SOAs: creation of stratum orthogonal arrays. R package version 1.3. <uri>https://cran.r-project.org/web/packages/SOAs/index.html</uri>.</mixed-citation>
</ref>
<ref id="j_nejsds26_ref_016">
<label>[16]</label><mixed-citation publication-type="other"> <string-name><surname>Groemping</surname>, <given-names>U.</given-names></string-name>, <string-name><surname>Amarov</surname>, <given-names>B.</given-names></string-name> and <string-name><surname>Xu</surname>, <given-names>H.</given-names></string-name> (2022). DoE.base: full factorials, orthogonal arrays and base utilities for DoE packages. R package version 1.2-1. <uri>https://cran.r-project.org/web/packages/DoE.base/index.html</uri>.</mixed-citation>
</ref>
<ref id="j_nejsds26_ref_017">
<label>[17]</label><mixed-citation publication-type="journal"> <string-name><surname>He</surname>, <given-names>Y.</given-names></string-name> and <string-name><surname>Tang</surname>, <given-names>B.</given-names></string-name> (<year>2013</year>). <article-title>Strong orthogonal arrays and associated Latin hypercubes for computer experiments</article-title>. <source>Biometrika</source> <volume>100</volume> <fpage>254</fpage>–<lpage>260</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1093/biomet/ass065" xlink:type="simple">https://doi.org/10.1093/biomet/ass065</ext-link>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=3034340">MR3034340</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_018">
<label>[18]</label><mixed-citation publication-type="journal"> <string-name><surname>He</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Cheng</surname>, <given-names>C. -S.</given-names></string-name> and <string-name><surname>Tang</surname>, <given-names>B.</given-names></string-name> (<year>2018</year>). <article-title>Strong orthogonal arrays of strength two plus</article-title>. <source>The Annals of Statistics</source> <volume>46</volume> <fpage>457</fpage>–<lpage>468</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1214/17-AOS1555" xlink:type="simple">https://doi.org/10.1214/17-AOS1555</ext-link>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=3782373">MR3782373</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_019">
<label>[19]</label><mixed-citation publication-type="chapter"> <string-name><surname>Hutter</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Hoos</surname>, <given-names>H. H.</given-names></string-name> and <string-name><surname>Leyton-Brown</surname>, <given-names>K.</given-names></string-name> (<year>2011</year>). <chapter-title>Sequential model-based optimization for general algorithm configuration</chapter-title>. In <source>International Conference on Learning and Intelligent Optimization</source> <fpage>507</fpage>–<lpage>523</lpage>. <publisher-name>Springer</publisher-name>.</mixed-citation>
</ref>
<ref id="j_nejsds26_ref_020">
<label>[20]</label><mixed-citation publication-type="journal"> <string-name><surname>Johnson</surname>, <given-names>M. E.</given-names></string-name>, <string-name><surname>Moore</surname>, <given-names>L. M.</given-names></string-name> and <string-name><surname>Ylvisaker</surname>, <given-names>D.</given-names></string-name> (<year>1990</year>). <article-title>Minimax and maximin distance designs</article-title>. <source>Journal of Statistical Planning and Inference</source> <volume>26</volume> <fpage>131</fpage>–<lpage>148</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/0378-3758(90)90122-B" xlink:type="simple">https://doi.org/10.1016/0378-3758(90)90122-B</ext-link>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=1079258">MR1079258</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_021">
<label>[21]</label><mixed-citation publication-type="journal"> <string-name><surname>Joseph</surname>, <given-names>V. R.</given-names></string-name>, <string-name><surname>Gul</surname>, <given-names>E.</given-names></string-name> and <string-name><surname>Ba</surname>, <given-names>S.</given-names></string-name> (<year>2015</year>). <article-title>Maximum projection designs for computer experiments</article-title>. <source>Biometrika</source> <volume>102</volume> <fpage>371</fpage>–<lpage>380</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1093/biomet/asv002" xlink:type="simple">https://doi.org/10.1093/biomet/asv002</ext-link>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=3371010">MR3371010</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_022">
<label>[22]</label><mixed-citation publication-type="journal"> <string-name><surname>Kleijnen</surname>, <given-names>J. P.</given-names></string-name> (<year>2009</year>). <article-title>Kriging metamodeling in simulation: a review</article-title>. <source>European Journal of Operational Research</source> <volume>192</volume> <fpage>707</fpage>–<lpage>716</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.ejor.2007.10.013" xlink:type="simple">https://doi.org/10.1016/j.ejor.2007.10.013</ext-link>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=2457613">MR2457613</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_023">
<label>[23]</label><mixed-citation publication-type="journal"> <string-name><surname>Krige</surname>, <given-names>D. G.</given-names></string-name> (<year>1951</year>). <article-title>A statistical approach to some basic mine valuation problems on the Witwatersrand</article-title>. <source>Journal of the Southern African Institute of Mining and Metallurgy</source> <volume>52</volume> <fpage>119</fpage>–<lpage>139</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds26_ref_024">
<label>[24]</label><mixed-citation publication-type="other"> <string-name><surname>Krizhevsky</surname>, <given-names>A.</given-names></string-name> (2009). Learning multiple layers of features from tiny images. Technical Report, University of Toronto. <uri>http://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdf</uri>.</mixed-citation>
</ref>
<ref id="j_nejsds26_ref_025">
<label>[25]</label><mixed-citation publication-type="journal"> <string-name><surname>Lecun</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Bottou</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Bengio</surname>, <given-names>Y.</given-names></string-name> and <string-name><surname>Haffner</surname>, <given-names>P.</given-names></string-name> (<year>1998</year>). <article-title>Gradient-based learning applied to document recognition</article-title>. <source>Proceedings of the IEEE</source> <volume>86</volume> <fpage>2278</fpage>–<lpage>2324</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds26_ref_026">
<label>[26]</label><mixed-citation publication-type="journal"> <string-name><surname>Li</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Jamieson</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>DeSalvo</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Rostamizadeh</surname>, <given-names>A.</given-names></string-name> and <string-name><surname>Talwalkar</surname>, <given-names>A.</given-names></string-name> (<year>2017</year>). <article-title>Hyperband: a novel bandit-based approach to hyperparameter optimization</article-title>. <source>The Journal of Machine Learning Research</source> <volume>18</volume> <fpage>6765</fpage>–<lpage>6816</lpage>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=3827073">MR3827073</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_027">
<label>[27]</label><mixed-citation publication-type="journal"> <string-name><surname>Lin</surname>, <given-names>C. D.</given-names></string-name>, <string-name><surname>Mukerjee</surname>, <given-names>R.</given-names></string-name> and <string-name><surname>Tang</surname>, <given-names>B.</given-names></string-name> (<year>2009</year>). <article-title>Construction of orthogonal and nearly orthogonal Latin hypercubes</article-title>. <source>Biometrika</source> <volume>96</volume> <fpage>243</fpage>–<lpage>247</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1093/biomet/asn064" xlink:type="simple">https://doi.org/10.1093/biomet/asn064</ext-link>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=2482150">MR2482150</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_028">
<label>[28]</label><mixed-citation publication-type="journal"> <string-name><surname>Liu</surname>, <given-names>H.</given-names></string-name> and <string-name><surname>Liu</surname>, <given-names>M. Q.</given-names></string-name> (<year>2015</year>). <article-title>Column-orthogonal strong orthogonal arrays and sliced strong orthogonal arrays</article-title>. <source>Statistica Sinica</source> <fpage>1713</fpage>–<lpage>1734</lpage>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=3409089">MR3409089</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_029">
<label>[29]</label><mixed-citation publication-type="book"> <string-name><surname>Livingstone</surname>, <given-names>D. J.</given-names></string-name> (<year>2008</year>) <source>Artificial neural networks: methods and applications</source>. <publisher-name>Springer</publisher-name>.</mixed-citation>
</ref>
<ref id="j_nejsds26_ref_030">
<label>[30]</label><mixed-citation publication-type="journal"> <string-name><surname>Lujan-Moreno</surname>, <given-names>G. A.</given-names></string-name>, <string-name><surname>Howard</surname>, <given-names>P. R.</given-names></string-name>, <string-name><surname>Rojas</surname>, <given-names>O. G.</given-names></string-name> and <string-name><surname>Montgomery</surname>, <given-names>D. C.</given-names></string-name> (<year>2018</year>). <article-title>Design of experiments and response surface methodology to tune machine learning hyperparameters, with a random forest case-study</article-title>. <source>Expert Systems with Applications</source> <volume>109</volume> <fpage>195</fpage>–<lpage>205</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds26_ref_031">
<label>[31]</label><mixed-citation publication-type="journal"> <string-name><surname>McCulloch</surname>, <given-names>W. S.</given-names></string-name> and <string-name><surname>Pitts</surname>, <given-names>W.</given-names></string-name> (<year>1943</year>). <article-title>A logical calculus of the ideas immanent in nervous activity</article-title>. <source>The Bulletin of Mathematical Biophysics</source> <volume>5</volume> <fpage>115</fpage>–<lpage>133</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/bf02478259" xlink:type="simple">https://doi.org/10.1007/bf02478259</ext-link>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=0010388">MR0010388</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_032">
<label>[32]</label><mixed-citation publication-type="journal"> <string-name><surname>McKay</surname>, <given-names>M. D.</given-names></string-name>, <string-name><surname>Beckman</surname>, <given-names>R. J.</given-names></string-name> and <string-name><surname>Conover</surname>, <given-names>W. J.</given-names></string-name> (<year>1979</year>). <article-title>A comparison of three methods for selecting values of input variables in the analysis of output from a computer code</article-title>. <source>Technometrics</source> <volume>21</volume> <fpage>239</fpage>–<lpage>245</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.2307/1268522" xlink:type="simple">https://doi.org/10.2307/1268522</ext-link>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=0533252">MR0533252</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_033">
<label>[33]</label><mixed-citation publication-type="book"> <string-name><surname>Mee</surname>, <given-names>R.</given-names></string-name> (<year>2009</year>) <source>A comprehensive guide to factorial two-level experimentation</source>. <publisher-name>Springer Science &amp; Business Media</publisher-name>.</mixed-citation>
</ref>
<ref id="j_nejsds26_ref_034">
<label>[34]</label><mixed-citation publication-type="journal"> <string-name><surname>Mockus</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Tiesis</surname>, <given-names>V.</given-names></string-name> and <string-name><surname>Zilinskas</surname>, <given-names>A.</given-names></string-name> (<year>1978</year>). <article-title>The application of Bayesian methods for seeking the extremum</article-title>. <source>Towards Global Optimization</source> <volume>2</volume> <fpage>117</fpage>–<lpage>129</lpage>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=0471305">MR0471305</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_035">
<label>[35]</label><mixed-citation publication-type="journal"> <string-name><surname>Sacks</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Welch</surname>, <given-names>W. J.</given-names></string-name>, <string-name><surname>Mitchell</surname>, <given-names>T. J.</given-names></string-name> and <string-name><surname>Wynn</surname>, <given-names>H. P.</given-names></string-name> (<year>1989</year>). <article-title>Design and analysis of computer experiments</article-title>. <source>Statistical Science</source> <volume>4</volume> <fpage>409</fpage>–<lpage>423</lpage>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=1041765">MR1041765</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_036">
<label>[36]</label><mixed-citation publication-type="book"> <string-name><surname>Santner</surname>, <given-names>T. J.</given-names></string-name>, <string-name><surname>Williams</surname>, <given-names>B. J.</given-names></string-name> and <string-name><surname>Notz</surname>, <given-names>W. I.</given-names></string-name> (<year>2003</year>) <source>The design and analysis of computer experiments</source>. <publisher-name>Springer</publisher-name>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/978-1-4757-3799-8" xlink:type="simple">https://doi.org/10.1007/978-1-4757-3799-8</ext-link>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=2160708">MR2160708</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_037">
<label>[37]</label><mixed-citation publication-type="journal"> <string-name><surname>Schmidhuber</surname>, <given-names>J.</given-names></string-name> (<year>2015</year>). <article-title>Deep learning in neural networks: an overview</article-title>. <source>Neural Networks</source> <volume>61</volume> <fpage>85</fpage>–<lpage>117</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds26_ref_038">
<label>[38]</label><mixed-citation publication-type="journal"> <string-name><surname>Shi</surname>, <given-names>C.</given-names></string-name> and <string-name><surname>Tang</surname>, <given-names>B.</given-names></string-name> (<year>2020</year>). <article-title>Construction results for strong orthogonal arrays of strength three</article-title>. <source>Bernoulli</source> <volume>26</volume> <fpage>418</fpage>–<lpage>431</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.3150/19-BEJ1130" xlink:type="simple">https://doi.org/10.3150/19-BEJ1130</ext-link>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=4036039">MR4036039</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_039">
<label>[39]</label><mixed-citation publication-type="journal"> <string-name><surname>Snoek</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Larochelle</surname>, <given-names>H.</given-names></string-name> and <string-name><surname>Adams</surname>, <given-names>R. P.</given-names></string-name> (<year>2012</year>). <article-title>Practical bayesian optimization of machine learning algorithms</article-title>. <source>Advances in Neural Information Processing Systems</source> <volume>25</volume>.</mixed-citation>
</ref>
<ref id="j_nejsds26_ref_040">
<label>[40]</label><mixed-citation publication-type="journal"> <string-name><surname>Sun</surname>, <given-names>C.</given-names></string-name> and <string-name><surname>Tang</surname>, <given-names>B.</given-names></string-name> (<year>2021</year>). <article-title>Uniform projection designs and strong orthogonal arrays</article-title>. <source>Journal of the American Statistical Association</source> <volume>0</volume> <fpage>1</fpage>–<lpage>15</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/01621459.2021.1935268" xlink:type="simple">https://doi.org/10.1080/01621459.2021.1935268</ext-link>.</mixed-citation>
</ref>
<ref id="j_nejsds26_ref_041">
<label>[41]</label><mixed-citation publication-type="journal"> <string-name><surname>Sun</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Wang</surname>, <given-names>Y.</given-names></string-name> and <string-name><surname>Xu</surname>, <given-names>H.</given-names></string-name> (<year>2019</year>). <article-title>Uniform projection designs</article-title>. <source>The Annals of Statistics</source> <volume>47</volume> <fpage>641</fpage>–<lpage>661</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1214/18-AOS1705" xlink:type="simple">https://doi.org/10.1214/18-AOS1705</ext-link>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=3909945">MR3909945</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_042">
<label>[42]</label><mixed-citation publication-type="journal"> <string-name><surname>Tian</surname>, <given-names>Y.</given-names></string-name> and <string-name><surname>Xu</surname>, <given-names>H.</given-names></string-name> (<year>2022</year>). <article-title>A minimum aberration-type criterion for selecting space-filling designs</article-title>. <source>Biometrika</source> <volume>109</volume> <fpage>489</fpage>–<lpage>501</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1093/biomet/asab021" xlink:type="simple">https://doi.org/10.1093/biomet/asab021</ext-link>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=4430970">MR4430970</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_043">
<label>[43]</label><mixed-citation publication-type="chapter"> <string-name><surname>Van Rijn</surname>, <given-names>J. N.</given-names></string-name> and <string-name><surname>Hutter</surname>, <given-names>F.</given-names></string-name> (<year>2018</year>). <chapter-title>Hyperparameter importance across datasets</chapter-title>. In <source>Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</source> <fpage>2367</fpage>–<lpage>2376</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds26_ref_044">
<label>[44]</label><mixed-citation publication-type="book"> <string-name><surname>Wackernagel</surname>, <given-names>H.</given-names></string-name> (<year>2003</year>) <source>Multivariate geostatistics: an introduction with applications</source>. <publisher-name>Springer Science &amp; Business Media</publisher-name>.</mixed-citation>
</ref>
<ref id="j_nejsds26_ref_045">
<label>[45]</label><mixed-citation publication-type="book"> <string-name><surname>Wu</surname>, <given-names>C. F. J.</given-names></string-name> and <string-name><surname>Hamada</surname>, <given-names>M. S.</given-names></string-name> (<year>2009</year>) <source>Experiments: planning, analysis, and optimization</source>. <publisher-name>John Wiley &amp; Sons</publisher-name>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=2583259">MR2583259</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_046">
<label>[46]</label><mixed-citation publication-type="journal"> <string-name><surname>Wu</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Chen</surname>, <given-names>S.</given-names></string-name> and <string-name><surname>Liu</surname>, <given-names>X.</given-names></string-name> (<year>2020</year>). <article-title>Efficient hyperparameter optimization through model-based reinforcement learning</article-title>. <source>Neurocomputing</source> <volume>409</volume> <fpage>381</fpage>–<lpage>393</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds26_ref_047">
<label>[47]</label><mixed-citation publication-type="journal"> <string-name><surname>Xiao</surname>, <given-names>Q.</given-names></string-name>, <string-name><surname>Wang</surname>, <given-names>L.</given-names></string-name> and <string-name><surname>Xu</surname>, <given-names>H.</given-names></string-name> (<year>2019</year>). <article-title>Application of Kriging models for a drug combination experiment on lung cancer</article-title>. <source>Statistics in Medicine</source> <volume>38</volume> <fpage>236</fpage>–<lpage>246</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1002/sim.7971" xlink:type="simple">https://doi.org/10.1002/sim.7971</ext-link>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=3892817">MR3892817</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_048">
<label>[48]</label><mixed-citation publication-type="journal"> <string-name><surname>Xu</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Jaynes</surname>, <given-names>J.</given-names></string-name> and <string-name><surname>Ding</surname>, <given-names>X.</given-names></string-name> (<year>2014</year>). <article-title>Combining two-level and three-level orthogonal arrays for factor screening and response surface exploration</article-title>. <source>Statistica Sinica</source> <volume>24</volume> <fpage>269</fpage>–<lpage>289</lpage>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=3183684">MR3183684</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds26_ref_049">
<label>[49]</label><mixed-citation publication-type="other"> <string-name><surname>Zhang</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Quan</surname>, <given-names>S.</given-names></string-name> and <string-name><surname>Yang</surname>, <given-names>Z.</given-names></string-name> (2018). UniDOE: uniform design of experiments. R package version 1.0.2. <uri>http://rmirror.lau.edu.lb/web/packages/UniDOE/index.html</uri>.</mixed-citation>
</ref>
<ref id="j_nejsds26_ref_050">
<label>[50]</label><mixed-citation publication-type="chapter"> <string-name><surname>Zhang</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Chen</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Yao</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Ge</surname>, <given-names>C.</given-names></string-name> and <string-name><surname>Dong</surname>, <given-names>M.</given-names></string-name> (<year>2019</year>). <chapter-title>Deep neural network hyperparameter optimization with orthogonal array tuning</chapter-title>. In <source>International Conference on Neural Information Processing</source> <fpage>287</fpage>–<lpage>295</lpage>. <publisher-name>Springer</publisher-name>.</mixed-citation>
</ref>
<ref id="j_nejsds26_ref_051">
<label>[51]</label><mixed-citation publication-type="other"> <string-name><surname>Zoph</surname>, <given-names>B.</given-names></string-name> and <string-name><surname>Le</surname>, <given-names>Q. V.</given-names></string-name> (2016). Neural architecture search with reinforcement learning. <italic>arXiv preprint arXiv:</italic><ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/1611.01578"><italic>1611.01578</italic></ext-link>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>
