<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">NEJSDS</journal-id>
<journal-title-group><journal-title>The New England Journal of Statistics in Data Science</journal-title></journal-title-group>
<issn pub-type="ppub">2693-7166</issn><issn-l>2693-7166</issn-l>
<publisher>
<publisher-name>New England Statistical Society</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">NEJSDS13EDI</article-id>
<article-id pub-id-type="doi">10.51387/23-NEJSDS13EDI</article-id>
<article-categories>
<subj-group subj-group-type="heading"><subject>Editorial</subject></subj-group>
</article-categories>
<title-group>
<article-title>Editorial. Design and Analysis of Experiments for Data Science</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Wang</surname><given-names>HaiYing</given-names></name><email xlink:href="mailto:haiying.wang@uconn.edu">haiying.wang@uconn.edu</email><xref ref-type="aff" rid="j_nejsds13edi_aff_004"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Deng</surname><given-names>Xinwei</given-names></name><email xlink:href="mailto:xdeng@vt.edu">xdeng@vt.edu</email><xref ref-type="aff" rid="j_nejsds13edi_aff_002"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Lin</surname><given-names>Devon</given-names></name><email xlink:href="mailto:devon.lin@queensu.ca">devon.lin@queensu.ca</email><xref ref-type="aff" rid="j_nejsds13edi_aff_003"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Chen</surname><given-names>Ming-Hui</given-names></name><email xlink:href="mailto:ming-hui.chen@uconn.edu">ming-hui.chen@uconn.edu</email><xref ref-type="aff" rid="j_nejsds13edi_aff_001"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Xie</surname><given-names>Min-ge</given-names></name><email xlink:href="mailto:mxie@stat.rutgers.edu">mxie@stat.rutgers.edu</email><xref ref-type="aff" rid="j_nejsds13edi_aff_005"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Wu</surname><given-names>Jing</given-names></name><email xlink:href="mailto:jing_wu@uri.edu">jing_wu@uri.edu</email><xref ref-type="aff" rid="j_nejsds13edi_aff_006"/>
</contrib>
<aff id="j_nejsds13edi_aff_001">Department of Statistics, <institution>University of Connecticut</institution>, <country>USA</country>. E-mail address: <email xlink:href="mailto:haiying.wang@uconn.edu">haiying.wang@uconn.edu</email></aff>
<aff id="j_nejsds13edi_aff_002">Department of Statistics, <institution>Virginia Tech</institution>, <country>USA</country>. E-mail address: <email xlink:href="mailto:xdeng@vt.edu">xdeng@vt.edu</email></aff>
<aff id="j_nejsds13edi_aff_003">Department of Mathematics and Statistics, <institution>Queen’s University</institution>, <country>Canada</country>. E-mail address: <email xlink:href="mailto:devon.lin@queensu.ca">devon.lin@queensu.ca</email></aff>
<aff id="j_nejsds13edi_aff_004">Department of Statistics, <institution>University of Connecticut</institution>, <country>USA</country>. E-mail address: <email xlink:href="mailto:ming-hui.chen@uconn.edu">ming-hui.chen@uconn.edu</email></aff>
<aff id="j_nejsds13edi_aff_005">Department of Statistics, <institution>Rutgers University</institution>, <country>USA</country>. E-mail address: <email xlink:href="mailto:mxie@stat.rutgers.edu">mxie@stat.rutgers.edu</email></aff>
<aff id="j_nejsds13edi_aff_006">Department of Computer Science and Statistics, <institution>University of Rhode Island</institution>, <country>USA</country>. E-mail address: <email xlink:href="mailto:jing_wu@uri.edu">jing_wu@uri.edu</email></aff>
</contrib-group>
<pub-date pub-type="ppub"><year>2023</year></pub-date><pub-date pub-type="epub"><day>27</day><month>11</month><year>2023</year></pub-date><volume>1</volume><issue>3</issue><fpage>297</fpage><lpage>298</lpage>
<permissions><copyright-statement>© 2023 New England Statistical Society</copyright-statement><copyright-year>2023</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
</article-meta>
</front>
<body>
<sec id="j_nejsds13edi_s_001">
<label>1</label>
<title>Introduction</title>
<p>We are pleased to introduce this special issue dedicated to “Design and Analysis of Experiments for Data Science”. The statistical methodology for design and analysis of experiments has played a pivotal role in various scientific fields for over a century. It guides researchers to make valid inference from experiments with optimized resource allocation. This enduring discipline remains a vibrant and dynamic field of modern research. This collection of papers present cutting-edge research and innovative methodologies related to experimental designs. They provide valuable insights to address challenges of modern experimental designs and demonstrate the pivotal role of experimental designs in various domains, from clinical trials to online experimentation and beyond, in the increasingly data-driven world.</p>
</sec>
<sec id="j_nejsds13edi_s_002">
<label>2</label>
<title>Research Articles</title>
<p>Chen et al. [<xref ref-type="bibr" rid="j_nejsds13edi_ref_002">2</xref>] showcase how Particle Swarm Optimization (PSO) can efficiently find optimal designs for longitudinal studies with diverse correlation structures and different models. Longitudinal studies present a unique set of challenges when it comes to experimental designs. The application of PSO in this context is a game-changer. The potential applications are far-reaching, from the Michaelis-Menten model to growth curve studies and HIV dynamic modeling. The optimization provided by PSO opens up new avenues for the scientific community.</p>
<p>Clinical trialists often grapple with the need to balance statistical power, ethical considerations, and control over the type I error rate. Zhu et al. [<xref ref-type="bibr" rid="j_nejsds13edi_ref_011">11</xref>] propose an adaptive seamless design (ASD) in conjunction with response adaptive randomization (RAR) to address these challenges. This innovative approach demonstrates how it is possible to achieve efficient and ethical objectives while maintaining control over the type I error rate, a crucial aspect in clinical trials.</p>
<p>The task of discovering significant factors in experiments with a large number of variables and limited observations is a common problem in data science. Qi and Chien [<xref ref-type="bibr" rid="j_nejsds13edi_ref_007">7</xref>] introduce a new approach to construct supersaturated designs with low coherence, enhancing their usability for variable selection, particularly in methods like the Lasso. The real-world examples provided illustrate the practical value of this approach.</p>
<p>Hyperparameter tuning is a key factor in the success of deep learning models. However, the computational expense involved can be prohibitive. Shi et al. [<xref ref-type="bibr" rid="j_nejsds13edi_ref_009">9</xref>] delve into the exploration of hyperparameters and the collection of informative data for deep learning techniques. The findings demonstrate the superiority of the technique of strong orthogonal array in efficiently collecting data to improve the test accuracy with deep learning, a critical measure of learning performance.</p>
<p>Bui et al. [<xref ref-type="bibr" rid="j_nejsds13edi_ref_001">1</xref>] propose a general additive network effect (GANE) model. Innovative experiments are essential for businesses, and social network companies are at the forefront of testing new ideas and product changes. The unique challenges these experiments pose require specialized approaches. The introduction of the GANE model provides a comprehensive framework for understanding treatment effects and network influence. The proposed power-degree specification showcases how specialized experiments can yield precise results, even in the face of model misspecification.</p>
<p>Crossover and interference models have diverse applications, but their complex nature has limited their practical utility. Hao et al. [<xref ref-type="bibr" rid="j_nejsds13edi_ref_003">3</xref>] present an algorithm designed to efficiently generate crossover designs under various conditions. The inclusion of a user-friendly interface and an R package broadens its accessibility and usability in a wide range of applications.</p>
<p>Systems with both quantitative and qualitative responses require specialized experimental designs. Kang et al. [<xref ref-type="bibr" rid="j_nejsds13edi_ref_004">4</xref>] introduce a Bayesian D-optimal design method that caters to such systems, providing an efficient approach to constructing both local and global D-optimal designs. The inclusion of prior distributions and a point-exchange search algorithm allows for meaningful interpretations and practical application in various contexts.</p>
<p>Personalized decision-making in controlled experiments is of growing interest, particularly in clinical trials and user behavior studies. Li et al. [<xref ref-type="bibr" rid="j_nejsds13edi_ref_005">5</xref>] tackle the challenge of optimizing treatment allocation while considering observational covariates. The proposed method seeks to maximize the variance of personalized treatment effects, enhancing precision in decision-making processes. Numerical studies validate the method’s quality and applicability.</p>
<p>Pronzato and Rendas [<xref ref-type="bibr" rid="j_nejsds13edi_ref_006">6</xref>] address integrated squared error (ISE) estimation for predicting unknown functions. They calculate ISE estimators as weighted averages of predictor residuals at selected points. Their study shows that minimizing mean squared error of ISE estimators is equivalent to minimizing maximum mean discrepancy (MMD). Sequential Bayesian quadrature is utilized to create nested validation designs that minimize MMD at each step. The optimal ISE estimate is expressed as the integral of a linear reconstruction of interpolator residual squares over the function’s domain. The validation designs maintain a space-filling property. Numerical experiments validate the method’s strong performance and robustness.</p>
<p>Online experimentation frequently encounters incomplete metric data, making imputation an essential step for analysis. Shen et al. [<xref ref-type="bibr" rid="j_nejsds13edi_ref_008">8</xref>] introduce a clustering-based imputation method, considering both experiment-specific features and user activities, to improve the analysis of online experiments. This research lays the foundation for more efficient imputation of large-scale data in online experimentation, benefiting both simulations and real-world experiments.</p>
<p>The explosion of big data requires efficient subdata selection methods to perform inferences while managing computational costs. Singh and Stufken [<xref ref-type="bibr" rid="j_nejsds13edi_ref_010">10</xref>] present the information-based optimal subdata selection method for selecting subdata with strong statistical properties in linear regression models. The idea of combining lasso and subdata selection extends the capabilities of subdata selection, offering an efficient solution for handling large datasets with many variables.</p>
</sec>
<sec id="j_nejsds13edi_s_003">
<label>3</label>
<title>Remark</title>
<p>We hope that this special issue, featuring eleven diverse and innovative research papers, will inspire further exploration and innovation in the field of experimental designs for data science. The methodologies and insights presented in these papers have the potential to transform how experiments are conducted and analyzed in our data-driven world. We invite you to delve into these papers to discover the latest advancements and ideas on experimental designs in data science.</p>
</sec>
</body>
<back>
<ack id="j_nejsds13edi_ack_001">
<title>Acknowledgements</title>
<p>We extend our gratitude to all the authors, reviewers, and contributors who have made this issue possible. Their dedication and expertise have ensured the quality and relevance of the papers presented.</p></ack>
<ref-list id="j_nejsds13edi_reflist_001">
<title>References</title>
<ref id="j_nejsds13edi_ref_001">
<label>[1]</label><mixed-citation publication-type="other"> <string-name><surname>Bui</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Steiner</surname>, <given-names>S.</given-names></string-name> and <string-name><surname>Stevens</surname>, <given-names>N.</given-names></string-name> (2023). General Additive Network Effect Models. <italic>The New England Journal of Statistics in Data Science</italic> 1–19. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.51387/23-NEJSDS29" xlink:type="simple">https://doi.org/10.51387/23-NEJSDS29</ext-link>.</mixed-citation>
</ref>
<ref id="j_nejsds13edi_ref_002">
<label>[2]</label><mixed-citation publication-type="other"> <string-name><surname>Chen</surname>, <given-names>P.-Y.</given-names></string-name>, <string-name><surname>Chen</surname>, <given-names>R.-B.</given-names></string-name> and <string-name><surname>Wong</surname>, <given-names>W. K.</given-names></string-name> (2023). Particle Swarm Optimization for Finding Efficient Longitudinal Exact Designs for Nonlinear Models. <italic>The New England Journal of Statistics in Data Science</italic> 1–15. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.51387/23-NEJSDS45" xlink:type="simple">https://doi.org/10.51387/23-NEJSDS45</ext-link>.</mixed-citation>
</ref>
<ref id="j_nejsds13edi_ref_003">
<label>[3]</label><mixed-citation publication-type="other"> <string-name><surname>Hao</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Yang</surname>, <given-names>M.</given-names></string-name> and <string-name><surname>Zheng</surname>, <given-names>W.</given-names></string-name> (2023). Algorithm-Based Optimal and Efficient Exact Experimental Designs for Crossover and Interference Models. <italic>The New England Journal of Statistics in Data Science</italic> 1–10. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.51387/23-NEJSDS41" xlink:type="simple">https://doi.org/10.51387/23-NEJSDS41</ext-link>.</mixed-citation>
</ref>
<ref id="j_nejsds13edi_ref_004">
<label>[4]</label><mixed-citation publication-type="other"> <string-name><surname>Kang</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Deng</surname>, <given-names>X.</given-names></string-name> and <string-name><surname>Jin</surname>, <given-names>R.</given-names></string-name> (2023). Bayesian D-Optimal Design of Experiments with Quantitative and Qualitative Responses. <italic>The New England Journal of Statistics in Data Science</italic> 1–15. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.51387/23-NEJSDS30" xlink:type="simple">https://doi.org/10.51387/23-NEJSDS30</ext-link>.</mixed-citation>
</ref>
<ref id="j_nejsds13edi_ref_005">
<label>[5]</label><mixed-citation publication-type="other"> <string-name><surname>Li</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>Q.</given-names></string-name>, <string-name><surname>Khademi</surname>, <given-names>A.</given-names></string-name> and <string-name><surname>Yang</surname>, <given-names>B.</given-names></string-name> (2023). Optimal Design of Controlled Experiments for Personalized Decision Making in the Presence of Observational Covariates. <italic>The New England Journal of Statistics in Data Science</italic> 1–8. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.51387/23-NEJSDS22" xlink:type="simple">https://doi.org/10.51387/23-NEJSDS22</ext-link>.</mixed-citation>
</ref>
<ref id="j_nejsds13edi_ref_006">
<label>[6]</label><mixed-citation publication-type="other"> <string-name><surname>Pronzato</surname>, <given-names>L.</given-names></string-name> and <string-name><surname>Rendas</surname>, <given-names>M.-J.</given-names></string-name> (2023). Validation of Machine Learning Prediction Models. <italic>The New England Journal of Statistics in Data Science</italic> 1–21. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.51387/23-NEJSDS50" xlink:type="simple">https://doi.org/10.51387/23-NEJSDS50</ext-link>.</mixed-citation>
</ref>
<ref id="j_nejsds13edi_ref_007">
<label>[7]</label><mixed-citation publication-type="other"> <string-name><surname>Qi</surname>, <given-names>Y.</given-names></string-name> and <string-name><surname>Chien</surname>, <given-names>P.</given-names></string-name> (2023). Construction of Supersaturated Designs with Small Coherence for Variable Selection. <italic>The New England Journal of Statistics in Data Science</italic> 1–11. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.51387/23-NEJSDS34" xlink:type="simple">https://doi.org/10.51387/23-NEJSDS34</ext-link>.</mixed-citation>
</ref>
<ref id="j_nejsds13edi_ref_008">
<label>[8]</label><mixed-citation publication-type="other"> <string-name><surname>Shen</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Mao</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>Z.</given-names></string-name>, <string-name><surname>Chen</surname>, <given-names>Z.</given-names></string-name>, <string-name><surname>Nie</surname>, <given-names>K.</given-names></string-name> and <string-name><surname>Deng</surname>, <given-names>X.</given-names></string-name> (2023). Clustering-Based Imputation for Dropout Buyers in Large-Scale Online Experimentation. <italic>The New England Journal of Statistics in Data Science</italic> 1–11. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.51387/23-NEJSDS33" xlink:type="simple">https://doi.org/10.51387/23-NEJSDS33</ext-link>.</mixed-citation>
</ref>
<ref id="j_nejsds13edi_ref_009">
<label>[9]</label><mixed-citation publication-type="other"> <string-name><surname>Shi</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Chiu</surname>, <given-names>A. K.</given-names></string-name> and <string-name><surname>Xu</surname>, <given-names>H.</given-names></string-name> (2023). Evaluating Designs for Hyperparameter Tuning in Deep Neural Networks. <italic>The New England Journal of Statistics in Data Science</italic> 1–8. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.51387/23-NEJSDS26" xlink:type="simple">https://doi.org/10.51387/23-NEJSDS26</ext-link>.</mixed-citation>
</ref>
<ref id="j_nejsds13edi_ref_010">
<label>[10]</label><mixed-citation publication-type="other"> <string-name><surname>Singh</surname>, <given-names>R.</given-names></string-name> and <string-name><surname>Stufken</surname>, <given-names>J.</given-names></string-name> (2023). Subdata Selection With a Large Number of Variables. <italic>The New England Journal of Statistics in Data Science</italic> 1–13. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.51387/23-NEJSDS36" xlink:type="simple">https://doi.org/10.51387/23-NEJSDS36</ext-link>.</mixed-citation>
</ref>
<ref id="j_nejsds13edi_ref_011">
<label>[11]</label><mixed-citation publication-type="other"> <string-name><surname>Zhu</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Yu</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Lai</surname>, <given-names>D.</given-names></string-name> and <string-name><surname>Wang</surname>, <given-names>L.</given-names></string-name> (2023). Seamless Clinical Trials with Doubly Adaptive Biased Coin Designs. <italic>The New England Journal of Statistics in Data Science</italic> 1–9. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.51387/23-NEJSDS25" xlink:type="simple">https://doi.org/10.51387/23-NEJSDS25</ext-link>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>
