<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">NEJSDS</journal-id>
<journal-title-group><journal-title>The New England Journal of Statistics in Data Science</journal-title></journal-title-group>
<issn pub-type="ppub">2693-7166</issn><issn-l>2693-7166</issn-l>
<publisher>
<publisher-name>New England Statistical Society</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">NEJSDS33</article-id>
<article-id pub-id-type="doi">10.51387/23-NEJSDS33</article-id>
<article-categories>
<subj-group subj-group-type="heading"><subject>Methodology Article</subject></subj-group>
<subj-group subj-group-type="area"><subject>Statistical Methodology</subject></subj-group>
</article-categories>
<title-group>
<article-title>Clustering-Based Imputation for Dropout Buyers in Large-Scale Online Experimentation</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Shen</surname><given-names>Sumin</given-names></name><email xlink:href="mailto:sumshen@ebay.com">sumshen@ebay.com</email><xref ref-type="aff" rid="j_nejsds33_aff_001"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Mao</surname><given-names>Huiying</given-names></name><email xlink:href="mailto:humao@ebay.com">humao@ebay.com</email><xref ref-type="aff" rid="j_nejsds33_aff_002"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Zhang</surname><given-names>Zezhong</given-names></name><email xlink:href="mailto:zezzhang@ebay.com">zezzhang@ebay.com</email><xref ref-type="aff" rid="j_nejsds33_aff_003"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Chen</surname><given-names>Zili</given-names></name><email xlink:href="mailto:zilchen@ebay.com">zilchen@ebay.com</email><xref ref-type="aff" rid="j_nejsds33_aff_004"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Nie</surname><given-names>Keyu</given-names></name><email xlink:href="mailto:keyunie@gmail.com">keyunie@gmail.com</email><xref ref-type="aff" rid="j_nejsds33_aff_005"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Deng</surname><given-names>Xinwei</given-names></name><email xlink:href="mailto:xdeng@vt.edu">xdeng@vt.edu</email><xref ref-type="aff" rid="j_nejsds33_aff_006"/><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<aff id="j_nejsds33_aff_001"><institution>eBay China Center of Excellence</institution>, Shanghai 201203, <country>China</country>. E-mail address: <email xlink:href="mailto:sumshen@ebay.com">sumshen@ebay.com</email></aff>
<aff id="j_nejsds33_aff_002"><institution>eBay Inc.</institution>, San Jose, CA 95125, <country>United States</country>. E-mail address: <email xlink:href="mailto:humao@ebay.com">humao@ebay.com</email></aff>
<aff id="j_nejsds33_aff_003"><institution>eBay Inc.</institution>, San Jose, CA 95125, <country>United States</country>. E-mail address: <email xlink:href="mailto:zezzhang@ebay.com">zezzhang@ebay.com</email></aff>
<aff id="j_nejsds33_aff_004"><institution>eBay China Center of Excellence</institution>, Shanghai 201203, <country>China</country>. E-mail address: <email xlink:href="mailto:zilchen@ebay.com">zilchen@ebay.com</email></aff>
<aff id="j_nejsds33_aff_005"><institution>eBay Inc.</institution>, San Jose, CA 95125, <country>United States</country>. E-mail address: <email xlink:href="mailto:keyunie@gmail.com">keyunie@gmail.com</email></aff>
<aff id="j_nejsds33_aff_006">Professor of Statistics, Co-Director of Statistics and Artificial Intelligence Laboratory, <institution>Virginia Tech</institution>, Blacksburg, VA 24061, <country>United States</country>. E-mail address: <email xlink:href="mailto:xdeng@vt.edu">xdeng@vt.edu</email></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2023</year></pub-date><pub-date pub-type="epub"><day>24</day><month>5</month><year>2023</year></pub-date><volume>1</volume><issue>3</issue><fpage>415</fpage><lpage>425</lpage><history><date date-type="accepted"><day>24</day><month>2</month><year>2023</year></date></history>
<permissions><copyright-statement>© 2023 New England Statistical Society</copyright-statement><copyright-year>2023</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>In online experimentation, appropriate metrics (e.g., purchase) provide strong evidence to support hypotheses and enhance the decision-making process. However, incomplete metrics are frequently occurred in the online experimentation, making the available data to be much fewer than the planned online experiments (e.g., A/B testing). In this work, we introduce the concept of dropout buyers and categorize users with incomplete metric values into two groups: visitors and dropout buyers. For the analysis of incomplete metrics, we propose a clustering-based imputation method using <italic>k</italic>-nearest neighbors. Our proposed imputation method considers both the experiment-specific features and users’ activities along their shopping paths, allowing different imputation values for different users. To facilitate efficient imputation of large-scale data sets in online experimentation, the proposed method uses a combination of stratification and clustering. The performance of the proposed method is compared to several conventional methods in both simulation studies and a real online experiment at eBay.</p>
</abstract>
<kwd-group>
<label>Keywords and phrases</label>
<kwd>Experimentation</kwd>
<kwd>Metrics</kwd>
<kwd>Imputation</kwd>
<kwd>Clustering</kwd>
<kwd>A/B testing</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="j_nejsds33_s_001">
<label>1</label>
<title>Introduction</title>
<p>Online experimentation has been playing a key role in data-driven decision making in the IT industry including Microsoft [<xref ref-type="bibr" rid="j_nejsds33_ref_016">16</xref>, <xref ref-type="bibr" rid="j_nejsds33_ref_017">17</xref>], Google [<xref ref-type="bibr" rid="j_nejsds33_ref_029">29</xref>], Linkedin [<xref ref-type="bibr" rid="j_nejsds33_ref_033">33</xref>], Netflix [<xref ref-type="bibr" rid="j_nejsds33_ref_032">32</xref>], Uber, eBay [<xref ref-type="bibr" rid="j_nejsds33_ref_023">23</xref>], and many others [<xref ref-type="bibr" rid="j_nejsds33_ref_009">9</xref>]. Generally, online controlled experimentation, also known as A/B testing, is conducted for a pre-determined amount of time to compare the difference in metrics between the treatment group and the control group where users are randomly assigned to. Prior to experimentation, a set of high-quality metrics are determined to assess the effects of new features in the treatment group. The collected metric results can provide strong evidence to support hypotheses and hence accelerate the decision-making process [<xref ref-type="bibr" rid="j_nejsds33_ref_002">2</xref>, <xref ref-type="bibr" rid="j_nejsds33_ref_004">4</xref>, <xref ref-type="bibr" rid="j_nejsds33_ref_019">19</xref>]. However, incomplete metrics are frequently occurred in the online experimentation, making the available data to be much fewer than the planned A/B testing. In this work, our focus is on the analysis of metrics that have incomplete measurements at the end of data collection in experiments.</p>
<p>According to the positions in the shopping funnel, metrics can be categorized as top, middle, and bottom funnel metrics. For instance, a successful purchase typically requires users to take multiple steps from the top homepage webpage to the bottom purchase webpage in the shopping funnel. In online experimentation, it is common for millions of users to arrive at the top funnel (e.g., homepage webpage), while only a small percentage of users reach the bottom funnel (e.g., purchase webpage). Between the transition from the top funnel to the bottom funnel, users need to navigate through multiple pages where they can exit from the shopping process. There are numerous scenarios in which users can exit the funnel, resulting in incomplete records of their purchases or other metrics. A common occurrence is simply that each experiment has its own experiment duration. Keeping experiments alive for a long period of time is expensive due to the high operational efforts and business opportunity costs. When we close experiments, we stop the track of all users, but some users might yet complete their purchases. This incompleteness in metrics due to the delay in collecting measurements for bottom-funnel metrics in experimentation are inevitable. There is also the possibility that users are lost to follow due to technical issues or user unavailability. For instance, when users switch from the desktop app to the mobile app, they become unavailable. It is essential to fill in the incomplete metrics to improve metric quality, leading to trustworthy results and better decisions.</p>
<p>With incomplete metric measurements, the inference of the difference in metrics between the treatment and the control in experiments is at the risk of being inaccurate [<xref ref-type="bibr" rid="j_nejsds33_ref_008">8</xref>, <xref ref-type="bibr" rid="j_nejsds33_ref_013">13</xref>, <xref ref-type="bibr" rid="j_nejsds33_ref_014">14</xref>]. To analyze experiments with missing metric values, a naive approach is to disregard users with incomplete outcomes. This approach assumes that the missingness is completely at random and that the fully observed users are representative of the entire population. Such an approach will reduce the total number of users in the study, leading to a decrease in the experiment power. The power decrease is substantial especially when the proportion of missingness is high.</p>
<p>Various imputation methods have been developed to address problems with missing data. One widely used method is the single imputation method, which fills in missing values with a single value, such as the mean of observed outcomes, for both the treatment group and the control group. The single imputation method preserves the full sample size, but it raises concerns regarding results with a distorted distribution and underestimated uncertainty [<xref ref-type="bibr" rid="j_nejsds33_ref_028">28</xref>]. In addition, the single imputation method disregards information from other observed variables collected along users’ journeys within the funnel. Other imputation methods have been developed for missing at random (MAR) and missing not at random (MNAR) scenarios. The MAR assumes that the missing mechanism is only associated with the observed variables [<xref ref-type="bibr" rid="j_nejsds33_ref_001">1</xref>, <xref ref-type="bibr" rid="j_nejsds33_ref_012">12</xref>, <xref ref-type="bibr" rid="j_nejsds33_ref_026">26</xref>]. Likelihood-based methods, such as generalized linear mixed models, are developed in clinical trials with incomplete outcomes [<xref ref-type="bibr" rid="j_nejsds33_ref_022">22</xref>]. The performance of the methods depends on the degree to which the assumptions are held for MAR. For MNAR in which the effect from missing outcomes is non-ignorable, the observed difference would be a biased estimate of the average treatment effect [<xref ref-type="bibr" rid="j_nejsds33_ref_022">22</xref>]. Regression-based imputation methods, such as the logistic regression, are employed for modeling the indicator for missingness [<xref ref-type="bibr" rid="j_nejsds33_ref_021">21</xref>]. Other prevalent methods, such as matching imputations, identify similar users from a set of variables. In general, these imputation methods require the identification of users with missing outcomes and users with outcomes as zero. In other words, general imputation methods are often not appropriate to handle certain online experimentation scenarios in which users’ missing outcomes represent both missing cases and zero cases.</p>
<p>To address the above challenges, we propose a clustering-based imputation method using <italic>k</italic>-nearest neighbors (kNN) for the analysis of online controlled experimentation in the presence of incomplete metrics. The key idea of the proposed method is to identify and impute incomplete metrics with users’ neighbors by incorporating the structure information of data from online experimentation. Specifically, the proposed method consists of two steps. The first step is to partition the data set into clusters after the stratification of experiment-specific features, including the treatment assignment and the buyers’ characteristics. In the second step, we perform the kNN-based imputation. Moreover, we divide users with missing outcomes into two categories: visitors and dropout buyers, such that the information of dropout buyers can be better utilized. Note that our framework assumes that the treatment assignment and user covariates are fully observed, whereas only the outcome at the bottom of the funnel has missing values. The proposed method has three key advantages. First, the proposed method uses the informative covariates during users’ journeys in the shopping funnel to impute incomplete metrics. Specifically, our method evaluates the heterogeneous impact from different user segments on missing rates in metrics. Second, the imputed values from our method are intuitive to understand. Lastly, our method employs stratification and clustering to alleviate the computation issues for large-scale data in online experimentation.</p>
<p>Throughout the paper, we consider the metric <italic>Purchase</italic> as an example of the incomplete metric at the funnel’s bottom for illustration. We also assume that the <italic>Purchase</italic> is the only metric (i.e., outcome) of interest in the experiment. The rest of the paper is organized as follows. In Sections <xref rid="j_nejsds33_s_002">2</xref> and <xref rid="j_nejsds33_s_003">3</xref>, we detail the problem formulation, the proposed method, and the estimation procedures. In Section <xref rid="j_nejsds33_s_007">4</xref>, we present simulations. A real case study is conducted in Section <xref rid="j_nejsds33_s_008">5</xref>. We conclude this work with some discussion in Section <xref rid="j_nejsds33_s_009">6</xref>.</p>
</sec>
<sec id="j_nejsds33_s_002">
<label>2</label>
<title>Problem Formulation</title>
<p>In the context of online controlled experiments, we can classify users into three types based on their purchase behaviors: visitors, real buyers, and dropout buyers. Visitors participate in experiments but do not make contributions (e.g., purchases). Real buyers not only participate in experiments but also make contributions (e.g., purchases). Dropout buyers could have made their contributions (e.g., completed their transactions) within the experimentation period but failed due to various reasons. For example, users could drop out of the experiment because of unexpected external payment issues. Another example is that the experiment lost users due to various technical issues.</p>
<p>Suppose there are <italic>n</italic> users in an experiment, and let <inline-formula id="j_nejsds33_ineq_001"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">∈</mml:mo>
<mml:mo fence="true" stretchy="false">{</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo fence="true" stretchy="false">}</mml:mo></mml:math><tex-math><![CDATA[${y_{i}}\in \{0,1\}$]]></tex-math></alternatives></inline-formula> denote whether the <italic>i</italic>-th user is a buyer or not. That is, 
<disp-formula id="j_nejsds33_eq_001">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfenced separators="" open="{" close="">
<mml:mrow>
<mml:mtable columnspacing="10.0pt" equalrows="false" columnlines="none" equalcolumns="false" columnalign="left left">
<mml:mtr>
<mml:mtd class="array">
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd class="array">
<mml:mtext>user</mml:mtext>
<mml:mspace width="2.5pt"/>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mspace width="2.5pt"/>
<mml:mtext>is either a real buyer or a dropout buyer</mml:mtext>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd class="array">
<mml:mn>0</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd class="array">
<mml:mtext>user</mml:mtext>
<mml:mspace width="2.5pt"/>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mspace width="2.5pt"/>
<mml:mtext>is a visitor</mml:mtext>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[ {y_{i}}=\left\{\begin{array}{l@{\hskip10.0pt}l}1,\hspace{1em}& \text{user}\hspace{2.5pt}i\hspace{2.5pt}\text{is either a real buyer or a dropout buyer},\\ {} 0,\hspace{1em}& \text{user}\hspace{2.5pt}i\hspace{2.5pt}\text{is a visitor},\end{array}\right.\]]]></tex-math></alternatives>
</disp-formula> 
and <inline-formula id="j_nejsds33_ineq_002"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mn mathvariant="double-struck">1</mml:mn>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">&gt;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[${y_{i}}=\mathbb{1}({z_{i}}\gt 0)$]]></tex-math></alternatives></inline-formula>, where <inline-formula id="j_nejsds33_ineq_003"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">≥</mml:mo>
<mml:mn>0</mml:mn></mml:math><tex-math><![CDATA[${z_{i}}\ge 0$]]></tex-math></alternatives></inline-formula> denotes the purchase metric value of the <italic>i</italic>-th user, and <inline-formula id="j_nejsds33_ineq_004"><alternatives><mml:math>
<mml:mn mathvariant="double-struck">1</mml:mn>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mo>·</mml:mo>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[$\mathbb{1}(\cdot )$]]></tex-math></alternatives></inline-formula> is the indicator function. We know for sure that user <italic>i</italic> is a real buyer and the corresponding value amount if he/she has completed transaction(s) during the experimentation period. In other cases, it is ambiguous whether he/she is a dropout buyer or merely a visitor. Therefore, we use <inline-formula id="j_nejsds33_ineq_005"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">o</mml:mi>
<mml:mi mathvariant="italic">b</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup></mml:math><tex-math><![CDATA[${y_{i}^{obs}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_nejsds33_ineq_006"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">o</mml:mi>
<mml:mi mathvariant="italic">b</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup></mml:math><tex-math><![CDATA[${z_{i}^{obs}}$]]></tex-math></alternatives></inline-formula> if the <italic>i</italic>-th user is a real buyer and <inline-formula id="j_nejsds33_ineq_007"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup></mml:math><tex-math><![CDATA[${y_{i}^{mis}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_nejsds33_ineq_008"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup></mml:math><tex-math><![CDATA[${z_{i}^{mis}}$]]></tex-math></alternatives></inline-formula> to represent the ambiguous situation (i.e., could be a dropout buyer or a visitor). To clarify, 
<disp-formula id="j_nejsds33_eq_002">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfenced separators="" open="{" close="">
<mml:mrow>
<mml:mtable equalrows="false" columnlines="none" equalcolumns="false" columnalign="left">
<mml:mtr>
<mml:mtd class="array">
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">o</mml:mi>
<mml:mi mathvariant="italic">b</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mtext>user</mml:mtext>
<mml:mspace width="2.5pt"/>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mspace width="2.5pt"/>
<mml:mtext>is a real buyer</mml:mtext>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd class="array">
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:mfenced separators="" open="{" close="">
<mml:mrow>
<mml:mtable equalrows="false" columnlines="none" equalcolumns="false" columnalign="left">
<mml:mtr>
<mml:mtd class="array">
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mtext>user</mml:mtext>
<mml:mspace width="2.5pt"/>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mspace width="2.5pt"/>
<mml:mtext>is a dropout buyer</mml:mtext>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd class="array">
<mml:mn>0</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mtext>user</mml:mtext>
<mml:mspace width="2.5pt"/>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mspace width="2.5pt"/>
<mml:mtext>is a visitor</mml:mtext>
<mml:mo>.</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
<mml:mspace width="1em"/>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[ {y_{i}}=\left\{\begin{array}{l}{y_{i}^{obs}}=1,\text{user}\hspace{2.5pt}i\hspace{2.5pt}\text{is a real buyer},\hspace{1em}\\ {} {y_{i}^{mis}}=\left\{\begin{array}{l}1,\text{user}\hspace{2.5pt}i\hspace{2.5pt}\text{is a dropout buyer},\hspace{1em}\\ {} 0,\text{user}\hspace{2.5pt}i\hspace{2.5pt}\text{is a visitor}.\hspace{1em}\end{array}\right.\hspace{1em}\end{array}\right.\]]]></tex-math></alternatives>
</disp-formula>
</p>
<p>However, some practitioners arbitrarily treat all <inline-formula id="j_nejsds33_ineq_009"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup></mml:math><tex-math><![CDATA[${y_{i}^{mis}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_nejsds33_ineq_010"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup></mml:math><tex-math><![CDATA[${z_{i}^{mis}}$]]></tex-math></alternatives></inline-formula> as 0 without the diligence to distinguish between dropout buyers and visitors. Here, we denote such an arbitrary but simplified buyer indicator as 
<disp-formula id="j_nejsds33_eq_003">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">˜</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfenced separators="" open="{" close="">
<mml:mrow>
<mml:mtable columnspacing="10.0pt" equalrows="false" columnlines="none" equalcolumns="false" columnalign="left left">
<mml:mtr>
<mml:mtd class="array">
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd class="array">
<mml:mtext>user</mml:mtext>
<mml:mspace width="2.5pt"/>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mspace width="2.5pt"/>
<mml:mtext>is a real buyer</mml:mtext>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd class="array">
<mml:mn>0</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd class="array">
<mml:mtext>otherwise.</mml:mtext>
<mml:mspace width="2.5pt"/>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[ {\tilde{y}_{i}}=\left\{\begin{array}{l@{\hskip10.0pt}l}1,\hspace{1em}& \text{user}\hspace{2.5pt}i\hspace{2.5pt}\text{is a real buyer},\\ {} 0,\hspace{1em}& \text{otherwise.}\hspace{2.5pt}\end{array}\right.\]]]></tex-math></alternatives>
</disp-formula> 
Their corresponding vectors are denoted as <inline-formula id="j_nejsds33_ineq_011"><alternatives><mml:math>
<mml:mi mathvariant="bold-italic">y</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="bold-italic">z</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo><mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">˜</mml:mo></mml:mover>
<mml:mo>=</mml:mo>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">˜</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">˜</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[$\boldsymbol{y}=({y_{1}},\dots ,{y_{n}}),\boldsymbol{z}=({z_{1}},\dots ,{z_{n}}),\tilde{\boldsymbol{y}}=({\tilde{y}_{1}},\dots ,{\tilde{y}_{n}})$]]></tex-math></alternatives></inline-formula>. Additionally, let <inline-formula id="j_nejsds33_ineq_012"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\boldsymbol{x}_{i}}$]]></tex-math></alternatives></inline-formula> denote the relevant features for user <italic>i</italic>, <inline-formula id="j_nejsds33_ineq_013"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">p</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo stretchy="false">∈</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="italic">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">p</mml:mi>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${\boldsymbol{x}_{i}}=({x_{i1}},\dots ,{x_{ip}})\in {R^{p}}$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_nejsds33_ineq_014"><alternatives><mml:math>
<mml:mi mathvariant="italic">p</mml:mi>
<mml:mo stretchy="false">≥</mml:mo>
<mml:mn>1</mml:mn></mml:math><tex-math><![CDATA[$p\ge 1$]]></tex-math></alternatives></inline-formula>, and let <inline-formula id="j_nejsds33_ineq_015"><alternatives><mml:math>
<mml:mi mathvariant="bold-italic">X</mml:mi>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">T</mml:mi>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[$\boldsymbol{X}={({\boldsymbol{x}_{1}},\dots ,{\boldsymbol{x}_{n}})^{T}}$]]></tex-math></alternatives></inline-formula>, Without loss of generality, we assume that <italic>p</italic> features are continuous variables.</p>
<p>Suppose there are <italic>m</italic> real buyers among the total <italic>n</italic> users, and without loss of generality, let us assume the first <italic>m</italic> users are real buyers. Denote <italic>n</italic> users’ purchase and transactional amount during the experimentation period using vectors 
<disp-formula id="j_nejsds33_eq_004">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd>
<mml:mi mathvariant="bold-italic">y</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">o</mml:mi>
<mml:mi mathvariant="italic">b</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">o</mml:mi>
<mml:mi mathvariant="italic">b</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">o</mml:mi>
<mml:mi mathvariant="italic">b</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[ \boldsymbol{y}=({\boldsymbol{y}^{obs}},{\boldsymbol{y}^{mis}})=({y_{1}^{obs}},\dots ,{y_{m}^{obs}},{y_{m+1}^{mis}},\dots ,{y_{n}^{mis}}),\]]]></tex-math></alternatives>
</disp-formula> 
<disp-formula id="j_nejsds33_eq_005">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd>
<mml:mi mathvariant="bold-italic">z</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">o</mml:mi>
<mml:mi mathvariant="italic">b</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">o</mml:mi>
<mml:mi mathvariant="italic">b</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">o</mml:mi>
<mml:mi mathvariant="italic">b</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[ \boldsymbol{z}=({\boldsymbol{z}^{obs}},{\boldsymbol{z}^{mis}})=({z_{1}^{obs}},\dots ,{z_{m}^{obs}},{z_{m+1}^{mis}},\dots ,{z_{n}^{mis}}).\]]]></tex-math></alternatives>
</disp-formula>
</p>
<p>The problem of interest is to impute missing values <inline-formula id="j_nejsds33_ineq_016"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${\boldsymbol{y}^{mis}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_nejsds33_ineq_017"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${\boldsymbol{z}^{mis}}$]]></tex-math></alternatives></inline-formula> in the context of online experimentation. Among users with missing values, visitors are mixed with dropout buyers. Therefore, our proposed method is to firstly identify the candidates of dropout buyers (i.e., identifying the candidates of 1s in <inline-formula id="j_nejsds33_ineq_018"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${\boldsymbol{y}^{mis}}$]]></tex-math></alternatives></inline-formula>) with the help of a classification model and then impute the <inline-formula id="j_nejsds33_ineq_019"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${\boldsymbol{y}^{mis}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_nejsds33_ineq_020"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${\boldsymbol{z}^{mis}}$]]></tex-math></alternatives></inline-formula> using an efficient cluster-based nearest neighbors-based approach.</p>
</sec>
<sec id="j_nejsds33_s_003">
<label>3</label>
<title>The Proposed Method</title>
<p>The objective of the imputation problem is to impute missing values such that they are close to the underlying true data. The missing value imputation problem can be formulated as 
<disp-formula id="j_nejsds33_eq_006">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd>
<mml:munder>
<mml:mrow>
<mml:mtext>min</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">ˆ</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:munder>
<mml:mspace width="2.5pt"/>
<mml:mi mathvariant="italic">l</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">ˆ</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[ \underset{{\hat{\boldsymbol{y}}^{mis}}}{\text{min}}\hspace{2.5pt}l({\hat{\boldsymbol{y}}^{mis}},{\boldsymbol{y}^{mis}}),\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_nejsds33_ineq_021"><alternatives><mml:math>
<mml:mi mathvariant="italic">l</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">ˆ</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[$l({\hat{\boldsymbol{y}}^{mis}},{\boldsymbol{y}^{mis}})$]]></tex-math></alternatives></inline-formula> is a loss function to quantify the difference between the imputed missing values <inline-formula id="j_nejsds33_ineq_022"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">ˆ</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${\hat{\boldsymbol{y}}^{mis}}$]]></tex-math></alternatives></inline-formula> and the underlying true values <inline-formula id="j_nejsds33_ineq_023"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${\boldsymbol{y}^{mis}}$]]></tex-math></alternatives></inline-formula>.</p>
<p>Imputing missing values with non-parametric methods such as the nearest neighbors algorithm in large-scale data sets is challenging due to the large computation requirements for distances between pairs of data points. To solve this challenge, we propose to incorporate the data clustering patterns into the imputation. In other words, we partition users into <italic>c</italic> clusters and then perform imputations within each cluster. Thus, the cluster-based imputation problem is described as 
<disp-formula id="j_nejsds33_eq_007">
<label>(3.1)</label><alternatives><mml:math display="block">
<mml:mtable displaystyle="true" columnalign="right left" columnspacing="0pt">
<mml:mtr>
<mml:mtd class="align-odd"/>
<mml:mtd class="align-even">
<mml:munder>
<mml:mrow>
<mml:mtext>min</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">ˆ</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:munder>
<mml:mspace width="2.5pt"/>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:mo largeop="true" movablelimits="false">∑</mml:mo></mml:mstyle>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">h</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:mo largeop="true" movablelimits="false">∑</mml:mo></mml:mstyle>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">C</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="italic">h</mml:mi>
</mml:mrow>
<mml:mrow/>
</mml:munderover>
<mml:mi mathvariant="italic">l</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">ˆ</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd class="align-odd">
<mml:mspace width="2.5pt"/>
</mml:mtd>
<mml:mtd class="align-even">
<mml:mi mathvariant="italic">s</mml:mi>
<mml:mo>.</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>.</mml:mo>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:mo largeop="true" movablelimits="false">∑</mml:mo></mml:mstyle>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">h</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:mo largeop="true" movablelimits="false">∑</mml:mo></mml:mstyle>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">C</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="italic">h</mml:mi>
</mml:mrow>
<mml:mrow/>
</mml:munderover>
<mml:mo stretchy="false">|</mml:mo>
<mml:mo stretchy="false">|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">μ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">|</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">|</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo stretchy="false">≤</mml:mo>
<mml:mi mathvariant="italic">g</mml:mi>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mspace width="2.5pt"/>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo stretchy="false">∈</mml:mo>
<mml:mi mathvariant="italic">I</mml:mi>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[\begin{aligned}{}& \underset{{\hat{\boldsymbol{y}}^{mis}}}{\text{min}}\hspace{2.5pt}{\sum \limits_{h=1}^{c}}{\sum \limits_{C(i)=h}^{}}l({\hat{y}_{i}^{mis}},{y_{i}^{mis}}),\\ {} \hspace{2.5pt}& s.t.{\sum \limits_{h=1}^{c}}{\sum \limits_{C(i)=h}^{}}||{\boldsymbol{x}_{i}}-{\boldsymbol{\mu }_{h}}|{|_{2}^{2}}\le g,\hspace{2.5pt}i\in I,\end{aligned}\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_nejsds33_ineq_024"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\boldsymbol{x}_{i}}$]]></tex-math></alternatives></inline-formula> denotes the features for user <italic>i</italic>, and <inline-formula id="j_nejsds33_ineq_025"><alternatives><mml:math>
<mml:mi mathvariant="italic">C</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="italic">h</mml:mi></mml:math><tex-math><![CDATA[$C(i)=h$]]></tex-math></alternatives></inline-formula> represents the user <italic>i</italic> with missing value <inline-formula id="j_nejsds33_ineq_026"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup></mml:math><tex-math><![CDATA[${y_{i}^{mis}}$]]></tex-math></alternatives></inline-formula> belongs to cluster <italic>h</italic> with the centroid <inline-formula id="j_nejsds33_ineq_027"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">μ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">h</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\boldsymbol{\mu }_{h}}$]]></tex-math></alternatives></inline-formula>, the constant <italic>g</italic> controls the within-cluster distances, and <inline-formula id="j_nejsds33_ineq_028"><alternatives><mml:math>
<mml:mo stretchy="false">|</mml:mo>
<mml:mo stretchy="false">|</mml:mo>
<mml:mo>·</mml:mo>
<mml:mo stretchy="false">|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">|</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[$||\cdot |{|_{2}}$]]></tex-math></alternatives></inline-formula> is the L<inline-formula id="j_nejsds33_ineq_029"><alternatives><mml:math>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${_{2}}$]]></tex-math></alternatives></inline-formula>-norm. The set of indices <italic>I</italic> is defined as <inline-formula id="j_nejsds33_ineq_030"><alternatives><mml:math>
<mml:mi mathvariant="italic">I</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo fence="true" stretchy="false">{</mml:mo>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mspace width="2.5pt"/>
<mml:mtext>is missing</mml:mtext>
<mml:mo fence="true" stretchy="false">}</mml:mo></mml:math><tex-math><![CDATA[$I=\{i:{y_{i}}\hspace{2.5pt}\text{is missing}\}$]]></tex-math></alternatives></inline-formula>. The features are selected based on experiment owners’ domain knowledge. After imputing <inline-formula id="j_nejsds33_ineq_031"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup></mml:math><tex-math><![CDATA[${y_{i}^{mis}}$]]></tex-math></alternatives></inline-formula>, we can estimate the corresponding <inline-formula id="j_nejsds33_ineq_032"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup></mml:math><tex-math><![CDATA[${z_{i}^{mis}}$]]></tex-math></alternatives></inline-formula> as well.</p>
<p>Note that it is unknown whether a user with an incomplete metric is a visitor or a dropout buyer. The dropout buyers are mixed with visitors because both do not have their purchase information recorded. To address the challenge, in Section <xref rid="j_nejsds33_s_004">3.1</xref>, we apply the logistic regression model to identify a certain portion of visitors and narrow down the candidates of dropout buyers. Section <xref rid="j_nejsds33_s_005">3.2</xref> will detail the proposed cluster-based imputation. Notice that the data set in online controlled experiments often is very large such that the conventional clustering methods cannot be conducted efficiently. To alleviate the computation issue, Section <xref rid="j_nejsds33_s_006">3.3</xref> will consider a stratification-based clustering and describe how to choose the number of clusters.</p>
<sec id="j_nejsds33_s_004">
<label>3.1</label>
<title>Identifying Dropout Buyer Candidates</title>
<p>The practitioners’ simplified buyer indicator <inline-formula id="j_nejsds33_ineq_033"><alternatives><mml:math><mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">˜</mml:mo></mml:mover></mml:math><tex-math><![CDATA[$\tilde{\boldsymbol{y}}$]]></tex-math></alternatives></inline-formula> reveals partial information in the true buyer indicator <inline-formula id="j_nejsds33_ineq_034"><alternatives><mml:math>
<mml:mi mathvariant="bold-italic">y</mml:mi></mml:math><tex-math><![CDATA[$\boldsymbol{y}$]]></tex-math></alternatives></inline-formula>. Therefore, a classification model based on <inline-formula id="j_nejsds33_ineq_035"><alternatives><mml:math>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="bold-italic">X</mml:mi>
<mml:mo mathvariant="normal">,</mml:mo><mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">˜</mml:mo></mml:mover>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[$(\boldsymbol{X},\tilde{\boldsymbol{y}})$]]></tex-math></alternatives></inline-formula> provides us with the likelihood of purchases. Users with a high likelihood but missing purchase records can serve as the candidates for dropout buyers. Since <inline-formula id="j_nejsds33_ineq_036"><alternatives><mml:math><mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">˜</mml:mo></mml:mover></mml:math><tex-math><![CDATA[$\tilde{\boldsymbol{y}}$]]></tex-math></alternatives></inline-formula> is used as a substitution of <inline-formula id="j_nejsds33_ineq_037"><alternatives><mml:math>
<mml:mi mathvariant="bold-italic">y</mml:mi></mml:math><tex-math><![CDATA[$\boldsymbol{y}$]]></tex-math></alternatives></inline-formula>, we call <inline-formula id="j_nejsds33_ineq_038"><alternatives><mml:math><mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">˜</mml:mo></mml:mover></mml:math><tex-math><![CDATA[$\tilde{\boldsymbol{y}}$]]></tex-math></alternatives></inline-formula> pseudo-response.</p>
<p>Specifically, we propose to apply the logistic regression model for the buyer identification. Denote the conditional probability for user <italic>i</italic> as <inline-formula id="j_nejsds33_ineq_039"><alternatives><mml:math>
<mml:mi mathvariant="italic">p</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="italic">P</mml:mi>
<mml:mi mathvariant="italic">r</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">˜</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[$p({\boldsymbol{x}_{i}})=Pr({\tilde{y}_{i}}=1|{\boldsymbol{x}_{i}})$]]></tex-math></alternatives></inline-formula>, that is, 
<disp-formula id="j_nejsds33_eq_008">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true" columnalign="right">
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">˜</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfenced separators="" open="{" close="">
<mml:mrow>
<mml:mtable columnspacing="10.0pt" equalrows="false" columnlines="none" equalcolumns="false" columnalign="left left">
<mml:mtr>
<mml:mtd class="array">
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
<mml:mtd class="array">
<mml:mtext>w.p.</mml:mtext>
<mml:mspace width="2.5pt"/>
<mml:mi mathvariant="italic">p</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd class="array">
<mml:mn>0</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
<mml:mtd class="array">
<mml:mtext>w.p.</mml:mtext>
<mml:mspace width="2.5pt"/>
<mml:mn>1</mml:mn>
<mml:mo>−</mml:mo>
<mml:mi mathvariant="italic">p</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[ {\tilde{y}_{i}}|{\boldsymbol{x}_{i}}=\left\{\begin{array}{l@{\hskip10.0pt}l}1,& \text{w.p.}\hspace{2.5pt}p({\boldsymbol{x}_{i}}),\\ {} 0,& \text{w.p.}\hspace{2.5pt}1-p({\boldsymbol{x}_{i}}).\end{array}\right.\]]]></tex-math></alternatives>
</disp-formula> 
We model the conditional probability <inline-formula id="j_nejsds33_ineq_040"><alternatives><mml:math>
<mml:mi mathvariant="italic">p</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[$p({\boldsymbol{x}_{i}})$]]></tex-math></alternatives></inline-formula> with the logistic model <inline-formula id="j_nejsds33_ineq_041"><alternatives><mml:math>
<mml:mo movablelimits="false">log</mml:mo>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">p</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo mathvariant="normal" stretchy="false">/</mml:mo>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>−</mml:mo>
<mml:mi mathvariant="italic">p</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">T</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mi mathvariant="bold-italic">β</mml:mi></mml:math><tex-math><![CDATA[$\log (p({\boldsymbol{x}_{i}})/(1-p({\boldsymbol{x}_{i}})))={\boldsymbol{x}_{i}^{T}}\boldsymbol{\beta }$]]></tex-math></alternatives></inline-formula> with <inline-formula id="j_nejsds33_ineq_042"><alternatives><mml:math>
<mml:mi mathvariant="bold-italic">β</mml:mi>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">β</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">β</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">p</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">T</mml:mi>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[$\boldsymbol{\beta }={({\beta _{1}},\dots ,{\beta _{p}})^{T}}$]]></tex-math></alternatives></inline-formula>. Note that the features used in the logistic regression model are believed to be closely related to users’ purchase behaviors. A threshold is needed in the logistic model for classification. One widely used threshold value is 0.5. Customers can choose the percentage of TN in the whole samples as the threshold according to their domain knowledge.</p>
<p>Comparing the model prediction and pseudo-response, Table <xref rid="j_nejsds33_tab_001">1</xref> summarizes four types of classification results: false positive (FP), true negative (TN), false negative (FN), and true positive (TP) from the classification model. The FP indicates that the users with pseudo-response as 0 should have purchase information. We use this inconsistency to figure out the candidates of dropout buyers. That is, the FP cases can be either visitors or dropout buyers. The TN suggests the agreement that these users do not have purchases recorded. Thus, we treat all TN cases as visitors. The FN and the TP are users recorded with purchase behaviors, and hence they are real buyers, not dropout buyers or visitors.</p>
<table-wrap id="j_nejsds33_tab_001">
<label>Table 1</label>
<caption>
<p>Summary of four categories of results in the logistic regression model.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: double; border-bottom: solid thin; border-right: solid thin"/>
<td style="vertical-align: top; text-align: right; border-top: double; border-bottom: solid thin; border-right: solid thin">Pseudo-response (<inline-formula id="j_nejsds33_ineq_043"><alternatives><mml:math><mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">˜</mml:mo></mml:mover></mml:math><tex-math><![CDATA[$\tilde{y}$]]></tex-math></alternatives></inline-formula>)</td>
<td style="vertical-align: top; text-align: right; border-top: double; border-bottom: solid thin; border-right: solid thin">Prediction</td>
<td style="vertical-align: top; text-align: left; border-top: double; border-bottom: solid thin">Description</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left; border-right: solid thin">True Negative (TN)</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">0</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">0</td>
<td style="vertical-align: top; text-align: left">Visitors</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-right: solid thin">False Positive (FP)</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">0</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">1</td>
<td style="vertical-align: top; text-align: left">Candidates of dropout buyers</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-right: solid thin">False Negative (FN)</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">1</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">0</td>
<td style="vertical-align: top; text-align: left">Real buyers</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin; border-right: solid thin">True Positive (TP)</td>
<td style="vertical-align: top; text-align: right; border-bottom: solid thin; border-right: solid thin">1</td>
<td style="vertical-align: top; text-align: right; border-bottom: solid thin; border-right: solid thin">1</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Real buyers</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Suppose there are <italic>r</italic> visitors and <inline-formula id="j_nejsds33_ineq_044"><alternatives><mml:math>
<mml:mi mathvariant="italic">n</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi mathvariant="italic">r</mml:mi></mml:math><tex-math><![CDATA[$n-m-r$]]></tex-math></alternatives></inline-formula> dropout buyer candidates that have been identified. Without loss of generality, let us assume the first <italic>r</italic> users in the missing set are those visitors. Then we write <inline-formula id="j_nejsds33_ineq_045"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${\boldsymbol{y}^{mis}}$]]></tex-math></alternatives></inline-formula> as 
<disp-formula id="j_nejsds33_eq_009">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>=</mml:mo>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">e</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">e</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="italic">r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">e</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="italic">r</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[ {\boldsymbol{y}^{mis}}=({\boldsymbol{y}^{est}},{\boldsymbol{y}^{\ast }})=({y_{m+1}^{est}},\dots ,{y_{m+r}^{est}},{y_{m+r+1}^{\ast }},\dots ,{y_{n}^{\ast }}),\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_nejsds33_ineq_046"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">e</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="italic">r</mml:mi></mml:math><tex-math><![CDATA[${y_{i}^{est}}=0,i=m+1,\dots ,m+r$]]></tex-math></alternatives></inline-formula>, and <inline-formula id="j_nejsds33_ineq_047"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo stretchy="false">∈</mml:mo>
<mml:mo fence="true" stretchy="false">{</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo fence="true" stretchy="false">}</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="italic">r</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">n</mml:mi></mml:math><tex-math><![CDATA[${y_{i}^{\ast }}\in \{0,1\},i=m+r+1,\dots ,n$]]></tex-math></alternatives></inline-formula> with 0 representing visitors and 1 representing dropout buyers. Similarly, we denote the corresponding continuous response for the purchase amount as 
<disp-formula id="j_nejsds33_eq_010">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>=</mml:mo>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">e</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">e</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="italic">r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">e</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="italic">r</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[ {\boldsymbol{z}^{mis}}=({\boldsymbol{z}^{est}},{\boldsymbol{z}^{\ast }})=({z_{m+1}^{est}},\dots ,{z_{m+r}^{est}},{z_{m+r+1}^{\ast }},\dots ,{z_{n}^{\ast }}),\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_nejsds33_ineq_048"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">e</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="italic">r</mml:mi></mml:math><tex-math><![CDATA[${z_{i}^{est}}=0,i=m+1,\dots ,m+r$]]></tex-math></alternatives></inline-formula>, represents the purchase amounts from estimated visitors and <inline-formula id="j_nejsds33_ineq_049"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup></mml:math><tex-math><![CDATA[${z_{i}^{\ast }}$]]></tex-math></alternatives></inline-formula> represents the missing non-negative response from <inline-formula id="j_nejsds33_ineq_050"><alternatives><mml:math>
<mml:mi mathvariant="italic">n</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi mathvariant="italic">r</mml:mi></mml:math><tex-math><![CDATA[$n-m-r$]]></tex-math></alternatives></inline-formula> users. In the following imputation methods in Section <xref rid="j_nejsds33_s_005">3.2</xref>, we consider <inline-formula id="j_nejsds33_ineq_051"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">e</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${\boldsymbol{z}^{est}}$]]></tex-math></alternatives></inline-formula> to be zeros and our major focus is to impute <inline-formula id="j_nejsds33_ineq_052"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${\boldsymbol{z}^{\ast }}$]]></tex-math></alternatives></inline-formula>.</p>
</sec>
<sec id="j_nejsds33_s_005">
<label>3.2</label>
<title>Clustering-Based Imputation for Dropout Buyers</title>
<p>To impute the missing purchase value <inline-formula id="j_nejsds33_ineq_053"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${\boldsymbol{z}^{\ast }}$]]></tex-math></alternatives></inline-formula> of the dropout buyers, we adopt the clustering-based method using kNN techniques. It is noted that clustering improves data analysis efficiency by identifying inherent structure patterns and partitioning the large-scale data set into small subsets. In each strata <inline-formula id="j_nejsds33_ineq_054"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mi mathvariant="italic">u</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\boldsymbol{X}_{tu}}$]]></tex-math></alternatives></inline-formula> (described later in the stratification step), we perform the <italic>K</italic>-means clustering method [<xref ref-type="bibr" rid="j_nejsds33_ref_020">20</xref>] to form clusters, which is formulated as 
<disp-formula id="j_nejsds33_eq_011">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd>
<mml:munder>
<mml:mrow>
<mml:mtext>minimize</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">C</mml:mi>
</mml:mrow>
</mml:munder>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:mo largeop="true" movablelimits="false">∑</mml:mo></mml:mstyle>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">h</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:mo largeop="true" movablelimits="false">∑</mml:mo></mml:mstyle>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">C</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="italic">h</mml:mi>
</mml:mrow>
<mml:mrow/>
</mml:munderover>
<mml:mo stretchy="false">|</mml:mo>
<mml:mo stretchy="false">|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">μ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">|</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">|</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[ \underset{C}{\text{minimize}}{\sum \limits_{h=1}^{c}}{\sum \limits_{C(i)=h}^{}}||{\boldsymbol{x}_{i}}-{\boldsymbol{\mu }_{h}}|{|_{2}^{2}}.\]]]></tex-math></alternatives>
</disp-formula>
</p>
<p>On top of clustering, we use the triangle inequality rule (described later) to ensure the consistent identification of nearest neighbors in the <italic>k</italic>-nearest neighbors (kNN) approach for imputation. The main idea of the kNN method is that nearby data points are similar to each other. The kNN algorithm is straightforward and does not require parametric model estimation, but it is computationally expensive and becomes slow as the size of the data set increases. However, this computational burden is greatly mitigated by the strategy of clustering. Given the specific cluster <italic>h</italic> (i.e., the fixed constraint in (1)), the imputation problem (1) with the kNN method can be written as 
<disp-formula id="j_nejsds33_eq_012">
<label>(3.2)</label><alternatives><mml:math display="block">
<mml:mtable displaystyle="true" columnalign="right">
<mml:mtr>
<mml:mtd class="align-odd">
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mtext>argmax</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">L</mml:mi>
</mml:mrow>
</mml:munder>
<mml:munder>
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:mo largeop="true" movablelimits="false">∑</mml:mo></mml:mstyle>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">∈</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
</mml:mrow>
</mml:munder>
<mml:mn mathvariant="double-struck">1</mml:mn>
<mml:mo fence="true" stretchy="false">{</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="italic">L</mml:mi>
<mml:mo fence="true" stretchy="false">}</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mspace width="2.5pt"/>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo stretchy="false">∈</mml:mo>
<mml:mi mathvariant="italic">I</mml:mi>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mspace width="2.5pt"/>
<mml:mi mathvariant="italic">C</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="italic">h</mml:mi>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[ {y_{i}^{\ast }}=\underset{L}{\text{argmax}}\sum \limits_{{x_{j}}\in {N_{k}}({x_{i}})}\mathbb{1}\{{y_{j}}=L\},\hspace{2.5pt}i\in I,\hspace{2.5pt}C(i)=h,\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_nejsds33_ineq_055"><alternatives><mml:math>
<mml:mi mathvariant="italic">L</mml:mi>
<mml:mo stretchy="false">∈</mml:mo>
<mml:mo fence="true" stretchy="false">{</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo fence="true" stretchy="false">}</mml:mo></mml:math><tex-math><![CDATA[$L\in \{0,1\}$]]></tex-math></alternatives></inline-formula> is the binary label, <italic>k</italic> is a positive integer representing the size of target user’s nearest neighbors <inline-formula id="j_nejsds33_ineq_056"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[${N_{k}}({x_{i}})$]]></tex-math></alternatives></inline-formula> and <italic>j</italic> is the nearest neighbors’ user index. The performance of the kNN method may be affected by different <italic>k</italic> values. The optimal <italic>k</italic> value depends on the underlying structure of data sets. In this work, we use a fixed value 15 for <italic>k</italic>. It is not difficult to derive the solution to the objective function, which is written as 
<disp-formula id="j_nejsds33_eq_013">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">ˆ</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:mfenced separators="" open="{" close="">
<mml:mrow>
<mml:mtable columnspacing="10.0pt" equalrows="false" columnlines="none" equalcolumns="false" columnalign="left left">
<mml:mtr>
<mml:mtd class="array">
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd class="array">
<mml:msubsup>
<mml:mrow>
<mml:mo largeop="false" movablelimits="false">∑</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" stretchy="false">/</mml:mo>
<mml:mi mathvariant="italic">k</mml:mi>
<mml:mo mathvariant="normal">&gt;</mml:mo>
<mml:mo>=</mml:mo>
<mml:mn>0.5</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd class="array">
<mml:mn>0</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd class="array">
<mml:mtext>otherwise</mml:mtext>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[ {\hat{y}_{i}^{\ast }}=\left\{\begin{array}{l@{\hskip10.0pt}l}1,\hspace{1em}& {\textstyle\textstyle\sum _{j=1}^{k}}{y_{j}}/k\gt =0.5,\\ {} 0,\hspace{1em}& \text{otherwise},\end{array}\right.\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_nejsds33_ineq_057"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mo largeop="false" movablelimits="false">∑</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" stretchy="false">/</mml:mo>
<mml:mi mathvariant="italic">k</mml:mi></mml:math><tex-math><![CDATA[${\textstyle\sum _{j=1}^{k}}{y_{j}}/k$]]></tex-math></alternatives></inline-formula> is the average of response <inline-formula id="j_nejsds33_ineq_058"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>′</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mi mathvariant="italic">s</mml:mi></mml:math><tex-math><![CDATA[${y^{\prime }_{j}}s$]]></tex-math></alternatives></inline-formula> in the nearest neighbors.</p>
<p>With the imputed <inline-formula id="j_nejsds33_ineq_059"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">ˆ</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup></mml:math><tex-math><![CDATA[${\hat{y}_{i}^{\ast }}$]]></tex-math></alternatives></inline-formula>, we obtain the corresponding imputed missing value <inline-formula id="j_nejsds33_ineq_060"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">ˆ</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup></mml:math><tex-math><![CDATA[${\hat{z}_{i}^{\ast }}$]]></tex-math></alternatives></inline-formula> from the cost function formulated as 
<disp-formula id="j_nejsds33_eq_014">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true" columnspacing="0pt" columnalign="right left">
<mml:mtr>
<mml:mtd>
<mml:munder>
<mml:mrow>
<mml:mtext>minimize</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:munder>
<mml:mspace width="2.5pt"/>
</mml:mtd>
<mml:mtd>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">ˆ</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:mo largeop="true" movablelimits="false">∑</mml:mo></mml:mstyle>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">k</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:mo stretchy="false">|</mml:mo>
<mml:mo stretchy="false">|</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">|</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">|</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>+</mml:mo>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>−</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">ˆ</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo stretchy="false">|</mml:mo>
<mml:mo stretchy="false">|</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo stretchy="false">|</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">|</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mspace width="2.5pt"/>
</mml:mtd>
<mml:mtd>
<mml:mspace width="2.5pt"/>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo stretchy="false">≥</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mspace width="2.5pt"/>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo stretchy="false">∈</mml:mo>
<mml:mi mathvariant="italic">I</mml:mi>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mspace width="2.5pt"/>
<mml:mi mathvariant="italic">C</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="italic">h</mml:mi>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[\begin{aligned}{}\underset{{z_{i}^{\ast }}}{\text{minimize}}\hspace{2.5pt}& {\hat{y}_{i}^{\ast }}{\sum \limits_{j=1}^{k}}||{z_{i}^{\ast }}-{z_{j}}|{|_{2}^{2}}+(1-{\hat{y}_{i}^{\ast }})||{z_{i}^{\ast }}|{|_{2}^{2}},\\ {} \hspace{2.5pt}& \hspace{2.5pt}{z_{i}^{\ast }}\ge 0,\hspace{2.5pt}i\in I,\hspace{2.5pt}C(i)=h.\end{aligned}\]]]></tex-math></alternatives>
</disp-formula> 
That is, the estimated <inline-formula id="j_nejsds33_ineq_061"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">ˆ</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup></mml:math><tex-math><![CDATA[${\hat{z}_{i}^{\ast }}$]]></tex-math></alternatives></inline-formula> is given by 
<disp-formula id="j_nejsds33_eq_015">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">ˆ</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>=</mml:mo>
<mml:mfenced separators="" open="{" close="">
<mml:mrow>
<mml:mtable columnspacing="10.0pt" equalrows="false" columnlines="none" equalcolumns="false" columnalign="left left">
<mml:mtr>
<mml:mtd class="array">
<mml:msubsup>
<mml:mrow>
<mml:mo largeop="false" movablelimits="false">∑</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" stretchy="false">/</mml:mo>
<mml:mi mathvariant="italic">k</mml:mi>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd class="array">
<mml:msubsup>
<mml:mrow>
<mml:mo largeop="false" movablelimits="false">∑</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" stretchy="false">/</mml:mo>
<mml:mi mathvariant="italic">k</mml:mi>
<mml:mo mathvariant="normal">&gt;</mml:mo>
<mml:mo>=</mml:mo>
<mml:mn>0.5</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd class="array">
<mml:mn>0</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mspace width="1em"/>
</mml:mtd>
<mml:mtd class="array">
<mml:mtext>otherwise</mml:mtext>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[ {\hat{z}^{\ast }}=\left\{\begin{array}{l@{\hskip10.0pt}l}{\textstyle\textstyle\sum _{j=1}^{k}}{z_{j}}/k,\hspace{1em}& {\textstyle\textstyle\sum _{j=1}^{k}}{y_{j}}/k\gt =0.5,\\ {} 0,\hspace{1em}& \text{otherwise},\end{array}\right.\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_nejsds33_ineq_062"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mo largeop="false" movablelimits="false">∑</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">k</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" stretchy="false">/</mml:mo>
<mml:mi mathvariant="italic">k</mml:mi></mml:math><tex-math><![CDATA[${\textstyle\sum _{j=1}^{k}}{z_{j}}/k$]]></tex-math></alternatives></inline-formula> is the average of response <inline-formula id="j_nejsds33_ineq_063"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>′</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mi mathvariant="italic">s</mml:mi></mml:math><tex-math><![CDATA[${z^{\prime }_{j}}s$]]></tex-math></alternatives></inline-formula> in the nearest neighbors.</p>
<p>The nearest neighbors are determined based on their distances to the target user, that is, the <italic>k</italic> closest neighbors are found by 
<disp-formula id="j_nejsds33_eq_016">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">n</mml:mi>
<mml:mspace width="2.5pt"/>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:mo largeop="true" movablelimits="false">∑</mml:mo></mml:mstyle>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">k</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:mi mathvariant="italic">d</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[ min\hspace{2.5pt}{\sum \limits_{j=1}^{k}}d({\boldsymbol{x}_{i}},{\boldsymbol{x}_{j}}),\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_nejsds33_ineq_064"><alternatives><mml:math>
<mml:mi mathvariant="italic">d</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[$d({\boldsymbol{x}_{i}},{\boldsymbol{x}_{j}})$]]></tex-math></alternatives></inline-formula> is the distance between the users <italic>i</italic> and <italic>j</italic>.</p>
<p>To further accelerate the computation, we adopt the triangle inequality rule [<xref ref-type="bibr" rid="j_nejsds33_ref_030">30</xref>], which avoids unnecessary distance calculations. We first obtain the <italic>k</italic> nearest neighbors within the closet cluster and denote their largest distance as <inline-formula id="j_nejsds33_ineq_065"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">a</mml:mi>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${d_{max}}$]]></tex-math></alternatives></inline-formula>. We denote the distance between the target user and any other cluster centroid as <inline-formula id="j_nejsds33_ineq_066"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${d_{1}}$]]></tex-math></alternatives></inline-formula>, the distance between any user in the same cluster and its cluster centroid as <inline-formula id="j_nejsds33_ineq_067"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${d_{2}}$]]></tex-math></alternatives></inline-formula>, the distance between the target user and any user as <inline-formula id="j_nejsds33_ineq_068"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${d_{3}}$]]></tex-math></alternatives></inline-formula>. The idea of the triangle inequality rule is that when <inline-formula id="j_nejsds33_ineq_069"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">a</mml:mi>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">≤</mml:mo>
<mml:mo stretchy="false">|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">|</mml:mo></mml:math><tex-math><![CDATA[${d_{max}}\le |{d_{1}}-{d_{2}}|$]]></tex-math></alternatives></inline-formula>, then <inline-formula id="j_nejsds33_ineq_070"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">a</mml:mi>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">≤</mml:mo></mml:math><tex-math><![CDATA[${d_{max}}\le $]]></tex-math></alternatives></inline-formula> <inline-formula id="j_nejsds33_ineq_071"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${d_{3}}$]]></tex-math></alternatives></inline-formula>. As a result, we do not have to explicitly calculate <inline-formula id="j_nejsds33_ineq_072"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${d_{3}}$]]></tex-math></alternatives></inline-formula>, which greatly speeds up the distance computation and ensures that the identification of nearest neighbors is robust to clustering. In this study, we use the L<inline-formula id="j_nejsds33_ineq_073"><alternatives><mml:math>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${_{2}}$]]></tex-math></alternatives></inline-formula>-norm to measure distances.</p>
</sec>
<sec id="j_nejsds33_s_006">
<label>3.3</label>
<title>Efficient Clustering Strategy</title>
<p>Note that the data set in the online controlled experimentation often is very large to cluster in the imputation step. To reduce the computational burden in clustering, we propose the stratification-based clustering approach. The key idea is to firstly stratify the user pool, and then perform clustering within each strata.</p>
<p>In the stratification step, we stratify users into two hierarchical levels: treatment assignment and users’ buying characteristics. The treatment assignment, including the treatment group and the control group, is determined by the experimentation configuration. Generally, in online controlled experiments there are two treatment assignments: control and treatment. However, more than two treatment assignments are possible in cases such as multivariant experiments. User’s buying characteristics, including new buyers, infrequent buyers, frequent buyers, and idle buyers, are categorized based on users’ purchase activities at eBay. There are in total 12 buyer categories. Note that both the experimentation configuration and the users’ buying segments are determined prior to the start of the experimentation. The hierarchical stratification is formulated as 
<disp-formula id="j_nejsds33_eq_017">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd>
<mml:mi mathvariant="bold-italic">X</mml:mi>
<mml:mo>=</mml:mo>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:mo largeop="true" movablelimits="false">⋃</mml:mo></mml:mstyle>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">T</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:mo largeop="true" movablelimits="false">⋃</mml:mo></mml:mstyle>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">u</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">U</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mi mathvariant="italic">u</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[ \boldsymbol{X}={\bigcup \limits_{t=1}^{T}}{\bigcup \limits_{u=1}^{U}}{\boldsymbol{X}_{tu}},\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_nejsds33_ineq_074"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="bold-italic">X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mi mathvariant="italic">u</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\boldsymbol{X}_{tu}}$]]></tex-math></alternatives></inline-formula> is the strata at the <italic>t</italic>-th treatment level and the <italic>u</italic>-th users’ buying characteristics in the feature space <inline-formula id="j_nejsds33_ineq_075"><alternatives><mml:math>
<mml:mi mathvariant="bold-italic">X</mml:mi></mml:math><tex-math><![CDATA[$\boldsymbol{X}$]]></tex-math></alternatives></inline-formula>, and there are in total <italic>T</italic> levels treatment assignment and <italic>U</italic> levels users’ buying characteristics.</p>
<p>The combination of stratification and clustering within each strata greatly improves computation efficiency in the imputation step, where the neighbors of the target user are searched within all clusters.</p>
<p>The number of clusters in each strata from the stratification is obtained by maximizing a simplified version of the Silhouette score, also known as simplified Silhouette. The Silhouette score is an effective measure of clustering goodness [<xref ref-type="bibr" rid="j_nejsds33_ref_025">25</xref>], but it requires an intense computation of the distance between each data point and the rest data points. The simplified Silhouette improves the computational efficiency of the Silhouette score by calculating the distances between each data point and centroids of clusters [<xref ref-type="bibr" rid="j_nejsds33_ref_011">11</xref>]. The simplified Silhouette of data point <italic>i</italic>, denoted as <inline-formula id="j_nejsds33_ineq_076"><alternatives><mml:math>
<mml:mi mathvariant="italic">S</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[$S{S_{i}}$]]></tex-math></alternatives></inline-formula>, is defined as 
<disp-formula id="j_nejsds33_eq_018">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mtext>SS</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo><mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">a</mml:mi>
<mml:mi mathvariant="italic">x</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[ {\text{SS}_{i}}=\frac{{b_{i}}-{a_{i}}}{max({a_{i}},{b_{i}})},\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_nejsds33_ineq_077"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${a_{i}}$]]></tex-math></alternatives></inline-formula> is the distance between the data point <italic>i</italic> and the centroid of the cluster it belongs to, and <inline-formula id="j_nejsds33_ineq_078"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${b_{i}}$]]></tex-math></alternatives></inline-formula> is the minimum of distances between the data point <italic>i</italic> and the centroids of other clusters. The final simplified Silhouette is the average of all data points’ simplified Silhouette. Note that the distances of each data point to its cluster centroid have already been calculated and recorded during the modeling process of k-means clustering, which greatly reduces the computational burden of the simplified Silhouette.</p>
<p>A pseudo-code for the proposed method is summarized in Algorithm <xref rid="j_nejsds33_fig_001">1</xref>.</p>
<fig id="j_nejsds33_fig_001">
<label>Algorithm 1</label>
<caption>
<p>Pseudo code for the proposed method.</p>
</caption>
<graphic xlink:href="nejsds33_g001.jpg"/>
</fig>
</sec>
</sec>
<sec id="j_nejsds33_s_007">
<label>4</label>
<title>Simulation</title>
<p>In this section, we conduct the simulation studies to evaluate the performance of the proposed cluster-based KNN imputation method. The complete response has two parts: the non-zero part and the zero part. The non-zero part of response follows a Gaussian distribution <inline-formula id="j_nejsds33_ineq_079"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>1.5</mml:mn>
<mml:mo>+</mml:mo>
<mml:mn>1.1</mml:mn>
<mml:mi mathvariant="italic">w</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1.1</mml:mn>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">s</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mn>0.2</mml:mn>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">s</mml:mi>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="italic">ϵ</mml:mi></mml:math><tex-math><![CDATA[${z_{s}}=1.5+1.1w+1.1{x_{s1}}+0.2{x_{s2}}+\epsilon $]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_nejsds33_ineq_080"><alternatives><mml:math>
<mml:mi mathvariant="italic">ϵ</mml:mi>
<mml:mo stretchy="false">∼</mml:mo>
<mml:mi mathvariant="italic">N</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mn>0.25</mml:mn>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[$\epsilon \sim N(0,0.25)$]]></tex-math></alternatives></inline-formula> where <italic>w</italic> is the binary assignment to the control and the treatment group, and <inline-formula id="j_nejsds33_ineq_081"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">s</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${x_{s1}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_nejsds33_ineq_082"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">s</mml:mi>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${x_{s2}}$]]></tex-math></alternatives></inline-formula> are variables normally distributed <italic>N</italic>(0.1, 1) and <italic>N</italic>(0.2, 2.25), respectively. The binary indicator of response follows a Bernoulli distribution with the conditional probability <inline-formula id="j_nejsds33_ineq_083"><alternatives><mml:math>
<mml:mi mathvariant="italic">P</mml:mi>
<mml:mi mathvariant="italic">r</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">s</mml:mi>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[$Pr({y_{s}}=1|{x_{s3}})$]]></tex-math></alternatives></inline-formula> expressed by a logistic regression model <inline-formula id="j_nejsds33_ineq_084"><alternatives><mml:math>
<mml:mi mathvariant="italic">l</mml:mi>
<mml:mi mathvariant="italic">o</mml:mi>
<mml:mi mathvariant="italic">g</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">s</mml:mi>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:mn>5.8</mml:mn>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">s</mml:mi>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[$logit({x_{s3}})=-1+5.8{x_{s3}}$]]></tex-math></alternatives></inline-formula>, where <inline-formula id="j_nejsds33_ineq_085"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">s</mml:mi>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${x_{s3}}$]]></tex-math></alternatives></inline-formula> is a variable with a Gaussian distribution <italic>N</italic>(0.2, 0.04). In the simulation, we consider three scenarios for generating missing values in the response for both the control and the treatment group. In scenario 1 (S1), the missing is completely at random. In scenario 2 (S2), the missing probability is described with a logistic regression model depending on an unobserved variable following a Gaussian distribution. In scenario 3 (S3), the missing is dependent on the value of the response. Specifically, the missing response is indicated if its value exceeds a pre-defined threshold within the control and the treatment group. In all three scenarios, we further treat responses with zero as missing to represent real cases where users have incomplete records. The sample size is fixed at 5000.</p>
<p>We compare the proposed method with six benchmark models, including (i) Complete-case analysis (BM<inline-formula id="j_nejsds33_ineq_086"><alternatives><mml:math>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${_{1}}$]]></tex-math></alternatives></inline-formula>), (ii) Unconditional control-mean imputation (BM<inline-formula id="j_nejsds33_ineq_087"><alternatives><mml:math>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${_{2}}$]]></tex-math></alternatives></inline-formula>), (iii) Unconditional treatment-mean imputation (BM<inline-formula id="j_nejsds33_ineq_088"><alternatives><mml:math>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${_{3}}$]]></tex-math></alternatives></inline-formula>), (iv) Unconditional zero imputation (BM<inline-formula id="j_nejsds33_ineq_089"><alternatives><mml:math>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${_{4}}$]]></tex-math></alternatives></inline-formula>), (v) Best-case analysis (BM<inline-formula id="j_nejsds33_ineq_090"><alternatives><mml:math>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${_{5}}$]]></tex-math></alternatives></inline-formula>), (vi) Worst-case analysis (BM<inline-formula id="j_nejsds33_ineq_091"><alternatives><mml:math>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mn>6</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${_{6}}$]]></tex-math></alternatives></inline-formula>).</p>
<p>Complete-case analysis removes cases with missing values and uses only cases with complete outcomes. Specifically, we discard <inline-formula id="j_nejsds33_ineq_092"><alternatives><mml:math>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msup></mml:math><tex-math><![CDATA[${\boldsymbol{z}^{mis}}$]]></tex-math></alternatives></inline-formula> and the sample size is reduced to <italic>m</italic>, that is, 
<disp-formula id="j_nejsds33_eq_019">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mtext>BM</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>:</mml:mo>
<mml:mspace width="2.5pt"/>
<mml:mi mathvariant="bold-italic">z</mml:mi>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold-italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">o</mml:mi>
<mml:mi mathvariant="italic">b</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[ {\text{BM}_{1}}:\hspace{2.5pt}\boldsymbol{z}={\boldsymbol{z}^{obs}}.\]]]></tex-math></alternatives>
</disp-formula> 
The complete-case analysis is easy to implement but generates unnecessary waste of information especially when the number of incomplete cases is substantial.</p>
<p>Unconditional control-mean imputation uses the mean in the observed users in the control group to impute missing values while unconditional treatment-mean imputation uses the mean in the observed users in the treatment group for imputation. That is, 
<disp-formula id="j_nejsds33_eq_020">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true" columnspacing="0pt" columnalign="right left">
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mtext>BM</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>:</mml:mo>
<mml:mspace width="2.5pt"/>
</mml:mtd>
<mml:mtd>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">ˆ</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo><mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mo largeop="false" movablelimits="false">∑</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">o</mml:mi>
<mml:mi mathvariant="italic">b</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mspace width="2.5pt"/>
<mml:mi mathvariant="italic">c</mml:mi>
<mml:mo stretchy="false">∈</mml:mo>
<mml:mi mathvariant="italic">C</mml:mi>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mtext>BM</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>:</mml:mo>
<mml:mspace width="2.5pt"/>
</mml:mtd>
<mml:mtd>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">ˆ</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo><mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mo largeop="false" movablelimits="false">∑</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">o</mml:mi>
<mml:mi mathvariant="italic">b</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mspace width="2.5pt"/>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo stretchy="false">∈</mml:mo>
<mml:mi mathvariant="italic">T</mml:mi>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[\begin{aligned}{}{\text{BM}_{2}}:\hspace{2.5pt}& {\hat{z}_{i}^{\ast }}=\frac{{\textstyle\textstyle\sum _{c=1}^{{n_{c}}}}{z_{c}^{obs}}}{{n_{c}}},\hspace{2.5pt}c\in C,\\ {} {\text{BM}_{3}}:\hspace{2.5pt}& {\hat{z}_{i}^{\ast }}=\frac{{\textstyle\textstyle\sum _{t=1}^{{n_{t}}}}{z_{t}^{obs}}}{{n_{t}}},\hspace{2.5pt}t\in T,\end{aligned}\]]]></tex-math></alternatives>
</disp-formula> 
where the set of indices <italic>C</italic> is defined as <inline-formula id="j_nejsds33_ineq_093"><alternatives><mml:math>
<mml:mi mathvariant="italic">C</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo fence="true" stretchy="false">{</mml:mo>
<mml:mi mathvariant="italic">c</mml:mi>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mspace width="2.5pt"/>
<mml:mtext>is in the control group.</mml:mtext>
<mml:mo fence="true" stretchy="false">}</mml:mo></mml:math><tex-math><![CDATA[$C=\{c:{z_{c}}\hspace{2.5pt}\text{is in the control group.}\}$]]></tex-math></alternatives></inline-formula> and the set of indices <italic>T</italic> is defined as <inline-formula id="j_nejsds33_ineq_094"><alternatives><mml:math>
<mml:mi mathvariant="italic">T</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo fence="true" stretchy="false">{</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mspace width="2.5pt"/>
<mml:mtext>is in the treatment group.</mml:mtext>
<mml:mo fence="true" stretchy="false">}</mml:mo></mml:math><tex-math><![CDATA[$T=\{t:{z_{t}}\hspace{2.5pt}\text{is in the treatment group.}\}$]]></tex-math></alternatives></inline-formula>. <inline-formula id="j_nejsds33_ineq_095"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${n_{c}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_nejsds33_ineq_096"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${n_{t}}$]]></tex-math></alternatives></inline-formula> is the number of sample sizes in the control group and in the treatment group, respectively. Unconditional zero imputation uses zero to impute missing values, that is, 
<disp-formula id="j_nejsds33_eq_021">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true" columnspacing="0pt" columnalign="right left">
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mtext>BM</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>:</mml:mo>
<mml:mspace width="2.5pt"/>
</mml:mtd>
<mml:mtd>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">ˆ</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mspace width="2.5pt"/>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="italic">m</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="italic">r</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">n</mml:mi>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[\begin{aligned}{}{\text{BM}_{4}}:\hspace{2.5pt}& {\hat{z}_{i}^{\ast }}=0,\hspace{2.5pt}i=m+r+1,\dots ,n.\end{aligned}\]]]></tex-math></alternatives>
</disp-formula> 
These three imputation methods are different types of single value imputation approach, which can keep the full data size. But these imputation methods treat the missing values as fixed, distorting the distribution and ignoring the uncertainty in the missing values.</p>
<p>The best-case analysis imputes missing values in the treatment (control) group with the mean in the users in the treatment (control) group. In contrast to the best-case analysis, the worst-case analysis imputes missing values in the treatment (control) group with the mean in the users in the control (treatment) group. Here, we assume that the testing feature in nature has a positive impact, and thus the mean in the treatment group is expected to be greater than the mean in the control group. As a result, the difference between the imputed missing values in the treatment group and the control group aligns with the feature impact in the best-case analysis, but contradicts the feature impact in the worst-case analysis. The best-case analysis and the worst-case analysis are expressed as 
<disp-formula id="j_nejsds33_eq_022">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true" columnspacing="0pt" columnalign="right left">
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mtext>BM</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>:</mml:mo>
<mml:mspace width="2.5pt"/>
</mml:mtd>
<mml:mtd>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">ˆ</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo><mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mo largeop="false" movablelimits="false">∑</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">o</mml:mi>
<mml:mi mathvariant="italic">b</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mspace width="2.5pt"/>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">ˆ</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo><mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mo largeop="false" movablelimits="false">∑</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">o</mml:mi>
<mml:mi mathvariant="italic">b</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mtext>BM</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mn>6</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>:</mml:mo>
<mml:mspace width="2.5pt"/>
</mml:mtd>
<mml:mtd>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">ˆ</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo><mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mo largeop="false" movablelimits="false">∑</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">o</mml:mi>
<mml:mi mathvariant="italic">b</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mspace width="2.5pt"/>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">ˆ</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo><mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mo largeop="false" movablelimits="false">∑</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">o</mml:mi>
<mml:mi mathvariant="italic">b</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[\begin{aligned}{}{\text{BM}_{5}}:\hspace{2.5pt}& {\hat{z}_{t}^{\ast }}=\frac{{\textstyle\textstyle\sum _{t=1}^{{n_{t}}}}{z_{t}^{obs}}}{{n_{t}}},\hspace{2.5pt}{\hat{z}_{c}^{\ast }}=\frac{{\textstyle\textstyle\sum _{c=1}^{{n_{c}}}}{z_{c}^{obs}}}{{n_{c}}},\\ {} {\text{BM}_{6}}:\hspace{2.5pt}& {\hat{z}_{t}^{\ast }}=\frac{{\textstyle\textstyle\sum _{c=1}^{{n_{c}}}}{z_{c}^{obs}}}{{n_{c}}},\hspace{2.5pt}{\hat{z}_{c}^{\ast }}=\frac{{\textstyle\textstyle\sum _{t=1}^{{n_{t}}}}{z_{t}^{obs}}}{{n_{t}}},\end{aligned}\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_nejsds33_ineq_097"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">ˆ</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup></mml:math><tex-math><![CDATA[${\hat{z}_{t}^{\ast }}$]]></tex-math></alternatives></inline-formula> (<inline-formula id="j_nejsds33_ineq_098"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">ˆ</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>∗</mml:mo>
</mml:mrow>
</mml:msubsup></mml:math><tex-math><![CDATA[${\hat{z}_{c}^{\ast }}$]]></tex-math></alternatives></inline-formula>) is the imputed missing value in the treatment (control) group, <inline-formula id="j_nejsds33_ineq_099"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">o</mml:mi>
<mml:mi mathvariant="italic">b</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup></mml:math><tex-math><![CDATA[${z_{t}^{obs}}$]]></tex-math></alternatives></inline-formula> (<inline-formula id="j_nejsds33_ineq_100"><alternatives><mml:math>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">o</mml:mi>
<mml:mi mathvariant="italic">b</mml:mi>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
</mml:msubsup></mml:math><tex-math><![CDATA[${z_{c}^{obs}}$]]></tex-math></alternatives></inline-formula>) is the observed value in the treatment (control) group.</p>
<p>To check the performance of the proposed method, we estimate the mean and variance in the control group, and compute lift in the mean between the treatment group and the control group, the standard error (SE) of the difference between the treatment and control group, coefficient of variation (CV) for the control group, zero rate (ZR) and p-value. The lift in the mean between the treatment group and the control group is described as 
<disp-formula id="j_nejsds33_eq_023">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true" columnalign="right left" columnspacing="0pt">
<mml:mtr>
<mml:mtd class="align-odd">
<mml:mtext>Lift</mml:mtext>
</mml:mtd>
<mml:mtd class="align-even">
<mml:mo>=</mml:mo><mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">μ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">μ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">μ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
<mml:mo>×</mml:mo>
<mml:mn>100</mml:mn>
<mml:mi mathvariant="normal">%</mml:mi>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd class="align-odd"/>
<mml:mtd class="align-even">
<mml:mo>=</mml:mo>
<mml:mo mathvariant="normal" fence="true" maxsize="2.03em" minsize="2.03em">(</mml:mo><mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mo largeop="false" movablelimits="false">∑</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
<mml:mrow/>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
<mml:mo>−</mml:mo><mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mo largeop="false" movablelimits="false">∑</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
<mml:mrow/>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
<mml:mo mathvariant="normal" fence="true" maxsize="2.03em" minsize="2.03em">)</mml:mo>
<mml:mo maxsize="2.03em" minsize="2.03em" stretchy="true" mathvariant="normal">/</mml:mo><mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mo largeop="false" movablelimits="false">∑</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
<mml:mrow/>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
<mml:mo>×</mml:mo>
<mml:mn>100</mml:mn>
<mml:mi mathvariant="normal">%</mml:mi>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[\begin{aligned}{}\text{Lift}& =\frac{{\mu _{t}}-{\mu _{c}}}{{\mu _{c}}}\times 100\% \\ {} & =\bigg(\frac{{\textstyle\textstyle\sum _{t=1}^{{n_{t}}}}{z_{t}^{}}}{{n_{t}}}-\frac{{\textstyle\textstyle\sum _{c=1}^{{n_{c}}}}{z_{c}^{}}}{{n_{c}}}\bigg)\bigg/\frac{{\textstyle\textstyle\sum _{c=1}^{{n_{c}}}}{z_{c}^{}}}{{n_{c}}}\times 100\% ,\end{aligned}\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_nejsds33_ineq_101"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">μ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\mu _{t}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_nejsds33_ineq_102"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">μ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\mu _{c}}$]]></tex-math></alternatives></inline-formula> are the mean in the treatment group and the control group, respectively.</p>
<p>The SE is expressed as 
<disp-formula id="j_nejsds33_eq_024">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd>
<mml:mtext>SE</mml:mtext>
<mml:mo>=</mml:mo>
<mml:msqrt>
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>+</mml:mo>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>−</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
<mml:mo>·</mml:mo>
<mml:mo mathvariant="normal" fence="true" maxsize="2.03em" minsize="2.03em">(</mml:mo><mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
<mml:mo>+</mml:mo><mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
<mml:mo mathvariant="normal" fence="true" maxsize="2.03em" minsize="2.03em">)</mml:mo>
</mml:mrow>
</mml:msqrt>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[ \text{SE}=\sqrt{\frac{({n_{t}}-1){s_{t}^{2}}+({n_{c}}-1){s_{c}^{2}}}{{n_{t}}+{n_{c}}-2}\cdot \bigg(\frac{1}{{n_{c}}}+\frac{1}{{n_{t}}}\bigg)},\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_nejsds33_ineq_103"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${s_{t}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_nejsds33_ineq_104"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${s_{c}}$]]></tex-math></alternatives></inline-formula> are the standard errors for the treatment group and the control group, respectively.</p>
<table-wrap id="j_nejsds33_tab_002">
<label>Table 2</label>
<caption>
<p>Performance comparisons of benchmark methods from 50 simulation replications (mean and standard errors (in parenthesis)). Note that method NoMissing uses the original complete response prior to the missing assignment.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: center; border-top: double; border-bottom: solid thin">Scenario</td>
<td style="vertical-align: top; text-align: center; border-top: double; border-bottom: solid thin">Method</td>
<td style="vertical-align: top; text-align: center; border-top: double; border-bottom: solid thin">Lift (%)</td>
<td style="vertical-align: top; text-align: center; border-top: double; border-bottom: solid thin"><inline-formula id="j_nejsds33_ineq_105"><alternatives><mml:math>
<mml:mi mathvariant="italic">μ</mml:mi>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mtext>c</mml:mtext>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[$\mu {_{\text{c}}}$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: center; border-top: double; border-bottom: solid thin"><inline-formula id="j_nejsds33_ineq_106"><alternatives><mml:math>
<mml:mi mathvariant="italic">μ</mml:mi>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mtext>t</mml:mtext>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[$\mu {_{\text{t}}}$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: center; border-top: double; border-bottom: solid thin"><inline-formula id="j_nejsds33_ineq_107"><alternatives><mml:math>
<mml:mi mathvariant="italic">s</mml:mi>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mtext>c</mml:mtext>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[$s{_{\text{c}}}$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: center; border-top: double; border-bottom: solid thin">CV</td>
<td style="vertical-align: top; text-align: center; border-top: double; border-bottom: solid thin"><inline-formula id="j_nejsds33_ineq_108"><alternatives><mml:math>
<mml:mi mathvariant="italic">n</mml:mi>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mtext>c</mml:mtext>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[$n{_{\text{c}}}$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: center; border-top: double; border-bottom: solid thin">ZR</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: center">S1</td>
<td style="vertical-align: top; text-align: center">BM<sub>1</sub></td>
<td style="vertical-align: top; text-align: center">65.6 (4.96)</td>
<td style="vertical-align: top; text-align: center">1.7 (0.05)</td>
<td style="vertical-align: top; text-align: center">2.8 (0.04)</td>
<td style="vertical-align: top; text-align: center"><bold>1.2</bold> (0.03)</td>
<td style="vertical-align: top; text-align: center">0.7 (0.03)</td>
<td style="vertical-align: top; text-align: center">953.8 (30.33)</td>
<td style="vertical-align: top; text-align: center">0 (0)</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: center"/>
<td style="vertical-align: top; text-align: center">BM<sub>2</sub></td>
<td style="vertical-align: top; text-align: center">24.9 (2.24)</td>
<td style="vertical-align: top; text-align: center">1.7 (0.05)</td>
<td style="vertical-align: top; text-align: center">2.1 (0.03)</td>
<td style="vertical-align: top; text-align: center">0.8 (0.02)</td>
<td style="vertical-align: top; text-align: center">0.5 (0.02)</td>
<td style="vertical-align: top; text-align: center">2504.1 (28.23)</td>
<td style="vertical-align: top; text-align: center">0 (0)</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: center"/>
<td style="vertical-align: top; text-align: center">BM<sub>3</sub></td>
<td style="vertical-align: top; text-align: center">17.8 (1.14)</td>
<td style="vertical-align: top; text-align: center">2.4 (0.03)</td>
<td style="vertical-align: top; text-align: center">2.8 (0.04)</td>
<td style="vertical-align: top; text-align: center">0.9 (0.02)</td>
<td style="vertical-align: top; text-align: center">0.4 (0.01)</td>
<td style="vertical-align: top; text-align: center">2504.1 (28.23)</td>
<td style="vertical-align: top; text-align: center">0 (0)</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: center"/>
<td style="vertical-align: top; text-align: center">BM<sub>4</sub></td>
<td style="vertical-align: top; text-align: center">65.4 (9.85)</td>
<td style="vertical-align: top; text-align: center">0.6 (0.02)</td>
<td style="vertical-align: top; text-align: center">1.1 (0.04)</td>
<td style="vertical-align: top; text-align: center">1.1 (0.02)</td>
<td style="vertical-align: top; text-align: center">1.8 (0.04)</td>
<td style="vertical-align: top; text-align: center">2504.1 (28.23)</td>
<td style="vertical-align: top; text-align: center">0.6 (0.01)</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: center"/>
<td style="vertical-align: top; text-align: center">BM<sub>5</sub></td>
<td style="vertical-align: top; text-align: center">65.6 (4.96)</td>
<td style="vertical-align: top; text-align: center">1.7 (0.05)</td>
<td style="vertical-align: top; text-align: center">2.8 (0.04)</td>
<td style="vertical-align: top; text-align: center">0.8 (0.02)</td>
<td style="vertical-align: top; text-align: center">0.5 (0.02)</td>
<td style="vertical-align: top; text-align: center">2504.1 (28.23)</td>
<td style="vertical-align: top; text-align: center">0 (0)</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: center"/>
<td style="vertical-align: top; text-align: center">BM<sub>6</sub></td>
<td style="vertical-align: top; text-align: center">-11.1 (0.76)</td>
<td style="vertical-align: top; text-align: center">2.4 (0.03)</td>
<td style="vertical-align: top; text-align: center">2.1 (0.03)</td>
<td style="vertical-align: top; text-align: center">0.9 (0.02)</td>
<td style="vertical-align: top; text-align: center">0.4 (0.01)</td>
<td style="vertical-align: top; text-align: center">2504.1 (28.23)</td>
<td style="vertical-align: top; text-align: center">0 (0)</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: center"/>
<td style="vertical-align: top; text-align: center">Proposed</td>
<td style="vertical-align: top; text-align: center">40.3 (11.30)</td>
<td style="vertical-align: top; text-align: center"><bold>1.1</bold> (0.25)</td>
<td style="vertical-align: top; text-align: center"><bold>1.5</bold> (0.24)</td>
<td style="vertical-align: top; text-align: center">1.3 (0.20)</td>
<td style="vertical-align: top; text-align: center">1.2 (0.09)</td>
<td style="vertical-align: top; text-align: center">2504.1 (28.23)</td>
<td style="vertical-align: top; text-align: center">0.4 (0.01)</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin"/>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">NoMissing</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">64.8 (7.41)</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">0.9 (0.03)</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">1.5 (0.04)</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">1.2 (0.02)</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">1.4 (0.03)</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">2504.1 (28.23)</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">0.5 (0.01)</td>
</tr>
</tbody><tbody>
<tr>
<td style="vertical-align: top; text-align: center">S2</td>
<td style="vertical-align: top; text-align: center">BM<sub>1</sub></td>
<td style="vertical-align: top; text-align: center">65.0 (4.41)</td>
<td style="vertical-align: top; text-align: center">1.7 (0.04)</td>
<td style="vertical-align: top; text-align: center">2.8 (0.04)</td>
<td style="vertical-align: top; text-align: center"><bold>1.2</bold> (0.03)</td>
<td style="vertical-align: top; text-align: center">0.7 (0.03)</td>
<td style="vertical-align: top; text-align: center">958.6 (29.9)</td>
<td style="vertical-align: top; text-align: center">0 (0)</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: center"/>
<td style="vertical-align: top; text-align: center">BM<sub>2</sub></td>
<td style="vertical-align: top; text-align: center">24.8 (2.02)</td>
<td style="vertical-align: top; text-align: center">1.7 (0.04)</td>
<td style="vertical-align: top; text-align: center">2.1 (0.03)</td>
<td style="vertical-align: top; text-align: center">0.8 (0.02)</td>
<td style="vertical-align: top; text-align: center">0.5 (0.02)</td>
<td style="vertical-align: top; text-align: center">2504.1 (28.23)</td>
<td style="vertical-align: top; text-align: center">0 (0)</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: center"/>
<td style="vertical-align: top; text-align: center">BM<sub>3</sub></td>
<td style="vertical-align: top; text-align: center">17.8 (1.06)</td>
<td style="vertical-align: top; text-align: center">2.4 (0.03)</td>
<td style="vertical-align: top; text-align: center">2.8 (0.04)</td>
<td style="vertical-align: top; text-align: center">0.9 (0.03)</td>
<td style="vertical-align: top; text-align: center">0.4 (0.01)</td>
<td style="vertical-align: top; text-align: center">2504.1 (28.23)</td>
<td style="vertical-align: top; text-align: center">0 (0)</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: center"/>
<td style="vertical-align: top; text-align: center">BM<sub>4</sub></td>
<td style="vertical-align: top; text-align: center">64.3 (9.47)</td>
<td style="vertical-align: top; text-align: center">0.6 (0.02)</td>
<td style="vertical-align: top; text-align: center">1.1 (0.04)</td>
<td style="vertical-align: top; text-align: center">1.1 (0.02)</td>
<td style="vertical-align: top; text-align: center">1.7 (0.04)</td>
<td style="vertical-align: top; text-align: center">2504.1 (28.23)</td>
<td style="vertical-align: top; text-align: center">0.6 (0.01)</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: center"/>
<td style="vertical-align: top; text-align: center">BM<sub>5</sub></td>
<td style="vertical-align: top; text-align: center">65.0 (4.41)</td>
<td style="vertical-align: top; text-align: center">1.7 (0.04)</td>
<td style="vertical-align: top; text-align: center">2.8 (0.04)</td>
<td style="vertical-align: top; text-align: center">0.8 (0.02)</td>
<td style="vertical-align: top; text-align: center">0.5 (0.02)</td>
<td style="vertical-align: top; text-align: center">2504.1 (28.23)</td>
<td style="vertical-align: top; text-align: center">0 (0)</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: center"/>
<td style="vertical-align: top; text-align: center">BM<sub>6</sub></td>
<td style="vertical-align: top; text-align: center">-11.0 (0.59)</td>
<td style="vertical-align: top; text-align: center">2.4 (0.03)</td>
<td style="vertical-align: top; text-align: center">2.1 (0.03)</td>
<td style="vertical-align: top; text-align: center">0.9 (0.03)</td>
<td style="vertical-align: top; text-align: center">0.4 (0.01)</td>
<td style="vertical-align: top; text-align: center">2504.1 (28.23)</td>
<td style="vertical-align: top; text-align: center">0 (0)</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: center"/>
<td style="vertical-align: top; text-align: center">Proposed</td>
<td style="vertical-align: top; text-align: center">39.4 (10.79)</td>
<td style="vertical-align: top; text-align: center"><bold>1.1</bold> (0.25)</td>
<td style="vertical-align: top; text-align: center"><bold>1.5</bold> (0.24)</td>
<td style="vertical-align: top; text-align: center">1.3 (0.20)</td>
<td style="vertical-align: top; text-align: center">1.2 (0.09)</td>
<td style="vertical-align: top; text-align: center">2504.1 (28.23)</td>
<td style="vertical-align: top; text-align: center">0.4 (0.01)</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin"/>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">NoMissing</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">64.8 (7.41)</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">0.9 (0.03)</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">1.5 (0.04)</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">1.2 (0.02)</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">1.4 (0.03)</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">2504.1 (28.23)</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">0.5 (0.01)</td>
</tr>
</tbody><tbody>
<tr>
<td style="vertical-align: top; text-align: center">S3</td>
<td style="vertical-align: top; text-align: center">BM<sub>1</sub></td>
<td style="vertical-align: top; text-align: center">100.9 (8.84)</td>
<td style="vertical-align: top; text-align: center"><bold>1.1</bold> (0.05)</td>
<td style="vertical-align: top; text-align: center">2.2 (0.03)</td>
<td style="vertical-align: top; text-align: center"><bold>0.9</bold> (0.02)</td>
<td style="vertical-align: top; text-align: center">0.8 (0.05)</td>
<td style="vertical-align: top; text-align: center">958.6 (29.9)</td>
<td style="vertical-align: top; text-align: center">0 (0)</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: center"/>
<td style="vertical-align: top; text-align: center">BM<sub>2</sub></td>
<td style="vertical-align: top; text-align: center">38.4 (4.02)</td>
<td style="vertical-align: top; text-align: center"><bold>1.1</bold> (0.05)</td>
<td style="vertical-align: top; text-align: center"><bold>1.5</bold> (0.03)</td>
<td style="vertical-align: top; text-align: center">0.6 (0.02)</td>
<td style="vertical-align: top; text-align: center">0.5 (0.03)</td>
<td style="vertical-align: top; text-align: center">2504.1 (28.23)</td>
<td style="vertical-align: top; text-align: center">0 (0)</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: center"/>
<td style="vertical-align: top; text-align: center">BM<sub>3</sub></td>
<td style="vertical-align: top; text-align: center">23.7 (1.33)</td>
<td style="vertical-align: top; text-align: center">1.8 (0.03)</td>
<td style="vertical-align: top; text-align: center">2.2 (0.03)</td>
<td style="vertical-align: top; text-align: center">0.8 (0.02)</td>
<td style="vertical-align: top; text-align: center">0.4 (0.02)</td>
<td style="vertical-align: top; text-align: center">2504.1 (28.23)</td>
<td style="vertical-align: top; text-align: center">0 (0)</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: center"/>
<td style="vertical-align: top; text-align: center">BM<sub>4</sub></td>
<td style="vertical-align: top; text-align: center">100.1 (15.36)</td>
<td style="vertical-align: top; text-align: center">0.4 (0.02)</td>
<td style="vertical-align: top; text-align: center">0.8 (0.03)</td>
<td style="vertical-align: top; text-align: center">0.8 (0.02)</td>
<td style="vertical-align: top; text-align: center">1.8 (0.06)</td>
<td style="vertical-align: top; text-align: center">2504.1 (28.23)</td>
<td style="vertical-align: top; text-align: center">0.6 (0.01)</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: center"/>
<td style="vertical-align: top; text-align: center">BM<sub>5</sub></td>
<td style="vertical-align: top; text-align: center">100.9 (8.84)</td>
<td style="vertical-align: top; text-align: center"><bold>1.1</bold> (0.05)</td>
<td style="vertical-align: top; text-align: center">2.2 (0.03)</td>
<td style="vertical-align: top; text-align: center">0.6 (0.02)</td>
<td style="vertical-align: top; text-align: center">0.5 (0.03)</td>
<td style="vertical-align: top; text-align: center">2504.1 (28.23)</td>
<td style="vertical-align: top; text-align: center">0 (0)</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: center"/>
<td style="vertical-align: top; text-align: center">BM<sub>6</sub></td>
<td style="vertical-align: top; text-align: center">-14.7 (0.89)</td>
<td style="vertical-align: top; text-align: center">1.8 (0.03)</td>
<td style="vertical-align: top; text-align: center"><bold>1.5</bold> (0.03)</td>
<td style="vertical-align: top; text-align: center">0.8 (0.02)</td>
<td style="vertical-align: top; text-align: center">0.4 (0.02)</td>
<td style="vertical-align: top; text-align: center">2504.1 (28.23)</td>
<td style="vertical-align: top; text-align: center">0 (0)</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: center"/>
<td style="vertical-align: top; text-align: center">Proposed</td>
<td style="vertical-align: top; text-align: center">71.6 (9.67)</td>
<td style="vertical-align: top; text-align: center">0.6 (0.03)</td>
<td style="vertical-align: top; text-align: center">1.0 (0.03)</td>
<td style="vertical-align: top; text-align: center">0.8 (0.02)</td>
<td style="vertical-align: top; text-align: center">1.4 (0.05)</td>
<td style="vertical-align: top; text-align: center">2504.1 (28.23)</td>
<td style="vertical-align: top; text-align: center">0.5 (0.01)</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin"/>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">NoMissing</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">64.8 (7.41)</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">0.9 (0.03)</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">1.5 (0.04)</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">1.2 (0.02)</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">1.4 (0.03)</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">2504.1 (28.23)</td>
<td style="vertical-align: top; text-align: center; border-bottom: solid thin">0.5 (0.01)</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In online experimentation, the faster we run experiments, the more economic benefits, and less operational costs are achieved. Given constant user traffic, running experiments faster means a smaller number of users required [<xref ref-type="bibr" rid="j_nejsds33_ref_003">3</xref>, <xref ref-type="bibr" rid="j_nejsds33_ref_031">31</xref>]. The CV is proportional to the number of users required for achieving a pre-determined statistical power of experiments. The CV is expressed as 
<disp-formula id="j_nejsds33_eq_025">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd>
<mml:mtext>CV</mml:mtext>
<mml:mo>=</mml:mo><mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">μ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[ \text{CV}=\frac{{s_{c}}}{{\mu _{c}}}.\]]]></tex-math></alternatives>
</disp-formula> 
The smaller the CV, the smaller the user size required to detect the difference at the specific statistical power, and thus the higher sensitivity.</p>
<p>The ZR is the ratio of the number of zero’s (<inline-formula id="j_nejsds33_ineq_109"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
<mml:mi mathvariant="italic">e</mml:mi>
<mml:mi mathvariant="italic">r</mml:mi>
<mml:mi mathvariant="italic">o</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${n_{zero}}$]]></tex-math></alternatives></inline-formula>) in imputed <inline-formula id="j_nejsds33_ineq_110"><alternatives><mml:math>
<mml:mi mathvariant="bold-italic">z</mml:mi></mml:math><tex-math><![CDATA[$\boldsymbol{z}$]]></tex-math></alternatives></inline-formula> out of total data size <italic>n</italic>, described as 
<disp-formula id="j_nejsds33_eq_026">
<alternatives><mml:math display="block">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd>
<mml:mtext>ZR</mml:mtext>
<mml:mo>=</mml:mo><mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">z</mml:mi>
<mml:mi mathvariant="italic">e</mml:mi>
<mml:mi mathvariant="italic">r</mml:mi>
<mml:mi mathvariant="italic">o</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[ \text{ZR}=\frac{{n_{zero}}}{n}.\]]]></tex-math></alternatives>
</disp-formula> 
The ZR evaluates the proportion of visitors with the outcome as zero after the imputation method.</p>
<p>We compare the performance of the proposed method and benchmark methods in all scenarios in Table <xref rid="j_nejsds33_tab_002">2</xref>. In S1 and S2, the proposed cluster-based KNN imputation method has the closest <inline-formula id="j_nejsds33_ineq_111"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">μ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\mu _{c}}$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_nejsds33_ineq_112"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">μ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\mu _{t}}$]]></tex-math></alternatives></inline-formula> and ZR compared to the method NoMissing. The BM<sub>2</sub> and BM<sub>3</sub> methods have larger <inline-formula id="j_nejsds33_ineq_113"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">μ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\mu _{c}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_nejsds33_ineq_114"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">μ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\mu _{t}}$]]></tex-math></alternatives></inline-formula> because these methods impute all missing values with nonzero values, which is indicated by their ZR values being 0. The proposed method has a comparable <inline-formula id="j_nejsds33_ineq_115"><alternatives><mml:math>
<mml:mi mathvariant="italic">s</mml:mi>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mtext>c</mml:mtext>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[$s{_{\text{c}}}$]]></tex-math></alternatives></inline-formula> value to the method NoMissing, while the BM<sub>2</sub>, BM<sub>3</sub>, BM<sub>4</sub>, and BM<sub>5</sub> methods have smaller values. This might be explained that the imputation values in the proposed method are not fixed as in the BM<sub>2</sub>, BM<sub>3</sub>, BM<sub>4</sub>, and BM<sub>5</sub> methods. Though the BM<sub>1</sub> method has a similar <inline-formula id="j_nejsds33_ineq_116"><alternatives><mml:math>
<mml:mi mathvariant="italic">s</mml:mi>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mtext>c</mml:mtext>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[$s{_{\text{c}}}$]]></tex-math></alternatives></inline-formula> compared to the proposed method, its sample size is smaller due to the removal of samples with missing responses. In S3, the proposed method does not outperform the BM<sub>2</sub> method. This is probably due to the fact that in S3 the missing response values can be partitioned into one particular group. When this entire group is missing, it is difficult for the KNN-based imputation approach to find good neighbors of missing responses. As a result, the estimated <inline-formula id="j_nejsds33_ineq_117"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">μ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">c</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\mu _{c}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_nejsds33_ineq_118"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">μ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${\mu _{t}}$]]></tex-math></alternatives></inline-formula> are not close to the truth.</p>
</sec>
<sec id="j_nejsds33_s_008">
<label>5</label>
<title>Case Study: Search Ranking Experiment</title>
<p>To illustrate the proposed method, this section uses a real online experiment whose objective was to improve eBay’s item ranking search results based on one ranking algorithm. The experiment hypothesis is that integrating information about negative buyer experiences into the ranking algorithm will reduce the visibility of items with a high probability of negative buyer experiences in search results, resulting in lower product return rates and increased revenues. This experiment lasts three weeks. A portion of eligible eBay users are selected and randomized into three variants – two treatment groups and one control group. The number of participant users in each variant exceeds 10 million. One of the most important outcomes is related to purchases, denoted here as PR.</p>
<table-wrap id="j_nejsds33_tab_003">
<label>Table 3</label>
<caption>
<p>Performance comparisons of benchmark methods in the ranking search experiment. Note that the values of <inline-formula id="j_nejsds33_ineq_119"><alternatives><mml:math>
<mml:mmultiscripts>
<mml:mrow>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
<mml:none/>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mtext>c</mml:mtext>
</mml:mrow>
<mml:none/>
</mml:mmultiscripts>
</mml:math><tex-math><![CDATA[$s^{2}{}_{\text{c}}$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_nejsds33_ineq_120"><alternatives><mml:math>
<mml:mi mathvariant="italic">μ</mml:mi>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mtext>c</mml:mtext>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[$\mu {_{\text{c}}}$]]></tex-math></alternatives></inline-formula>, CV, and SE are not real and masked with particular linear transformation to meet the disclosure requirement.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: double; border-bottom: solid thin; border-right: solid thin">Method</td>
<td style="vertical-align: top; text-align: right; border-top: double; border-bottom: solid thin; border-right: solid thin"><inline-formula id="j_nejsds33_ineq_121"><alternatives><mml:math>
<mml:mmultiscripts>
<mml:mrow>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
<mml:none/>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mtext>c</mml:mtext>
</mml:mrow>
<mml:none/>
</mml:mmultiscripts>
</mml:math><tex-math><![CDATA[$s^{2}{}_{\text{c}}$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: right; border-top: double; border-bottom: solid thin; border-right: solid thin"><inline-formula id="j_nejsds33_ineq_122"><alternatives><mml:math>
<mml:mi mathvariant="italic">μ</mml:mi>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mtext>c</mml:mtext>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[$\mu {_{\text{c}}}$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: right; border-top: double; border-bottom: solid thin; border-right: solid thin">CV</td>
<td style="vertical-align: top; text-align: right; border-top: double; border-bottom: solid thin; border-right: solid thin">ZR</td>
<td style="vertical-align: top; text-align: right; border-top: double; border-bottom: solid thin; border-right: solid thin">Lift (%)</td>
<td style="vertical-align: top; text-align: right; border-top: double; border-bottom: solid thin; border-right: solid thin">SE</td>
<td style="vertical-align: top; text-align: right; border-top: double; border-bottom: solid thin">p-value</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left; border-right: solid thin">BM<sub>1</sub></td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">107035.21</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">1235.8</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">0.265</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">0.00</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">-0.37</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">0.33</td>
<td style="vertical-align: top; text-align: right">0.17</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-right: solid thin">BM<sub>2</sub></td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">20003.17</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">390.5</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">0.362</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">0.00</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">-0.16</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">0.06</td>
<td style="vertical-align: top; text-align: right">0.28</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-right: solid thin">BM<sub>3</sub></td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">20004.96</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">389.9</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">0.363</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">0.00</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">-0.17</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">0.06</td>
<td style="vertical-align: top; text-align: right">0.28</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-right: solid thin">BM<sub>4</sub></td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">20693.30</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">213.7</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">0.673</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">0.83</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">-0.29</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">0.06</td>
<td style="vertical-align: top; text-align: right">0.31</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-right: solid thin">BM<sub>5</sub></td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">20003.17</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">390.5</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">0.362</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">0.00</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">-0.29</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">0.06</td>
<td style="vertical-align: top; text-align: right">0.05</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-right: solid thin">BM<sub>6</sub></td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">20004.96</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">389.9</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">0.363</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">0.00</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">-0.03</td>
<td style="vertical-align: top; text-align: right; border-right: solid thin">0.06</td>
<td style="vertical-align: top; text-align: right">0.82</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin; border-right: solid thin">Proposed</td>
<td style="vertical-align: top; text-align: right; border-bottom: solid thin; border-right: solid thin">21194.12</td>
<td style="vertical-align: top; text-align: right; border-bottom: solid thin; border-right: solid thin">246.6</td>
<td style="vertical-align: top; text-align: right; border-bottom: solid thin; border-right: solid thin">0.590</td>
<td style="vertical-align: top; text-align: right; border-bottom: solid thin; border-right: solid thin">0.80</td>
<td style="vertical-align: top; text-align: right; border-bottom: solid thin; border-right: solid thin">-0.50</td>
<td style="vertical-align: top; text-align: right; border-bottom: solid thin; border-right: solid thin">0.06</td>
<td style="vertical-align: top; text-align: right; border-bottom: solid thin">0.05</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The outcome PR is incomplete due to its high missing rate. The PR is recorded when users made purchases during the experiment’s data collection period, but not when either of the following occurred: users did not make purchases, or the platform was unable to record the purchases before the end of the experiment’s data collection period. To impute PR and thus identify visitors and dropout buyers, we use these informative covariates, including the treatment assignment, the number of sessions, the number of sessions with searches, the number of sessions with qualified events highly related to purchases at eBay, and the user’s buying characteristics. The treatment assignment is pre-determined before running the experiment to assign users to the treatment group and the control group. The number of sessions corresponds to the number of sessions users have throughout the experiment. The number of sessions with searches is the number of sessions that contain at least one search activity. The number of sessions with qualified events is the number of sessions that include at least one qualified event activity. The buying characteristics of users are their historical purchasing patterns at eBay. These useful covariates are complete and do not have missing values. We impute the outcome PR using the proposed cluster-based imputation method. In the step of stratification, we divided the large-scale data set into smaller subsets based on two variables: the treatment assignment and user’s buying characteristic. When performing clustering within each strata, we use the number of sessions, the number of sessions with searches, and the number of sessions with qualified events.</p>
<p>In Table <xref rid="j_nejsds33_tab_003">3</xref>, we compare the performance between the proposed cluster-based imputation method and benchmark methods. The proposed method has a smaller mean in the control group and ZR than other methods except for the BM<inline-formula id="j_nejsds33_ineq_123"><alternatives><mml:math>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${_{4}}$]]></tex-math></alternatives></inline-formula>. The proposed imputation method identifies visitors and dropout buyers from missing values. That being said, the proposed cluster-based imputation method imputes zeros for visitors, which is a portion of users with missing outcomes, and positive values for dropout buyers. Compared to the BM<inline-formula id="j_nejsds33_ineq_124"><alternatives><mml:math>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${_{4}}$]]></tex-math></alternatives></inline-formula>, the proposed imputation method has a smaller size of zero and thus a larger mean in the control group. Compared to other mean-imputation methods that impute all missing values with a single value, the proposed imputation method has more zero’s and a smaller mean in the control group. The proposed method has a larger CV in the control group than all other methods, with the exception of BM<inline-formula id="j_nejsds33_ineq_125"><alternatives><mml:math>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${_{4}}$]]></tex-math></alternatives></inline-formula>. This is largely attributable to the change in the mean of the control group, as the pooled standard errors for all methods, with the exception of BM<inline-formula id="j_nejsds33_ineq_126"><alternatives><mml:math>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${_{4}}$]]></tex-math></alternatives></inline-formula>, are quite close. The proposed method has the smallest lift, and all methods have a consistent direction of lift. Based on the p-value and the Type I error as 10%, the proposed method and BM<inline-formula id="j_nejsds33_ineq_127"><alternatives><mml:math>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${_{5}}$]]></tex-math></alternatives></inline-formula> are statistically significant, indicating that there is sufficient evidence to reject the null hypothesis, whereas other methods are not statistically significant. This is expected because it is well known that single imputation methods tend to dilute mean differences, producing results that there is no difference between the control group and the treatment group. The proposed method has a larger variance in the control group and SE than other methods except for the BM<inline-formula id="j_nejsds33_ineq_128"><alternatives><mml:math>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${_{1}}$]]></tex-math></alternatives></inline-formula>. The BM<inline-formula id="j_nejsds33_ineq_129"><alternatives><mml:math>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${_{1}}$]]></tex-math></alternatives></inline-formula> has a reduced sample size, resulting in the largest variance and SE for the control group. Unlike other methods, with the exception of the BM<inline-formula id="j_nejsds33_ineq_130"><alternatives><mml:math>
<mml:msub>
<mml:mrow/>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${_{1}}$]]></tex-math></alternatives></inline-formula>, the proposed method does not ignore variance among missing values, resulting in a greater variance.</p>
<fig id="j_nejsds33_fig_002">
<label>Figure 1</label>
<caption>
<p>Comparison of mu across user segments between the proposed imputation method and the zero imputation method for the treatment group. The tick values in the vertical axis are omitted for the restriction of disclosure.</p>
</caption>
<graphic xlink:href="nejsds33_g002.jpg"/>
</fig>
<fig id="j_nejsds33_fig_003">
<label>Figure 2</label>
<caption>
<p>Comparison of zero rate across user segments between the proposed imputation method and the zero imputation method.</p>
</caption>
<graphic xlink:href="nejsds33_g003.jpg"/>
</fig>
<p>Figure <xref rid="j_nejsds33_fig_002">1</xref> illustrates the increase in the mean of the control group across users’ buying segments for the proposed cluster-based imputation method and the zero-imputation method. Different user segments have different mean values, with the top two being the frequent buyer levels II and III. The proposed imputation method has larger mean values than the zero imputation method in nearly all user segments. The segments the frequent buyer levels II and III have considerably larger mean increases than the idle buyer levels. This suggests that the dropout buyers are more likely to occur in the frequent buyer levels II and III, while in the segments such as idle buyer levels, users with unrecorded outcomes are more likely to be visitors. This is consistent with the findings in Figure <xref rid="j_nejsds33_fig_003">2</xref> regarding the allocation of the zero rate across user segments. Different user segments have varying degrees of zero rate. The zero rates for frequent buyer levels II and III are approximately 45%, whereas the zero rates for idle buyer levels II and III are above 90%. This is reasonable given that frequent buyer levels II and III are more likely to make purchases, resulting in low zero values for outcome PR. The high zero rate corresponds to the low mean value in Figure <xref rid="j_nejsds33_fig_002">1</xref>.</p>
<p>Figure <xref rid="j_nejsds33_fig_004">3</xref> shows the distribution of CV across user segments for the proposed imputation method and the zero imputation method. For both methods, the CV values for the frequent buyer levels are less than half of those for the idle buyer levels. However, the CV of the proposed method is consistently lower than that of the zero imputation method across all user segments. The decrease in the CV indicates an improvement in sensitivity for the outcome PR. This improvement in sensitivity is largely attributable to the change in mean values.</p>
<fig id="j_nejsds33_fig_004">
<label>Figure 3</label>
<caption>
<p>Comparison of CV across user segments between the proposed imputation method and the zero imputation method for the treatment group. The tick values in the vertical axis are omitted for the restriction of disclosure.</p>
</caption>
<graphic xlink:href="nejsds33_g004.jpg"/>
</fig>
</sec>
<sec id="j_nejsds33_s_009">
<label>6</label>
<title>Discussion</title>
<p>Metrics provide strong evidence to support hypotheses in online experimentation and hence reduce debates in the decision-making process. This paper introduces the concept of dropout buyers and classifies users with incomplete metric values into two categories: visitors and dropout buyers. For the analysis of incomplete metrics, we propose a cluster-based k-nearest neighbors-based imputation method. The proposed imputation method considers both the experiment-specific features and users’ activities along their shopping paths. The proposed method incorporates uncertainty among missing values in the outcome metrics using the k-nearest neighbors method. To facilitate efficient imputation in large-scale data sets in online experimentation, the proposed method employs a combination of stratification and clustering. The stratification approach divides the entire large-scale data set into small subsets to improve computation efficiency in the clustering step. The clustering approach identifies inherent structure patterns to improve the performance of the k-nearest neighbors method within each cluster.</p>
<p>It is worth to remarking that the kNN method used in this work considered the average of responses in nearest neighbors. The weighted average of nearest neighbors has been proposed to suggest that different data points in the neighbor contribute differently to the decision based on their distances from the target point [<xref ref-type="bibr" rid="j_nejsds33_ref_010">10</xref>]. That is, nearby data points, which are closer to the target in the neighbors, have higher influence on the decision than distant data points. Moreover, one would incorporate the network structure information into the kNN for the networked A/B testing [<xref ref-type="bibr" rid="j_nejsds33_ref_034">34</xref>]. Another direction for future research is to study ratio metrics [<xref ref-type="bibr" rid="j_nejsds33_ref_015">15</xref>] related to purchases in the proposed imputation framework. On the other hand, the proposed imputation method aims to impute missing values for each user with missing outcomes. It would be interesting to categorize users with missing outcomes into various hubs and investigate the imputation strategy for each hub of users altogether.</p>
</sec>
</body>
<back>
<ref-list id="j_nejsds33_reflist_001">
<title>References</title>
<ref id="j_nejsds33_ref_001">
<label>[1]</label><mixed-citation publication-type="journal"> <string-name><surname>Bhaskaran</surname>, <given-names>K.</given-names></string-name> and <string-name><surname>Smeeth</surname>, <given-names>L.</given-names></string-name> (<year>2014</year>). <article-title>What is the difference between missing completely at random and missing at random?</article-title> <source>International journal of epidemiology</source> <volume>43</volume>(<issue>4</issue>) <fpage>1336</fpage>–<lpage>1339</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_002">
<label>[2]</label><mixed-citation publication-type="chapter"> <string-name><surname>Deng</surname>, <given-names>A.</given-names></string-name> and <string-name><surname>Shi</surname>, <given-names>X.</given-names></string-name> (<year>2016</year>). <chapter-title>Data-driven metric development for online controlled experiments: Seven lessons learned</chapter-title>. In <source>Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source> <fpage>77</fpage>–<lpage>86</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_003">
<label>[3]</label><mixed-citation publication-type="chapter"> <string-name><surname>Deng</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Xu</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Kohavi</surname>, <given-names>R.</given-names></string-name> and <string-name><surname>Walker</surname>, <given-names>T.</given-names></string-name> (<year>2013</year>). <chapter-title>Improving the sensitivity of online controlled experiments by utilizing pre-experiment data</chapter-title>. In <source>Proceedings of the sixth ACM international conference on Web search and data mining</source> <fpage>123</fpage>–<lpage>132</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_004">
<label>[4]</label><mixed-citation publication-type="chapter"> <string-name><surname>Dmitriev</surname>, <given-names>P.</given-names></string-name> and <string-name><surname>Wu</surname>, <given-names>X.</given-names></string-name> (<year>2016</year>). <chapter-title>Measuring metrics</chapter-title>. In <source>Proceedings of the 25th ACM international on conference on information and knowledge management</source> <fpage>429</fpage>–<lpage>437</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_005">
<label>[5]</label><mixed-citation publication-type="chapter"> <string-name><surname>Dmitriev</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Gupta</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Kim</surname>, <given-names>D. W.</given-names></string-name> and <string-name><surname>Vaz</surname>, <given-names>G.</given-names></string-name> (<year>2017</year>). <chapter-title>A dirty dozen: twelve common metric interpretation pitfalls in online controlled experiments</chapter-title>. In <source>Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining</source> <fpage>1427</fpage>–<lpage>1436</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_006">
<label>[6]</label><mixed-citation publication-type="journal"> <string-name><surname>Dolton</surname>, <given-names>P.</given-names></string-name> and <string-name><surname>O’Neill</surname>, <given-names>D.</given-names></string-name> (<year>1996</year>). <article-title>The restart effect and the return to full-time stable employment</article-title>. <source>Journal of the Royal Statistical Society: Series A (Statistics in Society)</source> <volume>159</volume>(<issue>2</issue>) <fpage>275</fpage>–<lpage>288</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_007">
<label>[7]</label><mixed-citation publication-type="journal"> <string-name><surname>Dolton</surname>, <given-names>P.</given-names></string-name> and <string-name><surname>O’Neill</surname>, <given-names>D.</given-names></string-name> (<year>1996</year>). <article-title>Unemployment duration and the restart effect: some experimental evidence</article-title>. <source>The Economic Journal</source> <volume>106</volume>(<issue>435</issue>) <fpage>387</fpage>–<lpage>400</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_008">
<label>[8]</label><mixed-citation publication-type="other"> <string-name><surname>Goldstein</surname>, <given-names>D. G.</given-names></string-name>, <string-name><surname>Imai</surname>, <given-names>K.</given-names></string-name> and <string-name><surname>Göritz</surname>, <given-names>A. S.</given-names></string-name> (2007). The subtle psychology of voter turnout. Technical Report.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_009">
<label>[9]</label><mixed-citation publication-type="journal"> <string-name><surname>Gupta</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Kohavi</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Tang</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Xu</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Andersen</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Bakshy</surname>, <given-names>E.</given-names></string-name>, <string-name><surname>Cardin</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Chandran</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Chen</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Coey</surname>, <given-names>D.</given-names></string-name> <etal>et al.</etal> (<year>2019</year>). <article-title>Top challenges from the first practical online controlled experiments summit</article-title>. <source>ACM SIGKDD Explorations Newsletter</source> <volume>21</volume>(<issue>1</issue>) <fpage>20</fpage>–<lpage>35</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_010">
<label>[10]</label><mixed-citation publication-type="other"> <string-name><surname>Hechenbichler</surname>, <given-names>K.</given-names></string-name> and <string-name><surname>Schliep</surname>, <given-names>K.</given-names></string-name> (2004). Weighted k-nearest-neighbor techniques and ordinal classification.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_011">
<label>[11]</label><mixed-citation publication-type="chapter"> <string-name><surname>Hruschka</surname>, <given-names>E. R.</given-names></string-name>, <string-name><surname>de Castro</surname>, <given-names>L. N.</given-names></string-name> and <string-name><surname>Campello</surname>, <given-names>R. J.</given-names></string-name> (<year>2004</year>). <chapter-title>Evolutionary algorithms for clustering gene-expression data</chapter-title>. In <source>Fourth IEEE International Conference on Data Mining (ICDM’04)</source> <fpage>403</fpage>–<lpage>406</lpage>. <publisher-name>IEEE</publisher-name>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_012">
<label>[12]</label><mixed-citation publication-type="journal"> <string-name><surname>Imai</surname>, <given-names>K.</given-names></string-name> (<year>2009</year>). <article-title>Statistical analysis of randomized experiments with non-ignorable missing binary outcomes: an application to a voting experiment</article-title>. <source>Journal of the Royal Statistical Society: Series C (Applied Statistics)</source> <volume>58</volume>(<issue>1</issue>) <fpage>83</fpage>–<lpage>104</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1111/j.1467-9876.2008.00637.x" xlink:type="simple">https://doi.org/10.1111/j.1467-9876.2008.00637.x</ext-link>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=2662235">MR2662235</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds33_ref_013">
<label>[13]</label><mixed-citation publication-type="other"> <string-name><surname>Imbens</surname>, <given-names>G. W.</given-names></string-name> and <string-name><surname>Pizer</surname>, <given-names>W. A.</given-names></string-name> (2000). The analysis of randomized experiments with missing data. Technical Report.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_014">
<label>[14]</label><mixed-citation publication-type="journal"> <string-name><surname>Imbens</surname>, <given-names>G. W.</given-names></string-name>, <string-name><surname>Rubin</surname>, <given-names>D. B.</given-names></string-name> and <string-name><surname>Sacerdote</surname>, <given-names>B. I.</given-names></string-name> (<year>2001</year>). <article-title>Estimating the effect of unearned income on labor earnings, savings, and consumption: Evidence from a survey of lottery players</article-title>. <source>American economic review</source> <volume>91</volume>(<issue>4</issue>) <fpage>778</fpage>–<lpage>794</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_015">
<label>[15]</label><mixed-citation publication-type="journal"> <string-name><surname>Jin</surname>, <given-names>Y.</given-names></string-name> and <string-name><surname>Ba</surname>, <given-names>S.</given-names></string-name> (<year>2022</year>). <article-title>Toward Optimal Variance Reduction in Online Controlled Experiments</article-title>. <source>Technometrics</source> <fpage>1</fpage>–<lpage>12</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_016">
<label>[16]</label><mixed-citation publication-type="chapter"> <string-name><surname>Kohavi</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Deng</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Longbotham</surname>, <given-names>R.</given-names></string-name> and <string-name><surname>Xu</surname>, <given-names>Y.</given-names></string-name> (<year>2014</year>). <chapter-title>Seven rules of thumb for web site experimenters</chapter-title>. In <source>Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining</source> <fpage>1857</fpage>–<lpage>1866</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_017">
<label>[17]</label><mixed-citation publication-type="journal"> <string-name><surname>Kohavi</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Crook</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Longbotham</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Frasca</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Henne</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Ferres</surname>, <given-names>J. L.</given-names></string-name> and <string-name><surname>Melamed</surname>, <given-names>T.</given-names></string-name> (<year>2009</year>). <article-title>Online experimentation at Microsoft</article-title>. <source>Data Mining Case Studies</source> <volume>11</volume>(<issue>2009</issue>) <fpage>39</fpage>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_018">
<label>[18]</label><mixed-citation publication-type="book"> <string-name><surname>Little</surname>, <given-names>R. J.</given-names></string-name> and <string-name><surname>Rubin</surname>, <given-names>D. B.</given-names></string-name> (<year>2019</year>). <source>Statistical analysis with missing data</source> <volume>793</volume>. <publisher-name>John Wiley &amp; Sons</publisher-name>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1002/9781119013563" xlink:type="simple">https://doi.org/10.1002/9781119013563</ext-link>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=1925014">MR1925014</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds33_ref_019">
<label>[19]</label><mixed-citation publication-type="chapter"> <string-name><surname>Machmouchi</surname>, <given-names>W.</given-names></string-name> and <string-name><surname>Buscher</surname>, <given-names>G.</given-names></string-name> (<year>2016</year>). <chapter-title>Principles for the design of online A/B metrics</chapter-title>. In <source>Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval</source> <fpage>589</fpage>–<lpage>590</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_020">
<label>[20]</label><mixed-citation publication-type="chapter"> <string-name><surname>MacQueen</surname>, <given-names>J.</given-names></string-name> <etal>et al.</etal> (<year>1967</year>). <chapter-title>Some methods for classification and analysis of multivariate observations</chapter-title>. In <source>Proceedings of the fifth Berkeley symposium on mathematical statistics and probability</source> <volume>1</volume> <fpage>281</fpage>–<lpage>297</lpage>. <publisher-name>Oakland, CA, USA</publisher-name>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=0214227">MR0214227</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds33_ref_021">
<label>[21]</label><mixed-citation publication-type="journal"> <string-name><surname>Mao</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Deng</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Jiang</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Shi</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Tuo</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Shi</surname>, <given-names>D.</given-names></string-name> and <string-name><surname>Guo</surname>, <given-names>F.</given-names></string-name> (<year>2021</year>). <article-title>Driving safety assessment for ride-hailing drivers</article-title>. <source>Accident Analysis &amp; Prevention</source> <volume>149</volume> <fpage>105574</fpage>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_022">
<label>[22]</label><mixed-citation publication-type="journal"> <string-name><surname>Molenberghs</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Thijs</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Jansen</surname>, <given-names>I.</given-names></string-name>, <string-name><surname>Beunckens</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Kenward</surname>, <given-names>M. G.</given-names></string-name>, <string-name><surname>Mallinckrodt</surname>, <given-names>C.</given-names></string-name> and <string-name><surname>Carroll</surname>, <given-names>R. J.</given-names></string-name> (<year>2004</year>). <article-title>Analyzing incomplete longitudinal clinical trial data</article-title>. <source>Biostatistics</source> <volume>5</volume>(<issue>3</issue>) <fpage>445</fpage>–<lpage>464</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_023">
<label>[23]</label><mixed-citation publication-type="chapter"> <string-name><surname>Nie</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Kong</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Yuan</surname>, <given-names>T. T.</given-names></string-name> and <string-name><surname>Burke</surname>, <given-names>P. B.</given-names></string-name> (<year>2020</year>). <chapter-title>Dealing With Ratio Metrics in A/B Testing at the Presence of Intra-User Correlation and Segments</chapter-title>. In <source>International Conference on Web Information Systems Engineering</source> <fpage>563</fpage>–<lpage>577</lpage>. <publisher-name>Springer</publisher-name>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_024">
<label>[24]</label><mixed-citation publication-type="chapter"> <string-name><surname>Ougiaroglou</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Nanopoulos</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Papadopoulos</surname>, <given-names>A. N.</given-names></string-name>, <string-name><surname>Manolopoulos</surname>, <given-names>Y.</given-names></string-name> and <string-name><surname>Welzer-Druzovec</surname>, <given-names>T.</given-names></string-name> (<year>2007</year>). <chapter-title>Adaptive k-nearest-neighbor classification using a dynamic number of nearest neighbors</chapter-title>. In <source>East European Conference on Advances in Databases and Information Systems</source> <fpage>66</fpage>–<lpage>82</lpage>. <publisher-name>Springer</publisher-name>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_025">
<label>[25]</label><mixed-citation publication-type="journal"> <string-name><surname>Rousseeuw</surname>, <given-names>P. J.</given-names></string-name> (<year>1987</year>). <article-title>Silhouettes: a graphical aid to the interpretation and validation of cluster analysis</article-title>. <source>Journal of computational and applied mathematics</source> <volume>20</volume> <fpage>53</fpage>–<lpage>65</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_026">
<label>[26]</label><mixed-citation publication-type="journal"> <string-name><surname>Rubin</surname>, <given-names>D. B.</given-names></string-name> (<year>1976</year>). <article-title>Inference and missing data</article-title>. <source>Biometrika</source> <volume>63</volume>(<issue>3</issue>) <fpage>581</fpage>–<lpage>592</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1093/biomet/63.3.581" xlink:type="simple">https://doi.org/10.1093/biomet/63.3.581</ext-link>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=0455196">MR0455196</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds33_ref_027">
<label>[27]</label><mixed-citation publication-type="book"> <string-name><surname>Rubin</surname>, <given-names>D. B.</given-names></string-name> (<year>2004</year>). <source>Multiple imputation for nonresponse in surveys</source> <volume>81</volume>. <publisher-name>John Wiley &amp; Sons</publisher-name>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=2117498">MR2117498</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds33_ref_028">
<label>[28]</label><mixed-citation publication-type="journal"> <string-name><surname>Spineli</surname>, <given-names>L. M.</given-names></string-name> and <string-name><surname>Kalyvas</surname>, <given-names>C.</given-names></string-name> (<year>2020</year>). <article-title>Comparison of exclusion, imputation and modelling of missing binary outcome data in frequentist network meta-analysis</article-title>. <source>BMC medical research methodology</source> <volume>20</volume>(<issue>1</issue>) <fpage>1</fpage>–<lpage>15</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_029">
<label>[29]</label><mixed-citation publication-type="chapter"> <string-name><surname>Tang</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Agarwal</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>O’Brien</surname>, <given-names>D.</given-names></string-name> and <string-name><surname>Meyer</surname>, <given-names>M.</given-names></string-name> (<year>2010</year>). <chapter-title>Overlapping experiment infrastructure: More, better, faster experimentation</chapter-title>. In <source>Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining</source> <fpage>17</fpage>–<lpage>26</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_030">
<label>[30]</label><mixed-citation publication-type="chapter"> <string-name><surname>Wang</surname>, <given-names>X.</given-names></string-name> (<year>2011</year>). <chapter-title>A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality</chapter-title>. In <source>The 2011 international joint conference on neural networks</source> <fpage>1293</fpage>–<lpage>1299</lpage>. <publisher-name>IEEE</publisher-name>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_031">
<label>[31]</label><mixed-citation publication-type="book"> <string-name><surname>Wu</surname>, <given-names>C. J.</given-names></string-name> and <string-name><surname>Hamada</surname>, <given-names>M. S.</given-names></string-name> (<year>2011</year>). <source>Experiments: planning, analysis, and optimization</source> <volume>552</volume>. <publisher-name>John Wiley &amp; Sons</publisher-name>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=2583259">MR2583259</ext-link></mixed-citation>
</ref>
<ref id="j_nejsds33_ref_032">
<label>[32]</label><mixed-citation publication-type="chapter"> <string-name><surname>Xie</surname>, <given-names>H.</given-names></string-name> and <string-name><surname>Aurisset</surname>, <given-names>J.</given-names></string-name> (<year>2016</year>). <chapter-title>Improving the sensitivity of online controlled experiments: Case studies at netflix</chapter-title>. In <source>Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source> <fpage>645</fpage>–<lpage>654</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_033">
<label>[33]</label><mixed-citation publication-type="chapter"> <string-name><surname>Xu</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Duan</surname>, <given-names>W.</given-names></string-name> and <string-name><surname>Huang</surname>, <given-names>S.</given-names></string-name> (<year>2018</year>). <chapter-title>SQR: balancing speed, quality and risk in online experiments</chapter-title>. In <source>Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</source> <fpage>895</fpage>–<lpage>904</lpage>.</mixed-citation>
</ref>
<ref id="j_nejsds33_ref_034">
<label>[34]</label><mixed-citation publication-type="journal"> <string-name><surname>Zhang</surname>, <given-names>Q.</given-names></string-name> and <string-name><surname>Kang</surname>, <given-names>L.</given-names></string-name> (<year>2022</year>). <article-title>Locally Optimal Design for A/B Tests in the Presence of Covariates and Network Dependence</article-title>. <source>Technometrics</source> <volume>64</volume>(<issue>3</issue>) <fpage>358</fpage>–<lpage>369</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/00401706.2022.2046169" xlink:type="simple">https://doi.org/10.1080/00401706.2022.2046169</ext-link>. <ext-link ext-link-type="uri" xlink:href="https://mathscinet.ams.org/mathscinet-getitem?mr=4457329">MR4457329</ext-link></mixed-citation>
</ref>
</ref-list>
</back>
</article>
