1 Instruction
Optimal designs of experiments are developed to reduce the variance of estimated model parameters [24] by optimizing a function of the information matrix usually under a (generalized) linear model assumption. Examples include the determinant, the trace, and the minimum eigenvalue of the information matrix, which result in D-, A-, and E-optimal designs, respectively. In many areas, such as clinical trials or user behavior studies in IT companies, the experiments are conducted to investigate the treatment effects over different users by designing controlled experiments, i.e., divide the users into different groups and assigning each group with a treatment or control. For example, optimal design of experiments is applied to design clinical trials in the era of precision medicine, where the patient response may depend on some biomarkers (covariates). In fact, there is a significant growing evidence that the response of the patients may depend on their covariates in a wide range of applications. For instance, [26] showed that the top ten highest-grossing drugs in the US are only effective in 4% to 25% of the patient population, a shocking result for the current clinical practice. [6] applied a modified D-optimal design to early stage clinical trials when the covariates are uncontrollable. [19] applied a weighted L-optimal design for late stage Phase III clinical trials with observed covariates. As another emerging application, Tech Giants such as Google, Facebook, and LinkedIn frequently use A/B testing for marketing, web design, and data-driven decision making [29]. In fact, applying controlled experiments such as A/B testing experiments in business has improved the performance of companies [18]. Also, there are some methodological contributions for optimal design of experiments applied for A/B testing. For example, [5] studied an offline and online standard A/B testing with the goal of maximizing the precision of least square estimations.
The literature of optimal design of experiments for linear models, which we also use in this study, is vast. From the optimal design perspective, [2] investigated the problem with two treatments, and developed the D-optimal design for both covariates and treatment allocation under the assumption of a linear model with interactions between covariates and treatments. [30] generalized the work of [2] to experiments with multiple treatments. Following the same model setting, [25] developed the A- and E-optimal designs for treatments and covariates.
In practice, the user covariates may not be controllable, due to the restriction in the user recruiting process. Thus, an alternative problem is to allocate the treatments to a given set of users with observed covariates. From the optimal design perspective, the aim is to obtain an exact design by optimizing a design objective with the treatment allocation as the design variable given fixed covariates, which often results in an optimization problem with binary decision variables. For two treatment allocation, under the linear model assumption without interaction between covariates and treatments, [16] proposed computationally tractable algorithms to search D- and A- optimal designs. Also, with a particular interest of minimizing the variance of the estimated treatment effect (i.e., a ${D_{s}}$ optimal design criterion), [5] demonstrated that the problem can be solved efficiently based on a generalization of the MAX-CUT semi-definite programming (SDP) relaxation of [14]. Given the linear model with interaction between treatments and covariates, [31] proposed an optimal design objective for personalized decision making with two treatments and a corresponding approximation solution approach to obtain near optimal allocations efficiently.
In this paper, we generalize the work in [31] to multiple treatment allocation for personalized decision making with observational covariates. Specifically, we consider the situation that there are a finite number of treatment options and subjects with given covariates. We assume that the response of a subject assigned to a treatment follows a linear model which includes the interaction between covariates and treatments. Accordingly, we can define the best treatment option for each subject groups, which can be estimated with data. This setup facilitates precision decision making. However, by including more than two treatments, the resulting design objective can not be simplified by the approximation approach developed for the two treatment cases in [31]. Since the design objective matches the E-optimal design criterion, we reformulate the optimization problem as a SDP with binary decision variables. We use a YALMIP and MOSEK based optimization solver for SDP in a branch-and-bound scheme to provide the optimal design. In our numerical study, we assess the quality of the optimal designs provided by the solver.
2 Problem Description
A decision-making problem in practice can often be simplified to choose a treatment from a finite candidate set $\{1,\dots ,K\}$ that has the best performance measure ${\mu _{k}}$, i.e.,
The values of ${\mu _{k}}$’s are unknown, which can be estimated based on collected user responses to their treatment allocations. Particularly, given n users, the experimenter allocates one of the K treatments to each of the users. The response of the i-th user is denoted by ${y_{i}}$. Assume that ${y_{i}}$ is a continuous scalar that follows a model given by
where ${\varepsilon _{i}}$ is an additive error term, and ${x_{ik}}\in \{0,1\}$ with ${x_{ik}}=1$ indicating that the k-th treatment is allocated to the i-th user. We require that ${\textstyle\sum _{k=1}^{K}}{x_{ik}}=1$ to guarantee that each user is only exposed to a single treatment. Under the model assumption in (2.2), the performance measure ${\mu _{k}}$ represents the average treatment performance over the target user population, which can be estimated by ${\hat{\mu }_{k}}$ given responses ${y_{i}}$’s and treatment allocation ${x_{ik}}$’s. Then a data-driven solution to (2.1) is given by
The accuracy of this decision is associated with the accuracy of the estimates ${\hat{\mu }_{k}}$’s. To reduce the uncertainty of the decision, we can reduce the variances of the estimates ${\hat{\mu }_{k}}$’s.
(2.2)
\[ {y_{i}}={\sum \limits_{k=1}^{K}}{x_{ik}}{\mu _{k}}+{\varepsilon _{i}},\hspace{2.5pt}\hspace{2.5pt}\mathrm{for}\hspace{2.5pt}\hspace{2.5pt}i=1,\dots ,n,\]Under the context of personalized decision making, a user is associated with observed covariates $\mathbf{z}={({z_{1}},\dots ,{z_{p}})^{\top }}\in \mathcal{Z}\subset {\mathbb{R}^{p}}$, where $\mathcal{Z}$ is the covariates space of the target population. Examples of the covariates include the demographic information, social behavior and network connections. The treatment performance can often be related to the covariates values. Let ${\mu _{k}}(\mathbf{z})$ be a function of the users’ covariates $\boldsymbol{z}$ representing the personalized treatment performance measure. In this paper, we further assume that ${\mu _{k}}(\mathbf{z})$ is a linear function over z. Therefore, the linear model in (2.2) is represented by
where ${\boldsymbol{\beta }_{k}}={({\beta _{1k}},\dots ,{\beta _{pk}})^{\top }}$ is a vector of linear coefficients. The first feature in each ${\mathbf{z}_{i}}$ is fixed to be one as the linear intercept. Similar to [30], this model incorporates the interaction between treatment and covariates, which enables the estimation of heterogeneity of treatment performance over covariates $\boldsymbol{z}$.
(2.4)
\[\begin{aligned}{}{y_{i}}& ={\sum \limits_{k=1}^{K}}{x_{ik}}{\mu _{k}}({\mathbf{z}_{i}})+{\varepsilon _{i}}\\ {} & ={\sum \limits_{k=1}^{K}}{x_{ik}}{\mathbf{z}_{i}^{\top }}{\boldsymbol{\beta }_{k}}+{\varepsilon _{i}},\hspace{2.5pt}\hspace{2.5pt}\mathrm{for}\hspace{2.5pt}\hspace{2.5pt}i=1,\dots ,n,\end{aligned}\]Under this model, the problem of personalized decision making becomes
With estimated parameters, a data-driven solution to (2.5) can be obtained for any $\mathbf{z}\in \mathcal{Z}$, i.e.,
Similar to (2.3), to improve the accuracy of personalized decisions, it is desired to reduce the variance of estimated ${\mu _{k}}(\mathbf{z})={\mathbf{z}^{\top }}{\boldsymbol{\beta }_{k}}$ for each $k\in \{1,\dots ,K\}$ and each $\mathbf{z}\in \mathcal{Z}$. Following the optimal design perspective in [31], we can reduce the variances by searching an optimal allocation of the treatments ${x_{ik}}$’s in the presence of fixed covariates of recruited users as illustrated in Figure 1. Next, we investigate the design criterion that characterizes the variance of estimated ${\mu _{k}}(\mathbf{z})={\mathbf{z}^{\top }}{\boldsymbol{\beta }_{k}}$ for each $k\in \{1,\dots ,K\}$ and any $\mathbf{z}\in \mathcal{Z}$.
(2.5)
\[ \begin{aligned}{}{k^{\ast }}(\mathbf{z})=& {\mathrm{argmax}_{k\in \{1,\dots ,K\}}}{\mu _{k}}(\mathbf{z})\\ {} =& {\mathrm{argmax}_{k\in \{1,\dots ,K\}}}\left\{{\mathbf{z}^{\top }}{\boldsymbol{\beta }_{k}}\right\}\\ {} & \text{for any}\hspace{2.5pt}\mathbf{z}\in \mathcal{Z}.\end{aligned}\](2.6)
\[ {\hat{k}^{\ast }}(\mathbf{z})={\mathrm{argmax}_{k=1,2,\dots ,K}}\left\{{\mathbf{z}^{\top }}\hat{{\boldsymbol{\beta }_{k}}}\right\}\]Figure 1
An illustration of the motivation of the proposed approach for two treatment allocations (i.e., + and − in the figure): We aim to obtain an optimal design of the treatment allocations to reduce the variances of estimated personalized treatment effects for users with any covariates $\boldsymbol{z}$. By reducing these variances, it is clearer how to make the decision accurately.
3 Design Criterion of Controlled Experiments for Personalized Decision Making
Let $\boldsymbol{y}={({y_{1}},\dots ,{y_{n}})^{\top }}$ be the responses of n users. Correspondingly, let Z be an $n\times p$ matrix, whose rows are the users’ covariates ${\boldsymbol{z}_{1}^{\top }},\dots ,{\boldsymbol{z}_{n}^{\top }}$ (the first entry is loaded by one as the intercept). The treatment allocation is recorded in an $n\times K$ matrix X, where the i-th row of X is denoted by ${\boldsymbol{x}_{i}^{\top }}={({x_{i1}},\dots ,{x_{iK}})^{\top }}$ representing the treatment allocation of the i-th user. We assume that the responses are generated under the model in (2.4). The model covariates matrix is then given by
\[ \left[\begin{array}{c}{\boldsymbol{x}_{1}^{\top }}\otimes {\boldsymbol{z}_{1}^{\top }}\\ {} \vdots \\ {} {\boldsymbol{x}_{n}^{\top }}\otimes {\boldsymbol{z}_{n}^{\top }}\end{array}\right]\]
which we assume has a full column rank. The notation ⊗ denotes the Kronecker product. By stacking ${\boldsymbol{\beta }_{k}}$’s in a vector $\boldsymbol{b}={({\boldsymbol{\beta }_{1}^{\top }},\dots ,{\boldsymbol{\beta }_{K}^{\top }})^{\top }}$ of size $Kp$, we can express the least squares estimator of $\boldsymbol{b}$ by
(3.1)
\[\begin{aligned}{}\hat{\boldsymbol{b}}& ={\left[{\hat{\boldsymbol{\beta }}_{1}^{\top }}\cdots {\hat{\boldsymbol{\beta }}_{K}}\right]^{\top }}\\ {} & ={\left({\left[\begin{array}{c}{\boldsymbol{x}_{1}^{\top }}\otimes {\boldsymbol{z}_{1}^{\top }}\\ {} \vdots \\ {} {\boldsymbol{x}_{n}^{\top }}\otimes {\boldsymbol{z}_{n}^{\top }}\end{array}\right]^{\top }}\left[\begin{array}{c}{\boldsymbol{x}_{1}^{\top }}\otimes {\boldsymbol{z}_{1}^{\top }}\\ {} \vdots \\ {} {\boldsymbol{x}_{n}^{\top }}\otimes {\boldsymbol{z}_{n}^{\top }}\end{array}\right]\right)^{-1}}\\ {} & \cdot {\left[\begin{array}{c}{\boldsymbol{x}_{1}^{\top }}\otimes {\boldsymbol{z}_{1}^{\top }}\\ {} \vdots \\ {} {\boldsymbol{x}_{n}^{\top }}\otimes {\boldsymbol{z}_{n}^{\top }}\end{array}\right]^{\top }}\boldsymbol{y}.\end{aligned}\]We assume that the additive error term ${\varepsilon _{i}}$’s in (2.4) are independent and identically distributed random variables with mean zero and variance ${\sigma ^{2}}$. The variance-covariance matrix of the estimated $\boldsymbol{b}$ in (3.1) can be expressed by
\[ \mathrm{var}(\hat{\boldsymbol{b}})={\sigma ^{2}}{\left({\left[\begin{array}{c}{\boldsymbol{x}_{1}^{\top }}\otimes {\boldsymbol{z}_{1}^{\top }}\\ {} \vdots \\ {} {\boldsymbol{x}_{n}^{\top }}\otimes {\boldsymbol{z}_{n}^{\top }}\end{array}\right]^{\top }}\left[\begin{array}{c}{\boldsymbol{x}_{1}^{\top }}\otimes {\boldsymbol{z}_{1}^{\top }}\\ {} \vdots \\ {} {\boldsymbol{x}_{n}^{\top }}\otimes {\boldsymbol{z}_{n}^{\top }}\end{array}\right]\right)^{-1}}.\]
Notice that
\[\begin{aligned}{}& {\left[\begin{array}{c}{\boldsymbol{x}_{1}^{\top }}\otimes {\boldsymbol{z}_{1}^{\top }}\\ {} \vdots \\ {} {\boldsymbol{x}_{n}^{\top }}\otimes {\boldsymbol{z}_{n}^{\top }}\end{array}\right]^{\top }}\left[\begin{array}{c}{\boldsymbol{x}_{1}^{\top }}\otimes {\boldsymbol{z}_{1}^{\top }}\\ {} \vdots \\ {} {\boldsymbol{x}_{n}^{\top }}\otimes {\boldsymbol{z}_{n}^{\top }}\end{array}\right]\\ {} & ={\sum \limits_{i=1}^{n}}({\boldsymbol{x}_{i}}\otimes {\boldsymbol{z}_{i}})({\boldsymbol{x}_{i}^{\top }}\otimes {\boldsymbol{z}_{i}^{\top }})={\sum \limits_{i=1}^{n}}({\boldsymbol{x}_{i}}{\boldsymbol{x}_{i}^{\top }})\otimes ({\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }})\\ {} & =\left[\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}{\textstyle\textstyle\sum _{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{i1}}& 0& 0\\ {} & \ddots \\ {} 0& 0& {\textstyle\textstyle\sum _{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{iK}}\end{array}\right],\end{aligned}\]
which is the result of that ${\boldsymbol{x}_{i}}{\boldsymbol{x}_{i}^{\top }}$ is a $K\times K$ diagonal matrix with diagonal entries ${x_{i1}},\dots ,{x_{iK}}$. Then the variance-covariance matrix of $\hat{\boldsymbol{b}}$ can be simplified to a block diagonal matrix:
(3.2)
\[ \mathrm{var}(\hat{\boldsymbol{b}})\hspace{-0.1667em}=\hspace{-0.1667em}{\sigma ^{2}}\mathrm{diag}\left\{{\left({\sum \limits_{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{i1}}\right)^{-1}}\hspace{-0.1667em}\dots {\left({\sum \limits_{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{iK}}\right)^{-1}}\hspace{-0.1667em}\right\}.\]For any $k\in \{1,\dots ,K\}$ and $\boldsymbol{z}\in \mathcal{Z}$, we have that
\[ \mathrm{var}\left({\boldsymbol{z}^{\top }}{\hat{\boldsymbol{\beta }}_{k}}\right)={\boldsymbol{z}^{\top }}\mathrm{var}\left({\hat{\boldsymbol{\beta }}_{k}}\right)\boldsymbol{z}={\sigma ^{2}}{\boldsymbol{z}^{\top }}{\left({\sum \limits_{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{ik}}\right)^{-1}}\boldsymbol{z}.\]
As noted earlier, we assume that the experiments are conducted with n existing users. Thus, the personalized information matrix Z is observed, and the experimental design problem is to determine the treatment allocation matrix X. A necessary condition for positive definiteness of the variance-covariance matrix requires that ${\textstyle\sum _{i=1}^{n}}{x_{ik}}\ge p$ for any $k\in \{1,\dots ,K\}$ and $n\ge Kp$.Our goal is to minimize the variance of ${\boldsymbol{z}^{\top }}{\hat{\boldsymbol{\beta }}_{k}}$ for any k and $\boldsymbol{z}$ to improve the accuracy of the data-driven decision in (2.6). Instead of minimizing all the variances simultaneously, we formulate the optimization problem by minimizing the worst case, i.e., the maximum variance of any $k\in \{1,\dots ,K\}$ and $\boldsymbol{z}\in \mathcal{Z}$. Therefore, the design criterion is given by
The optimal design is then given by solving the problem
where ${\textstyle\sum _{i=1}^{n}}{x_{ik}}\ge p$ is required for the positive definiteness of the matrix ${\textstyle\sum _{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{ik}}$, and the constraint ${\textstyle\sum _{k=1}^{K}}{x_{ik}}=1$ ensures that each user is exactly assigned to one treatment.
(3.3)
\[ \begin{aligned}{}& {\mathrm{max}_{\mathbf{z}\in \mathcal{Z}}}\underset{k=1,2,\dots ,K}{\max }\mathrm{var}\left({\mathbf{z}^{\top }}{\hat{\boldsymbol{\beta }}_{k}}\right)\\ {} & ={\mathrm{max}_{\mathbf{z}\in \mathcal{Z}}}\underset{k=1,2,\dots ,K}{\max }{\sigma ^{2}}{\boldsymbol{z}^{\top }}{\left({\sum \limits_{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{ik}}\right)^{-1}}\boldsymbol{z}.\end{aligned}\](3.4a)
\[\begin{aligned}{}\underset{X}{\min }\underset{\boldsymbol{z}\in \mathcal{Z}}{\max }\underset{k=1,2,\dots ,K}{\max }\hspace{1em}& {\boldsymbol{z}^{\top }}{\left({\sum \limits_{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{ik}}\right)^{-1}}\boldsymbol{z}\end{aligned}\](3.4b)
\[\begin{aligned}{}\text{s.t.}\hspace{2.5pt}\hspace{1em}& {\sum \limits_{i=1}^{n}}{x_{ik}}\ge p,\hspace{1em}k=1,\dots ,K,\end{aligned}\]Although the optimal design criterion has a closed-form expression, the corresponding optimization problem is still a challenge to solve directly as a mixed-integer minimax problem. We assume that the covariates space $\mathcal{Z}$ is given by a unit ball, i.e., $\mathcal{Z}=\{\boldsymbol{z}\in {\mathbb{R}^{p}}\mid \| \boldsymbol{z}\| \le 1\}$. For any X that gives a positive definite variance-covariance matrix, we have that
This design objective is known as the E-optimal design in the literature (e.g., [24] and [9]), i.e.,
\[\begin{aligned}{}& \underset{\| \boldsymbol{z}\| \le 1}{\max }\underset{k=1,2,\dots ,K}{\max }\left[{\boldsymbol{z}^{\top }}{\left({\sum \limits_{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{ik}}\right)^{-1}}\boldsymbol{z}\right]\\ {} & =\underset{k=1,2,\dots ,K}{\max }\underset{\| \boldsymbol{z}\| \le 1}{\max }\left[{\boldsymbol{z}^{\top }}{\left({\sum \limits_{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{ik}}\right)^{-1}}\boldsymbol{z}\right]\\ {} & =\underset{k=1,2,\dots ,K}{\max }\underset{\| \boldsymbol{z}\| =1}{\max }\left[{\boldsymbol{z}^{\top }}{\left({\sum \limits_{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{ik}}\right)^{-1}}\boldsymbol{z}\right]\\ {} & =\underset{k=1,2,\dots ,K}{\max }{\lambda _{\max }}\left[{\left({\sum \limits_{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{ik}}\right)^{-1}}\right]\\ {} & =\underset{k=1,2,\dots ,K}{\max }\left\{1/{\lambda _{\min }}\left[\left({\sum \limits_{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{ik}}\right)\right]\right\}\\ {} & =\underset{k=1,2,\dots ,K}{\max }\left\{1/\underset{\| \boldsymbol{z}\| =1}{\min }\left[{\boldsymbol{z}^{\top }}\left({\sum \limits_{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{ik}}\right)\boldsymbol{z}\right]\right\},\end{aligned}\]
where ${\lambda _{\max }}$ and ${\lambda _{\min }}$ produce the maximum and minimum eigenvalues of a matrix, respectively. Accordingly, the optimization problem in (3.4) is equivalent to (3.5a)
\[\begin{aligned}{}\underset{X}{\max }\underset{k=1,2,\dots ,K}{\min }\hspace{1em}& {\lambda _{\min }}\left({\sum \limits_{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{ik}}\right)\end{aligned}\](3.5b)
\[\begin{aligned}{}\text{s.t.}\hspace{2.5pt}\hspace{1em}& {\sum \limits_{i=1}^{n}}{x_{ik}}\ge p,\hspace{1em}k=1,\dots ,K,\end{aligned}\]
\[\begin{aligned}{}& \underset{k=1,2,\dots ,K}{\min }{\lambda _{\min }}\left({\sum \limits_{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{ik}}\right)\\ {} & ={\lambda _{\min }}\left(\left[\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}{\textstyle\textstyle\sum _{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{i1}}& 0& 0\\ {} & \ddots \\ {} 0& 0& {\textstyle\textstyle\sum _{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{iK}}\end{array}\right]\right).\end{aligned}\]
Next, we provide a SDP-based solution approach to solve (3.5) with observed covariates Z and a binary decision X.
4 A SDP-based Solution Approach
Following the reformulation technique for E-optimal design in [8], we cast problem (3.5) as a SDP with binary decision variables. The reformulation is based on the following observation. Let A be a real $p\times p$ symmetric matrix. It is well known in matrix analysis that for any $\delta \in \mathbb{R}$, ${\lambda _{\min }}(A-\delta I)={\lambda _{\min }}(A)-\delta $, where I is the $p\times p$ identity matrix. Therefore, ${\lambda _{\min }}(A)$ can be expressed as the largest value of δ such that ${\lambda _{\min }}(A-\delta I)\ge 0$, i.e.,
where $A-\delta I\succeq 0$ means that $A-\delta I$ is a positive semi-definite matrix.
We apply the above observation to problem (3.5). For $k=1,\dots ,K$,
As a result,
\[\begin{aligned}{}& \underset{k=1,\dots ,K}{\min }{\lambda _{\min }}\left({\sum \limits_{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{ik}}\right)\\ {} & =\underset{k=1,\dots ,K}{\min }\max \left\{{\delta _{k}}\Bigg|{\sum \limits_{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{ik}}-{\delta _{k}}I\succeq 0\right\}\\ {} & \hspace{-0.1667em}=\hspace{-0.1667em}\max \left\{\lambda \Bigg|\lambda \hspace{-0.1667em}\le \hspace{-0.1667em}{\delta _{k}},\hspace{0.1667em}{\sum \limits_{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{ik}}\hspace{-0.1667em}-\hspace{-0.1667em}{\delta _{k}}I\succeq 0,\hspace{0.1667em}k=1,\dots ,K\right\}\\ {} & =\max \left\{\lambda \Bigg|{\sum \limits_{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{ik}}-\lambda I\succeq 0,\hspace{0.1667em}k=1,\dots ,K\right\}.\end{aligned}\]
Consequently, problem (3.5) can be equivalently written as (4.1a)
\[\begin{aligned}{}\underset{X,\hspace{0.1667em}\lambda }{\max }\hspace{1em}& \lambda \end{aligned}\](4.1b)
\[\begin{aligned}{}\text{s.t.}\hspace{2.5pt}\hspace{1em}& {\sum \limits_{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{ik}}-\lambda I\succeq 0,\hspace{1em}k=1,2,\dots ,K,\end{aligned}\](4.1c)
\[\begin{aligned}{}& {\sum \limits_{i=1}^{n}}{x_{ik}}\ge p,\hspace{1em}k=1,\dots ,K,\end{aligned}\]Note that, the reformulation technique is the same as used in the SDP reformulation proposed for the approximate E-optimal design in Section 7.5 of [8]. However, the discrete feature in problem (3.5) cannot be averted by simply considering a continuous relaxation as in [8], even when n is large. Thus, the implementation of a SDP problem with continuous decisions cannot solve our problem. As a mixed binary SDP, the reformulation (4.1) can be solved by a branch-and-bound (BnB) algorithm, where each relaxation problem in a BnB node is solved by an SDP solver [13]. A built-in solver in YALMIP [20] provides such a BnB implementation for MATLAB [21], and MOSEK [1] can be used as the embedded SDP solver.
In practice, the experimenter often requires a (roughly) balanced allocation across the K treatments. If K can be divided by n, we can include balanced constraints
which also potentially accelerates the computation time of solving this problem by reducing the number of feasible solutions. If K can not be divided by n, a set of roughly balanced constraints can be added similarly. For convenience, we set n as a multiple of K and include the balanced constraints in our numerical study. We lastly remark that the original problem in (3.4) is equivalent to this reformulation under the constraint that $\mathcal{Z}=\{\boldsymbol{z}\in {\mathbb{R}^{p}}\mid \| \boldsymbol{z}\| \le 1\}$. However, for a given set of covariates, we can rescale the covariates with the largest possible norm of the covariates in $\mathcal{Z}$ to make sure that this constraint is satisfied.
5 Numerical Study
In this section, we illustrate the numerical performance of the proposed solution approach based on synthetic datasets. Our aim is to demonstrate the quality of the resulting optimal design in terms of the optimality objectives and the prediction performance.
Implementation of the Proposed Solution Approach
We solve the optimization problem (4.1) with the YALMIP-MOSEK implementation in MATLAB. In our implementation, we include the balanced constraint in (4.2). For simplicity, we set the number of users n as a multiple of the number of treatments K. To make the computation tractable, we set the maximum time to solve the optimization problem to 300 seconds. Therefore, the resulting design is not the exact optimal design. The MATLAB code of this optimization model is provided in https://github.com/yezhuoli/opd-vs-rd.git.
Quality of the Solution Approach
We assess the quality of the optimal design given by the proposed solution approach. We propose two quality measures to compare the optimal design with random designs. We generate 1000 random realizations of X that satisfy the constraints in the optimization model (4.1) as well as the balanced constraints in (4.2). We compute the objective values as in (3.5) for the 1000 random realizations of X. Given the objective values ${\lambda ^{1}},\dots ,{\lambda ^{1000}}$ of random designs, we provide two quality measures: the percentile among objectives of random designs and the relative improvement with respect to the median objective of random designs. Let ${\lambda _{opt}}$ be the objective value of the optimal design and ${\lambda _{med}}$ be the median of ${\lambda ^{1}},\dots ,{\lambda ^{1000}}$. The “Percentile” within random designs is defined by
where $I\{\cdot \}$ denotes the indicator function, and the “Relative Improvement (RI)” is defined by
The two quality measures compare the objective given by the proposed solution approach with the population and the median of the objectives given by random designs. The higher values of “Percentile” and “RI” indicate better quality of the optimal design, i.e., the value of adopting the proposed optimization approach. Next, we investigate the performance of the optimal design under two common examples of fixed covariates Z.
(5.1)
\[ \mathrm{P}=\frac{1}{1000}{\sum \limits_{i=1}^{1000}}I\{{\lambda _{opt}}\ge {\lambda ^{i}}\}\cdot 100\% ,\]Figure 2
The quality measure Percentile (given by (5.1)) of different cases in Example I, i.e., Percentile of the optimal objective ${\lambda _{opt}}$ in the population of objectives given by 1000 random designs.
Figure 3
The quality measure RI (given by (5.2)) of different cases in Example I, i.e., Relative Improvement of the optimal objective ${\lambda _{opt}}$ with respect to the median of objectives given by 1000 random designs.
Example I
In this example, we generate the entries of the covariates Z independently from the standard normal distribution. We consider a batch of cases with all the combinations of $n\in \{120,240,360\}$, $p\in \{3,5\}$ and $K\in \{2,3,4\}$. For each combination of n, p and K, we generate a set of covariates ${\boldsymbol{z}_{1}},\dots ,{\boldsymbol{z}_{n}}$, compute the quality measures Percentile and RI, and replicate this process for 100 times. The resulting 100 copies of Percentile and RI are depicted as boxplots in Figures 2 and 3, respectively. The results give the following observations:
These observations indicate that the cases with $K=3$ or 4 or $n=240$ or 360 provide solutions of higher quality than the case with $K=2$ and $n=120$, which is possibly due to that the difficulty in solving the problem caused by the binary constraints is more significant when K and n are small.
-
1. The values of Percentile and RI increase with K. For $K=2$, the proposed solution approach does not always give better design than the random designs.
-
2. The values of Percentile increase with n, whereas the values of RI decrease with n. For $n=120$, the objective values given by the proposed approach may not outperform the random designs that give median or higher quantiles of objectives.
-
3. The values of Percentile and RI increase slightly as we increase p from 3 to 5.
Example II
In this example, we generate the entries of the covariates Z independently from a 0–1 Bernoulli distribution with the proportion of ones equal to r. The aim of this example is to demonstrate the impact of imbalance in covariates to the quality of optimal design. Therefore, we fix $n=200$, $K=4$ and $p=3$, and consider $r\in \{0.1,0.5\}$, where $r=0.5$ gives balanced covariates and 0.1 gives imbalanced covariates. The values of Percentile and RI are depicted as boxplots in Figure 4. The results show that, for balanced covariates, the optimal design given by the proposed solution approach can often reach to the optimal value, but the values of RI are often around 20%. However, for imbalanced covariates, the relative improvement can often reach 50%–200%, but the Percentile may stand as low as 75% for some instances. This example demonstrates that the optimization problem under balanced covariates provides higher-quality solutions than that under imbalanced covariates, whereas, the benefit of optimal design under balanced covariates may not be as significant as that under imbalanced covariates.
After 300 seconds, the solver outputs the optimality gap (0%–100%), with a smaller optimality gap indicating that the output solution is closer to the true optimal solution. For both examples, the average optimality gaps of solving for optimal design is reduced to 10% or lower after 300 seconds. This shows that, for the sizes of the above examples, the proposed approach is appropriate to solve and returns a solution that is close to the exact optimal design. We also point out that for cases with K over 10 and/or n over 400, the proposed solution approach did not return a feasible solution within 300 seconds. For those cases, a heuristic and/or randomized solution approach can be a reasonable choice.
Quality of the Optimal Design for Prediction
Our aim (as noted in (3.3)) is to reduce the predictive variance of each individual, i.e., reduce the value of
for a new individual with covariates $\boldsymbol{z}$ given a design X and covariates $Z={({\boldsymbol{z}_{1}^{\top }},\dots ,{\boldsymbol{z}_{n}^{\top }})^{\top }}$. Following the settings in Example I, we obtain the optimal design ${X_{opt}}$ based on the covariates $Z={({\boldsymbol{z}_{1}^{\top }},\dots ,{\boldsymbol{z}_{n}^{\top }})^{\top }}$. Then we generate 1000 new copies of covariates $\boldsymbol{z}$ each with entries independently drawn from the standard normal distribution. Given a random design ${X_{rand}}$ under the same set of constraints, we compute the difference
for the 1000 new copies of $\boldsymbol{z}$. If the optimal design has better prediction accuracy, the location of this difference for 1000 new copies of $\boldsymbol{z}$ should be significantly above zero. We perform the Wilcox signed-rank test to validate this hypothesis, i.e., ${H_{0}}$: the location of the differences is zero verse ${H_{1}}$: the location of the differences is above zero. For each replication and each setting in Figure 2, we report the p-values of the hypothesis tests in Figure 5 and percentage of rejecting ${H_{0}}$ under significance level $\alpha =0.0001$ in Figure 6. The results shows that the optimal design gives significantly better prediction accuracy by reducing the predictive variance of each possible individuals especially for cases with $K\gt 2$.
(5.3)
\[ g(\boldsymbol{z},X;Z)=\underset{k=1,2,\dots ,K}{\max }{\boldsymbol{z}^{\top }}{\left({\sum \limits_{i=1}^{n}}{\boldsymbol{z}_{i}}{\boldsymbol{z}_{i}^{\top }}{x_{ik}}\right)^{-1}}\boldsymbol{z}\]6 Conclusion
To improve the accuracy of personalized decision, this paper investigates the optimal design to minimize the maximum variance of estimated personalized treatment effects over different treatments and different covariates values. The resulting design objective matches the E-optimal design criterion. To provide the optimal design of multiple treatment allocation in the presence of observed covariates, a SDP solution approach is applied to solve the optimization problem. The proposed solution approach is able to provide optimal designs efficiently for n from 100–300 as shown in our numerical results. We point out potential future directions. First, it is desired to develop more efficient reformulations or optimization algorithms to handle the proposed design objective with larger sizes. Second, in practice, it is useful to investigate online optimal allocation assuming that the users are recruited in an online dynamic fashion under the proposed design objective.