1 Introduction
Two-period multiarm platform trials are defined as trials requiring two or more arms during the first period that have the ability to add a new experimental arm(s) during the second period. This paper was motivated by a recent pediatric osteosarcoma study at St. Jude Children’s Research Hospital (St. Jude). This trial includes two planned periods, before and after the addition of new experimental arms. During the first period, the study has two experimental arms and a common control arm. During the second period, two more experimental arms will be added. One reason for adding two additional arms is that not all potential treatments are available at the same time. Details about drug information are concealed because the study is still in the design stage.
The primary endpoint of this osteosarcoma study is progression-free survival; however, for the sake of simplicity, here we use the continuous endpoint to show our proposed methods. To be specific, we use Dunnett’s multiple correction method [7] to control the family-wise error rate (FWER) in the multiarm setting. We also adopt an optimal control-to-experimental arms allocation ratio rule, the root-K rule [22], to achieve a targeted marginal power while minimizing the overall sample size of the first period. Despite adding two new experimental arms during the second period, the goal of the design is to have the same targeted marginal power and FWER as the trial with two experimental arms and one control without adding new arms. How to achieve this will be introduced in this paper.
This type of two-period multiarm trials has been discussed in Ren et al. [16] and Roig et al. [17]. In the former, the authors discussed a simplified version, with one experimental arm in the first period and a second experimental arm added later. Under this framework, Ren et al. presented statistical considerations, including type-I error control and power, as well as an optimal allocation ratio. In Ren et al., the total sample size is fixed and determined by a conventional three-arm design with equal randomization: each experimental arm has a marginal power of $1-\beta $ to detect a standardized treatment effect Δ and a marginal type-I error controlled at α (one-sided). Ren et al. discussed the optimal allocation and optimal timing of adding the new arm to maximize the disjunctive power of the study. However, marginal (pair-wise) power is often an interesting metric for possible registrational purposes. In Ren et al.’s method, the marginal power for each experimental arm cannot be maintained at its original level $1-\beta $, mainly due to the fixed total sample size.
In Roig et al. [17], the authors assessed the robustness of model-based approaches to adjust for time trends when utilizing non-concurrent controls. The focus of that research is the consequences of incorporating a nonconcurrent control with various time-trend models and assumptions for the different arms. In our method, we use only the concurrent control data (i.e., patients are recruited and allocated to the control group after a new arm is added) and do not discuss how to use nonconcurrent control data (i.e., patients are allocated in the control arm before the new arm is open). Specifically, data from patients allocated to a new experimental arm are only compared to that from patients randomized to the control arm contemporaneously. For the non-concurrent control discussion, a good resource is the EU-PEARL webinar, “Non-concurrent controls in platform trials” where different multi-stakeholder perspectives on challenges and opportunities for the use of nonconcurrent control data are discussed. In a more general setting of how to leverage information from external or non-concurrent sources to potentially gain power and precision or reduce the sample size, especially based on Bayesian models, people can refer to Normington et al. [14]. It is worth noting that a recent paper [4] describes how to apply the estimands concept when adding new arms.
The two-period multiarm trial is a special case of the platform trial in which a new arm(s) is added only once after the start of a trial. The general platform design is defined as a multiarm multistage (MAMS) trial that adds and removes experimental arms during the trial course. For the general platform design, unlimited times of “adding” are allowed; therefore, a platform trial is also called a perpetual or non-ending trial. For the general platform design, there is a rich body of literature [20, 9, 5, 15, 3].
The rest of the paper is organized as follows. In Section 2, after defining the notations, we introduce the two-period K+M-experimental arm platform design methods. Specifically, we use a 2+2-experimental arm trial to illustrate the design’s components and the developed method in detail. A design to control the pairwise type I error rate (PWER) is also introduced. In Section 3, we briefly showcase how to use the R package PlatformDesign to design our motivating pediatric osteosarcoma study and other examples. Comprehensive numerical evaluations are presented in Section 4. Section 5 concludes the paper with a discussion.
2 Methods
We discuss a general format of the two-period K+M-experimental arm platform design. The first period includes K experimental arms, and during the second period, M experimental arms will be added. K and M can be equal to 1. The second period includes two parts: an overlapping part and a non-overlapping part (see below for details). One common control arm is shared throughout the two periods.
We first introduce a K-experimental arm trial design upon which the K+M-experimental arm trial is based. The K-experimental arm trial we refer to in this paper is equivalent to a traditional K+1-arm trial, which has K experimental arms and one control arm. The K+M-experimental arm trial is based on the K-experimental arm trial, in the sense that we will retain the same FWER (or PWER) and marginal power for the K+M-experimental arm trial as in the K-experimental arm trial, despite adding M new experimental arms during the second period of the K+M-experimental arm trial. Then we describe the proposed methods for the K+M-experimental arm trial, detailing how to add a new experimental arm(s) during the second period of the trial and determine the critical value and allocation ratios.
2.1 Design Components for the K-Experimental Arm Trial
In the K-experimental arm trial, we test K experimental arms against a control arm. We define ${X_{ki}}$ as the treatment response of the i-th patient on arm k ($k=0,1,\dots ,K$, where $k=0$ represents the control arm). We then assume that ${X_{ki}}\sim \text{N}({\mu _{k}},{\sigma _{k}^{2}})$ and the family of K null hypotheses to be tested is
\[ {H_{01}}:{\delta _{1}}={\mu _{1}}-{\mu _{0}}\le 0,\dots ,{H_{0K}}:{\delta _{K}}={\mu _{K}}-{\mu _{0}}\le 0.\]
We use ${\delta _{k}}$ to denote the effect size for each experimental arm k, $k\in \{1,\dots ,K\}$. For simplification, we assume ${\sigma _{0}}={\sigma _{1}}=\cdots ={\sigma _{K}}=\sigma $, where σ is the common standard deviation. We also denote the standardized effect size using ${\Delta _{k}}={\delta _{k}}/\sigma $.To test the hypothesis ${H_{0k}},k\in \{1,2,\dots ,K\}$ for experimental arm k versus control, we assume that a standardized test statistic, ${Z_{k}}$, is computed as
Under ${H_{0k}}$, the distribution of the Z-test statistics is standard normal, $N(0,1)$. A marginal (or pair-wise) type-I error rate of level α can be computed as $1-\Phi ({z_{1-\alpha }})$, where $\Phi (\cdot )$ is the standard normal probability distribution function.
If we assume that σ is unknown, then we would use the T-test statistic, ${T_{k}}=\frac{{\bar{X}_{k}}-{\bar{X}_{0}}}{s\sqrt{1/n+1/{n_{0}}}}$, $k\in \{1,\dots ,K\}$. Here, $\bar{X}$ is the sample mean, and s is the sample standard deviation. The design parameters n and ${n_{0}}$ are the numbers of patients enrolled in each experimental arm and the control arm (assuming an equal number of patients is recruited for each experimental arm). Under the null, ${T_{k}}\sim {t_{1,v}}$ with $v=n+{n_{0}}-2$. We can then compute the marginal type-I error rate using the T distribution. In this paper, we will use the Z-test statistic to introduce the methods.
2.1.1 Error Rate
For a set (or family) of hypotheses, a type-I error is defined as rejecting any true null hypothesis. In this paper, we use the Dunnett’s correction [7] to control the FWER in the strong sense, which means that the probability of rejecting any true null hypothesis is controlled at a pre-specified level for any possible values of $({\delta _{1}},\dots ,{\delta _{K}})$. The situation in which PWER (instead of FWER) is controlled is discussed in Section 2.4. The guidance on multiplicity issues in clinical trials from the regulatory bodies (FDA 2017 and EMA 2012) states that controlling the family-wise type-I error in the strong sense is required for confirmatory trials.
To be explicit, by setting a global null hypothesis ${H_{0}^{G}}$,
Magirr et al. [13] showed that the FWER is maximized under ${H_{0}^{G}}$. Then, the FWER is defined as follows:
Dunnett [7] provided an analytical formula to estimate the FWER when all the comparisons start and conclude at the same time. The FWER can be calculated using
where ${\Phi _{K}}(\cdot ;{\Sigma _{1}})$ with ${\Sigma _{1}}=[{\rho _{k{k^{\prime }}}}]$ is the standard K-variate normal probability distribution function, and ${\Sigma _{1}}$ is a K-by-K correlation matrix, with ${\rho _{k{k^{\prime }}}}$ denoting the correlation between ${Z_{k}}$ and ${Z_{{k^{\prime }}}}$ at the final analysis. The ${z_{1-{\alpha _{1}}}}$ is the critical value to control the FWER${_{D}}$ during the K-experimental arm trial; the subscript D refers to Dunnett’s method.
(2.2)
\[ {\text{FWER}_{D}}=1-{\Phi _{K}}({z_{1-{\alpha _{1}}}},\dots ,{z_{1-{\alpha _{1}}}};{\Sigma _{1}}),\]2.1.2 Power
Sample sizes can be computed to control several types of power at specified levels. There are multiple definitions of power, depending on the objective of the trial in multiarm settings.
We use $\omega =1-\beta $ to denote the marginal power (pair-wise) for a given experimental arm against the control.
The alternative hypothesis for each of the comparisons to be tested is
In this paper, we focus on the global alternative hypothesis, ${H_{1}^{G}}$, which is given by
where ${\delta ^{\ast }}$ is the common effect size. Because we assume equal standard deviation (denoted as σ) for each experimental arm, this is equivalent to ${H_{1}^{G}}:{\Delta _{1}}=\cdots ={\Delta _{k}}=\cdots ={\Delta _{K}}=\Delta (\gt 0)$, where Δ is the common standardized effect size. Then, we can define the following power based on ${H_{1}^{G}}$.
(2.3)
\[ {H_{1}^{G}}:{\delta _{1}}=\cdots ={\delta _{k}}=\cdots ={\delta _{K}}={\delta ^{\ast }}(\gt 0),\]
Disjunctive (any-pair) power (${\Omega _{dis}}$) is the probability of showing a statistically significant effect under the targeted effects for at least one comparison
Of note, another popular alternative hypothesis is the least favorable configuration for experimental $k\in \{1,\dots ,K\}$, which is given by ${H_{1}^{{\text{LFC}_{k}}}}:{\delta _{k}}={\delta ^{\ast }}$, ${\delta _{1}}=\cdots ={\delta _{K}}=0$. We will not explore this hypothesis in this paper.
Conjunctive (all-pairs) (${\Omega _{c}}$) power is the probability of showing a statistically significant effect under the targeted effects for all comparisons. The conjunctive power is computed as
This power is optimistic, and we will not use it in the paper.
2.1.3 Optimal Allocation Ratio
In a traditional two-arm randomized clinical trial in which the endpoint measured for both the control and experimental treatments has the same variance, the optimal allocation ratio between the two arms is 1:1, which maximizes the power. However, when there are multiple experimental arms compared to a control arm, the optimal allocation is no longer 1:1. If early stopping was implemented for an experimental arm, then the optimal allocation would be approximately $\sqrt{K}$ patients (root-K rule) allocated to the control group for every patient allocated to a given experimental treatment [21]. Thus, as the number of experimental arms increases, the optimal allocation ratio also increases. The above result applies to the one-stage K-experimental arm design.
Based on the root-K rule, we have the same allocation ratio (${A_{1}}=\sqrt{K}$) across all experimental arms in K-experimental arm trial, thus, ${n_{{0_{1}}}}={A_{1}}\times {n_{1}}$, ${A_{1}}\in (0,\infty )$. Here, ${A_{1}}$ is the allocation ratio for the control arm relative to the experimental arm. The design parameters ${n_{1}}$ and ${n_{{0_{1}}}}$ are the sample sizes of each of the K experimental arms and the control arm, respectively, during the K-experimental arm trial. In the first period of the K+M-experimental arm trial, the same allocation ratio, ${A_{1}}$, is kept. ${A_{1}}={n_{{0_{t}}}}/{n_{t}}$, as ${n_{t}}$ and ${n_{{0_{t}}}}$ are the sample size of each of the K experimental arms and the control arm, respectively, during the first period of the K+M-experimental arm trial, before the M new arms are added. (Figure 1.) Additionally, The correlation ${\rho _{k{k^{\prime }}}}$ between ${Z_{k}}$ and ${Z_{{k^{\prime }}}}$ is ${A_{1}}/({A_{1}}+1)$ (For details, see Step 2 of Appendix A.2). If there is an equal allocation to the control and experimental arms, then, ${n_{{0_{1}}}}={n_{1}}$ and ${\rho _{k{k^{\prime }}}}=0.5$.
2.1.4 Design Summary for the K-Experimental Arm Trial
In the K-experimental arm trial, there are K experimental arms and one common control arm. To control the FWER (e.g., at 0.025), equation (2.2) is used to derive the critical value ${z_{1-{\alpha _{1}}}}$. Given the global alternative hypothesis ${H_{1}^{G}}$ defined in formula (2.3), the required sample sizes for the control and each experimental arm are derived with the allocation ratio based on the root-K rule to obtain a desirable marginal power ${\omega _{1}}$ (See Step 4 in Appendix A.2). The corresponding disjunctive power, ${\Omega _{1}}$, defined in equation (2.4), is calculated based on ${\omega _{1}}$, as described in Step 5 in Appendix A.2.
2.2 Design Components for the K+M-Experimental Arm Trial
At the end of the first period with K experimental arms, M experimental arms are allowed to be added, and the study enters the second period. The second period of the K+M-experimental arm trial has two parts. The first part is an overlapping duration in which K initial experimental arms and M new experimental arms overlap, and the second part is a non-overlapping duration in which only the M new experimental arms are open. Both parts share a common control arm. We use a 2+2-experimental arm trial (depicted in Figure 1) to introduce the notations used in the K+M-experimental arm trial. As its name suggests, the 2+2-experimental arm trial includes a first period in which there are two experimental arms, and a second period in which two new experimental arms are added. This is the setting of the St. Jude pediatric osteosarcoma trial.
Figure 1
Schema of a two-period 2+2-experimental arm platform trial. The left part of the figure shows a traditional three-arm trial. In the context of this paper, we refer it as a two-experimental arm trial, as it has two experimental arms and one common control. The right part of this figure depicts the 2+2-experimental arm trial. During the first period, this trial has two experimental arms, Trt 1 and Trt 2 (light blue segments), and a control arm (dark blue segment). The vertical solid line separates the first and second periods of the trial and indicates the opening of two new experimental arms, Trt 3 and Trt 4. The dashed vertical line separates two parts of the second period and indicates the closing of Trt 1 and Trt 2. During the first part of the second period, the control arm (dark purple segment) is shared among the four experimental arms (light purple segments). During the second part of the second period, the control (dark green), Trt 3, and Trt 4 (light green) continue to accrue patients until reaching the planned sample sizes. The ${n_{t}}$ and ${n_{{0_{t}}}}$ (blue brackets) indicate the numbers of patients enrolled in Trt 1, Trt 2, and the control, respectively, when Trt 3 and Trt 4 are added. The ${n_{1}}$ and ${n_{{0_{1}}}}$ (orange brackets) indicate the sample sizes for each of the two experimental arms and the control, respectively, in the 2-experimental arm trial. The ${n_{2}}$ and ${n_{{0_{2}}}}$ (green brackets) indicate sample sizes for each of four experimental arms and the concurrent control. ${A_{1}}$ denotes the allocation ratio (control to experimental arm) during the first period. ${A_{2}}$ denotes the allocation ratio during the first part of the second period, when all four experiments arms are open. ${A_{3}}$ denotes the allocation ratio during the second part of the second period.
In Figure 1, ‘Control’ denotes the common control arm, ‘Trt1’ and ‘Trt2’ denote the two initial experimental arms opened during the first period, and ‘Trt3’ and ‘Trt4’ refer to the two experimental arms opened during the second period of the 2+2-experimental arm trial (the right side of Figure 1). ${A_{1}}$ is the randomization ratio of the control to Trt1 or Trt2 (determined by the root-K rule), and ${n_{t}}$ is the “information time” when Trt3 and Trt4 are added. Specifically, the two arms are added when ${n_{t}}$ patients have been enrolled in each of Trt1 and Trt2. Equivalently, the “information time” can be defined as ${n_{{0_{t}}}}=[{A_{1}}{n_{t}}]$, the number of patients have been enrolled in the control arm when adding new arms, where $[\cdot ]$ means rounding up to the nearest integer. The information time (${n_{t}}$ and ${n_{{0_{t}}}}$) should follow the two constraints: ${n_{t}}\le {n_{1}}$ and ${n_{{0_{t}}}}\le {n_{{0_{1}}}}$.
The allocation ratio changes to ${A_{2}}$ once the Trt 3 and 4 are added. During the overlapping stage, there are ${n_{2}}-{n_{t}}$ patients enrolled for each of the experimental arms and ${n_{{0_{2}}}}-{n_{{0_{t}}}}$ patients enrolled for the control. Therefore, ${A_{2}}=({n_{{0_{2}}}}-{n_{{0_{t}}}})/({n_{2}}-{n_{t}})$. Determination of ${n_{2}}$ and ${n_{{0_{2}}}}$ will be introduced in the next section. After the overlapping stage (i.e., after Trt1 and Trt2 are stopped), Trt3 and Trt4 will continue to enroll until each has reached the required sample size of ${n_{2}}$. Therefore, both of these arms, need to enroll an additional ${n_{t}}$ patients to “catch up” with Trt1 and Trt2 during the second part of the second period. In the same vein, the control arm will enroll an additional ${n_{{0_{t}}}}$ patients to ensure the same number of concurrent controls across experimental arms. Therefore, the allocation ratio ${A_{3}}$ is equal to ${A_{1}}$ after the completion of Trt1 and Trt2. Additionally, we denote the overall allocation ratio as $A={n_{{0_{2}}}}/{n_{2}}$.
Because the 2+2-experimental arm trial has four overlapping experimental arms and therefore four test statistics, we can not use the critical value ${z_{1-{\alpha _{1}}}}$ from the 2-experimental arm trial for the 2+2-experimental arm trial. For example, if we want to control the FWER, the critical value of the K+M-experimental arm trial, ${z_{1-{\alpha _{2}}}}$, should be computed based on the correlation matrix of $K+M$ test statistics using formula (2.2) with K replaced by $K+M$.
2.3 Determination of the Optimal Allocation Ratio ${A_{2}}$ for the Overlapping Duration in a Two-Period K+M-Arm Trial
We need to first determine the critical value ${z_{1-{\alpha _{2}}}}$ before we determine the optimal allocation ratio ${A_{2}}$. To calculate the ${z_{1-{\alpha _{2}}}}$, we need to determine the correlation matrix ${\Sigma _{2}}=[{\rho _{kk\prime }}]$ of the K+M-experimental arm trial. This can be derived as follows (see Appendix A.1 for the derivation):
\[ {\rho _{k,{k^{{^{\prime }}}}}}=\frac{{n_{{0_{k{k^{{^{\prime }}}}}}}}}{\frac{{n_{{0_{2}}}^{2}}}{{n_{2}}}+{n_{{0_{2}}}}}.\]
Here, ${n_{{0_{k{k^{\prime }}}}}}$ is the number of shared controls between experimental arms k and ${k^{\prime }}$. By Figure 1, if arms k and ${k^{\prime }}$ started at the same time, then ${n_{{0_{k{k^{\prime }}}}}}={n_{{0_{2}}}}$. Otherwise, ${n_{{0_{k{k^{\prime }}}}}}={n_{{0_{2}}}}-{n_{0t}}$.Once we have the ${\Sigma _{2}}$, we can use the following equation to find the updated critical value ${z_{1-{\alpha _{2}}}}$.
(2.6)
\[\begin{aligned}{}& FWER=1-{\int _{-\infty }^{{z_{1-{\alpha _{2}}}}}}{\int _{-\infty }^{{z_{1-{\alpha _{2}}}}}}...{\int _{-\infty }^{{z_{1-{\alpha _{2}}}}}}{\pi _{Z}}(Z({z_{1}},\dots ,\\ {} & \hspace{1em}\hspace{1em}{z_{K}},{z_{K+1}},\dots {z_{K+M}}),0,{\Sigma _{2}})d{z_{1}}d{z_{2}}...d{z_{K+M}}\end{aligned}\]Then we can use ${z_{1-{\alpha _{2}}}}$ to calculate the marginal power, ${\omega _{2}}$, and the disjunctive power, ${\Omega _{2}}$, of the K+M-experimental arm trial. (See more details at the end of Section 2.3.1.)
The goal of a two-period K+M-experimental arm platform design is to determine the minimum total sample size (denoted as ${N_{2}}$) that can have the marginal power ${\omega _{2}}$ and disjunctive power ${\Omega _{2}}$ that are no less than their counterparts, ${\omega _{1}}$ and ${\Omega _{1}}$, in the K-experimental arm, while controlling for FWER.
2.3.1 Admissible Set for Finding the Optimal Design(s)
It is easy to see ${z_{1-{\alpha _{2}}}}$ can not be derived without ${n_{2}}$ and ${n_{{0_{2}}}}$, as they are needed for computing the correlation matrix ${\Sigma _{2}}$. We define an admissible set for pairs of $({n_{2}},{n_{{0_{2}}}})$ based on the following three constraints. The first two constraints are related to ${A_{2}}$, the allocation ratio after adding the new arms.
Here we have
where ${n_{t}}$ and ${n_{{0_{t}}}}$ are the numbers of enrolled patients in each of the experimental arms and the control arm, at the time of adding the two new arms. The value of ${A_{2}}$ needs to be a non-infinite positive number. In our 2+2 example, if ${n_{t}}=30$, then ${n_{{0_{t}}}}=[{A_{1}}\ast {n_{t}}]=43$ (See Step 8 in Appendix A.2 for details).
Therefore, the first two constraints are
and
We also need to set an upper limit for the total sample size of the K+M-experimental arm trial, ${N_{2}}$. A reasonable upper limit implies that ${N_{2}}$ should not exceed the required sample sizes (denoted as S) for conducting two separate multiarm trials, i.e., a K-experimental arm trial and an M-experimental arm trial.
Based on formulae (A.2) and (A.3),
\[\begin{aligned}{}S=& \frac{{({z_{1-{\alpha _{1}}}}+{z_{1-{\beta _{1}}}})^{2}}}{{\Delta ^{2}}}(1+2\sqrt{K}+K)\\ {} & +\frac{{({z_{1-{\alpha _{1}^{\ast }}}}+{z_{1-{\beta _{1}}}})^{2}}}{{\Delta ^{2}}}(1+2\sqrt{M}+M),\end{aligned}\]
where ${z_{1-{\alpha _{1}}}}$ and ${z_{1-{\alpha _{1}^{\ast }}}}$ are the critical values for the K- and M-experimental arm trials, respectively. Therefore, the third constraint for $({n_{2}},{n_{{0_{2}}}})$ is
In our “2+2” example, ${z_{1-{\alpha _{1}}}}={z_{1-{\alpha _{1}^{\ast }}}}$ as $K=M=2$, and $S=2{N_{1}}=690$. (See Step 4 in Appendix A.2 for derivation of ${N_{1}}$.)
Therefore, the third constraint for the “2+2” example is
Under the above three constraints, the admissible set of $({n_{2}},{n_{{0_{2}}}})$ can be identified. Given ${n_{t}}=30$ and ${n_{{0_{t}}}}=43$, we can obtain the feasible region (shaded area in Figure 2). Specifically, all integer pairs $({n_{2}},{n_{{0_{2}}}})$ in this region are potential design candidates.
Figure 2
Admissible set of $({n_{2}},{n_{{0_{2}}}})$ (shaded triangular region), when ${n_{t}}=30$ and ${n_{{0_{t}}}}=43$ in a two-period 2+2-experimental arm platform trial.
Once we have the feasible region for the pairs of $({n_{2}},{n_{{0_{2}}}})$, then we can compute the correlation matrix ${\Sigma _{2}}$ of the $K+M$ test statistics $({Z_{1}},\dots ,{Z_{K}},{Z_{K+1}},\dots ,{Z_{K+M}})$. Derivation of ${\Sigma _{2}}$ in the “2+2” setting is presented in Step 9 of Appendix A.2. With the correlation matrix ${\Sigma _{2}}$, we can use equation (2.6) to find the marginal type I error ${\alpha _{2}}$ for a specific pair $({n_{2}},{n_{{0_{2}}}})$.
With ${n_{1}}$, ${n_{{0_{1}}}}$, ${\alpha _{1}}$, ${\beta _{1}}$, and ${\alpha _{2}}$, we can use the following equation (2.7) to calculate the marginal power ${\omega _{2}}=1-{\beta _{2}}$ for each pair $({n_{2}},{n_{{0_{2}}}})$ from ${z_{1-{\beta _{2}}}}$.
Next, we can derive the disjunctive power ${\Omega _{2}}$ for each pair of $({n_{2}},{n_{{0_{2}}}})$ by plugging ${\beta _{2}}$ and ${\Sigma _{2}}$ into the following equation.
Based on the above procedure, we can compute the associated ${\omega _{2}}$ and ${\Omega _{2}}$ for all admissible pairs in the feasible region. In the 2+2-experimental arm example, the total number of $({n_{2}},{n_{{0_{2}}}})$ pairs in the admissible set is 29,040, and we can compute ${\omega _{2}}$ and ${\Omega _{2}}$ for each of the pairs. Then, we can perform a two-step selection procedure to determine the “optimal” design(s):
-
1. We keep only the designs in which ${\omega _{2}}\ge {\omega _{1}}$ and ${\Omega _{2}}\ge {\Omega _{1}}$, respectively (lower limits are decided in Step 7 of Appendix A.2).
-
2. Among the designs selected, we recommend the one(s) with the smallest sample size (${N_{2}}$) as the “optimal” design(s).
We demonstrate how to design a two-period 2+2-experimental arm platform design by using the R package PlatformDesign in Appendix A.2.
2.4 An Optimal K+M-Experimental Arm Design that Controls the PWER
We have described the method for designing a K+M-experimental arm trial to control the FWER, i.e., to control the multiplicity when many interventions are evaluated simultaneously against a common control. However, depending on the reason different treatments are included in the same platform trials, they may not be considered “a family” simply because they are included in the same trial. For master protocols like platform trials, if different experimental arms are included solely for operational efficiency (e.g., reducing the sample size of the control arm by using a shared control to save resources expended during recruitment), we would not necessarily need to perform multiplicity adjustment to control the FWER. Therefore, in this section we introduce an alternative version of the optimal K+M-experimental arm design controlling for the pair-wise type I error rate (PWER). The PWER is the probability of incorrectly rejecting the null hypothesis for the primary outcome in a particular experimental arm, regardless of outcomes in the other experimental arms. In this case, the critical value ${z_{\alpha }}$ can be derived directly from the equation below:
where α is a prespecified pair-wise type-I error for each comparison in the trial, which will not be changed due to adding new arms. That is, ${\alpha _{1}}={\alpha _{2}}=\alpha $. Therefore, the main difference between the K+M-experimental arm trial designs controlling for FWER and PWER is that the latter does not use the Dunnett method to derive critical values. Instead, the design derives it directly from Equation 2.8. Notably, the upper limit S for the total sample size ${N_{2}}$ when controlling for PWER, is constructed using the total sample sizes from the multiarm trials which also control the PWER. Other procedures are similar between the two versions of the design. For more details, see Appendix A.3 for an example to design a “2+2” trial controlling the PWER.
3 Software Example
We developed an R package, PlatformDesign for implementing the proposed two-period multiarm platform methods. In this section, we demonstrate our package with three examples of 1) a “2+2” trial with ${n_{t}}=30$, 2) a “2+2” trial with ${n_{t}}=50$ and 3) a “1+3” trial with ${n_{t}}=30$.
Example 1 If no arms are to be added during the course of a study, we can use the functions one_stage_multiarm(·) to compute the sample sizes for the experimental and control arms. For instance, in a study planned to have only two experimental arms and one common control, given a FWER of 0.025 and marginal power of 80% and assuming the expected standardized effective size of 0.4, by using the following code, we found that the sample size for the control is 143, and that for each of the experimental arms is 101. Thus, the planned total sample size in the first period is 345.
However for our pediatric osteosarcoma study, the plan is to add two new experimental arms during the trial. Assuming that the new arms will be added when 30 patients have been enrolled in each experimental arm, and that the study controls the FWER at 0.025 and achieves marginal power at 80%, we can use the following function platform_design($\cdots \hspace{0.1667em}$) to find the optimal design(s).
Figure 3
An example of adding two experimental arms to a two-experimental arm trial comparing Trt 1 and Trt2 to control. Key design parameters are shown. The vertical solid line represents when the new experimental arms (Trt3 and Trt 4) are added to the trial. The dashed vertical line represents when the Trt 1 and Trt 2 arms close to accrual. The blue brackets represent the “information time”, when the two new experimental arms are added. They indicate the number of patients enrolled in each of the two initial experimental arms and in the control at the time. The green brackets represent the sample sizes required per experimental arm and the corresponding control in the 2+2-experimental arm trial.The orange brackets indicate the sample sizes for each of the two experimental arms and the control for the 2-experimental arm trial without adding a new arm(s). The optimal allocation ratios (${A_{1}}$, ${A_{2}}$, and ${A_{3}}$) for each period are shown at the bottom of the figure.
The first part of the outputs ($design_Karm) contains the parameters for the K-experimental arm trial. The second part ($designs) contains the parameters for the K+M-experimental arm trial designed based on the former. From above, four designs are recommended ($designs), all of which meet the requirements, in terms of controlling the FWER and obtaining power levels equal to or greater than that of the K-experimental arm trial. If we choose design #16632 (the last row), then the sample sizes for each experimental arm and its corresponding concurrent control in the 2+2-experimental arm trial are 104 and 210, respectively. The sample size for the entire control arm (including non-concurrent controls) is 253. Using this design$,{A_{2}}$, the allocation ratio in the first part of the second period is 2.26, control-to-experimental arm. Other parameters of this design are shown in Figure 3.
Once we decide ${n_{2}}=104$ and ${n_{{0_{2}}}}=210$, the sample sizes for each of the experimental arms (Trt 1 to Trt 4) and the control arm in the first part of the second period are $104-30=74$ and $210-43=167$, respectively. Accordingly, we can determine the sample sizes of Trt 3, Trt 4 and the control arm for the second part of the second period to be $104-74=30$ and $210-167=43$, respectively. The optimal allocation for the second part of the second period ${A_{3}}$ can be computed accordingly as ${A_{3}}=43/30=1.41$.
Example 2 With the constraints on sample sizes described in Section 2.3.1, the optimal design(s) may not exist when ${n_{t}}$, i.e., the timing of adding new arm(s) is relatively “late”. For instance, in the above “2+2” example, if ${n_{t}}=50$, no optimal design is identified if a marginal power of $80\% $ needs to be maintained, as shown in the following code.
The platform_design(·) function returns criteria indicators (i.e., flag.dp, flag.mp, and flag.dpmp) to show if any optimal design exists, given FWER, marginal power, the timing of adding a new arm(s), and the number of experimental arms for each period in a K+M-experimental arm trial. If $flag.dpmp=0$, the optimal design can maintain both marginal and disjunctive power levels no less than those in the K-experimenal arm trial. Otherwise, the algorithm will check if we can find a design(s) that maintains either the marginal or disjunctive power. When ${n_{t}}=50$, $flag.dpmp=1$, and $flag.dp=0$, it indicates that we can only find designs that keep the disjunctive power no less than its counterpart in the K-experimenal arm trial. However, the marginal power in the designs found is less than $80\% $. The accompanying warning message conveys the same information.
Example 3 The PlatformDesign package can be used to design any K+M-experimental arm platform trial with K and M as positive integers. Here, we show a hypothetical example for designing a 1+3-experimental arm platform trial using this R package.
Based on the output above, one design with ${N_{2}}=654$ is recommended. In addition, we can see that all the criteria indicators are equal to zero, implying that the criteria for both marginal and disjunctive power levels have been met.
More explanations of the above results and step-by-step instructions for using this package can be found in Appendix A.2. Details of how to use Platform_Design(.) and other functions can also be found in documents and vignettes of the R package PlatformDesign.
4 Numerical Evaluations
Unless otherwise specified, all numerical evaluations are conducted in the setting of a two-period 2+2-experimental arm trial (except for Figure 14 in Section 4.4), which controls the FWER at 0.025 and achieving a marginal power of 0.8, given the standardized effect size Δ is 0.4. Therefore, the disjunctive power exceeds 0.922. In this section, we will examine the relations among various design parameters in the 2+2-experimental arm trial.
4.1 Correlations
We explored the relations between the correlations of Z-test statistics and the disjunctive power ${\Omega _{2}}$ in the 2+2-experimental arm trial. Specifically, during the second period, two types of correlations occur. We denote ${\rho _{1}}$ as the correlation between any pair of experimental arms that start at the same time, and ${\rho _{2}}$ as the correlation between any pair of experimental arms that start at different times.
In the “2+2” example, the change in disjunctive power ${\Omega _{2}}$ is driven by ${\rho _{2}}$ (Figure 5) instead of ${\rho _{1}}$ (Figure 4). The disjunctive power (${\Omega _{2}}$) decreases as ${\rho _{2}}$ increases (Figure 5). Given a specified marginal power, an optimal design(s) may not exist if the timing of adding new arms is relatively late (i.e., the value of ${n_{t}}$ is large). Therefore, in Figures 4 and 5, for ${n_{t}}=50,60,70$, or 80, the lower limit of marginal power ${\Omega _{2}}$ is 75%. We must choose a lower limit for ${\omega _{2}}$ when ${n_{t}}\gt 40$ to ensure that ${N_{2}}\lt S$. More detailed reasoning for this is provided in Appendix A.2.
Figure 4
Relations between the correlation ${\rho _{1}}$ and disjunctive power ${\Omega _{2}}$ in a two-period 2+2 platform trial setting, given the FWER of 0.025, a disjunctive power level represented by ${\Omega _{2}}\ge $ 0.922, the marginal power represented ${\omega _{2}}\ge $ 0.8 (for ${n_{t}}$ = 10, 20, 30, or 40; blue dots) or 0.75 (for ${n_{t}}$ = 50, 60, 70, or 80; red dots), and the standardized effect size Δ = 0.4. The value associated with each dot is the corresponding ${n_{t}}$ value.
Figure 5
Relations between the correlation ${\rho _{2}}$ and disjunctive power ${\Omega _{2}}$ in a two-period 2+2 platform trial setting, given the FWER of 0.025, disjunctive power ${\Omega _{2}}\ge $ 0.922, the marginal power ${\omega _{2}}\ge $ 0.8 (for ${n_{t}}$ = 10, 20, 30, or 40; blue dots) or 0.75 (for ${n_{t}}$ = 50, 60, 70, or 80; red dots), and the standardized effect size Δ = 0.4. The value associated with each dot is the corresponding ${n_{t}}$ value.
4.2 Influence of the Timing of Adding New Arms
To design a platform trial, we must know how the timing of adding a new arm(s) affects the design’s properties. Here we examine the relations between the timing of adding new arms (i.e., “information time”$,{n_{t}}$) and various design parameters in the K+M-experimental arm trial (e.g., the total required sample size ${N_{2}}$, the disjunctive power ${\Omega _{2}}$, and the marginal type-I error rate ${\alpha _{2}}$) using a “2+2” example.
As shown in Figure 6, the total sample size ${N_{2}}$ increases with increased information time ${n_{t}}$. Thus, the earlier the timing of adding new arms, the more patients can be saved by conducting a 2+2-experimenatal arm trial compared to two separate 2-experimental arm trials (shown as a red line in Figure 6). For instance, if two experimental arms are added to the trial when ${n_{t}}=30$, the total required sample size is 669. This means that 21 fewer patients are needed, keeping the $FWER$ at 0.025 and marginal power of 80%, and assuming a standardized effect size of 0.4.
Figure 6
The timing of adding new arms (${n_{t}}$) affects the total sample size ${N_{2}}$, given the $FWER$ of 0.025, disjunctive power ${\Omega _{2}}\ge $ 0.922, and the marginal power ${\omega _{2}}\ge $ 0.8 (for ${n_{t}}$ = 10, 20, 30, or 40; blue dots) or ≥0.75 (for ${n_{t}}$ = 50, 60, 70, or 80; red dots), and the standardized effect size Δ = 0.4. The value associated with each dot is the corresponding ${n_{t}}$ value. The red line indicates the total sample size needed for conducting two separate 2-experimental arm trials. The values associated with each dot is the corresponding ${N_{2}}$ value.
Figure 7 suggests that the disjunctive power ${\Omega _{2}}$ also increases with the delay of adding new arms in the “2+2” scenario. This is expected, as the delay decreases the correlations between any pair of arms starting at different times (${\rho _{2}}$), caused by a briefer overlapping period. Therefore, experimental arms become more independent, which increases ${\Omega _{2}}$.
Figure 7
The timing of adding new arms (${n_{t}}$) affects the disjunctive power (${\Omega _{2}}$). Given $FWER=0.025$, the marginal power ${\omega _{2}}\ge $ 0.8 (for nt = 10, 20, 30, or 40; blue dots) or ≥0.75 (for nt = 50, 60, 70, or 80; red dots), and the standardized effect size Δ = 0.4.
We also examined the relations between the marginal type-I error rate (${\alpha _{2}}$) and the timing of adding new arms (${n_{t}}$) in the “2+2” example. The marginal type-I error rate ${\alpha _{2}}$ decreases when ${n_{t}}$ increases (Figure 8), though this change is negligible (range of ${\alpha _{2}}$, 0.00650 to 0.00665). This finding indicates that the timing of adding new experimental arms to an existing platform protocol has a minimal impact on the marginal type-I error rate.
4.3 Overlapping Parameter
We define an overlapping parameter as $\frac{{n_{2}}-{n_{t}}}{{n_{2}}}$, which represents the percentage of patients in an experimental arm who are enrolled during the overlapping stage. We explored the relations between the overlapping parameter and various design parameters, including the disjunctive power ${\Omega _{2}}$ and the marginal type-I error rate ${\alpha _{2}}$ in the “2+2” scenario.
As illustrated in Figure 9, the disjunctive power ${\Omega _{2}}$ decreases as the overlapping parameter increases. This is the opposite of what we observed for the relation between ${n_{t}}$ and ${\Omega _{2}}$.
Unlike the relation with ${\Omega _{2}}$, the marginal type I error ${\alpha _{2}}$ increases with the increase in the overlapping parameter (Figure 10).
We also explore the relation between the overlapping parameter and test statistics correlations for two types of correlations. In Figure 11, there is no obvious trend between the overlapping parameter and ${\rho _{1}}$, but in Figure 12, there is a positive trend between the overlapping parameter and ${\rho _{2}}$.
4.4 Optimal Overall Allocation Ratio A
The overall allocation ratio (defined as $A={n_{{0_{2}}}}/{n_{2}}$) stays very close to the value of $\sqrt{4}=2$ with various ${n_{t}}$ (Figure 13). The value of A ranges from 1.95 to slightly less than 2.10.
Figure 13
The relation between the overall allocation ratio $A={n_{{0_{2}}}}/{n_{2}}$ and the timing of adding new arms (${n_{t}}$) in a two-period 2+2-experimental arm platform trial. The red dashed line represents the optimal allocation ratio used in the first period, based on the root-K method.
Given the timing of adding new arms at ${n_{t}}=30$ and choosing only the optimal design with the largest disjunctive power, we explored how the overall allocation ratio A changes with varied $K=1,2,3,4$, or 5 and $M=1,2,3,4$, or 5. From Figure 14, given the same M, the overall allocation ratio A increases if K increases. Given the same K, A increases as M increases.
Figure 14
Relations between the overall allocation ratio $A={n_{{0_{2}}}}/{n_{2}}$ and the numbers of experimental arms initially opened ($K=1,2,3,4$, or 5) and added later ($M=1,2,3,4$, or 5), given ${n_{t}}=30$, $FWER=0.025$, disjunctive power ${\Omega _{2}}\ge $ 0.922, the marginal power ${\omega _{2}}\ge $ 0.8, and the standardized effect size Δ = 0.4. The results are shown for each combination of K and M, where the optimal design with the greatest ${\Omega _{2}}$ is presented. In the “5+1” scenario, the optimal design does not exist due to sample size constraints and the prespecified goal for the marginal power to be at least 80%.
5 Conclusion
The popularity of platform trials has increased in recent years. However, due to the complexity of such trial designs, many design-related questions remain, and the use of platform trials is still limited, especially in the confirmatory late-phase setting. To facilitate the use of platform trials, we propose an optimal design for two-period multiarm platform trials, in terms of minimizing the total sample size to control the FWER or PWER. Instead of adding new arms without end, this type of trial considers two periods, before and after new experimental arms are added. Each period can have one or more experimental arms, and a common control arm is shared by both periods. A two-period multiarm platform trial is usually very useful in the setting of a single institution and is a special type of MAMS platform.
In this paper, to meet registrational purposes, we systematically described how to control the FWER or PWER when adding new arms, re-estimate the sample size to achieve the desirable power, and determine the optimal allocation ratio. Numerical evaluations were conducted to comprehensively examine the properties of the proposed design. The advantage of this design over conducting separate multiarm trials is that we can reduce the sample size and use a shared infrastructure. We also provide a step-by-step tutorial in Appendix A.2, that demonstrates how to use the R package PlatformDesign.
In this paper, we considered conducting the main analyses using only the concurrent controls. Because osteosarcoma is a relatively rare disease, patient accrual can take a long time. Therefore, we need to be careful about the potential changes in treatment effect over time. For clinical studies with relatively faster accrual rates, the difference between the two periods may not be substantial. In those cases, a nonconcurrent control may still be used. Nevertheless, in the design of our pediatric osteosarcoma study, we plan to include all control arm data for sensitivity analyses (i.e., to increase the estimation precision and power). There are three rationales for using a pooled control arm: (1) Pediatric osteosarcoma is a rare disease, so patients are scarce. (2) If the timing of adding new arms is early and the medical landscape is stable, there is little concern about any potential shift in the treatment effect over time. (3) The nonconcurrent control is essentially part of the control arm. Those patients are enrolled in the same study, screened with the same inclusion/exclusion criteria, and participate at the same institution, just like the concurrent controls. How to use nonconcurrent controls has been described in many papers [15, 6, 19, 18].
We also examined the timing of adding new arms in platform trials because practical guidance about deciding the timing of adding and closing arms will help increase the uptake of this approach. However, we have focused primarily on the statistical aspects of adding arms. The optimal timing of adding (or closing) arms in platform trials depends on the clinical context, the nature of the interventions, and the capability of stakeholders to deliver amendments [12]. It should also be noted that whether to add a new treatment arm to a multiarm study in the two-period setting (called two-stage in [11]) has been discussed based on a decision-theoretic framework.
Future works will involve extending the current method to a multiperiod, multiarm, multistage setting. The design will include more than two periods and will be not only multiarm but also multistage, allowing early closing of arms or graduating during the interim analyses. In this paper, we do not consider the presence of time trends, which is an effect of a treatment (either an experimental or control treatment) that may vary with time as the study period of platform trials is often longer than that of fixed trials. This happens, for example, when there is a learning curve amongst the study personnel or when standard care changes over time. In the future, we may use models that incorporate time trends in the proposed framework. We also may study how to use nonconcurrent control arm data when time trends are considered.