The New England Journal of Statistics in Data Science logo


  • Help
Login Register

  1. Home
  2. To appear
  3. Three-Outcome Dual-Criterion Randomized ...

The New England Journal of Statistics in Data Science

Submit your article Information Become a Peer-reviewer
  • Article info
  • Full article
  • More
    Article info Full article

Three-Outcome Dual-Criterion Randomized Phase II Clinical Trial Design
Yujia Wang   Xiaohan Chi   Ruitao Lin  

Authors

 
Placeholder
https://doi.org/10.51387/25-NEJSDS83
Pub. online: 7 May 2025      Type: Methodology Article      Open accessOpen Access
Area: Statistical Methodology

Accepted
3 March 2025
Published
7 May 2025

Abstract

The high cost of drug development and the relatively low success rates of phase III clinical trials highlight the need for improved and reasonably sized phase II trial designs, especially when responses observed in treatment and control could not lead to a clear-cut decision warranting further studies. To this end, we propose a three-outcome dual-criterion randomized (TDR) trial design, which implements inconclusive region sculpting using boundaries defined by both statistically significant differences between treatment and control as well as the clinical relevance of treatment responses. We provide statistical justifications for the TDR design in both one-stage and two-stage trial settings. Additionally, we evaluate its operating characteristics through a comparison with existing designs. The proposed design is shown able to achieve sample size saving and type II error reduction while controlling the type I error at a marginal cost of power reduction. Lastly, robustness under various deviations from the assumed control response rate is also demonstrated.

1 Introduction

With the recent development in cancer treatment and regulations, a greater focus has been placed on randomized designs in phase II cancer clinical trials. In general, phase II trials are a vital step in oncology drug development. A phase II clinical trial should screen out inefficacious agents while warranting subsequent large-scale phase III clinical trials when a treatment demonstrates safety and efficacy [9, 13]. However, the rate of success in phase III trials is generally low [7], highlighting the need for more careful decision-making in phase II trials [17]. A plethora of trial designs have been proposed for phase II trials, including the commonly used Simon’s two-stage design [19], historical controls [1], the reference control arm design [5], the pick-the-winner design [18], and the screening design [14]. However, these methods, especially those that utilize a single-armed design, are subject to potential biases due to shifts in patient selection and evolving standards of care [13]. Conventional hypothesis testing results in two possible outcomes: either rejecting or accepting the null hypothesis, which often poses a dilemma for investigators, especially when the observed responses fall near the decision boundaries. In such scenarios, the dichotomous framework requires a definitive acceptance or rejection of one hypothesis over the other, despite the inherent uncertainty in observed data. Considering the high cost of drug development and long development phases [21], the impact of incorrect decisions can be substantial. In view of the issue, several three-outcome designs have been proposed. In a phase II trial setting, Storer [20] proposed a single-armed three-outcome design that allows for rejecting neither ${H_{0}}:p\le {p_{1}}$ nor ${H_{a}}:p\ge {p_{2}}$ when observed response rates fall between probabilities ${p_{1}}$ and ${p_{2}}$. This design optimizes the sample size to meet constraints on the probability of rejecting neither hypothesis. Sargent et al. [15] proposed an alternative three-outcome design with an inconclusive region defined by two cutoff points for observed responses. Building on this, Hong and Wang [6] extended Sargent’s design to a two-armed randomized controlled trial that controls design error rates and inconclusiveness probabilities, resulting in considerable sample size savings compared to traditional two-outcome designs.
Concurrently, researchers seek to enhance the validity and practicality of phase II trials by incorporating a second criterion of clinical relevance in decision rules. Fisch et al. [4] raised the question of whether statistical significance between treatment and control arms alone is sufficient to justify advancing to phase III trials, noting that a statistically significant but minor improvement may not warrant further investment. Thus, they proposed a proof-of-concept design where dual criteria of significance and relevance were evaluated. Subsequently, Litwin et al. [10] extended this approach to a two-stage randomized controlled trial method. They proposed early termination probabilities under the null and alternative hypotheses to derive the stage 1 sample size and employed an incremental search for stage 2 sample size determination. This method showed substantial sample size savings, yet we see merits in addressing the borderline response rates to further reduce false positives.
In this paper, we propose a randomized controlled phase II clinical trial design that considers both the uncertainty in trial outcomes and clinical relevance. By incorporating the inconclusive regions in the hypothesis testing framework, the proposed design allows practical considerations such as clinical, regulatory, and commercial decision-making. The adoption of the dual criteria further ensures the predicted power to warrant a phase III trial and reduces the type I error. The remainder of the paper is organized as follows. In Section 2, we propose three-outcome dual-criterion randomized designs with controls on the inconclusive region, presented in both one-stage and two-stage manners. We also describe the sample size determination procedure and introduce a loss function for optimizing the design parameters. In Section 3, we evaluate the proposed method numerically and compare it with existing methods. In Section 4, we apply the proposed design to data from the VIT-0910 trial. The paper is concluded with discussions in Section 5. The TDR sample size calculation program in the form of R code is available online at https://github.com/ywangaz/TDRdesign.

2 Methods

In this section, we describe a three-outcome dual-criterion randomized (TDR) design for phase II trials with binary efficacy endpoints. The design aims to attain sample size savings while controlling type I and type II errors as well as maintaining adequate statistical power. This is achieved by sculpting the hypothesis rejection region, taking both statistical significance and clinical relevance into account. The probability of incorrectly rejecting either the null hypothesis or the alternative hypothesis is reduced by introducing an inconclusive region, which allows comprehensive considerations of other aspects in addition to statistical significance and clinical relevance in drug development when observed results are borderline. We first focus on the TDR design in a one-stage trial setting, and the design in a two-stage trial setting will also be discussed. For simplicity, the 1:1 randomization is illustrated in this paper, and the design can be applied to other randomization schemes where appropriate.

2.1 TDR One-Stage Design

In phase II trials with binary efficacy endpoints, let ${p_{E}}$ and ${p_{C}}$ denote the true response rates for the experimental arm and the control arm, and ${\hat{p}_{E}}$ and ${\hat{p}_{C}}$ denote the observed response rates. We aim to address the two-sample test with the following hypotheses:
\[\begin{aligned}{}& {H_{0}}:{p_{E}}={p_{C}}={p_{0}},\\ {} & {H_{a}}:{p_{E}}={p_{1}},\hspace{2.5pt}{p_{C}}={p_{0}},\hspace{2.5pt}{p_{1}}\gt {p_{0}}.\end{aligned}\]
Traditional approaches for the two-sample test rely solely on between-arm comparisons and yield only binary trial outcomes: either rejecting or accepting ${H_{0}}$. To reduce the required sample size and draw more meaningful conclusions from two-sample tests, we combine one-sample rejection decision rules with traditional two-sample rules and introduce a statistically inconclusive region.
In a one-stage setting, assume ${n_{E}}$ patients are recruited to the experimental arm and ${n_{C}}$ patients are recruited to the control arm, leading to a total of $N={n_{E}}+{n_{C}}$ patients. Let ${y_{E}}$ and ${y_{C}}$ denote the number of patients demonstrating successful responses in the experimental and control arm, respectively. The decision rules for reaching one of the three outcomes of rejecting ${H_{0}}$, rejecting ${H_{a}}$, or rejecting neither are defined as follows:
(2.1)
\[\begin{aligned}{}& \text{If}\hspace{2.5pt}{\hat{p}_{E}}-{\hat{p}_{C}}\ge {p_{s}}\cap {\hat{p}_{E}}\ge {p_{m}}\text{, reject}\hspace{2.5pt}{H_{0}};\\ {} & \text{If}\hspace{2.5pt}{\hat{p}_{E}}-{\hat{p}_{C}}\lt {p_{s}}\text{, reject}\hspace{2.5pt}{H_{a}};\\ {} & \text{If}\hspace{2.5pt}{\hat{p}_{E}}-{\hat{p}_{C}}\hspace{-0.1667em}\ge \hspace{-0.1667em}{p_{s}}\cap {\hat{p}_{E}}\hspace{-0.1667em}\lt \hspace{-0.1667em}{p_{m}}\text{, declare statistically inconclusive},\end{aligned}\]
where ${p_{s}}$ denotes the statistical significance boundary (${p_{s}}\gt 0$), and ${p_{m}}$ denotes the clinical relevance boundary (${p_{m}}\gt 0$). Given a 1:1 randomization, the decision rules in Equation (2.1) can be simplified by assuming ${n_{E}}={n_{C}}=N/2$, that is
\[\begin{aligned}{}& \text{If}\hspace{2.5pt}{y_{E}}-{y_{C}}\ge s\cap {y_{E}}\ge m\text{, reject}\hspace{2.5pt}{H_{0}};\\ {} & \text{If}\hspace{2.5pt}{y_{E}}-{y_{C}}\lt s\text{, reject}\hspace{2.5pt}{H_{a}};\\ {} & \text{If}\hspace{2.5pt}{y_{E}}-{y_{C}}\ge s\cap {y_{E}}\lt m\text{, declare statistically inconclusive},\end{aligned}\]
where $s=\frac{N}{2}{p_{s}}$, and $m=\frac{N}{2}{p_{m}}$. Here we refer to this design as a 2-by-2 TDR design, where the decision rules for both statistical significance and clinical relevance each contain two regions: ${y_{E}}-{y_{C}}\ge s$ or ${y_{E}}-{y_{C}}\lt s$ for statistical significance, and ${y_{E}}\ge m$ or ${y_{E}}\lt m$ for clinical relevance. Under the independent and normality assumption, the conditions of the decision rules are equivalent to constructing two z-test statistics.
The statistically inconclusive region is reserved for the situation where there are substantial differences between the experimental arm and the control, while the observed responses in the experimental arm are suboptimal in terms of the historical control rate. This may occur when the trial population differs from the population used to derive the historical control rate. In this case, the clinical decision regarding whether to proceed to a phase III trial or terminate the current trial requires more deliberation. Primary investigators and statisticians should comprehensively review factors such as regulatory requirements, commercial potential, and practicality of administering the treatment, in order to decide on warranting further investigations.
In our one-stage TDR design, we use exact binomial probabilities to calculate the type I error α, type II error β, and inconclusive region probabilities η under ${H_{0}}$ and γ under ${H_{a}}$, as follows:
\[\begin{aligned}{}\alpha & =\sum \limits_{{D_{1}}\cap {D_{2}}}B({y_{E}},{n_{E}},{p_{E}}\mid {H_{0}})B({y_{C}},{n_{C}},{p_{C}}\mid {H_{0}}),\\ {} \beta & =\sum \limits_{{D^{\prime }_{1}}}B({y_{E}},{n_{E}},{p_{E}}\mid {H_{a}})B({y_{C}},{n_{C}},{p_{C}}\mid {H_{a}}),\\ {} \eta & =\sum \limits_{{D_{1}}\cap {D^{\prime }_{2}}}B({y_{E}},{n_{E}},{p_{E}}\mid {H_{0}})B({y_{C}},{n_{C}},{p_{C}}\mid {H_{0}}),\\ {} \gamma & =\sum \limits_{{D_{1}}\cap {D^{\prime }_{2}}}B({y_{E}},{n_{E}},{p_{E}}\mid {H_{a}})B({y_{C}},{n_{C}},{p_{C}}\mid {H_{a}}),\end{aligned}\]
where ${D_{1}}=\{({y_{E}},{y_{C}}):{y_{E}}-{y_{C}}\ge s\}$, ${D_{2}}=\{{y_{E}}:{y_{E}}\ge m\}$, and ${D^{\prime }_{1}}$ and ${D^{\prime }_{2}}$ denote the complementary sets of ${D_{1}}$ and ${D_{2}}$. In addition, we introduce $\lambda =(\eta +\gamma )/2$ as the expected inconclusive probability. This definition is based on a common assumption that the unknown true response rates of both arms vary uniformly between the null and alternative hypotheses. The statistically inconclusive regions under ${H_{0}}$ and ${H_{a}}$ can be controlled simultaneously by constraining γ and λ instead of γ and η to prevent highly imbalanced inconclusive regions under ${H_{0}}$ and ${H_{a}}$. To provide finer control over inconclusive regions, it is possible to introduce both upper and lower boundaries for determining the statistical significance. This extension is referred to as a 3-by-2 TDR design, with three regions for statistical significance (i.e. ${y_{E}}-{y_{C}}\ge s$, $r\lt {y_{E}}-{y_{C}}\lt s$, or ${y_{E}}-{y_{C}}\le r$) and two regions for clinical relevance (i.e. ${y_{E}}\ge m$ or ${y_{E}}\lt m$) in the decision rules. A detailed description is provided in Section 1.1 of the Supplementary Materials.

2.2 TDR Two-Stage Design

In this section, we consider extending the proposed design to a two-stage trial setting, for the purpose of ethically stopping a trial early given insufficient evidence of efficacy. In stage 1, we enroll and randomize ${n_{C1}}$ patients to the control arm and ${n_{E1}}$ patients to the experimental arm. If the trial is not stopped early, we proceed to stage 2, where ${n_{C2}}$ and ${n_{E2}}$ patients are randomized to each arm. We denote ${N_{1}}={n_{C1}}+{n_{E1}}$ and ${N_{2}}={n_{C2}}+{n_{E2}}$ as the total sample size in stages 1 and 2, respectively. The number of responses observed in the stage 1 (or stage 2) are denoted as ${y_{E1}}$ and ${y_{C1}}$ (or ${y_{E2}}$ and ${y_{C2}}$) for the treatment and control arms, respectively. At the conclusion of stage 1, an interim analysis is performed, and the trial proceeds to stage 2 if
(2.2)
\[ {y_{E1}}-{y_{C1}}\gt {s_{1}}\hspace{0.2778em}\hspace{0.2778em}\mathrm{and}\hspace{0.2778em}\hspace{0.2778em}{y_{E1}}\ge {m_{1}},\]
where ${s_{1}}$ is a statistical difference threshold for early stopping and ${m_{1}}$ is a clinical relevance threshold for early stopping. Note that the inconclusive regions are excluded in the interim analysis for simplicity. We denote the probabilities of proceeding to stage 2 under ${H_{0}}$ and ${H_{a}}$ as $\Pr ({S_{1}}\mid {H_{0}})$ and $\Pr ({S_{1}}\mid {H_{a}})$, respectively, where ${S_{1}}=\{{n_{E1}},{n_{C1}},{s_{1}},{m_{1}}:{y_{E1}}-{y_{C1}}\gt {s_{1}}\cap {y_{E1}}\ge {m_{1}}\}$ represents the condition (2.2). In our design, we propose to control
\[ \Pr ({S_{1}}\mid {H_{0}})\le 1-e{s_{0}}\hspace{0.2778em}\hspace{0.2778em}\mathrm{and}\hspace{0.2778em}\hspace{0.2778em}\Pr ({S_{1}}\mid {H_{a}})\ge 1-e{s_{1}},\]
where $e{s_{0}}$ and $e{s_{1}}$ are early stopping probability levels under the null and alternative hypotheses, respectively. In this paper, we set $e{s_{0}}=0.50$ and $e{s_{1}}=0.05$ as reasonable constraints.
In stage 2, the proposed design will proceed as described in Section 2.1, with type I error α, type II error β, and inconclusive region probabilities η under ${H_{0}}$ and γ under ${H_{a}}$ defined as below:
\[\begin{aligned}{}\alpha & =\sum \limits_{{D_{1}}\cap {D_{2}}}\hspace{-0.1667em}B({y_{E2}},{n_{E2}},{p_{E}}\mid {H_{0}})B({y_{C2}},{n_{C2}},{p_{C}}\mid {H_{0}})\Pr ({S_{1}}\mid {H_{0}}),\\ {} \beta & =\sum \limits_{{D^{\prime }_{1}}}B({y_{E2}},{n_{E2}},{p_{E}}\mid {H_{a}})B({y_{C2}},{n_{C2}},{p_{C}}\mid {H_{a}})\Pr ({S_{1}}\mid {H_{a}}),\\ {} \eta & =\sum \limits_{{D_{1}}\cap {D^{\prime }_{2}}}\hspace{-0.1667em}B({y_{E2}},{n_{E2}},{p_{E}}\mid {H_{0}})B({y_{C2}},{n_{C2}},{p_{C}}\mid {H_{0}})\Pr ({S_{1}}\mid {H_{0}}),\\ {} \gamma & =\sum \limits_{{D_{1}}\cap {D^{\prime }_{2}}}\hspace{-0.1667em}B({y_{E2}},{n_{E2}},{p_{E}}\mid {H_{a}})B({y_{C2}},{n_{C2}},{p_{C}}\mid {H_{a}})\Pr ({S_{1}}\mid {H_{a}}),\end{aligned}\]
where ${D_{1}}=\{({y_{E2}},{y_{C2}}):{y_{E2}}-{y_{C2}}\ge {s_{2}}-({y_{E1}}-{y_{C1}})\}$, ${D_{2}}=\{{y_{E2}}:{y_{E2}}\ge {m_{2}}-{y_{E1}}\}$, ${s_{2}}$ is the statistical significance boundary (${s_{2}}\gt {y_{E1}}-{y_{C1}}$), and ${m_{2}}$ is the clinical relevance boundary (${m_{2}}\gt {y_{E1}}$).

2.3 Sample Size Determination

With the introduction of the inconclusiveness region, the power of the TDR design is defined as $\pi =1-\beta -\gamma $, which is the probability of rejecting ${H_{0}}$ when ${H_{a}}$ is true. Given a specific minimum target power ${\pi _{\mathrm{min}}}$ and maximum type II error level ${\beta _{\mathrm{max}}}$, we control the inconclusive region probabilities γ and λ under their maximum allowable constraints ${\gamma _{\mathrm{max}}}$ and ${\lambda _{\mathrm{max}}}$. As a result, there might be multiple sets of $(\alpha ,\beta ,\gamma ,\lambda )$ satisfying these requirements. For example, given ${\pi _{\mathrm{min}}}$ and ${\beta _{\mathrm{max}}}$, the maximum allowable inconclusive probability under ${H_{a}}$ is given by ${\gamma _{\mathrm{max}}}=1-({\beta _{\mathrm{max}}}+{\pi _{\mathrm{min}}})$. To ensure proper control of the trial’s inconclusive probabilities, a possible γ should not exceed this threshold (i.e., $\gamma \le {\gamma _{\mathrm{max}}}$). Similarly, ${\lambda _{\mathrm{max}}}$ is given by ${\lambda _{\mathrm{max}}}=({\eta _{\mathrm{max}}}+{\gamma _{\mathrm{max}}})/2$, where ${\eta _{\mathrm{max}}}$ represents the maximum target level for the inconclusive probability η. To facilitate parameter search under these constraints, we propose a loss function for systematic evaluation, which is discussed in Section 2.4.
The sample size for the proposed one-stage TDR design is determined using a two-step approach. Firstly, for each candidate sample size, sets of parameters $\{s,m,{n_{C}},{n_{E}}\}$ are obtained through an incremental search over a grid of design parameters $\{(\alpha ,\beta ,\gamma ,\lambda ,\pi ):\alpha \le {\alpha _{\mathrm{max}}},\beta \le {\beta _{\mathrm{max}}},\gamma \le {\gamma _{\mathrm{max}}},\lambda \le {\lambda _{\mathrm{max}}},\pi \ge {\pi _{\mathrm{min}}}\}$, where ${\alpha _{\mathrm{max}}}$ denotes the maximum target type I error level. For each given set of $(\alpha ,\beta ,\gamma ,\lambda ,\pi )$ satisfying these constraints, we search for all possible pairs of s and m within pre-defined searching regions. Reasonable regions for s and m can be
\[\begin{aligned}{}& \big\{s:s\in \big[{p_{C}}({n_{E}}-k{n_{C}}),{p_{E}}{n_{E}}-k{p_{C}}{n_{C}}+4{\sigma _{\mathrm{pool}}}\big],\\ {} & \hspace{1em}-{n_{C}}\le s\le {n_{E}}\big\},\\ {} & \big\{m:m\in [{p_{C}}{n_{E}},{p_{E}}{n_{E}}+4{\sigma _{E}}],m\le {n_{E}}\big\},\end{aligned}\]
where k is the randomization ratio, ${\sigma _{\mathrm{pool}}}$ is the pooled standard error, and ${\sigma _{E}}$ is the standard error under ${H_{a}}$. The optimal $(s,m)$ is then obtained by selecting the pair with the smallest type I error α. Secondly, among all candidate sample sizes, the optimal design parameters $(\alpha ,\beta ,\gamma ,\lambda ,\pi )$ and the corresponding sample size are selected through a loss function that balances the trade-off between trial power and sample size. We provide a systematic evaluation of sample size and power using the loss function described in Section 2.4. The parameters yielding the smallest loss score are then selected as the optimal parameters.
The proposed one-stage design is expected to reduce type I error by introducing the inconclusive region, compared to the design by [10], at the same sample size. This occurs when the difference in the number of responses between the two arms is larger than s, but the number of observed responses in the experimental arm is less than m. Theoretically, given a fixed sample size N, the relationships between $(s,m)$ and the associated error rates (or inconclusive probabilities) can be summarized as follows: when m is kept constant, increasing s reduces α, increases β, and reduces γ and η; when s is kept constant, increasing m reduces α, increases γ and η, and has no effect on β. Under ${H_{0}}$, the criterion of clinical relevance has less effect compared to its effect under ${H_{a}}$. This is because, when ${p_{E}}={p_{C}}$, it is more likely that ${y_{E}}\lt m$. For the same reason, η is generally larger than γ.

2.4 Loss Function

Previous studies have established the practice of optimizing trial design using loss functions, such as [8, 12, 16], among others. In this design, we propose using a loss function to systematically evaluate the effects of inconclusive region constraints to optimize both sample size and power. For each sample size and its corresponding optimal design parameters $(\alpha ,\beta ,\gamma ,\lambda ,\pi )$, we calculate a loss score at the sample size n and power π with respect to a reference sample size ${n_{0}}$ and a reference power ${\pi _{0}}$ by a loss function $L(n,\pi ,{n_{0}},{\pi _{0}})$. The reference sample size ${n_{0}}$ is the sample size per arm calculated using a standard two-group sample size calculation under the same hypothesis as the TDR design. The reference power ${\pi _{0}}$ is the corresponding power calculated in the reference sample size method, i.e. the probability of correctly accepting the alternative hypothesis of the reference sample size method. The primary principle of the loss function is to penalize an increase in sample size and a reduction in power. According to [11], a loss function should meet the following criteria:
  • 1. Monotonicity: at a fixed sample size n or a fixed power π, the loss score should monotonously increase if power decreases or sample size increases.
    \[\begin{aligned}{}L(n,{\pi _{1}},{n_{0}},{\pi _{0}})& \gt L(n,{\pi _{2}},{n_{0}},{\pi _{0}})\Leftrightarrow {\pi _{1}}\lt {\pi _{2}};\\ {} L({n_{1}},\pi ,{n_{0}},{\pi _{0}})& \gt L({n_{2}},\pi ,{n_{0}},{\pi _{0}})\Leftrightarrow {n_{1}}\gt {n_{2}}.\end{aligned}\]
  • 2. Scale invariance: proportional scaling in sample size n and reference sample size ${n_{0}}$ at the same power, or proportional scaling in power π and reference power ${\pi _{0}}$ at the same sample size, should produce the same loss score.
    \[\begin{aligned}{}L(n,\pi ,{n_{0}},{\pi _{0}})& =L(c\cdot n,\pi ,c\cdot {n_{0}},{\pi _{0}});\\ {} L(n,\pi ,{n_{0}},{\pi _{0}})& =L(n,d\cdot \pi ,{n_{0}},d\cdot {\pi _{0}}),\end{aligned}\]
where $\forall c\gt 0$, $\forall d\gt 0$, $d\cdot \pi \le 1$, $d\cdot {\pi _{0}}\le 1$. Additional design consideration includes interpretability and being bounded within $(0,1)$.
Based on the criteria discussed above, we propose the following loss function:
\[ L(n,\pi ,{n_{0}},{\pi _{0}})=\sigma \bigg(w\frac{n}{{n_{0}}}+(1-w)\frac{{\pi _{0}}}{\pi }-1\bigg),\]
where w is a weighing parameter, and $\sigma (\cdot )$ is a link function defined as $\sigma (x)=\frac{1}{1+{e^{-x}}}$. The link function $\sigma (\cdot )$ scales the loss score to the range of $(0,1)$. The parameter w determines the trade-off between reducing sample size and increasing power. We performed a sensitivity analysis on w and found that the sample size and power were invariant to w when $w\gt 0.4$ (Supplementary Figure S1). Within the range of $w\le 0.4$, a larger w gives greater priority to reducing the sample size, while a smaller w prioritizes increasing power. Therefore, we recommend using $w=0.5$, which assigns equal importance to sample size reduction and power improvement and provides an optimized balance between sample size and power. We recommend calculating ${n_{0}}$ using the standard sample size formula for testing ${H_{0}}:{p_{E}}-{p_{C}}=0$ and ${H_{a}}:{p_{E}}-{p_{C}}\gt 0$ [2]. When the sample size is smaller than that of the standard two-sample test and the power is greater, which is the most desirable scenario, the sum of the first two components is smaller than 1, resulting in a loss score smaller than 0.5; when the sample size and power match those of the standard two-sample test, the loss score equals 0.5; when the sample size is larger and the power is lower, the loss score is greater than 0.5, which is considered undesirable.
The optimal parameters for the inconclusive regions are selected as the smallest solution set that minimizes the loss function. Firstly, given minimum target power ${\pi _{\mathrm{min}}}$ and sample size n, we specify a pair of $({\gamma _{\mathrm{max}}},{\lambda _{\mathrm{max}}})$. Then the optimal design sample size and the corresponding power $({N^{\ast }},{\pi ^{\ast }})$ are determined by minimizing the loss score. Formally,
(2.3)
\[ \big({N^{\ast }},{\pi ^{\ast }}\big)=\arg \underset{N\in Q}{\min }L(N,\pi ,{n_{0}},{\pi _{0}}|{\gamma _{\mathrm{max}}},{\lambda _{\mathrm{max}}}),\]
where Q represents the search set for the sample size. An example illustrating the determination of ${\gamma _{\mathrm{max}}}$ and ${\lambda _{\mathrm{max}}}$ is provided in Table S1 in the Supplementary Materials. In practice, we impose an additional constraint on power π to ensure that it remains at a relatively high level, requiring $\pi \ge {\pi _{\mathrm{min}}}-c$, where c is a constant (e.g., $c=0.05$). When comparing different trial configurations, the loss score provides a systematic evaluation of both sample size and power.

3 Numerical Studies

3.1 TDR One-Stage Design

Table 1
Optimal design parameters of the TDR one-stage 2-by-2 design with ${\alpha _{\mathrm{max}}}=0.20$, ${\beta _{\mathrm{max}}}=0.20$, ${\pi _{\mathrm{min}}}=0.80$, and $c=0.05$.
Setting Design Parameter
δ ${p_{C}}$ ${p_{E}}$ ${\gamma _{\mathrm{max}}}$ ${\lambda _{\mathrm{max}}}$ s m N π β α γ η λ
0.15 0.10 0.25 0.08 0.20 1 4 44 0.79 0.13 0.15 0.08 0.25 0.16
0.20 0.10 0.30 0.08 0.20 1 3 28 0.80 0.13 0.14 0.08 0.23 0.15
0.25 0.10 0.35 0.12 0.20 1 3 22 0.77 0.11 0.08 0.11 0.27 0.19
0.15 0.20 0.35 0.16 0.30 0 8 54 0.76 0.08 0.15 0.16 0.42 0.29
0.20 0.20 0.40 0.16 0.30 0 5 30 0.77 0.08 0.16 0.16 0.43 0.30
0.25 0.20 0.45 0.10 0.15 1 4 24 0.81 0.13 0.17 0.06 0.22 0.14
0.15 0.30 0.45 0.11 0.20 1 12 62 0.76 0.14 0.17 0.10 0.28 0.19
0.20 0.30 0.50 0.07 0.20 1 8 40 0.81 0.12 0.19 0.06 0.24 0.15
0.15 0.35 0.50 0.10 0.20 1 13 60 0.76 0.15 0.19 0.09 0.26 0.17
0.20 0.35 0.55 0.10 0.15 1 9 40 0.81 0.13 0.20 0.06 0.23 0.15
0.25 0.35 0.60 0.10 0.20 1 6 24 0.78 0.15 0.18 0.08 0.24 0.16
0.20 0.40 0.60 0.10 0.20 1 10 38 0.76 0.14 0.16 0.10 0.27 0.19
0.25 0.40 0.65 0.16 0.30 0 7 24 0.77 0.07 0.15 0.16 0.43 0.29
0.15 0.50 0.65 0.10 0.15 2 20 70 0.77 0.18 0.19 0.05 0.17 0.11
0.20 0.50 0.70 0.11 0.20 1 12 38 0.77 0.13 0.16 0.10 0.28 0.19
0.15 0.55 0.70 0.13 0.25 0 18 56 0.78 0.09 0.20 0.13 0.36 0.24
0.20 0.55 0.75 0.14 0.30 0 11 32 0.79 0.08 0.19 0.13 0.38 0.26
0.25 0.60 0.85 0.10 0.20 1 7 18 0.77 0.17 0.19 0.06 0.22 0.14
0.20 0.65 0.85 0.10 0.20 1 11 28 0.78 0.15 0.18 0.07 0.24 0.15
0.15 0.70 0.85 0.11 0.25 1 19 48 0.79 0.14 0.19 0.07 0.24 0.16
${\gamma _{\mathrm{max}}}$: design constraint for γ; ${\lambda _{\mathrm{max}}}$: design constraint for λ; s: statistical difference boundary; m: clinical relevance boundary; N: total sample size; π: power; α: type I error; β: type II error; γ: inconclusive probability under ${H_{a}}$; η: inconclusive probability under ${H_{0}}$; λ: average inconclusive probability under ${H_{0}}$ and ${H_{a}}$.
nejsds83_g001.jpg
Figure 1
Comparison of TDR one-stage 2-by-2 with the HW method [6] and LBR method [10] under ${\alpha _{\mathrm{max}}}=0.20$, ${\beta _{\mathrm{max}}}=0.20$, ${\pi _{\mathrm{min}}}=0.80$, and $c=0.05$. (A) Sample size reduction with respect to the conventional sample size calculation for two-sample proportions; (B) Comparison of operating characteristics power π, type I error α, and type II error β.
nejsds83_g002.jpg
Figure 2
Comparison of TDR one-stage 2-by-2 with the HW method [6] and LBR method [10] under ${\alpha _{\mathrm{max}}}=0.10$, ${\beta _{\mathrm{max}}}=0.10$, ${\pi _{\mathrm{min}}}=0.90$, and $c=0.05$. (A) Sample size reduction with respect to the conventional sample size calculation for two-sample proportions; (B) Comparison of operating characteristics power π, type I error α, and type II error β.
Table 1 lists the optimal TDR one-stage 2-by-2 design parameters with varying differences in response rate, $\delta ={p_{E}}-{p_{C}}\in \{0.15,0.20,0.25\}$, under target levels ${\alpha _{\mathrm{max}}}=0.20$, ${\beta _{\mathrm{max}}}=0.20$, ${\pi _{\mathrm{min}}}=0.80$, and $c=0.05$. In this table, ${\gamma _{\mathrm{max}}}$ and ${\lambda _{\mathrm{max}}}$ are design parameters corresponding to the optimal sample size, determined using the loss function with a weight parameter $w=0.50$, which equally weighs the importance of sample size and power. For each case of response rates, we employ a 1:1 randomization. The total sample size required for the trial is denoted by N with corresponding decision boundaries s and m. To evaluate the proposed design, we compare the operating characteristics of the TDR design with the method proposed by [6] (HW) and the method proposed by [10] (LBR). The comparison evaluates the percentage reduction in sample size relative to the conventional calculation for two-sample proportions under the same settings of type I error ${\alpha _{\mathrm{max}}}$, and type II error ${\beta _{\mathrm{max}}}$. The results are shown in Figure 1. Overall, the TDR design provides 26.7–51.7% sample size savings as compared to the conventional approach, while the HW method provides up to 22.7% sample size savings and the LBR method provides 20.0–42.6% sample size savings. In most of the scenarios, the proposed method outperforms the HW method and the LBR method with minimal loss of power. The TDR method generally yields higher power than the HW method and provides more sample size savings in all cases. Compared to the LBR method, the TDR method provides superior or comparable sample size reduction up to 13.8% except in one case. TDR achieved 0.3–12.3% type II error reduction at the cost of up to 6.3% power loss and gained 1.0% power in one case. In terms of the type I error, all three methods are comparable and are constrained below 0.20. In the one case where the TDR method does not show sample size saving compared to the LBR method, the type II error is 7.5% lower and the power is 1.1% higher at response rates of (0.30, 0.50). In four of the twenty cases, the HW method does not exhibit substantial sample size saving and is therefore not displayed.
Under more stringent requirements on type I and type II errors, i.e., ${\alpha _{\mathrm{max}}}=0.10$, ${\beta _{\mathrm{max}}}0.10$, ${\pi _{\mathrm{min}}}=0.80$, and $c=0.05$, superior performances of the proposed method are observed in most cases. Table 2 provides details of the optimal trial design parameters. Comparisons of sample size reduction, as well as operating characteristics, are shown in Figure 2. Overall, the TDR design achieves 37.2–57.0% sample size reduction, the HW design achieves up to 34.0% sample size reduction, and the LBR design achieves 37.2–49.1% sample size reduction. The TDR method provides additional 2.9–18.0% sample size savings as compared to the LBR method in 18 of the 20 cases. In two cases where equal sample sizes are calculated, TDR provides 3.1% and 1.2% reduction in type II errors at response rates of (0.20, 0.45) and (0.65, 0.85), respectively.
Table 2
Optimal design parameters of the TDR one-stage 2-by-2 design with ${\alpha _{\mathrm{max}}}=0.10$, ${\beta _{\mathrm{max}}}=0.10$, ${\pi _{\mathrm{min}}}=0.90$, and $c=0.05$.
Setting Design Parameter
δ ${p_{C}}$ ${p_{E}}$ ${\gamma _{\mathrm{max}}}$ ${\lambda _{\mathrm{max}}}$ s m N π β α γ η λ
0.15 0.10 0.25 0.10 0.25 1 7 76 0.86 0.05 0.08 0.09 0.35 0.22
0.20 0.10 0.30 0.10 0.20 1 5 48 0.87 0.05 0.08 0.07 0.32 0.20
0.25 0.10 0.35 0.10 0.20 1 4 34 0.88 0.05 0.08 0.07 0.31 0.19
0.15 0.20 0.35 0.10 0.25 1 15 106 0.86 0.05 0.09 0.09 0.36 0.23
0.20 0.20 0.40 0.11 0.25 1 10 64 0.87 0.05 0.08 0.08 0.35 0.22
0.25 0.20 0.45 0.10 0.15 2 9 54 0.90 0.06 0.07 0.05 0.24 0.14
0.15 0.30 0.45 0.10 0.25 1 23 120 0.86 0.05 0.10 0.09 0.36 0.22
0.20 0.30 0.50 0.10 0.25 1 15 72 0.86 0.05 0.09 0.09 0.36 0.23
0.15 0.35 0.50 0.10 0.25 1 27 124 0.86 0.05 0.10 0.09 0.37 0.23
0.20 0.35 0.55 0.11 0.30 0 16 68 0.86 0.04 0.10 0.11 0.45 0.28
0.25 0.35 0.60 0.10 0.25 1 11 44 0.86 0.06 0.10 0.08 0.34 0.21
0.20 0.40 0.60 0.10 0.20 2 20 78 0.86 0.07 0.09 0.06 0.27 0.17
0.25 0.40 0.65 0.10 0.15 2 13 48 0.86 0.09 0.10 0.05 0.23 0.14
0.15 0.50 0.65 0.10 0.25 1 37 126 0.86 0.05 0.10 0.09 0.37 0.23
0.20 0.50 0.70 0.12 0.30 0 22 70 0.86 0.03 0.09 0.11 0.46 0.29
0.15 0.55 0.70 0.12 0.30 1 39 122 0.86 0.05 0.09 0.09 0.37 0.23
0.20 0.55 0.75 0.10 0.25 1 23 68 0.86 0.05 0.09 0.09 0.36 0.23
0.25 0.60 0.85 0.10 0.20 2 16 42 0.86 0.09 0.08 0.05 0.24 0.14
0.20 0.65 0.85 0.10 0.20 2 24 62 0.87 0.08 0.09 0.05 0.26 0.15
0.15 0.70 0.85 0.10 0.25 2 38 96 0.86 0.08 0.09 0.06 0.28 0.17
${\gamma _{\mathrm{max}}}$: design constraint for γ; ${\lambda _{\mathrm{max}}}$: design constraint for λ; s: statistical difference boundary; m: clinical relevance boundary; N: total sample size; π: power; α: type I error; β: type II error; γ: inconclusive probability under ${H_{a}}$; η: inconclusive probability under ${H_{0}}$; λ: average inconclusive probability under ${H_{0}}$ and ${H_{a}}$.

3.2 TDR Two-Stage Design

Extending the method to a two-stage design, we provide details of the TDR two-stage 2-by-2 design in Table 3. In addition, given the established sample size saving performance of the LBR method proposed by [10], we use it as a reference to benchmark the proposed method in terms of expected sample size (EN) and maximum sample size as shown in Figure 3. The total sample sizes required for stage 1 and stage 2 of the trial are denoted by ${N_{1}}$ and ${N_{2}}$, respectively, with corresponding decision boundaries ${s_{1}}$ and ${m_{1}}$ for stage 1, and ${s_{2}}$ and ${m_{2}}$ for stage 2.
Table 3
Optimal design parameters of the TDR two-stage 2-by-2 design with ${\alpha _{\mathrm{max}}}=0.20$, ${\beta _{\mathrm{max}}}=0.20$, ${\pi _{\mathrm{min}}}=0.80$, and $c=0.05$.
Setting Design Parameter
${p_{C}}$ ${p_{E}}$ ${\gamma _{\mathrm{max}}}$ ${\lambda _{\mathrm{max}}}$ ${s_{1}}$ ${m_{1}}$ ${s_{2}}$ ${m_{2}}$ $EN$ ${N_{1}}$ ${N_{2}}$ π β α γ η λ
0.10 0.25 0.10 0.07 −4 3 1 4 47.63 46 50 0.85 0.07 0.19 0.03 0.09 0.06
0.10 0.30 0.10 0.07 −3 2 1 3 29.65 28 32 0.85 0.07 0.17 0.03 0.11 0.07
0.10 0.35 0.10 0.07 −2 2 1 3 25.34 24 28 0.88 0.05 0.14 0.03 0.11 0.07
0.20 0.35 0.06 0.15 −4 6 1 9 60.90 56 66 0.82 0.13 0.17 0.06 0.16 0.11
0.20 0.40 0.10 0.10 −4 4 1 6 36.69 34 40 0.83 0.15 0.17 0.05 0.14 0.09
0.20 0.45 0.10 0.14 −3 3 1 5 25.75 24 28 0.80 0.08 0.12 0.08 0.18 0.13
0.30 0.45 0.06 0.15 −4 10 1 13 66.93 64 70 0.81 0.14 0.19 0.06 0.15 0.10
0.30 0.50 0.10 0.09 −5 6 1 8 37.86 36 40 0.81 0.09 0.19 0.05 0.13 0.09
0.35 0.50 0.10 0.13 −7 11 1 14 61.97 60 64 0.76 0.10 0.17 0.09 0.17 0.13
0.35 0.55 0.10 0.16 −3 7 1 10 39.98 38 42 0.77 0.09 0.14 0.09 0.20 0.15
0.35 0.60 0.10 0.15 −2 5 1 8 28.78 26 32 0.82 0.14 0.14 0.07 0.18 0.13
0.40 0.60 0.10 0.14 −3 8 1 11 39.96 38 42 0.78 0.09 0.15 0.09 0.19 0.14
0.40 0.65 0.10 0.09 −4 6 1 8 27.70 26 30 0.83 0.08 0.18 0.04 0.11 0.08
0.50 0.65 0.10 0.15 −6 15 1 21 64.97 58 72 0.79 0.14 0.18 0.07 0.16 0.11
0.50 0.70 0.16 0.25 −3 9 1 12 35.93 34 38 0.77 0.18 0.16 0.09 0.18 0.13
0.55 0.70 0.10 0.11 −6 16 1 19 57.94 56 60 0.78 0.11 0.20 0.06 0.14 0.10
0.55 0.75 0.15 0.20 −3 10 1 13 35.84 34 38 0.78 0.17 0.15 0.08 0.17 0.13
0.60 0.85 0.10 0.08 −2 6 1 9 20.78 18 24 0.84 0.09 0.19 0.03 0.12 0.08
0.65 0.85 0.15 0.25 −3 9 0 12 27.97 26 30 0.80 0.11 0.17 0.11 0.24 0.18
0.70 0.85 0.10 0.15 −3 16 1 19 45.91 44 48 0.79 0.19 0.19 0.06 0.14 0.10
${\gamma _{\mathrm{max}}}$: design constraint for γ; ${\lambda _{\mathrm{max}}}$: design constraint for λ; ${s_{1}}$: statistical difference boundary in stage 1; ${m_{1}}$: clinical relevance boundary in stage 1; ${s_{2}}$: statistical difference boundary in stage 2; ${m_{2}}$: clinical relevance boundary in stage 2; $EN$: expected total sample size; ${N_{1}}$: total sample size in stage 1; ${N_{2}}$: total sample size in stage 2; π: power; α: type I error; β: type II error; γ: inconclusive probability under ${H_{a}}$; η: inconclusive probability under ${H_{0}}$; λ: average inconclusive probability under ${H_{0}}$ and ${H_{a}}$.
nejsds83_g003.jpg
Figure 3
Comparison of TDR two-stage 2-by-2 with the LBR method [10] under ${\alpha _{\mathrm{max}}}=0.20$, ${\beta _{\mathrm{max}}}=0.20$, ${\pi _{\mathrm{min}}}=0.80$, and $c=0.05$. (A) Expected sample size reduction and maximum sample size reduction with respect to the LBR method; (B) Comparison of operating characteristics power π, type I error α, and type II error β.
In a two-stage TDR design, the inconclusive region is considered only in stage 2. As a result, if the sample size in stage 1 is much larger than in stage 2, the potential for sample size savings will be limited. In 11 of the 20 cases, our proposed method provides a 2.8–8.7% reduction in expected sample size and a 5.4–15.8% reduction in maximum sample size as compared to the LBR method. In all cases, the proposed method shows reductions in type II errors compared to the LBR method. In the nine cases of no sample size reduction, we observe reductions in type II errors except for minimal type II error inflation in two cases (0.02 and 0.003 at the response rates (0.20, 0.40) and (0.35 and 0.60), respectively). This could be due to over-constraining design parameters, which could be remedied by a more granular search of ${\gamma _{\mathrm{max}}}$ and ${\lambda _{\mathrm{max}}}$ in regions around the current constraints, adjusting the loss function weight parameter w, or inspecting the sculpting boundaries. For example, at ${p_{C}}=0.35$, ${p_{E}}=0.60$, by decreasing the statistical difference boundary, ${s_{2}}$, one could reduce the total sample size by two in the second stage with a 1.8% reduction in type I error with a power still higher than 0.75.

3.3 Sensitivity Analysis

As previously shown, the TDR design can be applied to more stringent requirements on type I and type II errors. To account for the effect of variation in control response rates, we apply the type I error constraints to the maximum of type I errors yielded from a confidence interval of ${p_{C}}$. A confidence interval of 30% is chosen for demonstration purposes, as with historical data, there could be a fairly informed estimation of control response rates. In general, the proposed design shows superior performance to the HW design, the LBR design, as well as the conventional design. The details can be found in Table S2 and Figure S2 of the Supplementary Materials.
We also conducted a comparison between the proposed 2-by-2 design and the 3-by-2 design. Further details can be found in Table S3 and Figure S3 in the Supplementary Materials. Overall, with more granular control of the inconclusive region, the 3-by-2 TDR design provides a reduction in type II error, which is an advantage of using the 3-by-2 design. Depending on true response rates, the 3-by-2 design may provide sample size savings in some cases. However, one should also take note of the increased design complexity when choosing between 2-by-2 and 3-by-2 designs.

4 Trial Application

Defachelles et al. [3] conducted a randomized two-parallel group phase II trial to evaluate the efficacy and safety of the vincristine-irinotecan combination with and without temozolomide (VIT and VI, respectively) among patients with relapsed or refractory rhabdomyosarcoma. In this study, a total of 120 patients were randomized 1:1 to receive 21-day cycles of VI or VIT, with 60 patients per arm. The primary endpoint is the objective response rate (ORR) after two cycles. Originally designed as a non-comparative randomized phase II trial, the trial performed Simon’s two-stage design [19] in each arm to define the sample size. The design parameters are set as ${p_{0}}=0.35$ under the null hypothesis and ${p_{1}}=0.55$ under the alternative hypothesis for each arm. A dropout rate of 8% was considered in this trial. The ORR after two cycles in the whole population was $44\% $ in the VIT arm and $31\% $ in the VI arm (i.e., ${\hat{p}_{E}}=0.44$ and ${\hat{p}_{C}}=0.31$).
The TDR two-stage design, as detailed in Section 2.2, is performed to re-calculate the sample size in the VIT-0910 trial. In adherence to the above trial configurations, we set ${p_{C}}=0.35$ and ${p_{E}}=0.55$ in our design. We search for the sample size under a specified type I error of ${\alpha _{\mathrm{max}}}=0.10$ and seek to achieve a power of 0.90. The selection of the optimal sample size is based on minimizing the loss score across all potential candidates. As a comparison, we also compute the sample size using the LBR design. The summarized sample sizes and design parameters are shown in Table 4. Both our TDR design and the LBR design exhibit a substantial decrease in the required sample size for the same trial.
Table 4
Application of the TDR two-stage design and the LBR method to the VIT-0910 trial.
N α β π
VIT-0910 128 0.10 0.10 0.90
TDR 102 0.09 0.02 0.90
LBR 102 0.09 0.05 0.90
N: total sample size; π: power; α: type I error; β: type II error. An 8% dropout rate is considered in the total sample size.

5 Discussions

In this paper, we propose a three-outcome dual-criterion randomized phase II design that utilizes inconclusive region sculpting to reduce sample size and type II error. The proposed TDR trial design shows sample size saving and reduction in type II error compared to existing methods. When the requirements for type I and type II error control become more stringent, such as controlling α and β to be within 10% instead of 20%, the proposed method demonstrates even greater sample size savings. While the benefit of sample savings and type II error reduction is evident in most cases, a limitation of the proposed design is a slight reduction in power. However, this can be controlled by specifying an acceptable power threshold and adjusting design parameters accordingly. It should also be noted that as the trials of interest for this design consider binary outcomes, the discreteness of responses may lead to fluctuations in type I and type II errors, as well as power, when automatically searching design parameters over a range of values for (${\gamma _{\mathrm{max}}}$, ${\lambda _{\mathrm{max}}}$). This can be mitigated by manually adjusting the sample size or design parameters, and the loss function can assist in such an adjustment process by systematically evaluating the trade-off between power and sample size.
The TDR design provides flexibility for an extension to a two-stage setting, particularly when early stopping due to lack of efficacy is an ethical consideration. Additionally, the inconclusive region can be more finely sculpted using a 3-by-2 TDR design, further reducing type II error. To align with specific study objectives, parameters and loss function settings can be adjusted to control type I and type II errors, inconclusive probabilities, randomization ratio, confidence interval of ${p_{C}}$ for robustness, and early stopping probabilities. Moreover, given the flexibility of the design, the dual criteria on clinical relevance could potentially be extended to a three-region decision framework to further control the inconclusive probabilities.

Acknowledgements

We would like to express our gratitude to the Editor, the Associate Editor, and two reviewers for their valuable comments and suggestions, which significantly contributed to improving the quality of the article.

References

[1] 
Brookmeyer, R. and Crowley, J. (1982). A confidence interval for the median survival time. Biometrics 38(1) 29–41. https://doi.org/10.2307/2530286
[2] 
Chow, S.-C., Wang, H. and Shao, J. (2007). Sample Size Calculations in Clinical Research, 2nd edn. Chapman and Hall/CRC. https://doi.org/10.1201/9781584889830. MR2356591
[3] 
Defachelles, A.-S., Bogart, E., Casanova, M., Merks, J. H. M., Bisogno, G., Calareso, G., Gallego Melcon, S., Gatz, S. A., Le Deley, M.-C., McHugh, K., Probst, A., Rocourt, N., van Rijn, R. R., Wheatley, K., Minard-Colin, V. and Chisholm, J. C. (2021). Randomized phase II trial of vincristine-irinotecan with or without temozolomide, in children and adults with relapsed or refractory rhabdomyosarcoma: a European paediatric soft tissue sarcoma study group and innovative therapies for children with cancer trial. Journal of Clinical Oncology 39(27) 2979–2990. https://doi.org/10.1200/JCO.21.00124.
[4] 
Fisch, R., Jones, I., Jones, J., Kerman, J., Rosenkranz, G. K. and Schmidli, H. (2015). Bayesian design of proof-of-concept trials. Therapeutic Innovation & Regulatory Science 49(1) 155–162. https://doi.org/10.1177/2168479014533970.
[5] 
Herson, J. and Carter, S. K. (1986). Calibrated phase II clinical trials in oncology. Statistics in Medicine 5(5) 441–447. https://doi.org/10.1002/sim.4780050508.
[6] 
Hong, S. and Wang, Y. (2007). A three-outcome design for randomized comparative phase II clinical trials. Statistics in Medicine 26(19) 3525–3534. https://doi.org/10.1002/sim.2824. MR2393733
[7] 
Kola, I. and Landis, J. (2004). Can the pharmaceutical industry reduce attrition rates? Nature Reviews. Drug Discovery 3(8) 711–715. https://doi.org/10.1038/nrd1470.
[8] 
Law, M., Grayling, M. J. and Mander, A. P. (2021). A stochastically curtailed two-arm randomised phase II trial design for binary outcomes. Pharmaceutical Statistics 20(2) 212–228. https://doi.org/10.1002/PST.2067.
[9] 
Lee, J. J. and Feng, L. (2005). Randomized phase II designs in cancer clinical trials: current status and future directions. Journal of Clinical Oncology: Official Journal of the American Society of Clinical Oncology 23(19) 4450–4457. https://doi.org/10.1200/JCO.2005.03.197.
[10] 
Litwin, S., Basickes, S. and Ross, E. A. (2017). Two-sample binary phase 2 trials with low type I error and low sample size. Statistics in Medicine 36(9) 1383–1394. https://doi.org/10.1002/sim.7226. MR3631967
[11] 
Mozgunov, P., Jaki, T. and Gasparini, M. (2019). Loss functions in restricted parameter spaces and their Bayesian applications. Journal of Applied Statistics 46(13) 2314. https://doi.org/10.1080/02664763.2019.1586848. MR3987561
[12] 
Mozgunov, P. and Jaki, T. (2019). An information theoretic phase I–II design for molecularly targeted agents that does not require an assumption of monotonicity. Journal of the Royal Statistical Society. Series C, Applied Statistics 68(2) 347. https://doi.org/10.1111/rssc.12293. MR3902998
[13] 
Rubinstein, L., Crowley, J., Ivy, P., LeBlanc, M. and Sargent, D. (2009). Randomized phase II designs. Clinical Cancer Research 15(6) 1883–1890. https://doi.org/10.1158/1078-0432.CCR-08-2031.
[14] 
Rubinstein, L. V., Korn, E. L., Freidlin, B., Hunsberger, S., Ivy, S. P. and Smith, M. A. (2005). Design issues of randomized phase ii trials and a proposal for phase II screening trials. Journal of Clinical Oncology 23(28) 7199–7206. https://doi.org/10.1200/JCO.2005.01.149.
[15] 
Sargent, D. J., Chan, V. and Goldberg, R. M. (2001). A three-outcome design for phase II clinical trials. Controlled Clinical Trials 22(2) 117–125. https://doi.org/10.1016/S0197-2456(00)00115-X.
[16] 
Shan, M. (2021). A confidence function-based posterior probability design for phase II cancer trials. Pharmaceutical Statistics 20(3) 485–498. https://doi.org/10.1002/pst.2089.
[17] 
Sharma, M. R., Stadler, W. M. and Ratain, M. J. (2011). Randomized phase II trials: a long-term investment with promising returns. JNCI Journal of the National Cancer Institute 103(14) 1093–1100. https://doi.org/10.1093/jnci/djr218.
[18] 
Simon, R., Wittes, R. E. and Ellenberg, S. S. (1985). Randomized phase II clinical trials. Cancer Treatment Reports 69(12) 1375–1381.
[19] 
Simon, R. (1989). Optimal two-stage designs for phase II clinical trials. Controlled Clinical Trials 10(1) 1–10. https://doi.org/10.1016/0197-2456(89)90015-9. MR4366283
[20] 
Storer, B. E. (1992). A class of phase II designs with three possible outcomes. Biometrics 48(1) 55–60. https://doi.org/10.2307/2532738.
[21] 
Wouters, O. J., McKee, M. and Luyten, J. (2020). Estimated research and development investment needed to bring a new medicine to market, 2009–2018. JAMA 323(9) 844–853. https://doi.org/10.1001/jama.2020.1166.
Reading mode PDF XML

Table of contents
  • 1 Introduction
  • 2 Methods
  • 3 Numerical Studies
  • 4 Trial Application
  • 5 Discussions
  • Acknowledgements
  • References

Copyright
© 2025 New England Statistical Society
by logo by logo
Open access article under the CC BY license.

Keywords
Phase II trial design Inconclusive region Clinical relevance Binary outcome Sample size

Funding
Lin’s research is partially supported by NIH/NCI grants R01CA261978 and 1R21LM014699.

Metrics
since December 2021
9

Article info
views

1

Full article
views

4

PDF
downloads

0

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

  • Figures
    3
  • Tables
    4
  • Supplementary
    1
nejsds83_g001.jpg
Figure 1
Comparison of TDR one-stage 2-by-2 with the HW method [6] and LBR method [10] under ${\alpha _{\mathrm{max}}}=0.20$, ${\beta _{\mathrm{max}}}=0.20$, ${\pi _{\mathrm{min}}}=0.80$, and $c=0.05$. (A) Sample size reduction with respect to the conventional sample size calculation for two-sample proportions; (B) Comparison of operating characteristics power π, type I error α, and type II error β.
nejsds83_g002.jpg
Figure 2
Comparison of TDR one-stage 2-by-2 with the HW method [6] and LBR method [10] under ${\alpha _{\mathrm{max}}}=0.10$, ${\beta _{\mathrm{max}}}=0.10$, ${\pi _{\mathrm{min}}}=0.90$, and $c=0.05$. (A) Sample size reduction with respect to the conventional sample size calculation for two-sample proportions; (B) Comparison of operating characteristics power π, type I error α, and type II error β.
nejsds83_g003.jpg
Figure 3
Comparison of TDR two-stage 2-by-2 with the LBR method [10] under ${\alpha _{\mathrm{max}}}=0.20$, ${\beta _{\mathrm{max}}}=0.20$, ${\pi _{\mathrm{min}}}=0.80$, and $c=0.05$. (A) Expected sample size reduction and maximum sample size reduction with respect to the LBR method; (B) Comparison of operating characteristics power π, type I error α, and type II error β.
Table 1
Optimal design parameters of the TDR one-stage 2-by-2 design with ${\alpha _{\mathrm{max}}}=0.20$, ${\beta _{\mathrm{max}}}=0.20$, ${\pi _{\mathrm{min}}}=0.80$, and $c=0.05$.
Table 2
Optimal design parameters of the TDR one-stage 2-by-2 design with ${\alpha _{\mathrm{max}}}=0.10$, ${\beta _{\mathrm{max}}}=0.10$, ${\pi _{\mathrm{min}}}=0.90$, and $c=0.05$.
Table 3
Optimal design parameters of the TDR two-stage 2-by-2 design with ${\alpha _{\mathrm{max}}}=0.20$, ${\beta _{\mathrm{max}}}=0.20$, ${\pi _{\mathrm{min}}}=0.80$, and $c=0.05$.
Table 4
Application of the TDR two-stage design and the LBR method to the VIT-0910 trial.
Supplementary Material
nejsds83_s001.pdf
Supplementary Material for Three-Outcome Dual-Criterion Randomized Phase II Clinical Trial Design.
nejsds83_g001.jpg
Figure 1
Comparison of TDR one-stage 2-by-2 with the HW method [6] and LBR method [10] under ${\alpha _{\mathrm{max}}}=0.20$, ${\beta _{\mathrm{max}}}=0.20$, ${\pi _{\mathrm{min}}}=0.80$, and $c=0.05$. (A) Sample size reduction with respect to the conventional sample size calculation for two-sample proportions; (B) Comparison of operating characteristics power π, type I error α, and type II error β.
nejsds83_g002.jpg
Figure 2
Comparison of TDR one-stage 2-by-2 with the HW method [6] and LBR method [10] under ${\alpha _{\mathrm{max}}}=0.10$, ${\beta _{\mathrm{max}}}=0.10$, ${\pi _{\mathrm{min}}}=0.90$, and $c=0.05$. (A) Sample size reduction with respect to the conventional sample size calculation for two-sample proportions; (B) Comparison of operating characteristics power π, type I error α, and type II error β.
nejsds83_g003.jpg
Figure 3
Comparison of TDR two-stage 2-by-2 with the LBR method [10] under ${\alpha _{\mathrm{max}}}=0.20$, ${\beta _{\mathrm{max}}}=0.20$, ${\pi _{\mathrm{min}}}=0.80$, and $c=0.05$. (A) Expected sample size reduction and maximum sample size reduction with respect to the LBR method; (B) Comparison of operating characteristics power π, type I error α, and type II error β.
Table 1
Optimal design parameters of the TDR one-stage 2-by-2 design with ${\alpha _{\mathrm{max}}}=0.20$, ${\beta _{\mathrm{max}}}=0.20$, ${\pi _{\mathrm{min}}}=0.80$, and $c=0.05$.
Setting Design Parameter
δ ${p_{C}}$ ${p_{E}}$ ${\gamma _{\mathrm{max}}}$ ${\lambda _{\mathrm{max}}}$ s m N π β α γ η λ
0.15 0.10 0.25 0.08 0.20 1 4 44 0.79 0.13 0.15 0.08 0.25 0.16
0.20 0.10 0.30 0.08 0.20 1 3 28 0.80 0.13 0.14 0.08 0.23 0.15
0.25 0.10 0.35 0.12 0.20 1 3 22 0.77 0.11 0.08 0.11 0.27 0.19
0.15 0.20 0.35 0.16 0.30 0 8 54 0.76 0.08 0.15 0.16 0.42 0.29
0.20 0.20 0.40 0.16 0.30 0 5 30 0.77 0.08 0.16 0.16 0.43 0.30
0.25 0.20 0.45 0.10 0.15 1 4 24 0.81 0.13 0.17 0.06 0.22 0.14
0.15 0.30 0.45 0.11 0.20 1 12 62 0.76 0.14 0.17 0.10 0.28 0.19
0.20 0.30 0.50 0.07 0.20 1 8 40 0.81 0.12 0.19 0.06 0.24 0.15
0.15 0.35 0.50 0.10 0.20 1 13 60 0.76 0.15 0.19 0.09 0.26 0.17
0.20 0.35 0.55 0.10 0.15 1 9 40 0.81 0.13 0.20 0.06 0.23 0.15
0.25 0.35 0.60 0.10 0.20 1 6 24 0.78 0.15 0.18 0.08 0.24 0.16
0.20 0.40 0.60 0.10 0.20 1 10 38 0.76 0.14 0.16 0.10 0.27 0.19
0.25 0.40 0.65 0.16 0.30 0 7 24 0.77 0.07 0.15 0.16 0.43 0.29
0.15 0.50 0.65 0.10 0.15 2 20 70 0.77 0.18 0.19 0.05 0.17 0.11
0.20 0.50 0.70 0.11 0.20 1 12 38 0.77 0.13 0.16 0.10 0.28 0.19
0.15 0.55 0.70 0.13 0.25 0 18 56 0.78 0.09 0.20 0.13 0.36 0.24
0.20 0.55 0.75 0.14 0.30 0 11 32 0.79 0.08 0.19 0.13 0.38 0.26
0.25 0.60 0.85 0.10 0.20 1 7 18 0.77 0.17 0.19 0.06 0.22 0.14
0.20 0.65 0.85 0.10 0.20 1 11 28 0.78 0.15 0.18 0.07 0.24 0.15
0.15 0.70 0.85 0.11 0.25 1 19 48 0.79 0.14 0.19 0.07 0.24 0.16
${\gamma _{\mathrm{max}}}$: design constraint for γ; ${\lambda _{\mathrm{max}}}$: design constraint for λ; s: statistical difference boundary; m: clinical relevance boundary; N: total sample size; π: power; α: type I error; β: type II error; γ: inconclusive probability under ${H_{a}}$; η: inconclusive probability under ${H_{0}}$; λ: average inconclusive probability under ${H_{0}}$ and ${H_{a}}$.
Table 2
Optimal design parameters of the TDR one-stage 2-by-2 design with ${\alpha _{\mathrm{max}}}=0.10$, ${\beta _{\mathrm{max}}}=0.10$, ${\pi _{\mathrm{min}}}=0.90$, and $c=0.05$.
Setting Design Parameter
δ ${p_{C}}$ ${p_{E}}$ ${\gamma _{\mathrm{max}}}$ ${\lambda _{\mathrm{max}}}$ s m N π β α γ η λ
0.15 0.10 0.25 0.10 0.25 1 7 76 0.86 0.05 0.08 0.09 0.35 0.22
0.20 0.10 0.30 0.10 0.20 1 5 48 0.87 0.05 0.08 0.07 0.32 0.20
0.25 0.10 0.35 0.10 0.20 1 4 34 0.88 0.05 0.08 0.07 0.31 0.19
0.15 0.20 0.35 0.10 0.25 1 15 106 0.86 0.05 0.09 0.09 0.36 0.23
0.20 0.20 0.40 0.11 0.25 1 10 64 0.87 0.05 0.08 0.08 0.35 0.22
0.25 0.20 0.45 0.10 0.15 2 9 54 0.90 0.06 0.07 0.05 0.24 0.14
0.15 0.30 0.45 0.10 0.25 1 23 120 0.86 0.05 0.10 0.09 0.36 0.22
0.20 0.30 0.50 0.10 0.25 1 15 72 0.86 0.05 0.09 0.09 0.36 0.23
0.15 0.35 0.50 0.10 0.25 1 27 124 0.86 0.05 0.10 0.09 0.37 0.23
0.20 0.35 0.55 0.11 0.30 0 16 68 0.86 0.04 0.10 0.11 0.45 0.28
0.25 0.35 0.60 0.10 0.25 1 11 44 0.86 0.06 0.10 0.08 0.34 0.21
0.20 0.40 0.60 0.10 0.20 2 20 78 0.86 0.07 0.09 0.06 0.27 0.17
0.25 0.40 0.65 0.10 0.15 2 13 48 0.86 0.09 0.10 0.05 0.23 0.14
0.15 0.50 0.65 0.10 0.25 1 37 126 0.86 0.05 0.10 0.09 0.37 0.23
0.20 0.50 0.70 0.12 0.30 0 22 70 0.86 0.03 0.09 0.11 0.46 0.29
0.15 0.55 0.70 0.12 0.30 1 39 122 0.86 0.05 0.09 0.09 0.37 0.23
0.20 0.55 0.75 0.10 0.25 1 23 68 0.86 0.05 0.09 0.09 0.36 0.23
0.25 0.60 0.85 0.10 0.20 2 16 42 0.86 0.09 0.08 0.05 0.24 0.14
0.20 0.65 0.85 0.10 0.20 2 24 62 0.87 0.08 0.09 0.05 0.26 0.15
0.15 0.70 0.85 0.10 0.25 2 38 96 0.86 0.08 0.09 0.06 0.28 0.17
${\gamma _{\mathrm{max}}}$: design constraint for γ; ${\lambda _{\mathrm{max}}}$: design constraint for λ; s: statistical difference boundary; m: clinical relevance boundary; N: total sample size; π: power; α: type I error; β: type II error; γ: inconclusive probability under ${H_{a}}$; η: inconclusive probability under ${H_{0}}$; λ: average inconclusive probability under ${H_{0}}$ and ${H_{a}}$.
Table 3
Optimal design parameters of the TDR two-stage 2-by-2 design with ${\alpha _{\mathrm{max}}}=0.20$, ${\beta _{\mathrm{max}}}=0.20$, ${\pi _{\mathrm{min}}}=0.80$, and $c=0.05$.
Setting Design Parameter
${p_{C}}$ ${p_{E}}$ ${\gamma _{\mathrm{max}}}$ ${\lambda _{\mathrm{max}}}$ ${s_{1}}$ ${m_{1}}$ ${s_{2}}$ ${m_{2}}$ $EN$ ${N_{1}}$ ${N_{2}}$ π β α γ η λ
0.10 0.25 0.10 0.07 −4 3 1 4 47.63 46 50 0.85 0.07 0.19 0.03 0.09 0.06
0.10 0.30 0.10 0.07 −3 2 1 3 29.65 28 32 0.85 0.07 0.17 0.03 0.11 0.07
0.10 0.35 0.10 0.07 −2 2 1 3 25.34 24 28 0.88 0.05 0.14 0.03 0.11 0.07
0.20 0.35 0.06 0.15 −4 6 1 9 60.90 56 66 0.82 0.13 0.17 0.06 0.16 0.11
0.20 0.40 0.10 0.10 −4 4 1 6 36.69 34 40 0.83 0.15 0.17 0.05 0.14 0.09
0.20 0.45 0.10 0.14 −3 3 1 5 25.75 24 28 0.80 0.08 0.12 0.08 0.18 0.13
0.30 0.45 0.06 0.15 −4 10 1 13 66.93 64 70 0.81 0.14 0.19 0.06 0.15 0.10
0.30 0.50 0.10 0.09 −5 6 1 8 37.86 36 40 0.81 0.09 0.19 0.05 0.13 0.09
0.35 0.50 0.10 0.13 −7 11 1 14 61.97 60 64 0.76 0.10 0.17 0.09 0.17 0.13
0.35 0.55 0.10 0.16 −3 7 1 10 39.98 38 42 0.77 0.09 0.14 0.09 0.20 0.15
0.35 0.60 0.10 0.15 −2 5 1 8 28.78 26 32 0.82 0.14 0.14 0.07 0.18 0.13
0.40 0.60 0.10 0.14 −3 8 1 11 39.96 38 42 0.78 0.09 0.15 0.09 0.19 0.14
0.40 0.65 0.10 0.09 −4 6 1 8 27.70 26 30 0.83 0.08 0.18 0.04 0.11 0.08
0.50 0.65 0.10 0.15 −6 15 1 21 64.97 58 72 0.79 0.14 0.18 0.07 0.16 0.11
0.50 0.70 0.16 0.25 −3 9 1 12 35.93 34 38 0.77 0.18 0.16 0.09 0.18 0.13
0.55 0.70 0.10 0.11 −6 16 1 19 57.94 56 60 0.78 0.11 0.20 0.06 0.14 0.10
0.55 0.75 0.15 0.20 −3 10 1 13 35.84 34 38 0.78 0.17 0.15 0.08 0.17 0.13
0.60 0.85 0.10 0.08 −2 6 1 9 20.78 18 24 0.84 0.09 0.19 0.03 0.12 0.08
0.65 0.85 0.15 0.25 −3 9 0 12 27.97 26 30 0.80 0.11 0.17 0.11 0.24 0.18
0.70 0.85 0.10 0.15 −3 16 1 19 45.91 44 48 0.79 0.19 0.19 0.06 0.14 0.10
${\gamma _{\mathrm{max}}}$: design constraint for γ; ${\lambda _{\mathrm{max}}}$: design constraint for λ; ${s_{1}}$: statistical difference boundary in stage 1; ${m_{1}}$: clinical relevance boundary in stage 1; ${s_{2}}$: statistical difference boundary in stage 2; ${m_{2}}$: clinical relevance boundary in stage 2; $EN$: expected total sample size; ${N_{1}}$: total sample size in stage 1; ${N_{2}}$: total sample size in stage 2; π: power; α: type I error; β: type II error; γ: inconclusive probability under ${H_{a}}$; η: inconclusive probability under ${H_{0}}$; λ: average inconclusive probability under ${H_{0}}$ and ${H_{a}}$.
Table 4
Application of the TDR two-stage design and the LBR method to the VIT-0910 trial.
N α β π
VIT-0910 128 0.10 0.10 0.90
TDR 102 0.09 0.02 0.90
LBR 102 0.09 0.05 0.90
N: total sample size; π: power; α: type I error; β: type II error. An 8% dropout rate is considered in the total sample size.

The New England Journal of Statistics in Data Science

  • ISSN: 2693-7166
  • Copyright © 2021 New England Statistical Society

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer
Powered by PubliMill  •  Privacy policy