1 Background
In oncology phase I/II clinical trials, the primary objective is to identify the maximum tolerable or assessed dose (MTD/MAD) for new therapies and to evaluate their preliminary efficacy. These trials often employ dose escalation/de-escalation algorithms, using dose-limiting toxicity events and available safety and pharmacokinetic information to identify tolerable and therapeutically effective doses. Various methods are available for the optimal dose search, including the Bayesian optimal interval (BOIN) design for Phase I clinical trials [8], and the modified toxicity probability interval (mTPI-2) method [5]. However, these methods primarily focus on safety and may not be able to adequately evaluate the therapeutic efficacy due to the constrained sample sizes at each dose level. Typically, trials employing these designs proceed to the dose-expansion phase once the MTD/MAD dose is identified.
In trials where only one dose is identified for expansion and further efficacy assessment in phase II, Simon’s two-stage design [10] is frequently used. In this design, a dose may be terminated at the first stage if the anti-tumor activity is insufficient or it may proceed to the second stage for further evaluation with more subjects. However, although the identified dose level, often the MTD/MAD, exhibits anti-tumor activities, it may not optimize the benefit-risk ratio, which may hinder successful development in the later phase. Hence, finding a dose level that optimizes the balance between therapeutic benefits and safety risks in the early phases is paramount for the overall success of therapy development. In recognition of this importance, the FDA issued guidance on optimizing the dosage for developing oncology therapy [12]. In the guidance “Expansion Cohorts: Use in First-in-Human Clinical Trials to Expedite Development of Oncology Drugs and Biologics Guidance for Industry” [11], the FDA advises developing statistical analysis plans that encompass not only the justification of the maximum sample size but also stopping rules for lack of activity to minimize subject exposure to ineffective treatments. Furthermore, the FDA’s guidance also encouraged exploring randomized designs for multiple doses/regimens, including justification of sample size in the early phases of therapy development.
In response to the increasing need for new statistical methodologies that explore dose-response relationships in early-phase oncology therapy development, we propose a design method that extends Simon’s single-arm two-stage design. Many modifications of Simon’s design exist in the literature. For example, Chen expanded the two-stage framework into three stages, reducing the expected sample size when the treatment is ineffective [3], while Lin and Shih introduced adaptive extensions that allow modifications based on interim results, improving flexibility in decision-making [7]. Whitehead adapted Simon’s methodology for trials with survival endpoints, broadening its applicability to time-to-event outcomes [13]. Other methods also explored balancing groups to optimize trial efficiency and minimize bias in two-stage designs. For instance, Ye and Shyr propose balanced two-stage designs that maintain equal allocation across groups, improving trial efficiency in oncology Phase II studies [14]. Parashar introduce stratified designs that extend Simon’s two-stage methodology to stratified patient groups, optimizing the assessment of heterogeneity across subgroups [9]. Methods were also developed to enhance Simon’s design for specific trial contexts, such as incorporating sequential monitoring and decision-making processes. Lee proposed sequential dose-finding designs that optimize patient safety by continuously evaluating efficacy and toxicity [6]. Bartroff introduced group-sequential methodologies that integrate dose-escalation strategies with efficacy evaluations, enhancing the applicability of two-stage designs in dose-response contexts [1]. Further advancements include Bayesian methods and meta-analytic methods. For example, Zohar and Chevret presented a Bayesian two-stage design for dose-ranging trials, refining dose selection through adaptive decision rules [15]. Crippa introduced meta-analytic methods for synthesizing dose-response data, which can be integrated into sequential trial designs for more comprehensive analysis [4].
Despite their advancements, existing methods are limited to evaluating a single dose and do not explore the dose-response relationships of multiple doses. In contrast, we propose to extend Simon’s design to evaluate two doses and derive decision rules to identify better doses between two doses with optimal sample sizes. The decision rules allow early termination of the study at Stage I if none of the doses show sufficient anti-tumor potential or if at least one dose exhibits optimal anti-tumor activities. The decision rules also allow the selection of one dose for further evaluation in the second stage if the anti-tumor potential is promising. The paper is organized into the following sections: an overview of Simon’s two-stage design is provided in Section 2; the new method is introduced in Section 3 along with the calculation of the decision probabilities; Section 4 presents an enumeration algorithm for sample size calculation leading to the optimal and minimax design under the constraints of overall type I error and power; and examples of the application of the new method are illustrated in Section 5.
2 Overview of Simon’s Two-Stage Design
In Simon’s two-stage design, the null and alternative hypotheses are specified as ${H_{0}}:\theta \le {\theta _{0}}$, ${H_{1}}:\theta \ge {\theta _{A}}$, where θ represents a response rate for anti-tumor activities such as objective response rates (ORR). The null hypothesis implies that response rates equal to ${\theta _{0}}$ or lower are considered ineffective whereas the alternative corresponds to the parameter space that has treatment effect. A study design may consider powering certain desired response rates $\theta \ge {\theta _{A}}$. Stage I of Simon’s design uses a small sample size to explore if there is any desired anti-tumor activity. The study may be terminated at Stage I if the activity is lower than expected. Otherwise, additional subjects will be enrolled in Stage II to further evaluate the anti-tumor activity. Denote the observed number of responses at Stage I and Stage II as ${S_{1}}$ and ${S_{2}}$, respectively, and let $S={S_{1}}+{S_{2}}$. The response thresholds for Stage I and Stage II are denoted as a and r, respectively, out of the sample size ${n_{1}}$ and ${n_{2}}$, the number of subjects enrolled at Stages I and II, respectively. At the end of the first stage, if ${S_{1}}\le a$, the therapy is considered ineffective and the study will be terminated in Stage I. Otherwise, the study enters into Stage II. If $S\gt r$ at the end of the second stage, the therapy is considered effective.
Simon’s design determines the efficient sample sizes ${n_{1}}$ and ${n_{2}}$ for the decision rule a and r by controlling type I error, α, and type II error, β, at the desired levels. Let $n={n_{1}}+{n_{2}}$ be the total number of subjects. The expected sample size is expressed as $\text{EN}={n_{1}}+(1-\text{PET}){n_{2}}$, where $\text{PET}$ is the probability of early termination. The optimal design is derived by enumerating n, ${n_{1}}$, a, r with exact binomial probabilities based on the given ${\theta _{0}}$, ${\theta _{A}}$, α, and β. While controlling the type I error and achieving the desired power, Simon introduced two designs for finding sample sizes: the optimal design, which minimizes the expected sample size EN, and the minimax design, which minimizes the maximum sample size n.
3 Proposed Method
3.1 Study Design Considerations
In Simon’s two-stage design, the trial for the selected MTD/MAD dose is terminated at the first stage if the dose has low anti-tumor activities and moved to the second stage for further efficacy evaluation if adequate responses are observed in the first stage. However, recent advancements in anti-cancer treatments, such as cell and immuno-oncology therapies, show that the ORRs can be up to $80\% $ and $90\% $ [2]. With such high anti-tumor responses, it is reasonable to incorporate early termination for efficacy in the study design. Therefore, in addition to early termination for futility, we also propose termination for efficacy at the end of Stage I if the dose successfully demonstrates a high response rate in Stage I. By allowing the trial to be terminated for both futility and efficacy in the first stage, the proposed method could lead to additional sample size reduction in comparison to Simon’s approach. Furthermore, we extend Simon’s two-stage design to include two doses in the proposed method. The two doses will be randomized in a 1:1 ratio in the first stage assuming no prior knowledge of the preference of the two doses. Dose-response decisions will be made at the end of the first stage such that if the trial is not terminated for either futility or efficacy, only one dose will be selected to move to the next stage. In cases where the two doses show similar benefits and risk profiles, the lower dose is recommended for the next stage. The second stage of the study has a single arm.
Let ${\theta _{1}}$ and ${\theta _{2}}$ denote the anti-tumor responses, such as ORR, for Doses 1 and 2, respectively. The null hypothesis in the parameter space, ${\theta _{1}}$ and ${\theta _{2}}$, implies that neither of the two doses is efficacious and can be written as ${H_{0}}:{\theta _{1}}\le {\theta _{0}}$ and ${\theta _{2}}\le {\theta _{0}}$, where ${\theta _{0}}$ represents the threshold of a sub-therapeutic anti-tumor response region. The corresponding alternative is ${H_{A}}:{\theta _{1}}\ge {\theta _{A}}$ or ${\theta _{2}}\ge {\theta _{A}}$. In the alternative space of ${\theta _{1}}$ and ${\theta _{2}}$, we consider powering the study at the region ${\theta _{1}}\ge {\theta _{A}}$ or ${\theta _{2}}\ge {\theta _{A}}$, where ${\theta _{A}}$ is the expected or desired anti-tumor activity. This alternative region represents the scenario in which both doses are efficacious. The alternative space also covers the scenarios where only one dose is efficacious, such as the regions ${H_{01}}:{\theta _{1}}\le {\theta _{0}}$ vs ${H_{A1}}:{\theta _{1}}\ge {\theta _{A}}$ for Dose 1 and ${H_{02}}:{\theta _{2}}\le {\theta _{0}}$ vs ${H_{A2}}:{\theta _{2}}\ge {\theta _{A}}$ for Dose 2. In the proposed method, we may consider powering the study in different alternative regions. For example, a study design may power only the alternative region where ${\theta _{1}}\ge {\theta _{A}}$ and ${\theta _{2}}\ge {\theta _{A}}$. A more comprehensive design is to power the region of ${\theta _{1}}\ge {\theta _{A}}$ or ${\theta _{2}}\ge {\theta _{A}}$, which also includes the region ${\theta _{1}}\ge {\theta _{A}}$ and ${\theta _{2}}\ge {\theta _{A}}$. We will demonstrate in later sections that the optimal design to power the different regions will require different sample sizes.
3.2 Decision Rules
In the proposed method, we determine the decision rules and sample sizes based on the desired power and type I error rate. The decision rules include thresholds for observed responses to determine whether the trial should be terminated at the first stage for futility or success, which dose should be moved to the next stage, and whether a dose can be claimed for success at the end of Stage II. The optimal decision rule will lead to designs that use the minimum sample size to achieve the desired power for efficacious doses and control the type I error rate for the dose that lacks anti-tumor activities. Consistent with Simon’s two-stage design, we refer to the design with the minimum expected sample size as the optimal design and the design with the smallest total sample size as the minimax design.
Denoting the number of patients enrolled for each dose at the first and second stage as ${n_{1}}$ and ${n_{2}}$, respectively, the maximum total sample size should be $n=2{n_{1}}+{n_{2}}$ as only one dose will be moved to the second stage. The sample size for the dose moved to the second stage is $m={n_{1}}+{n_{2}}$. Let ${S_{1k}}$ denote the number of responses at Stage I for dose k, $k=1$ or 2, and ${S_{2}}$ denote the number of responses at Stage II for the dose that is moved to Stage II. Note in this proposed method, we only select one dose to move to Stage II. Let ${a_{1}}$ and ${r_{1}}$ be the lower and upper thresholds for terminating the study at Stage I. Doses with responses lower than or equal to ${a_{1}}$ will be terminated for futility, and doses with responses higher than or equal to ${r_{1}}$ are considered promising and will skip Stage II, moving directly to the next phase of development. Let s denote the number of cumulative responses: $S={S_{1k}}+{S_{2}}$ and r is the upper threshold out of ${n_{1}}+{n_{2}}$ for claiming efficacy for the dose at Stage II. We also assume that Dose 1 is the lower dose.
The decision rule is represented by the set of value $\{{a_{1}},{r_{1}},r\}$. During the conduct of the trial, the following decisions may be made based on the observed ${S_{1k}}$, and ${S_{2}}$, see Figure 1 and Figure 2:
-
• $D1.1$ to claim either dose to be efficacious and terminate the trial at Stage I, that is, ${S_{11}}\ge {r_{1}}$ or ${S_{12}}\ge {r_{1}}$.
-
• $D1.2$ to drop both doses for futility and terminate the trial at Stage I, that is ${S_{11}}\le {a_{1}}$ and ${S_{12}}\le {a_{1}}$.
-
• $D2.1$ to claim Dose 1 to be efficacious at Stage II. That is ${a_{1}}\lt {S_{11}}\lt {r_{1}}$ and ${S_{12}}\le {S_{11}}$ and $S={S_{11}}+{S_{2}}\gt r$.
-
• $D2.2$ to claim Dose 2 to be efficacious at Stage II. That is ${a_{1}}\lt {S_{12}}\lt {r_{1}}$ and ${S_{11}}\lt {S_{12}}$ and $S={S_{12}}+{S_{2}}\gt r$.
-
• $D2.3$ to claim no dose to be efficacious at Stage II. That is $S={S_{1k}}+{S_{2}}\le r$, ${a_{1}}\lt {S_{1k}}\lt {r_{1}}$, and ${S_{1,3-k}}\le {S_{1k}}$, where $k=1,2$.
Figure 2
Two-dimensional Decision Rules. At Stage I, the x-axis represents the dose response for dose I (${S_{12}}$) and the y-axis represents the dose response for dose II (${S_{11}}$). According to the decision rules, the 2-dimensional space are divided into 6 parts: (1)Drop both doses for futility and terminate the trial at Stage I, for ${S_{11}}\le {a_{1}}$ and ${S_{12}}\le {a_{1}}$. (2) Claim both dose to be efficacious and terminate the trial at Stage I, for ${S_{11}}\ge {r_{1}}$ and ${S_{12}}\ge {r_{1}}$. (3) Claim Dose 1 to be efficacious and terminate the trial at Stage I, for ${S_{11}}\ge {r_{1}}$ and ${S_{12}}\lt {r_{1}}$. (4) Claim Dose 2 to be efficacious and terminate the trial at Stage I, for ${S_{12}}\ge {r_{1}}$ and ${S_{11}}\lt {r_{1}}$. (5) Move Dose 1 to Stage II, for ${a_{1}}\lt {S_{11}}\lt {r_{1}}$ and ${S_{11}}\ge {S_{12}}$. (6) Move Dose 2 to Stage II, for ${a_{1}}\lt {S_{12}}\lt {r_{1}}$ and ${S_{11}}\lt {S_{12}}$. If Dose 1 is moved to the Stage II, claim no dose efficacious for $S={S_{11}}+{S_{2}}\le r$, and claim Dose 1 efficacious for $S={S_{11}}+{S_{2}}\gt r$. Similarly, if Dose 2 is moved to the Stage II, claim no dose efficacious for $S={S_{12}}+{S_{2}}\le r$, and claim Dose I efficacious for $S={S_{12}}+{S_{2}}\gt r$.
Assume that ${S_{1k}}\sim Binomial({\theta _{k}},{n_{1}})$, $k=1,2$, follow binomial distributions and ${S_{11}}$ and ${S_{12}}$ are independent. We denote $Bin({s_{1k}}|{n_{1}},{\theta _{k}})=P({S_{1k}}\le {s_{1k}}|{n_{1}},{\theta _{k}})$ as the cumulative probability function and $bin({s_{1k}}|{n_{1}},{\theta _{k}})=P({S_{1k}}={s_{1k}}|{n_{1}},{\theta _{k}})$ as the probability mass function of the binomial distribution $Binomial({\theta _{k}},{n_{1}})$. The probability of claiming either dose at Stage I using the decision rule $D1.1$ can be written as
\[\begin{aligned}{}& {R_{1}}({\theta _{1}},{\theta _{2}})\\ {} & =P({S_{11}}\ge {r_{1}}\cup {S_{12}}\ge {r_{1}}|{\theta _{1}},{\theta _{2}})\\ {} & =1-P({S_{11}}\lt {r_{1}}\cap {S_{12}}\lt {r_{1}}|{\theta _{1}},{\theta _{2}})\\ {} & =1-Bin({r_{1}}-1|{\theta _{1}},{n_{1}})Bin({r_{1}}-1|{\theta _{2}},{n_{1}}).\end{aligned}\]
The probability of terminating the trial at Stage I for decision $D1.2$ can be written as
The probability of claiming Dose 1 at Stage II for decision $D2.1$ is
\[\begin{aligned}{}& {R_{21}}({\theta _{1}},{\theta _{2}})\\ {} & =P({a_{1}}\lt {S_{11}}\lt {r_{1}}\cap {S_{11}}\ge {S_{12}}\cap S\ge r|{\theta _{1}},{\theta _{2}})\\ {} & ={\sum \limits_{{S_{11}}={a_{1}}+1}^{{r_{1}}-1}}bin({S_{11}}|{\theta _{1}},{n_{1}})Bin({S_{11}}|{\theta _{2}},{n_{1}})\\ {} & [1-Bin(r-{S_{11}}-1|{\theta _{1}},{n_{2}})].\end{aligned}\]
The probability of claiming Dose 2 at Stage II for decision $D2.2$ is
\[\begin{aligned}{}& {R_{22}}({\theta _{1}},{\theta _{2}})\\ {} & =P({a_{1}}\lt {S_{12}}\lt {r_{1}}\cap {S_{11}}\lt {S_{12}}\cap S\ge r|{\theta _{1}},{\theta _{2}})\\ {} & ={\sum \limits_{{S_{12}}={a_{1}}+1}^{{r_{1}}-1}}bin({S_{12}}|{\theta _{2}},{n_{1}})Bin({S_{12}}-1|{\theta _{1}},{n_{1}})\\ {} & [1-Bin(r-{S_{12}}-1|{\theta _{2}},{n_{2}})].\end{aligned}\]
Note that when ${S_{11}}={S_{12}}$, the lower dose, which is Dose 1, will be selected to move forward.
3.3 Type I Error
The type I error rate will be discussed under the global null hypothesis denoted as ${H_{0}}:{\theta _{1}}\le {\theta _{0}}\cap {\theta _{2}}\le {\theta _{0}}$, which implies that none of the two doses is efficacious. The corresponding global alternative that we would like to power is ${H_{A}}:{\theta _{1}}\ge {\theta _{A}}\cup {\theta _{2}}\ge {\theta _{A}}$. Although the space between ${\theta _{0}}$ to ${\theta _{A}}$ is considered alternative, we are not interested in power such alternative space. In addition to the global hypotheses, the hypotheses for the individual doses are ${H_{01}}:{\theta _{1}}\le {\theta _{0}}$ vs. ${H_{A1}}:{\theta _{1}}\ge {\theta _{A}}$ and ${H_{02}}:{\theta _{2}}\le {\theta _{0}}$ vs. ${H_{A2}}:{\theta _{2}}\ge {\theta _{A}}$ for doses 1 and 2, respectively. All hypotheses are 1-sided hypotheses. Under the null hypothesis ${H_{0}}:{\theta _{1}}\le {\theta _{0}}$ and ${\theta _{2}}\le {\theta _{0}}$, the overall type I error is the maximum error rate in the null space $0\le {\theta _{1}}\le {\theta _{0}}$ and $0\le {\theta _{2}}\le {\theta _{0}}$, i.e., the maximum probability of claiming either doses or both to be active in anti-tumor activities at Stages 1 or 2 based on the decision rules $D1.1$, $D2,1$, and $D2.2$. The need to identify the maximum type I error is because of the non-monotonic nature of the type I error function in the null space. Controlling the type I error at the level of ${\alpha _{0}}$ for 1-sided tests, it can be written as formula 3.1. The derivation of the formula can be found in the appendix.
(3.1)
\[\begin{aligned}{}& \text{max}({R_{1}}+{R_{21}}+{R_{22}}|{\theta _{1}}\le {\theta _{0}}\hspace{2.5pt}\text{and}\hspace{2.5pt}{\theta _{2}}\le {\theta _{0}})\\ {} =& \text{max}\{1-Bin({r_{1}}-1|{\theta _{1}},{n_{1}})Bin({r_{1}}-1|{\theta _{2}},{n_{1}})\\ {} +& {\sum \limits_{{S_{11}}={a_{1}}+1}^{{r_{1}}-1}}bin({S_{11}}|{\theta _{1}},{n_{1}})Bin({S_{11}}|{\theta _{2}},{n_{1}})\\ {} & [1-Bin(r-{S_{11}}-1|{\theta _{1}},{n_{2}})]\\ {} +& {\sum \limits_{{S_{12}}={a_{1}}+1}^{{r_{1}}-1}}bin({S_{12}}|{\theta _{2}},{n_{1}})Bin({S_{12}}-1|{\theta _{1}},{n_{1}})\\ {} & [1-Bin(r-{S_{12}}-1|{\theta _{2}},{n_{2}})]\}\\ {} & \le {\alpha _{0}}.\end{aligned}\]The probability of claiming Dose 1 to be efficacious under the individual null hypotheses can be written as
\[\begin{aligned}{}& {P_{{H_{01}}}}({S_{11}}\ge {r_{1}})\\ {} +& {P_{{H_{01}}}}({a_{1}}\lt {S_{11}}\lt {r_{1}}\cap {S_{11}}\ge {S_{12}}\cap S\ge r).\end{aligned}\]
As only one dose can be selected to move forward, the chance of Dose 1 being the selected dose can be maximized by letting ${\theta _{2}}=0$. That is, $Bin({S_{11}}|{\theta _{2}},{n_{1}})=1$. Thus the maximum individual type l error for Dose 1 can be written as
\[\begin{aligned}{}& \text{max}[{P_{{H_{01}}}}({S_{11}}\ge {r_{1}})\\ {} & +{P_{{H_{01}}}}({a_{1}}\lt {S_{11}}\lt {r_{1}}\cap {S_{11}}\ge {S_{12}}\cap S\ge r)|\\ {} & {\theta _{1}}\le {\theta _{0}}\hspace{2.5pt}\text{and}\hspace{2.5pt}{\theta _{2}}=0]\\ {} & \lt 1-Bin({r_{1}}-1|{\theta _{1}},{n_{1}})\\ {} & +{\sum \limits_{{S_{11}}={a_{1}}+1}^{{r_{1}}-1}}bin({S_{11}}|{\theta _{1}},{n_{1}})\\ {} & [1-Bin(r-{S_{11}}-1|{\theta _{1}},{n_{2}})]\\ {} & \le {\alpha _{0}}.\end{aligned}\]
Similarly, the probability of claiming Dose 2 is under the individual null hypotheses can be written as
\[\begin{aligned}{}& {P_{{H_{02}}}}({S_{12}}\ge {r_{1}})\\ {} & +{P_{{H_{02}}}}({a_{1}}\lt {S_{12}}\lt {r_{1}}\cap {S_{12}}\gt {S_{11}}\cap S\ge r).\end{aligned}\]
and Dose 2 will have the maximum chance to move to Stage II when ${\theta _{1}}=0$. That is, $Bin({S_{12}}|{\theta _{1}},{n_{1}})=1$. The maximum individual type l error for Dose 2 is
\[\begin{aligned}{}& \text{max}[{P_{{H_{02}}}}({S_{12}}\ge {r_{1}})\\ {} & +{P_{{H_{02}}}}({a_{1}}\lt {S_{12}}\lt {r_{1}}\cap {S_{12}}\gt {S_{11}}\cap S\ge r)|\\ {} & {\theta _{1}}=0\hspace{2.5pt}\text{and}\hspace{2.5pt}{\theta _{2}}\le {\theta _{0}}]\\ {} & \lt 1-Bin({r_{1}}-1|{\theta _{2}},{n_{1}})\\ {} & +{\sum \limits_{{S_{12}}={a_{1}}+1}^{{r_{1}}-1}}bin({S_{12}}|{\theta _{2}},{n_{1}})\\ {} & [1-Bin(r-{S_{12}}-1|{\theta _{2}},{n_{2}})]\\ {} & \le {\alpha _{0}}.\end{aligned}\]
Note when controlling the maximum type I error for each dose, the type I error is strongly controlled.3.4 Power
The overall power for Dose 1 and Dose 2 is calculated as the following:
\[\begin{aligned}{}& 1-\beta ={R_{1}}+{R_{21}}+{R_{22}}|{\theta _{1}}={\theta _{A}}\hspace{2.5pt}\text{or}\hspace{2.5pt}{\theta _{2}}={\theta _{A}}.\end{aligned}\]
In the space of ${\theta _{1}}={\theta _{A}}\hspace{2.5pt}\cup \hspace{2.5pt}{\theta _{2}}={\theta _{A}}$, the following three points in this alternative space will be chosen as the criteria of power: $({\theta _{1}}={\theta _{A}},{\theta _{2}}={\theta _{0}})$ for the worst power of Dose 1, $({\theta _{1}}={\theta _{0}},{\theta _{2}}={\theta _{A}})$ for the worst power of Dose 2, and $({\theta _{1}}={\theta _{A}},{\theta _{2}}={\theta _{A}})$.When $({\theta _{1}}={\theta _{A}},{\theta _{2}}={\theta _{0}})$, the power for Dose 1 reflects the probability of selecting Dose 1 while rejecting the null hypothesis for ${\theta _{1}}$:
When $({\theta _{1}}={\theta _{0}},{\theta _{2}}={\theta _{A}})$, the power for Dose 2 reflects the probability of selecting Dose 2 while rejecting the null hypothesis for ${\theta _{2}}$:
When $({\theta _{1}}={\theta _{A}},{\theta _{2}}={\theta _{A}})$, the power reflects the combined probability of rejecting the null hypothesis for both doses:
Detailed derivation of power can be found in the appendix.
3.5 Probability of Early Termination (PET)
The probability of early termination is the probability that the trial terminated at the first stage, by either claiming at least one dose to be efficacious or dropping both doses for futility. PET can be calculated by
\[\begin{aligned}{}\text{PET}=& {R_{1}}+{A_{1}}\\ {} & =1-Bin({r_{1}}-1|{\theta _{1}},{n_{1}})Bin({r_{1}}-1|{\theta _{2}},{n_{1}})\\ {} & +Bin({a_{1}}-1|{\theta _{1}},{n_{1}})Bin({a_{1}}-1|{\theta _{2}},{n_{1}}).\end{aligned}\]
The derivation of the formula can be found in Appendix C, where PET can be calculated under null, denoted as $PE{T_{1}}$, and alternative ${\theta _{1}}={\theta _{A}}$ and ${\theta _{2}}={\theta _{A}}$, denoted as $PE{T_{2}}$.4 Derivation of the Optimal Design
To control the maximum type I error in the null space $0\le {\theta _{1}}\le {\theta _{0}}$ and $0\le {\theta _{2}}\le {\theta _{0}}$, it is necessary to search the entire null space as the type I error function in 3.1 is not monotone. To enhance the searching speed for the maximum type I error, we divided the search algorithm into two parts. The first part grids the partial sample size and decision rule, ${n_{1}}$ and ${r_{1}}$, in the null space $0\le {\theta _{1}}\le {\theta _{0}}$ and $0\le {\theta _{2}}\le {\theta _{0}}$ by an interval of 0.02. The partial type I error, $1-Bin({r_{1}}|{\theta _{1}},{n_{1}})Bin({r_{1}}|{\theta _{2}},{n_{1}})$, is calculated for ${n_{1}}$ and ${r_{1}}$. Note that ${n_{1}}$ is the sample size of each dose during the first stage, which is usually small, therefore, ${n_{1}}$ is limited to less than 50 subjects. The choice of ${r_{1}}$ is bound to be less or equal to ${n_{1}}$. In addition, only the combinations of ${n_{1}}$ and ${r_{1}}$ that control the maximum partial type I error in the null space are kept for the next part of search that completes the derivation of the full decision rules and sample size. The first step eliminates a significant portion of the ${n_{1}}$ and ${r_{1}}$ combinations and therefore enhances the efficiency of the enumeration process to identify the decision rules in the second step.
The second step of the algorithm iterates over the selected set of $({n_{1}},{r_{1}})$ from the first step to identify the combinations of $({n_{2}},{a_{1}},r)$ that satisfy the constraints: maximum type I error less than α and power greater than $1-\beta $. To ensure the maximum type I error is less than α for each set of decision rules and sample sizes $({n_{1}},{r_{1}},{n_{2}},{a_{1}},r)$, type I errors are calculated over the null space: ${\theta _{10}}\in (0,{\theta _{0}})$ and ${\theta _{20}}\in (0,{\theta _{0}})$ by a step of 0.01. Decision rules with the overall type I error greater than α are eliminated. The selected decision rules require the control of the maximum individual type I errors to be less than α for each individual dose. As a result, the type I errors are strongly controlled.
Designs are not considered adequate if ${a_{1}}$ and ${r_{1}}$ are too close to each other, for example, only 1 response apart. Such designs terminate the study at Stage I either for futility or efficacy without leave much room for the grey zone (the response is between ${a_{1}}$ and ${r_{1}}$), which would warrants further evaluation in Stage II. To ensure that designs don’t make a black-or-white decision but allow the dose to move on to Stage II when responses are in between the two termination criteria, additional rules, such as ${r_{1}}\gt {a_{1}}+2$, are imposed in design selections. In addition, when the sample size assigned to the first stage ${n_{1}}$ is too small relative to the total number of sample size n, such designs will not be adequate for dose-response evaluation in the first stage. Therefore, further restrictions, such as $2{n_{1}}\ge {n_{2}}\ge {n_{1}}/2$, are also placed in the derivation algorithm. For the decision rule in Stage II, r, it is natural to require $r\gt {r_{1}}$.
Each study design will include sample size ${n_{1}}$ and n as well as decision rules ${a_{1}},{r_{1}},r$ and satisfy both the type I error and power constraints. The expected sample size $\text{EN}$ can be calculated as $\text{EN}=2{n_{1}}+(1-\text{PET}){n_{2}}$. Among all decision rules that satisfy the constraints, the decision rule that yields the minimum total sample size will be selected as the minimax design and the decision rule that yields the minimum EN is the optimal design. Figure 3 illustrates the algorithm of decision rules. Here, $\text{EN}$ can be calculated under null or alternative, denoted as ${\text{EN}_{1}}$ and ${\text{EN}_{2}}$, respectively. In addition, an average ${\text{EN}_{Avg}}$ can also be calculated as the average of ${\text{EN}_{1}}$ and ${\text{EN}_{2}}$.
5 Results
This section illustrates the results using the proposed algorithm and the selection of efficient designs that are considered optimal. Various design parameters ${\theta _{0}}$ and ${\theta _{A}}$ are used as examples. The type I error is controlled at 1-sided 0.05 and the required power is 80%. For the optimal design, the average expected sample size ${\text{EN}_{Avg}}$ is used to select the design. The optimal designs are selected from all designs that satisfy the constraints of Type I error and the required power. The corresponding design selected from the algorithm for each set of design parameters are listed in Appendix D.
Table 1
Power the Region of ${\theta _{1}}\ge {\theta _{A}}$ and ${\theta _{2}}\ge {\theta _{A}}$.
Difference | Parameter | Method | n | ${n_{1}}$ | ${n_{2}}$ | ${a_{1}}$ | ${r_{1}}$ | r | $PE{T_{1}}$ | $PE{T_{2}}$ | $PE{T_{Avg}}$ | $E{N_{Avg}}$ |
0.3 | ${\theta _{0}}=0.2$, ${\theta _{A}}=0.5$ | Minimax | 20 | 6 | 8 | 1 | 4 | 7 | 0.46 | 0.58 | 0.52 | 16 |
Optimal | 20 | 6 | 8 | 1 | 4 | 7 | 0.46 | 0.58 | 0.52 | 16 | ||
${\theta _{0}}=0.3$, ${\theta _{A}}=0.6$ | Minimax | 21 | 7 | 7 | 3 | 6 | 8 | 0.77 | 0.38 | 0.57 | 17 | |
Optimal | 21 | 7 | 7 | 3 | 6 | 8 | 0.77 | 0.38 | 0.57 | 17 | ||
${\theta _{0}}=0.4$, ${\theta _{A}}=0.7$ | Minimax | 24 | 7 | 10 | 3 | 6 | 12 | 0.54 | 0.57 | 0.55 | 19 | |
Optimal | 24 | 7 | 10 | 3 | 6 | 12 | 0.54 | 0.57 | 0.55 | 19 | ||
${\theta _{0}}=0.5$, ${\theta _{A}}=0.8$ | Minimax | 22 | 7 | 8 | 4 | 7 | 12 | 0.61 | 0.40 | 0.51 | 18 | |
Optimal | 22 | 7 | 8 | 4 | 7 | 12 | 0.61 | 0.40 | 0.51 | 18 | ||
0.2 | ${\theta _{0}}=0.2$, ${\theta _{A}}=0.4$ | Minimax | 41 | 11 | 19 | 3 | 6 | 11 | 0.73 | 0.52 | 0.62 | 30 |
Optimal | 41 | 11 | 19 | 3 | 6 | 11 | 0.73 | 0.52 | 0.62 | 30 | ||
${\theta _{0}}=0.3$, ${\theta _{A}}=0.5$ | Minimax | 52 | 20 | 12 | 8 | 11 | 16 | 0.82 | 0.72 | 0.77 | 43 | |
Optimal | 54 | 14 | 26 | 5 | 9 | 19 | 0.63 | 0.42 | 0.52 | 29 | ||
${\theta _{0}}=0.4$, ${\theta _{A}}=0.6$ | Minimax | 55 | 21 | 13 | 11 | 14 | 20 | 0.86 | 0.67 | 0.77 | 46 | |
Optimal | 59 | 15 | 29 | 8 | 11 | 24 | 0.84 | 0.54 | 0.69 | 40 | ||
${\theta _{0}}=0.5$, ${\theta _{A}}=0.7$ | Minimax | 53 | 19 | 15 | 12 | 15 | 23 | 0.84 | 0.11 | 0.48 | 46 | |
Optimal | 56 | 15 | 26 | 9 | 12 | 28 | 0.76 | 0.58 | 0.67 | 39 |
Tables 1 shows the results of the optimal and minimax design when powering at ${\theta _{1}}={\theta _{A}}$ and ${\theta _{2}}={\theta _{A}}$, the same targeting anti-tumor activities for both doses, with relative treatment differences ${\theta _{A}}-{\theta _{0}}$ of either 0.3 or 0.2. Table 1 includes the total sample size, n, the sample size of the first stage ${n_{1}}$, the sample size of the second stage ${n_{2}}$, the termination rules ${a_{1}}$ and ${r_{1}}$ for responses, as well as the total responses for claiming efficacy from Stage II, r, the probability of early termination ${\text{PET}_{Avg}}$ at Stage I, and the expected sample size ${\text{EN}_{Avg}}$. The study designs presented in Table 1 power the alternative points at the minimum of 80% and control the type I error at the level of 1-sided 0.05.
Notice that in Table 1, when ${\theta _{A}}-{\theta _{0}}=0.3$, the best design with a minimal total sample size, the minimax design, and the minimal expected sample size, the optimal design, for most design scenarios are the same. For the scenario where ${\theta _{A}}=0.5$ and ${\theta _{0}}=0.2$, the first stage consists of ${n_{1}}=6$ subjects for both minimax and optimal design. If responses are no more than 1 (${a_{1}}=1$) in both doses, indicating that the compound in both doses does not exhibit sufficient anti-tumor activity, then the trial can be terminated for futility. On the other hand, if there are more than 4 (${r_{1}}=4$) responses in one or both doses, the corresponding doses are considered efficacious and the trial can be terminated after the first stage for efficacy. Otherwise, if the response is in between the two thresholds ${a_{1}}$ and ${r_{1}}$, the dose with a better response rate or the lower dose when both doses have the same response rate is moved to the second stage, and the trial continues to accrual more subjects up to ${n_{2}}=8$ in Stage II. The dose is considered efficacious at the end of Stage II if there are total $r=6$ responses from both stages out of ${n_{1}}+{n_{2}}=14$ subjects. With this design, the total sample size is $2{n_{1}}+{n_{2}}=20$ and the average sample size is ${\text{EN}_{Avg}}=16$ with ${\text{PET}_{Avg}}=0.52$.
It is also observed in Table 1 that as expected, when the response differences ${\theta _{A}}-{\theta _{0}}$ become smaller, larger total and expected sample sizes are needed to meet the desired power and type I error. In design scenarios with a low threshold of ${\theta _{0}}$, for example, ${\theta _{0}}=0.2$, the study will terminate for failure in the first stage with relatively lower ${a_{1}}$ and ${r_{1}}$ as opposed to larger ${\theta _{0}}$. Such design makes sense as lower ${\theta _{0}}$ may indicate diseases that have no good treatments, therefore a treatment with a lower response rate ${r_{1}}$ will be acceptable and the study will be terminated for futility if the treatment response is very low (with a lower threshold ${a_{1}}$).
Table 2 shows the results of the minimax and optimal design when powering the regions ${\theta _{1}}={\theta _{A}}$ or ${\theta _{2}}={\theta _{A}}$. In those designs, the studies are powered when either dose or both doses have response rates ${\theta _{A}}$. As a result of this powering strategy, the overall sample size has increased by close to 50% in comparison to the design that powers the alternative ${\theta _{1}}={\theta _{A}}$ and ${\theta _{2}}={\theta _{A}}$. Similar to Table 1 for treatment difference ${\theta _{A}}-{\theta _{0}}=0.3$, the optimal and minimax designs are the same design.
It is interesting to observe from both Tables 1 and 2 that even though the maximum total sample sizes are similar in both optimal and the minimax designs, the expected sample sizes and the decision rules can be very different between the two. In the optimal design, the sample size allocated in the first stage is smaller than the minimax design, and ${a_{1}}$, the threshold to terminate the study to accept the null, is lower than that of the minimax design in general. Because of the lower sample size in the first stage, the threshold of terminating the study for efficacy, ${r_{1}}$, is also lower than that of the minimax design. Overall, when the two designs are different, the minimax designs invest more subjects in the first stage and has a higher probability to terminate under null and some cases under alternative, relative to the optimal design.
Table 2
Power the Region of ${\theta _{1}}\ge {\theta _{A}}$ or ${\theta _{2}}\ge {\theta _{A}}$.
Difference | Parameter | Method | n | ${n_{1}}$ | ${n_{2}}$ | ${a_{1}}$ | ${r_{1}}$ | r | $PE{T_{1}}$ | $PE{T_{2}}$ | $PE{T_{Avg}}$ | $E{N_{Avg}}$ |
0.3 | ${\theta _{0}}=0.2$, ${\theta _{A}}=0.5$ | Minimax | 37 | 10 | 17 | 2 | 6 | 11 | 0.47 | 0.61 | 0.54 | 28 |
Optimal | 37 | 10 | 17 | 2 | 6 | 11 | 0.47 | 0.61 | 0.54 | 28 | ||
${\theta _{0}}=0.3$, ${\theta _{A}}=0.6$ | Minimax | 39 | 12 | 15 | 4 | 8 | 14 | 0.54 | 0.69 | 0.62 | 30 | |
Optimal | 39 | 12 | 15 | 4 | 8 | 14 | 0.54 | 0.69 | 0.62 | 30 | ||
${\theta _{0}}=0.4$, ${\theta _{A}}=0.7$ | Minimax | 42 | 11 | 20 | 5 | 9 | 19 | 0.58 | 0.53 | 0.56 | 31 | |
Optimal | 42 | 11 | 20 | 5 | 9 | 19 | 0.58 | 0.53 | 0.56 | 31 | ||
${\theta _{0}}=0.5$, ${\theta _{A}}=0.8$ | Minimax | 37 | 11 | 15 | 6 | 10 | 19 | 0.54 | 0.54 | 0.54 | 29 | |
Optimal | 37 | 11 | 15 | 6 | 10 | 19 | 0.54 | 0.54 | 0.54 | 29 | ||
0.2 | ${\theta _{0}}=0.2$, ${\theta _{A}}=0.4$ | Minimax | 77 | 25 | 27 | 6 | 10 | 18 | 0.64 | 0.83 | 0.73 | 57 |
Optimal | 77 | 25 | 27 | 6 | 10 | 18 | 0.64 | 0.83 | 0.73 | 57 | ||
${\theta _{0}}=0.3$, ${\theta _{A}}=0.5$ | Minimax | 92 | 27 | 38 | 9 | 14 | 29 | 0.56 | 0.75 | 0.66 | 68 | |
Optimal | 92 | 27 | 38 | 9 | 14 | 29 | 0.56 | 0.75 | 0.66 | 68 | ||
${\theta _{0}}=0.4$, ${\theta _{A}}=0.6$ | Minimax | 97 | 38 | 21 | 18 | 22 | 33 | 0.78 | 0.90 | 0.84 | 80 | |
Optimal | 98 | 32 | 34 | 15 | 20 | 36 | 0.71 | 0.72 | 0.74 | 74 | ||
${\theta _{0}}=0.5$, ${\theta _{A}}=0.7$ | Minimax | 94 | 33 | 28 | 19 | 23 | 40 | 0.76 | 0.85 | 0.80 | 72 | |
Optimal | 95 | 27 | 41 | 15 | 20 | 44 | 0.63 | 0.66 | 0.64 | 69 |
6 Discussion
In the evolving landscape of anti-cancer treatments, such as the emergence of high response rate treatments like cell and immuno-oncology therapies, the need for adaptive trial designs that can dynamically explore dose-response relationships is expanding. Therefore, in this paper, our goal is to propose a two-dose two-stage design, which extends on the popular traditional two-stage design proposed by Simon in 1989, that is more suitable for the modern advancements of anti-cancer treatments and more flexible in exploring the dose-response relationship in early-phase oncology treatment development.
In the proposed method, the flexibility of implementing the decision rules enables the design to accommodate the risk-benefit profile evaluation for two doses. Such adaptability is a key strength of our proposed method, providing a framework that aligns with the comprehensive nature of early-phase oncology therapy development.
In the proposed design, we acknowledge the potential high response rates of modern oncology therapy by incorporating early termination for efficacy in the first stage. This approach enables further sample size reduction and is aligned with the treatment capabilities of recent medicine compared to Simon’s original design, which only included early termination for futility. Moreover, we introduce decision rules in the proposed method which enables our design to be flexible under different therapeutic outcomes when identifying the superior dose that optimizes the balance between therapeutic benefits and safety risks. By controlling type I error rigorously in the algorithm, our design also provides high confidence for the anti-tumor activities of the superior dose that the design identifies for the next phases of development. The numerical examples also illustrate the flexibility of our design in its application under varying scenarios.
While our method primarily centers around a single efficacy endpoint, ORR, we acknowledge that the evaluation of anti-tumor activities often encompasses a broader spectrum of metrics. Evidence for anti-tumor activities may involve multiple metrics such as the overall and partial response, response durability, overall survival, and relevant biomarkers. However, at the design stage, it is important to focus on a single parameter, which is different from the totality of evidence for treatment evaluation.
We recognize the limitation that the proposed method only selects one dose from the first stage among the two doses evaluated. In situations where there are no clear differences between the two doses in risk-benefit profiles in all available data, the lower dose is recommended moving forward to the next stage. In such situations, there might be reasons to expect a higher efficacy from the higher doses so that both doses will be moved to the next stage. Further research will be required for such study designs so that the optimal sample size can be derived by controlling the type I error and achieving adequate power.
Moreover, although it is not common to establish quantitative targeting criteria for safety performance except for certain well-characterized safety endpoints such as dose-limiting toxicities at the design stage, the safety profiles such as serious adverse events, the pharmacokinetic parameters, and a comprehensive evaluation of the frequency, severity, duration, and the manageability of adverse events are also important factors to consider for dose selection. The benefit and risk ratio should also assessed to ensure that the gain in efficacy for an increased dose will result in no substantial loss in safety.