1 Introduction
Phase I clinical trials are first-in-human studies evaluating the toxicity profile of a new treatment. Given a series of candidate dose levels, the goal of a phase I trial in oncology is to determine the maximum tolerated dose (MTD), defined as the highest dose having a dose-limiting toxicity (DLT) probability close to or not higher than a target toxicity probability, say ${p_{T}}=30\% $. In most dose-finding studies, a DLT is typically defined as the occurrence of a grade 3 or higher toxicity event according to the Common Terminology Criterion for Adverse Events (CTCAE) by the National Cancer Institute [1]. Toxicity events of lower grades are called moderate toxicities and are often not modeled in oncology dose-finding designs. Instead, most dose-finding designs consider DLT as the only endpoint, including the 3+3 design [2], the continual reassessment method [3], the interval-based designs such as mTPI and mTPI-2 [4, 5], the BOIN and keyboard designs [6, 7], and the i3+3 design [8].
Recently, the FDA Oncology Center of Excellence launched an initiative, named “Project Optimus” FDA (2022) [9], to improve the dose optimization and dose selection paradigm in oncology drug development as the conventional goal of identifying the MTD is no longer applicable for modern molecular targeted agents and immunotherapies. Specifically, these modern agents do not necessarily exhibit monotone dose-response relationships, rendering MTD a potentially sub-optimal dose for patient care. Moreover, emerging evidence shows that these new treatments often induce moderate adverse events rather than DLTs [10, 11]. As noted in Shah et al. (2021), it is important to evaluate the negative health impact of different adverse toxicity events, including both DLT and lower-grade toxicity events [12]. For example, patients who experienced a large number of moderate toxicity events may suffer from a comparable toxicity burden as patients with DLTs but minimum moderate toxicity.
To address this challenge, several statistical approaches have been proposed to incorporate a more comprehensive measure of a patient’s toxicity burden as the endpoint. Bekele and Thall (2004) first introduced the concept of total toxicity burden, which is the sum of severity weights of all toxicities experienced by a patient [13]. Yuan et al. (2007) developed a quasi-likelihood CRM approach based on equivalent toxicity scores by converting the toxicity grades into a single outcome [14]. Lee et al. (2009) proposed an alternative measure, the toxicity burden score (TBS), which is estimated by fitting linear mixed-effect models using historical data [15]. Later, Lee et al. (2011) developed the continual reassessment method with multiple constraints (CRM-MC), which allows for the specification of various toxicity thresholds with a continuous or ordinal toxicity measure such as the TBS [16]. Van Meter et al. (2012) extended the CRM to incorporate toxicity severity using proportional odds models [17]. Ezzalfani et al. (2013) introduced the total toxicity profile, defined as the Euclidean norm of the weights of toxicities experienced by a patient, to summarize the overall severity of multiple types and grades of toxicities [18]. More recently, Mu et al. (2019) proposed a generalized Bayesian optimal interval design (gBOIN) that extended the BOIN design to account for toxicity grades, binary or continuous toxicity endpoints under a unified framework [19]. All of the methods aforementioned rely on a model-based or model-assisted inference for dose recommendation.
The concept of toxicity burden has also been implemented beyond phase I dose-finding trials. Hobbs et al. (2015) [20] proposed a Bayesian group sequential design using the total toxicity burden and progression-free survival as co-primary endpoints. A randomized Phase II trials (NCT01512589) has been completed using the design [21, 22].
Motivated by these early developments and the use of interval designs such as i3+3 [8], we propose a model-free design called Ti3+3 that also uses toxicity burden to summarize patients’ toxicity profile but adopts a simple dose-finding algorithm based on toxicity burden interval (TBI). Utilizing TBI, the Ti3+3 design greatly simplifies the application in practice.
The remainder of the paper is organized as follows. In Section 2, we define the toxicity burden and describe details the proposed Ti3+3 design, including the dose-finding algorithm and MTD selection criteria. In Section 3, we perform simulations to compare the performance of the proposed method with an existing design and present the simulation results as well as a sensitivity analysis in Section 4. We end the paper with a discussion in Section 5.
2 Methods
2.1 Toxicity Burden
In order to comprehensively evaluate the effects of toxicities to patients, Bekele and Thall (2004) [13] first proposed the toxicity burden as a weighted sum of individual toxicity types and grades, where the severity weights reflect the relative health impact of each grade and type of toxicity. The proposed Ti3+3 design utilizes type-specific and overall toxicity burdens similar to Bekele and Thall (2004), which is described next.
Let $d\in \{1,\dots ,T\}$ denote a set of ascending doses explored in a phase I trial. Assume multiple types of treatment-related toxicity $j\in \{1,\dots ,J\}$ are observed for patients, and each type of toxicity is classified into toxicity grades $k\in \{0,\dots ,K\}$ using a standard reference, such as the CTCAE [1]. For example, neurotoxicity and GI toxicity are two different types of adverse events, each consisting of five grades with grade 0 denoting no toxicity, 1-2 moderate toxicity, 3-4 severe toxicity, and 5 death. Toxicities with grade 5, corresponding to treatment-related death, require trial suspension and direct intervention from the safety committee and therefore are not considered in the proposed design.
Specifically, let ${Y_{ij}}\in \{0,\dots ,K\}$ denote the observed toxicity grade of type j for patient i; and let $\{{X_{i}}=d\}$ denote the event that patient i is treated at dose d. Denote ${p_{djk}}=Pr({Y_{ij}}=k|{X_{i}}=d)$ the toxicity probability of grade k for type j at dose d, and apparently ${\textstyle\sum _{k=0}^{K}}{p_{djk}}=1$, $0\le {p_{djk}}\le 1$. The proposed Ti3+3 design relies on a weight matrix $\boldsymbol{W}$ that is elicited through consultation with clinicians. Let $\boldsymbol{W}=\{{w_{jk}}\}$ be a standardized matrix of weights,
\[ \boldsymbol{W}=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}{w_{10}}& \dots & {w_{1K}}\\ {} \vdots & \ddots & \vdots \\ {} {w_{J0}}& \dots & {w_{JK}}\end{array}\right),\]
where ${\textstyle\sum _{j,k}}{w_{jk}}=1$. Here ${w_{jk}}\ge 0$ quantifies the relative health impact assigned by physicians for the toxicity event of type j and grade k. Denote the row sum ${c_{j}}={\textstyle\sum _{k=0}^{K}}{w_{jk}}$. The magnitudes of $\{{c_{j}}\}$’s reflect the relative severity of different types of toxicity events; within each type j, the magnitudes of $\{{w_{jk}}\}$’s reflect the relative average severity of different grades of toxicity events. These interpretations are exploited in the definitions of the toxicity burdens. In Appendix A, we describe an algorithm to guide the elicitation of the $\boldsymbol{W}$ matrix by statisticians and clinicians. To reflect the belief that higher grade of a type of toxicity is more impactful to a patient’s health, we assume the monotonicity $0={w_{j0}}\lt \cdots \lt {w_{jK}}$ for any type j, which implies an increasing toxicity burden for high toxicity grades.Next, we define a type-specific toxicity burden for toxicity type j at dose d as
where $T{B_{d}^{j}}\in (0,1)$ is analogous to the toxicity probability in DLT-based dose-finding trials. To see this, $T{B_{d}^{j}}$ in (2.1) is a weighted sum of the type-specific toxicity probabilities of different grades, with the weight equal to 1 for the highest grade K, and $\frac{{w_{jk}}}{{w_{jK}}}$ for grade $k\lt K$. Since ${w_{jk}}\lt {w_{jK}}$ for $k\lt K$, $\frac{{w_{jk}}}{{w_{jK}}}\lt 1$. Therefore, (2.1) implies that $T{B_{d}^{j}}$ is a re-scaled probability of toxicity of the highest grade K for type j at dose d, converting all the lower grade toxicities to the highest grade by a weight factor of $\frac{{w_{jk}}}{{w_{jK}}}$. Note that $T{B_{d}^{j}}$ is a parameter. To estimate it, we consider the following statistics, ${\widehat{TB}_{id}^{j}}$, the observed type-specific toxicity burden for patient i treated at dose d, given by
where $\boldsymbol{I}(\cdot )$ is an indicator function and
(2.2)
\[ {\widehat{TB}_{id}^{j}}={\sum \limits_{k=0}^{K}}\frac{{w_{jk}}}{{w_{jK}}}\boldsymbol{I}({Y_{ij}}=k)\boldsymbol{I}({X_{i}}=d),\]Here ${\widehat{TB}_{id}^{j}}$ is based on observed data $\{{Y_{ij}},{X_{i}}\}$. Since a patient may experience multiple adverse events associated with multiple types and grades of toxicity, we only use the most severe grade of each type in defining the toxicity burdens. In other words, grade “k” in (2.2) is the most severe toxicity grade among all the toxicity events of type j for patient i. In addition, for patient i,
It is trivial to show that ${\widehat{TB}_{d}^{j}}$ is unbiased, i.e., $E({\widehat{TB}_{d}^{j}})=T{B_{d}^{j}}$, assuming ${w_{jk}}$ is given and fixed.
\[{\widehat{TB}_{id}}={\displaystyle\sum \limits_{j=1}^{J}}{c_{j}}{\displaystyle\sum \limits_{k=0}^{K}}\displaystyle\frac{{w_{jk}}}{{w_{jK}}}\boldsymbol{I}({Y_{ij}}=k)\boldsymbol{I}({X_{i}}=d)\]
is the observed toxicity burden for patient i. Assuming a total of ${n_{d}}$ patients have been treated at the dose d, then $T{B_{d}^{j}}$ can be estimated by
(2.3)
\[ {\widehat{TB}_{d}^{j}}=\frac{1}{{n_{d}}}{\sum \limits_{i=1}^{{n_{d}}}}{\widehat{TB}_{id}^{j}}.\]Finally, given the type-specific toxicity burdens $T{B_{d}^{j}}$, the overall toxicity burden at dose d can be defined as
where ${c_{j}}={\textstyle\sum _{k=0}^{K}}{w_{jk}}$ are constants that reflect the relative severity between toxicity types. Apparently, ${\textstyle\sum _{j}}{c_{j}}={\textstyle\sum _{j,k}}{w_{jk}}=1$. Similarly, the observed overall toxicity burden at dose d is calculated as follows
(2.4)
\[ T{B_{d}}={\sum \limits_{j=1}^{J}}{c_{j}}T{B_{d}^{j}}={\sum \limits_{j=1}^{J}}{c_{j}}{\sum \limits_{k=0}^{K}}\frac{{w_{jk}}}{{w_{jK}}}{p_{djk}},\]An interval dose-finding design like i3+3 needs to specify the target toxicity probability (of DLTs), ${p_{T}}$, and an equivalence interval ($EI$) to facilitate the dose-finding decisions. Similarly, the proposed Ti3+3 design needs a target toxicity burden ($TTB$) and an associated $EI$ for $TTB$. To start, we define a type-specific $TT{B^{j}}$ for toxicity type j. The $TT{B^{j}}$ can be viewed as the toxicity target for the MTD if only toxicity events of type j are considered as outcome; similarly, the $E{I^{j}}$ is the equivalence interval for the MTD when only type j toxicity events are modeled. For example, assume that there are two types of toxicities $J=2$, and the targets for type-specific toxicity burden could be set to $TT{B^{1}}=0.3$ and $TT{B^{2}}=0.25$. In addition, denote $E{I^{j}}=(TT{B^{j}}-{\epsilon _{1}^{j}},TT{B^{j}}+{\epsilon _{2}^{j}})$ the equivalence interval of $TT{B^{j}}$ for toxicity type j, which is an interval centered around the $TT{B^{j}}$. The upper bound and lower bound of the $E{I^{j}}$ reflect the highest and lowest value of toxicity burden that the clinicians would consider to be acceptable for MTD if only type j events are considered. Similar to how the definition of the DLT is tailored case-by-case for conventional dose-finding trials, these values should be elicited with the clinical team based on the specific context of a trial. Given the specified $TT{B^{j}}$ and Equation 2.4, an overall target toxicity burden, denoted as $TTB$, is defined as a weighted sum of $TT{B^{j}}$ by
And similarly, the equivalence interval for the overall toxicity burden, $EI$, is derived as a weighted average of $E{I^{j}}$ with weights ${c_{j}}$. That is, $EI={\textstyle\sum _{j=1}^{J}}{c_{j}}E{I^{j}}$.
In summary, the main effort in applying the proposed Ti3+3 is in the initial setup, which requires the specification of the weight matrix, $\boldsymbol{W}$, the target toxicity burden, $TT{B^{j}}$, and the equivalence interval, $E{I^{j}}$, for each toxicity type j. Once they are determined, dose finding proceeds based on a simple algorithm next.
2.2 Dose-Finding Algorithm
Assume patients are assigned sequentially in cohorts, starting with the lowest dose. The next cohort of patients will not be enrolled until toxicity outcomes have been observed for the present cohort. Suppose dose d is the current dose used to treat patients and ${n_{d}}$ patients have been treated at the dose, Ti3+3 extends the dose-finding algorithm in the i3+3 design to accommodate the new toxicity burden endpoints. The dose-finding algorithm of Ti3+3 are first applied to toxicity type j to generate a type-specific decision, denoted as ${\mathcal{A}^{j}}\in \{“E\text{''},“S\text{''},“D\text{''}\}$, where “E”, “S”, and “D” denote “Escalation”, “Stay”, and “De-escalation”, respectively, and to the overall toxicity burden to generate ${\mathcal{A}^{0}}\in \{“E\text{''},“S\text{''},“D\text{''}\}$. The next cohort of patients is assigned to the minimum dose level indicated by decisions ${\mathcal{A}^{j}}$’s and ${\mathcal{A}^{0}}$.
Below we introduce the dose-finding algorithm.
Algorithm 1 requires computing a new quantity ${\widehat{TB}_{d,-1}^{j}}$, which is defined as a hypothetical observed toxicity burden assuming the patient in the cohort who experienced the lowest toxicity burden would experience no toxicity at all. In other words, if we remove all the toxicity events from the patient with the lowest burden from the data at dose d, ${\widehat{TB}_{d,-1}^{j}}$ is the new observed toxicity burden for type j. This idea is similar to that of the i3+3 design, the difference being that i3+3 only considers DLT while Ti3+3 considers different grades and types of toxicity.
Table 1 summarizes the decisions of Ti3+3. The type-specific decision ${\mathcal{B}^{j}}({\mathcal{A}^{j}})$ for each toxicity type j (or overall burden if $j=0$) is listed, and the final decision is to assign the next cohort to dose $(d+{\min _{j}}\{{\mathcal{B}^{j}}\})$. Similar to i3+3, in Ti3+3, there is a special rule. When ${\widehat{TB}_{d}^{j}}\gt E{I^{j}}$ and ${\widehat{TB}_{d,-1}^{j}}\lt E{I^{j}}$, it indicates that removing the toxicity events from a single patient in the observed data renders the observed toxicity burden from being above the equivalence interval to below the interval. In other words, changing one-patient worth of information of the data would result in a reversal of the decision from de-escalation (since ${\widehat{TB}_{d}^{j}}\gt E{I^{j}}$) to escalation since (${\widehat{TB}_{d,-1}^{j}}\lt E{I^{j}}$). This implies that the information in the observed data is sparse and small change of the data results in reversal of decisions. Therefore, Ti3+3, in this case, does not de-escalate due to lack of confidence in the data, and instead, continues to treat patients at the current dose, i.e., “S” stay. The other rules in Table 1 are straightforward, following the idea of de-escalation if observed toxicity burden is above the $EI$, stay if inside the $EI$, or escalation if below the $EI$.
Table 1
The dose-finding algorithm of the Ti3+3 design.
Condition | Decision ${\mathcal{B}^{j}}$ (${\mathcal{A}^{j}}$) |
${\widehat{TB}_{d}^{j}}\lt E{I^{j}}$ | $1\hspace{2.5pt}({E^{\ast }})$ |
${\widehat{TB}_{d}^{j}}\in E{I^{j}}$ | $0\hspace{2.5pt}(S)$ |
${\widehat{TB}_{d}^{j}}\gt E{I^{j}}$ & ${\widehat{TB}_{d,-1}^{j}}\lt E{I^{j}}$ | $0\hspace{2.5pt}(S)$ |
${\widehat{TB}_{d}^{j}}\gt E{I^{j}}$ & ${\widehat{TB}_{d,-1}^{j}}\in E{I^{j}}$ | $-1\hspace{2.5pt}({D^{\ast }})$ |
${\widehat{TB}_{d}^{j}}\gt E{I^{j}}$ & ${\widehat{TB}_{d,-1}^{j}}\gt E{I^{j}}$ | $-1\hspace{2.5pt}({D^{\ast }})$ |
Final Decision: | $d+{\min _{j}}\{{\mathcal{B}^{j}}\}$ |
∗: when d is the highest dose ($d=T$) or the lowest dose ($d=1$), the decisions D and E should be replaced by S accordingly.
In Table 2, we provide three examples to illustrate the proposed algorithm. Suppose there are two types of toxicities, and a pre-determined standardized weight matrix is given below
\[ \boldsymbol{W}=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c@{\hskip10.0pt}c@{\hskip10.0pt}c}0& 0.03& 0.11& 0.17& 0.42\\ {} 0& 0.03& 0.03& 0.07& 0.14\end{array}\right).\]
Moreover, assume $TT{B^{1}}=TT{B^{2}}=TTB=0.30$ and the $E{I^{1}}=E{I^{2}}=EI=(0.25,0.33)$. Based on the observed toxicity types and grades from patients, ${\widehat{TB}_{id}}$ and ${\widehat{TB}_{d}}$ can be calculated based on (2.5). In case 1, ${n_{d}}=3$ patients are treated at the current dose d, and ${\widehat{TB}_{d}}=0.34$, which is greater than the upper bound 0.33. However, since ${\widehat{TB}_{d,-1}}=0.24$ falls below the $EI$, according to the Ti3+3 algorithm, the decision is, ${\mathcal{A}^{0}}=“S\text{''}$, stay at the current dose. Therefore, even though ${\widehat{TB}_{d}}=0.34$ is above the EI, the decision is to enroll more patients at the same dose d since ${\widehat{TB}_{d,-1}}$ is below $EI$. We see that there are only three patients at dose d and the data is sparse. In the second case, ${n_{d}}=5$ and ${\widehat{TB}_{d}}=0.34$, which is greater than 0.33. And ${\widehat{TB}_{d,-1}}=0.28$ falls inside of the $EI$, therefore, ${\mathcal{A}^{0}}=“D\text{''}$, de-escalate to the next lower dose $(d-1)$. In the third case, the two quantities ${\widehat{TB}_{d}}$, ${\widehat{TB}_{d,-1}}$ are the same and below the EI, and the decision is then ${\mathcal{A}^{0}}=“E\text{''}$, escalate to the dose $(d+1)$. The examples demonstrate the simplicity of the proposed decision rules using the overall toxicity burden. Algorithm 1 applies these rules to each toxicity type j as well.Table 2
Three hypothetical cases to illustrate the dose-finding decisions of the Ti3+3 design for a trial with two toxicity types and five grades. Notation: 1) Tox data: $({k_{1}},{k_{2}})$ if the patient experiences grade ${k_{1}}$ of the first type of toxicity and grade ${k_{2}}$ of the second type of toxicity; 2) ${\widehat{TB}_{id}}$: observed toxicity burden for patient i who is treated on dose d; 3) ${\widehat{TB}_{d}}$:observed toxicity burden for the current dose d; 4) ${\widehat{TB}_{d,-1}}$: ${\widehat{TB}_{d}}$ calculated assuming the patient with the lowest ${\widehat{TB}_{id}}$ experienced no toxicity. Suppose $TTB=0.30$, and the $EI$ is $(0.25,0.33)$. The patient with the lowest ${\widehat{TB}_{id}}$ in each case is bolded.
Case # | Patient # | Tox data | ${\widehat{TB}_{id}}$ | ${\widehat{TB}_{d}}$ | ${\widehat{TB}_{d,-1}}$ | ${\mathcal{A}^{0}}$ |
$\mathbf{1}$ | $\boldsymbol{(}\mathbf{2}\mathbf{,}\mathbf{3}\boldsymbol{)}$ | $\mathbf{0}\mathbf{.}\mathbf{33}$ | ||||
1 | 2 | (3, 2) | 0.35 | 0.34 | 0.23 | $\hspace{2.5pt}S$ |
3 | (3, 2) | 0.35 | $0.34\gt EI$ | $0.23\lt EI$ | ||
1 | (2, 2) | 0.25 | ||||
$\mathbf{2}$ | $\boldsymbol{(}\mathbf{2}\mathbf{,}\mathbf{1}\boldsymbol{)}$ | $\mathbf{0}\mathbf{.}\mathbf{25}$ | ||||
2 | 3 | (2, 3) | 0.33 | 0.34 | 0.28 | $\hspace{2.5pt}D$ |
4 | (3, 0) | 0.30 | $0.34\gt EI$ | $0.28\in EI$ | ||
5 | (3, 4) | 0.56 | ||||
$\mathbf{1}$ | $\boldsymbol{(}\mathbf{0}\mathbf{,}\mathbf{0}\boldsymbol{)}$ | $\mathbf{0}\mathbf{.}\mathbf{00}$ | ||||
3 | 2 | (1, 1) | 0.11 | 0.07 | 0.07 | $\hspace{2.5pt}E$ |
3 | (1, 2) | 0.11 | $0.07\lt EI$ | $0.07\lt EI$ |
In addition, the Ti3+3 design consists of a few safety rules for practical and ethical concerns. Again, these safety rules are applied iteratively to each type-specific and overall toxicity burdens.
-
• Safety rule 1 (early termination): At any moment during the trial, if ${n_{1}}\ge 3$, and $Pr(T{B_{1}^{j}}\gt TT{B^{j}}|\text{data})\gt 0.95$ or $Pr(T{B_{1}}\gt TTB|\text{data})\gt 0.95$, terminate the trial due to excessive toxicity. This rule stops the trial whenever the lowest dose $(d=1)$ is deemed overly toxic.
-
• Safety rule 2 (dose exclusion): At any moment during the trial, suppose the current dose is d. If ${n_{d}}\ge 3$, and $Pr(T{B_{d}^{j}}\gt TT{B^{j}}|$data$)\gt 0.95$ or $Pr(T{B_{d}}\gt TTB|$data$)\gt 0.95$, remove dose d and higher doses from the trial. In other words, if a sufficient number (${n_{d}}\ge 3$) of patients has been treated at a dose d, and their outcomes suggest that dose d is deemed overly toxic, dose d and higher doses are excluded from the trial. Any future escalation to dose d will be changed to “S”, stay.
The calculation of the posterior distributions $Pr(T{B_{d}^{j}}\gt TT{B^{j}}|\text{data})$ and $Pr(T{B_{d}}\gt TTB|\text{data})$ are discussed below.
2.3 A working model and MTD selection
Once all patients finish their followup at the end of a trial, the Ti3+3 design first selects the type-specific MTD, denoted as ${d_{j}^{\ast }}$, and the MTD based on the overall toxicity burden, denoted as ${d_{0}^{\ast }}$.
Next, we propose a working statistical model to calculate the posterior probabilities $Pr(T{B_{d}^{j}}\gt TT{B^{j}}|\text{data})$ and $Pr(T{B_{d}}\gt TTB|\text{data})$ and select the MTD based on the observed data. Recall ${p_{djk}}$ denote the probability of toxicity grade k for type j at dose d. For a given dose d, assume different types of toxicities are independent. Let ${\boldsymbol{p}_{\boldsymbol{d}\boldsymbol{j}}}=\{{p_{dj0}}.\dots ,{p_{djK}}\}$ represent the vector of the probabilities associated with different toxicity grades for type j at dose d, and ${\boldsymbol{y}_{\boldsymbol{d}\boldsymbol{j}}}=\{{y_{dj0}}.\dots ,{y_{djK}}\}$ the vector of patient counts across different grades, i.e., ${y_{djk}}={\textstyle\sum _{i}}\boldsymbol{I}({Y_{ij}}=k)\boldsymbol{I}({X_{i}}=d)$. Then ${\boldsymbol{y}_{\boldsymbol{d}\boldsymbol{j}}}$ is assumed to follow a the multinomial sampling distribution given by
where for any j, ${\textstyle\sum _{k=0}^{K}}{p_{djk}}=1$, ${p_{djk}}\in (0,1)$, and ${\textstyle\sum _{k=0}^{K}}{y_{djk}}={n_{d}}$. Assume a conjugate Dirichlet prior distribution of ${\boldsymbol{p}_{\boldsymbol{d}\boldsymbol{j}}}$, i.e.,
where $\boldsymbol{\alpha }=({\alpha _{0}},\dots ,{\alpha _{K}})$ are positive values. We set ${\alpha _{0}}=\cdots ={\alpha _{K}}=0.1$ for different toxicity types across doses. Following Morita et al. (2011) [23], since ${\textstyle\sum _{k=0}^{4}}{\alpha _{k}}=0.5$, a small value, the Dirichlet prior in (2.8) is deemed vague and has little impact on the posterior distribution given by
Therefore, the posterior of $T{B_{d}^{j}}$ and $T{B_{d}}$ can be computed numerically by sampling ${f_{j}}({\boldsymbol{p}_{\boldsymbol{d}\boldsymbol{j}}}\mathbf{|}{\boldsymbol{y}_{\boldsymbol{d}\boldsymbol{j}}})$ in 2.9. To calculate the posterior probabilities used in safety rules, we have
and
where ${p_{djk}^{(s)}}$ is the s-th random draw from ${f_{j}}({\boldsymbol{p}_{\boldsymbol{d}\boldsymbol{j}}}\mathbf{|}{\boldsymbol{y}_{\boldsymbol{d}\boldsymbol{j}}})$.
(2.7)
\[ {\boldsymbol{y}_{\boldsymbol{d}\boldsymbol{j}}}|{\boldsymbol{p}_{\boldsymbol{d}\boldsymbol{j}}}\sim \text{Multinomial}({\boldsymbol{p}_{\boldsymbol{d}\boldsymbol{j}}},{n_{d}}),\](2.8)
\[ {\boldsymbol{p}_{\boldsymbol{d}\boldsymbol{j}}}\sim \text{Dirichlet}(\boldsymbol{\alpha }),\](2.9)
\[ {\boldsymbol{p}_{\boldsymbol{d}\boldsymbol{j}}}|{\boldsymbol{y}_{\boldsymbol{d}\boldsymbol{j}}}\sim \text{Dirichlet}({y_{dj0}}+{\alpha _{0}},\dots ,{y_{djK}}+{\alpha _{K}}).\](2.10)
\[ Pr(T{B_{d}^{j}}\gt TT{B^{j}}|\text{data})\approx \frac{1}{S}{\sum \limits_{s=1}^{S}}\boldsymbol{I}({\sum \limits_{k=0}^{K}}\frac{{w_{jk}}}{{w_{jK}}}{p_{djk}^{(s)}}\gt TT{B^{j}}),\](2.11)
\[ Pr(T{B_{d}}\gt TTB|\text{data})\approx \frac{1}{S}{\sum \limits_{s=1}^{S}}\boldsymbol{I}({\sum \limits_{j=1}^{J}}{c_{j}}{\sum \limits_{k=0}^{K}}\frac{{w_{jk}}}{{w_{jK}}}{p_{djk}^{(s)}}\gt TTB),\]To impose monotonicity assumption of dose-toxicity relationship, we apply isotonic regression to the posterior means of $T{B_{d}^{j}}$ and $T{B_{d}}$ via the pool adjacent violators algorithm [24]. Let ${\widetilde{TB}_{d}^{j}}$ and ${\widetilde{TB}_{d}}$ be the isotonic transformed posterior means for all dose levels. Among all the tried doses (${n_{d}}\gt 0$) for which satisfy the safety rules, the estimated MTDs based on the type-specific and overall toxicity burden are defined as
If more than one dose of ${d_{j}^{\ast }}$ exists, only one dose is selected based on following rules:
3 Simulations
3.1 Comparison with CRM-MC
We simulate clinical trials to assess the operating characteristics of the Ti3+3 design. We first compare to the CRM-MC method by Lee et al. (2011) [16]. The Ti3+3 design requires close collaboration between statisticians and clinicians to define the numerical weights $\boldsymbol{W}$ and the $EI$’s at the study design stage. For simulation purpose, we adopt the general setting of the bortezomib trial described in Lee et al. (2011) [16]. Under the bortezomib trial, two main types $(J=2)$ of toxicities are identified as related to the treatment. The first type $(j=1)$ is neuropathy and the second $(j=2)$ is low platelet count. The standardized matrix of weights $\boldsymbol{W}$, provided in Lee et al. (2011), is given by:
\[ \boldsymbol{W}=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c@{\hskip10.0pt}c@{\hskip10.0pt}c}0& 0.03& 0.11& 0.17& 0.42\\ {} 0& 0.03& 0.03& 0.07& 0.14\end{array}\right).\]
The first and second rows correspond to the weights for grade 0 to grade 4 of each type, neuropathy and low platelet count, respectively. The DLT is defined as the occurrence of a grade 3 or 4 neuropathy or a grade 4 low platelet count. Given this matrix and the definition of $TB$, a patient has a ${\widehat{TB}_{id}}$ value 0.30 for a grade 3 neuropathy or 0.27 for a grade 4 low platelet count. And a patient experiencing a grade 3 neuropathy has similar ${\widehat{TB}_{id}}=0.30$ as a patient experiencing a grade 2 neuropathy plus a grade 3 low platelet count ${\widehat{TB}_{id}}=0.31$. These severity weights reflect the relative clinical importance between different toxicity types and grades, and less severe toxicities are given lower weights. In addition, ${c_{1}}={\textstyle\sum _{k=0}^{4}}{w_{1k}}=0.73$ and ${c_{2}}={\textstyle\sum _{k=0}^{4}}{w_{2k}}=0.27$, which implies that overall neuropathy is a much more severe type of toxicity than low platelet count.To implement the Ti3+3 design, the $TT{B^{1}}$, $TT{B^{2}}$, and $TTB$ are set at 0.30, which suggests (by comparing to W) that the occurrence of a grade 3 or higher neuropathy, or a grade 4 low platelet count are considered clinically unacceptable. Moreover, the equivalence intervals used are $E{I^{1}}=E{I^{2}}=EI=(0.25,0.35)$. For the CRM-MC method, the toxicity burden of patient i is calculated as ${T_{i}}=0.03\boldsymbol{I}({Y_{i1}}=1)+0.11\boldsymbol{I}({Y_{i1}}=2)+0.17\boldsymbol{I}({Y_{i1}}=3)+0.42\boldsymbol{I}({Y_{i1}}=4)+0.03\boldsymbol{I}({Y_{i2}}=1,2)+0.07\boldsymbol{I}({Y_{i2}}=3)+0.14\boldsymbol{I}({Y_{i2}}=4)$, where ${Y_{i1}}$ and ${Y_{i2}}$ are the grades of toxicity type 1 and 2, respectively, for the patient. CRM-MC applies primary and secondary constraints $Pr({T_{i}}\ge 0.25|d)\le 0.10$ and $Pr({T_{i}}\ge 0.17|d)\le 0.25$, respectively, to decide the dose for next patients. The vector of scaled doses is obtained via backward substitution with dose 3 as the prior guess of MTD, 0.08 as the indifference interval parameter, 0.69 as the prior median of the slope parameter, and a probit model with an intercept equal to 3. The prior distributions of the probit model slope parameter β and thresholds $({\gamma _{l}}-{\gamma _{l-1}})$ follow independent exponential distributions with mean 1, as suggested by the authors. Refer to Lee et al. (2011) for more details.
Following the original bortezomib trial, Lee et al. (2011) implemented a simulation with a sample size 18, cohort size 1, and a starting dose at level 3. We consider a more common simulation setup and apply the following parameters to both designs. The sample size is fixed at 30 with a total of 10 cohorts. A total of five dose levels are investigated, and both designs, Ti3+3 and CRM-MC will start at the lowest dose level. We construct a total of eight scenarios, including the first four scenarios from Lee et al. (2011) and additional four scenarios. According to Ti3+3, the true MTD is defined as the highest dose with the $T{B_{d}^{j}}$ falls below or inside the equivalence interval of $TT{B^{j}}$ for all type-specific and overall toxicity burdens, that is, the dose with $T{B_{d}^{j}}\le 0.35$ for $j=1,2$ and $T{B_{d}}\le 0.35$. Different from Ti3+3, the CRM-MC design selects MTD as the highest dose that satisfies the primary and secondary toxicity constraints. Therefore, CRM-MC and Ti3+3 sometimes would consider different doses in the same scenario as the true MTD, for example, in scenario 1, Ti3+3 considers dose level 3 as the true MTD while based on CRM-MC, dose level 2 is the true MTD. A full description of all dosing scenarios is provided in Appendix B. For each scenario, we simulate 1,000 trials.
Table 3
Performance of the Ti3+3 design compared with CRM-MC. The $TT{B^{1}}=TT{B^{2}}=TTB$ is 0.30, and the $E{I^{1}}=E{I^{2}}=EI$ is $(0.25,0.35)$. The MTD in each scenario is in bold.
Selection % | Allocation % | ||||||||||
1 | 2 | 3 | 4 | 5 | 1 | 2 | 3 | 4 | 5 | ||
Scenario 1 | |||||||||||
$T{B^{1}}$ | 0.10 | 0.21 | 0.28 | 0.34 | 0.42 | ||||||
$T{B^{2}}$ | 0.11 | 0.23 | 0.34 | 0.46 | 0.59 | ||||||
TB | 0.11 | 0.22 | 0.30 | 0.37 | 0.46 | ||||||
CRM-MC | 0.13 | 0.71 | 0.15 | 0.01 | 0.00 | 0.25 | 0.53 | 0.18 | 0.03 | 0.00 | |
Ti3+3 | 0.05 | 0.58 | 0.35 | 0.02 | 0.00 | 0.23 | 0.49 | 0.24 | 0.04 | 0.00 | |
Scenario 2 | |||||||||||
$T{B^{1}}$ | 0.10 | 0.11 | 0.21 | 0.29 | 0.39 | ||||||
$T{B^{2}}$ | 0.08 | 0.11 | 0.23 | 0.46 | 0.57 | ||||||
TB | 0.10 | 0.11 | 0.22 | 0.34 | 0.44 | ||||||
CRM-MC | 0.00 | 0.15 | 0.74 | 0.10 | 0.00 | 0.14 | 0.24 | 0.45 | 0.15 | 0.02 | |
Ti3+3 | 0.00 | 0.02 | 0.80 | 0.18 | 0.00 | 0.11 | 0.20 | 0.48 | 0.19 | 0.01 | |
Scenario 3 | |||||||||||
$T{B^{1}}$ | 0.11 | 0.10 | 0.12 | 0.21 | 0.29 | ||||||
$T{B^{2}}$ | 0.08 | 0.11 | 0.16 | 0.23 | 0.46 | ||||||
TB | 0.10 | 0.11 | 0.13 | 0.22 | 0.34 | ||||||
CRM-MC | 0.00 | 0.04 | 0.29 | 0.58 | 0.09 | 0.14 | 0.16 | 0.24 | 0.34 | 0.12 | |
Ti3+3 | 0.00 | 0.00 | 0.07 | 0.77 | 0.16 | 0.12 | 0.14 | 0.23 | 0.38 | 0.14 | |
Scenario 4 | |||||||||||
$T{B^{1}}$ | 0.10 | 0.11 | 0.12 | 0.14 | 0.21 | ||||||
$T{B^{2}}$ | 0.05 | 0.08 | 0.16 | 0.21 | 0.23 | ||||||
TB | 0.09 | 0.10 | 0.13 | 0.16 | 0.22 | ||||||
CRM-MC | 0.00 | 0.03 | 0.15 | 0.32 | 0.50 | 0.13 | 0.16 | 0.20 | 0.23 | 0.29 | |
Ti3+3 | 0.00 | 0.00 | 0.05 | 0.21 | 0.74 | 0.11 | 0.13 | 0.19 | 0.24 | 0.33 | |
Scenario 5 | |||||||||||
$T{B^{1}}$ | 0.22 | 0.32 | 0.36 | 0.48 | 0.52 | ||||||
$T{B^{2}}$ | 0.17 | 0.25 | 0.45 | 0.52 | 0.60 | ||||||
TB | 0.21 | 0.30 | 0.39 | 0.49 | 0.54 | ||||||
CRM-MC | 0.73 | 0.26 | 0.01 | 0.00 | 0.00 | 0.64 | 0.28 | 0.06 | 0.01 | 0.00 | |
Ti3+3 | 0.38 | 0.57 | 0.05 | 0.00 | 0.00 | 0.52 | 0.41 | 0.07 | 0.01 | 0.00 | |
Scenario 6 | |||||||||||
$T{B^{1}}$ | 0.13 | 0.23 | 0.30 | 0.34 | 0.45 | ||||||
$T{B^{2}}$ | 0.11 | 0.21 | 0.43 | 0.44 | 0.79 | ||||||
TB | 0.13 | 0.22 | 0.33 | 0.37 | 0.54 | ||||||
CRM-MC | 0.24 | 0.66 | 0.10 | 0.00 | 0.00 | 0.33 | 0.47 | 0.15 | 0.04 | 0.01 | |
Ti3+3 | 0.02 | 0.73 | 0.25 | 0.01 | 0.00 | 0.25 | 0.52 | 0.21 | 0.01 | 0.00 | |
Scenario 7 | |||||||||||
$T{B^{1}}$ | 0.10 | 0.12 | 0.18 | 0.31 | 0.48 | ||||||
$T{B^{2}}$ | 0.18 | 0.28 | 0.29 | 0.58 | 0.73 | ||||||
TB | 0.12 | 0.16 | 0.21 | 0.38 | 0.55 | ||||||
CRM-MC | 0.04 | 0.32 | 0.60 | 0.04 | 0.00 | 0.14 | 0.31 | 0.46 | 0.08 | 0.01 | |
Ti3+3 | 0.07 | 0.45 | 0.46 | 0.02 | 0.00 | 0.28 | 0.39 | 0.27 | 0.06 | 0.00 | |
Scenario 8 | |||||||||||
$T{B^{1}}$ | 0.04 | 0.06 | 0.14 | 0.25 | 0.29 | ||||||
$T{B^{2}}$ | 0.10 | 0.12 | 0.16 | 0.17 | 0.44 | ||||||
TB | 0.06 | 0.08 | 0.15 | 0.23 | 0.33 | ||||||
CRM-MC | 0.01 | 0.06 | 0.26 | 0.56 | 0.11 | 0.08 | 0.11 | 0.24 | 0.40 | 0.18 | |
Ti3+3 | 0.00 | 0.00 | 0.06 | 0.71 | 0.23 | 0.12 | 0.15 | 0.21 | 0.36 | 0.16 |
The simulation results are presented in Table 3, which shows the percentage of recommending a particular dose (Selection %) and the percentage of patients assigned to each dose level (Allocation %). A desirable design should demonstrate a good balance between the ability to correctly identify the MTD and patient safety. Compared with the CRM-MC design, Ti3+3 has higher percentage of correct selection (PCS) among scenarios where both design consider the same dose as the true MTD (scenarios $2,3,4,6$, and 8). In terms of patient allocation, Ti3+3 seems to perform better at assigning patients to the target dose (PCA) in scenarios 2-6, and is less likely to assign patients to a dose above the MTD (POA) in majority of the eight scenarios. Overall, the operating characteristics of our proposed Ti3+3 design are comparable to the CRM-MC design, which relies on a model-based inference for dose assignment decisions.
3.2 Comparison with gBOIN
Additionally, we evaluate the performance of Ti3+3 by comparing it with the gBOIN design by Mu et al. (2019) [19]. The gBOIN design generalizes the BOIN design and provides a unified framework to incorporate non-binary toxicity outcomes. We follow the simulation settings described in Section 3.2 in Mu et al. (2019). Only one type of toxicity is considered, and the standardized weight matrix is defined as
\[ \boldsymbol{W}=\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c@{\hskip10.0pt}c@{\hskip10.0pt}c}0& 0& 0.16& 0.33& 0.50\end{array}\right).\]
In words, grades 0 and 1 are of no concern, grade 2 toxicity is considered equivalent to half grade 3 toxicity, and grade 4 toxicity is equivalent to one and half grade 3 toxicity. Based on Mu et al. (2019), the $TTB$ is 0.31, and the recommended default escalation and de-escalation boundaries are ${\lambda _{e}^{\ast }}=0.249$ and ${\lambda _{d}^{\ast }}=0.377$. The same set of target burden and $EI$ is adopted in Ti3+3. A total sample of 30 patients and a cohort size of 3 is used in the simulation, with a starting dose at dose level 1. The details of the ten dose-toxicity scenarios is provided in Appendix B.Table 4 shows the results based on 4,000 simulated trials. In general, the two designs demonstrate comparable operating characteristics, and Ti3+3 shows superior performance in terms of trial safety. Even though gBOIN yields higher PCS in scenarios 1 through 7, the differences in PCS are no greater than 0.05 except in scenario 7. And Ti3+3 generates higher PCS in scenarios 8, 9 and 10. It is worthnoting that Ti3+3 is less likely to recommend a overly toxic dose as the MTD across all scenarios. The patient allocation of the two designs are very close as shown in Table 4. The Ti3+3 yields higher or similar PCA in 7 out of 10 scenarios, and shows consistently lower POA across all scenarios. The POS and POA are two important safety metrics considered in early phase trials, and Ti3+3 shows relatively strong performance in the safety metrics.
Table 4
Performance of the Ti3+3 design compared with gBOIN. The $TTB$ is 0.31, and the $EI$ is $(0.249,0.377)$. The MTD in each scenario is in bold.
Selection % | Allocation % | ||||||||||||
1 | 2 | 3 | 4 | 5 | 6 | 1 | 2 | 3 | 4 | 5 | 6 | ||
Scenario 1 | |||||||||||||
TB | 0.08 | 0.13 | 0.22 | 0.32 | 0.50 | 0.70 | |||||||
gBOIN | 0.00 | 0.02 | 0.29 | 0.56 | 0.12 | 0.01 | 0.12 | 0.17 | 0.28 | 0.30 | 0.12 | 0.01 | |
Ti3+3 | 0.00 | 0.07 | 0.34 | 0.50 | 0.08 | 0.00 | 0.12 | 0.20 | 0.31 | 0.27 | 0.09 | 0.01 | |
Scenario 2 | |||||||||||||
TB | 0.05 | 0.09 | 0.19 | 0.28 | 0.47 | 0.66 | |||||||
gBOIN | 0.00 | 0.00 | 0.16 | 0.66 | 0.18 | 0.00 | 0.12 | 0.15 | 0.25 | 0.33 | 0.14 | 0.01 | |
Ti3+3 | 0.00 | 0.04 | 0.24 | 0.62 | 0.11 | 0.00 | 0.12 | 0.16 | 0.27 | 0.33 | 0.11 | 0.01 | |
Scenario 3 | |||||||||||||
TB | 0.11 | 0.26 | 0.33 | 0.44 | 0.55 | 0.75 | |||||||
gBOIN | 0.02 | 0.47 | 0.36 | 0.13 | 0.00 | 0.00 | 0.21 | 0.41 | 0.26 | 0.11 | 0.02 | 0.00 | |
Ti3+3 | 0.12 | 0.43 | 0.33 | 0.10 | 0.01 | 0.00 | 0.21 | 0.41 | 0.26 | 0.10 | 0.02 | 0.00 | |
Scenario 4 | |||||||||||||
TB | 0.07 | 0.22 | 0.30 | 0.40 | 0.52 | 0.71 | |||||||
gBOIN | 0.01 | 0.28 | 0.47 | 0.23 | 0.01 | 0.00 | 0.16 | 0.30 | 0.36 | 0.15 | 0.03 | 0.00 | |
Ti3+3 | 0.05 | 0.32 | 0.46 | 0.16 | 0.01 | 0.00 | 0.18 | 0.34 | 0.31 | 0.13 | 0.03 | 0.00 | |
Scenario 5 | |||||||||||||
TB | 0.00 | 0.04 | 0.06 | 0.07 | 0.11 | 0.22 | |||||||
gBOIN | 0.00 | 0.00 | 0.00 | 0.00 | 0.04 | 0.96 | 0.10 | 0.10 | 0.11 | 0.11 | 0.15 | 0.42 | |
Ti3+3 | 0.00 | 0.00 | 0.00 | 0.01 | 0.08 | 0.91 | 0.10 | 0.10 | 0.11 | 0.12 | 0.16 | 0.41 | |
Scenario 6 | |||||||||||||
TB | 0.30 | 0.42 | 0.53 | 0.67 | 0.77 | 0.86 | |||||||
gBOIN | 0.69 | 0.29 | 0.01 | 0.00 | 0.00 | 0.00 | 0.66 | 0.27 | 0.06 | 0.01 | 0.00 | 0.00 | |
Ti3+3 | 0.62 | 0.19 | 0.01 | 0.00 | 0.00 | 0.00 | 0.69 | 0.26 | 0.05 | 0.00 | 0.00 | 0.00 | |
Scenario 7 | |||||||||||||
TB | 0.13 | 0.30 | 0.38 | 0.49 | 0.60 | 0.78 | |||||||
gBOIN | 0.11 | 0.55 | 0.28 | 0.06 | 0.00 | 0.00 | 0.30 | 0.43 | 0.21 | 0.06 | 0.01 | 0.00 | |
Ti3+3 | 0.15 | 0.54 | 0.25 | 0.05 | 0.01 | 0.00 | 0.31 | 0.42 | 0.20 | 0.05 | 0.01 | 0.00 | |
Scenario 8 | |||||||||||||
TB | 0.05 | 0.16 | 0.21 | 0.29 | 0.37 | 0.50 | |||||||
gBOIN | 0.00 | 0.06 | 0.22 | 0.41 | 0.28 | 0.04 | 0.09 | 0.19 | 0.24 | 0.28 | 0.15 | 0.05 | |
Ti3+3 | 0.00 | 0.07 | 0.25 | 0.41 | 0.25 | 0.02 | 0.13 | 0.19 | 0.25 | 0.24 | 0.15 | 0.04 | |
Scenario 9 | |||||||||||||
TB | 0.11 | 0.30 | 0.38 | 0.49 | 0.6 | 0.78 | |||||||
gBOIN | 0.00 | 0.91 | 0.00 | 0.00 | 0.00 | 0.00 | 0.11 | 0.76 | 0.14 | 0.00 | 0.00 | 0.00 | |
Ti3+3 | 0.00 | 0.94 | 0.06 | 0.00 | 0.00 | 0.00 | 0.11 | 0.79 | 0.11 | 0.00 | 0.00 | 0.00 | |
Scenario 10 | |||||||||||||
TB | 0.05 | 0.16 | 0.21 | 0.29 | 0.37 | 0.50 | |||||||
gBOIN | 0.00 | 0.00 | 0.03 | 0.71 | 0.23 | 0.03 | 0.10 | 0.12 | 0.17 | 0.44 | 0.13 | 0.04 | |
Ti3+3 | 0.00 | 0.00 | 0.06 | 0.79 | 0.14 | 0.01 | 0.10 | 0.12 | 0.19 | 0.47 | 0.09 | 0.03 |
4 Sensitivity Analysis
Table 5
Sensitivity analysis results of the Ti3+3 with different $EI$ lengths. The $TTB$ is 0.30. The different $EI$s: $A=(0.20,0.35)$, $B=(0.20,0.40)$, $C=(0.25,0.40)$, $D=(0.25,0.35)$. The MTD in each scenario is in bold.
Selection % | Allocation % | ||||||||||||
1 | 2 | 3 | 4 | 5 | 6 | 1 | 2 | 3 | 4 | 5 | 6 | ||
Scenario 1 | |||||||||||||
TB | 0.08 | 0.13 | 0.22 | 0.32 | 0.50 | 0.70 | |||||||
A | 0.01 | 0.10 | 0.46 | 0.38 | 0.04 | 0.00 | 0.14 | 0.25 | 0.36 | 0.21 | 0.05 | 0.00 | |
B | 0.02 | 0.09 | 0.44 | 0.40 | 0.05 | 0.00 | 0.14 | 0.23 | 0.35 | 0.23 | 0.05 | 0.00 | |
C | 0.01 | 0.09 | 0.35 | 0.49 | 0.07 | 0.00 | 0.12 | 0.20 | 0.30 | 0.28 | 0.09 | 0.01 | |
D | 0.02 | 0.08 | 0.37 | 0.45 | 0.08 | 0.00 | 0.13 | 0.20 | 0.31 | 0.26 | 0.09 | 0.01 | |
Scenario 2 | |||||||||||||
TB | 0.05 | 0.09 | 0.19 | 0.28 | 0.47 | 0.66 | |||||||
A | 0.00 | 0.05 | 0.33 | 0.53 | 0.10 | 0.00 | 0.12 | 0.19 | 0.33 | 0.27 | 0.07 | 0.00 | |
B | 0.00 | 0.03 | 0.35 | 0.54 | 0.07 | 0.00 | 0.13 | 0.18 | 0.33 | 0.30 | 0.06 | 0.00 | |
C | 0.00 | 0.04 | 0.23 | 0.63 | 0.09 | 0.00 | 0.11 | 0.15 | 0.27 | 0.34 | 0.12 | 0.01 | |
D | 0.01 | 0.05 | 0.25 | 0.61 | 0.09 | 0.00 | 0.11 | 0.17 | 0.27 | 0.32 | 0.11 | 0.01 | |
Scenario 3 | |||||||||||||
TB | 0.11 | 0.26 | 0.33 | 0.44 | 0.55 | 0.75 | |||||||
A | 0.14 | 0.55 | 0.24 | 0.05 | 0.00 | 0.00 | 0.32 | 0.45 | 0.18 | 0.04 | 0.01 | 0.00 | |
B | 0.14 | 0.53 | 0.26 | 0.06 | 0.01 | 0.00 | 0.28 | 0.47 | 0.20 | 0.05 | 0.00 | 0.00 | |
C | 0.12 | 0.41 | 0.35 | 0.10 | 0.01 | 0.00 | 0.23 | 0.38 | 0.26 | 0.11 | 0.02 | 0.00 | |
D | 0.14 | 0.44 | 0.32 | 0.08 | 0.01 | 0.00 | 0.27 | 0.38 | 0.24 | 0.09 | 0.02 | 0.00 | |
Scenario 4 | |||||||||||||
TB | 0.07 | 0.22 | 0.30 | 0.40 | 0.52 | 0.71 | |||||||
A | 0.07 | 0.46 | 0.38 | 0.09 | 0.00 | 0.00 | 0.23 | 0.44 | 0.26 | 0.07 | 0.01 | 0.00 | |
B | 0.07 | 0.39 | 0.43 | 0.12 | 0.01 | 0.00 | 0.21 | 0.41 | 0.29 | 0.09 | 0.01 | 0.00 | |
C | 0.06 | 0.31 | 0.46 | 0.16 | 0.01 | 0.00 | 0.18 | 0.34 | 0.32 | 0.14 | 0.03 | 0.00 | |
D | 0.07 | 0.36 | 0.42 | 0.13 | 0.01 | 0.00 | 0.21 | 0.36 | 0.29 | 0.12 | 0.02 | 0.00 | |
Scenario 5 | |||||||||||||
TB | 0.00 | 0.04 | 0.06 | 0.07 | 0.11 | 0.22 | |||||||
A | 0.00 | 0.00 | 0.00 | 0.01 | 0.12 | 0.86 | 0.10 | 0.11 | 0.12 | 0.13 | 0.18 | 0.35 | |
B | 0.00 | 0.00 | 0.00 | 0.02 | 0.11 | 0.87 | 0.10 | 0.11 | 0.12 | 0.14 | 0.18 | 0.35 | |
C | 0.00 | 0.00 | 0.00 | 0.01 | 0.10 | 0.89 | 0.10 | 0.10 | 0.11 | 0.11 | 0.16 | 0.42 | |
D | 0.00 | 0.00 | 0.00 | 0.01 | 0.10 | 0.89 | 0.10 | 0.10 | 0.11 | 0.11 | 0.17 | 0.41 | |
Scenario 6 | |||||||||||||
TB | 0.30 | 0.42 | 0.53 | 0.67 | 0.77 | 0.86 | |||||||
A | 0.67 | 0.12 | 0.01 | 0.00 | 0.00 | 0.00 | 0.80 | 0.18 | 0.02 | 0.00 | 0.00 | 0.00 | |
B | 0.66 | 0.12 | 0.01 | 0.00 | 0.00 | 0.00 | 0.79 | 0.18 | 0.02 | 0.00 | 0.00 | 0.00 | |
C | 0.64 | 0.17 | 0.01 | 0.00 | 0.00 | 0.00 | 0.67 | 0.27 | 0.05 | 0.00 | 0.00 | 0.00 | |
D | 0.63 | 0.17 | 0.01 | 0.00 | 0.00 | 0.00 | 0.69 | 0.25 | 0.05 | 0.01 | 0.00 | 0.00 | |
Scenario 7 | |||||||||||||
TB | 0.13 | 0.30 | 0.38 | 0.49 | 0.60 | 0.78 | |||||||
A | 0.20 | 0.57 | 0.20 | 0.02 | 0.00 | 0.00 | 0.36 | 0.43 | 0.17 | 0.04 | 0.00 | 0.00 | |
B | 0.18 | 0.57 | 0.22 | 0.03 | 0.00 | 0.00 | 0.33 | 0.45 | 0.17 | 0.04 | 0.00 | 0.00 | |
C | 0.17 | 0.52 | 0.24 | 0.06 | 0.00 | 0.00 | 0.28 | 0.41 | 0.22 | 0.07 | 0.01 | 0.00 | |
D | 0.18 | 0.53 | 0.23 | 0.04 | 0.00 | 0.00 | 0.33 | 0.41 | 0.20 | 0.05 | 0.01 | 0.00 | |
Scenario 8 | |||||||||||||
TB | 0.05 | 0.16 | 0.21 | 0.29 | 0.37 | 0.50 | |||||||
A | 0.02 | 0.18 | 0.43 | 0.29 | 0.08 | 0.01 | 0.17 | 0.34 | 0.31 | 0.15 | 0.04 | 0.00 | |
B | 0.03 | 0.18 | 0.42 | 0.29 | 0.09 | 0.00 | 0.17 | 0.33 | 0.32 | 0.15 | 0.04 | 0.00 | |
C | 0.01 | 0.05 | 0.28 | 0.42 | 0.23 | 0.01 | 0.13 | 0.18 | 0.26 | 0.25 | 0.15 | 0.04 | |
D | 0.01 | 0.07 | 0.26 | 0.43 | 0.22 | 0.02 | 0.13 | 0.19 | 0.24 | 0.25 | 0.14 | 0.04 |
Table 6
Sensitivity analysis results of the Ti3+3 with different cohort sizes. The $TTB$ is 0.30, and the $EI$ is (0.25,0.35). The MTD in each scenario is in bold.
Selection % | Allocation % | ||||||||||||
1 | 2 | 3 | 4 | 5 | 6 | 1 | 2 | 3 | 4 | 5 | 6 | ||
Scenario 1 | |||||||||||||
TB | 0.08 | 0.13 | 0.22 | 0.32 | 0.50 | 0.70 | |||||||
Cohort size 1 | 0.04 | 0.14 | 0.35 | 0.43 | 0.03 | 0.00 | 0.09 | 0.17 | 0.30 | 0.31 | 0.10 | 0.02 | |
Cohort size 2 | 0.04 | 0.15 | 0.36 | 0.40 | 0.06 | 0.00 | 0.12 | 0.22 | 0.30 | 0.27 | 0.08 | 0.01 | |
Cohort size 3 | 0.02 | 0.08 | 0.37 | 0.45 | 0.08 | 0.00 | 0.13 | 0.20 | 0.31 | 0.26 | 0.09 | 0.01 | |
Scenario 2 | |||||||||||||
TB | 0.05 | 0.09 | 0.19 | 0.28 | 0.47 | 0.66 | |||||||
Cohort size 1 | 0.03 | 0.10 | 0.26 | 0.56 | 0.05 | 0.00 | 0.07 | 0.14 | 0.25 | 0.40 | 0.12 | 0.02 | |
Cohort size 2 | 0.02 | 0.09 | 0.29 | 0.53 | 0.08 | 0.00 | 0.11 | 0.18 | 0.28 | 0.33 | 0.10 | 0.01 | |
Cohort size 3 | 0.01 | 0.05 | 0.25 | 0.61 | 0.09 | 0.00 | 0.11 | 0.17 | 0.27 | 0.32 | 0.11 | 0.01 | |
Scenario 3 | |||||||||||||
TB | 0.11 | 0.26 | 0.33 | 0.44 | 0.55 | 0.75 | |||||||
Cohort size 1 | 0.22 | 0.38 | 0.29 | 0.10 | 0.00 | 0.00 | 0.26 | 0.32 | 0.26 | 0.12 | 0.03 | 0.01 | |
Cohort size 2 | 0.20 | 0.43 | 0.30 | 0.06 | 0.01 | 0.00 | 0.30 | 0.37 | 0.23 | 0.08 | 0.02 | 0.00 | |
Cohort size 3 | 0.14 | 0.44 | 0.32 | 0.08 | 0.01 | 0.00 | 0.27 | 0.38 | 0.24 | 0.09 | 0.02 | 0.00 | |
Scenario 4 | |||||||||||||
TB | 0.07 | 0.22 | 0.30 | 0.40 | 0.52 | 0.71 | |||||||
Cohort size 1 | 0.12 | 0.33 | 0.42 | 0.12 | 0.00 | 0.00 | 0.16 | 0.31 | 0.32 | 0.16 | 0.04 | 0.01 | |
Cohort size 2 | 0.15 | 0.34 | 0.39 | 0.12 | 0.01 | 0.00 | 0.24 | 0.34 | 0.29 | 0.11 | 0.02 | 0.00 | |
Cohort size 3 | 0.07 | 0.36 | 0.42 | 0.13 | 0.01 | 0.00 | 0.21 | 0.36 | 0.29 | 0.12 | 0.02 | 0.00 | |
Scenario 5 | |||||||||||||
TB | 0.00 | 0.04 | 0.06 | 0.07 | 0.11 | 0.22 | |||||||
Cohort size 1 | 0.00 | 0.01 | 0.01 | 0.03 | 0.12 | 0.82 | 0.04 | 0.05 | 0.05 | 0.07 | 0.16 | 0.64 | |
Cohort size 2 | 0.01 | 0.01 | 0.02 | 0.03 | 0.14 | 0.80 | 0.07 | 0.08 | 0.09 | 0.10 | 0.17 | 0.48 | |
Cohort size 3 | 0.00 | 0.00 | 0.00 | 0.01 | 0.10 | 0.89 | 0.10 | 0.10 | 0.11 | 0.11 | 0.17 | 0.41 | |
Scenario 6 | |||||||||||||
TB | 0.30 | 0.42 | 0.53 | 0.67 | 0.77 | 0.86 | |||||||
Cohort size 1 | 0.61 | 0.15 | 0.01 | 0.00 | 0.00 | 0.00 | 0.65 | 0.26 | 0.07 | 0.02 | 0.00 | 0.00 | |
Cohort size 2 | 0.66 | 0.15 | 0.01 | 0.00 | 0.00 | 0.00 | 0.71 | 0.23 | 0.05 | 0.01 | 0.00 | 0.00 | |
Cohort size 3 | 0.63 | 0.17 | 0.01 | 0.00 | 0.00 | 0.00 | 0.69 | 0.25 | 0.05 | 0.01 | 0.00 | 0.00 | |
Scenario 7 | |||||||||||||
TB | 0.13 | 0.30 | 0.38 | 0.49 | 0.60 | 0.78 | |||||||
Cohort size 1 | 0.26 | 0.44 | 0.24 | 0.05 | 0.00 | 0.00 | 0.30 | 0.34 | 0.22 | 0.10 | 0.03 | 0.01 | |
Cohort size 2 | 0.27 | 0.46 | 0.22 | 0.05 | 0.00 | 0.00 | 0.36 | 0.36 | 0.20 | 0.07 | 0.01 | 0.00 | |
Cohort size 3 | 0.18 | 0.53 | 0.23 | 0.04 | 0.00 | 0.00 | 0.33 | 0.41 | 0.20 | 0.05 | 0.01 | 0.00 | |
Scenario 8 | |||||||||||||
TB | 0.05 | 0.16 | 0.21 | 0.29 | 0.37 | 0.50 | |||||||
Cohort size 1 | 0.07 | 0.12 | 0.27 | 0.33 | 0.21 | 0.01 | 0.11 | 0.16 | 0.25 | 0.26 | 0.18 | 0.05 | |
Cohort size 2 | 0.05 | 0.13 | 0.25 | 0.37 | 0.19 | 0.01 | 0.13 | 0.22 | 0.26 | 0.25 | 0.13 | 0.02 | |
Cohort size 3 | 0.01 | 0.07 | 0.26 | 0.43 | 0.22 | 0.02 | 0.13 | 0.19 | 0.24 | 0.25 | 0.14 | 0.04 |
Sensitivity analysis is conducted to further evaluate the Ti3+3 design using the first eight scenarios considered in Section 3.2. First, we investigate the effect of $EI$ length on the performance of the proposed method. Specifically, with $TT{B^{j}}$ and $TTB$ fixed at 0.30, we select four different $EI$’s (represented as Cases A, B, C, and D), and for each $EI$ value and each of the eight scenarios, we simulate 1,000 trials each with a cohort size 3 and total sample size 30. Simulation results with varying $EI$ lengths are shown in Table 5. Note that in Case B ($EI=(0.20,0.40)$) and D ($EI=(0.25,0.35)$), $EI$s are symmetric around the $TTB$, whereas in Case A ($EI=(0.20,0.35)$) and C ($EI=(0.25,0.40)$), the $EI$s are asymmetric around $TTB$. For all scenarios, patient safety (the percentage of patients treated at or below the true MTD) and PCS tend to improve with wider $EI$s. This is because a wider $EI$ allows a wider range of doses to be considered as the MTD. However, the $EI$ cannot be too wide to become clinically meaningless. Moreover, across all scenarios, we observe that Case A allocates fewer patients to doses over MTD than Case B and C since the upper bound of the $EI$ in A is smaller than that of the $EI$ in B and C. The overall performances observed across the four cases are comparable, and the proposed Ti3+3 design seems robust against various choices of $EI$ lengths.
We fix the total sample size of 30 and conduct another sensitivity analysis using Ti3+3 with cohort sizes 1, 2, or 3. Results with different cohort sizes are shown in Table 6. For all scenarios, the results for cohort sizes 1 and 2 are less desirable when comparing to cohort size 3. Specifically, the percentage of patients treated above the true MTD tends to increase with smaller cohort size, and PCS tends to decrease with smaller cohort size. As more information is needed for the estimation of the toxicity profile at each dose $T{B_{d}}$ as opposed to the probability of binary DLT, we recommend implementing this proposed method with cohort size 3 or above.
5 Discussion
We propose a practical rule-based Ti3+3 design that extends the i3+3 design by incorporating toxicity outcomes with multiple toxicity types and grades to improve the efficacy and safety of phase I trials. The Ti3+3 adopts a similar dose-finding algorithm as i3+3, which is simple and straightforward. In addition, we show that it exhibits desirable operating characteristics by extensive simulations. Compared with the existing methods such as CRM-MC and gBOIN, the Ti3+3 demonstrates similar PCS with better safety performance. We provide an RShiny tool freely available at https://i3design.shinyapps.io/ti3plus3/ that generates dose-escalation decisions based on the Ti3+3 design and conducts simulation given toxicity scenarios provided by the users.
A major advantage of using Ti3+3 instead of the model-based and model-assisted designs for practical trials might be its simplicity, especially the simplicity of the dose-finding rules. In particular, the up-and-down rules can be directly assessed and easily understood and executed by clinicians. It is important for clinicians to understand the dose-finding rules since they are the final decision makers for dose selection for each patient, to whom they might need to explain these decisions. Even though the decision rules of the model-assisted design gBOIN are also straightforward, these rules are based on complex statistical inference that is usually not easy to explain to clinicians. The Ti3+3 design, on the other hand, is based on a set of simple rules, which could be easily conveyed to physicians.
The proposed Ti3+3 design incorporates type-specific toxicity burdens to quantify the impact of a particular type of toxicity events on patients’ health and thereby reduce the risk of selecting a toxic dose which may be considered “safe” when evaluated solely based on the overall toxicity burden. Furthermore, the type-specific target toxicity burdens as well as their equivalence intervals used in the proposed method are relatively easy to specify and interpret, i.e., they can be considered as a re-scaled toxicity probability in the conventional DLT-based dose-finding studies. Specifying the target and $EI$ for the overall toxicity burden is also straightforward given the pre-determined weight matrix.
The lower bound of the $EI$ can be considered the smallest toxicity probability that clinicians would not want to escalate the dose, and the upper bound the highest toxicity probability that clinicians would not want to de-escalate. Here, even though the $EI$ is for toxicity burden instead of toxicity probability, because the way we constructed the burden (Equation (2.1)), the toxicity burden is essentially re-scaled as toxicity probability for the highest grade K. Therefore, the rescaling facilitates the elicitation and interpretation of the $EI$s as we have now explained.
The use of toxicity burden or multiple toxicity grades in dose-finding trials has been limited in practice, mostly due to the need for extensive collaboration between statisticians and clinicians required for the elicitation of weights and target in the design stage. The numerical value of each severity weight reflects the relative effect on patients’ survival/quality of life that is associated with experiencing the toxicity at the given grade. The severity weights elicited via interactions between statisticians and clinicians are intrinsically subjective. Alternatively, the elicitation of weights can be facilitated by utilizing existing trial data (for the same drug or drug class) and/or medical databases, such as the FDA Adverse Event Reporting System (FAERS) [25]. Moreover, as noted in Mu et al. (2019), to ensure good operating characteristics of the design, the elicitation process should be an iterative process. Simulation studies conducted by statisticians and inputs from multiple physicians are required to ensure that the dose assignment decisions reflect appropriate clinical significance and the design is calibrated appropriately.