1 Introduction
This article considers the problem of designing dose-finding trials that simultaneously account for patient heterogeneity and late-onset toxicities. The design described herein is applied to a Phase I clinical trial conducted at the University of Virginia. Participants were classified into two groups (good prognosis and poor prognosis) according to their expected tolerance to a radiation treatment [15]. Instead of conducting two separate, parallel trials for the groups, the design used in Muller et al. [15] borrowed accumulating information across groups to identify group-specific maximum tolerated doses (MTDs) within a single trial. In addition, the dose-limiting toxicity (DLT) observation window is up to 90 days, and on average 1-2 patients are accrued monthly. Therefore, there is also a potential for late-onset toxicities, which the original trial design did not incorporate, motivating the new design proposed in this work. Although such phase I trials are increasing in modern oncology development, there is little practical statistical methodology for designing these trials. More details of the trial are provided below.
It is not uncommon for participants to be separated into prognostic groups where they differ in terms of their expected reaction to the treatment [12]. For instance, prior studies have shown that children diagnosed with Acute Leukemia have a higher tolerance to the treatment relative to adults [16]. In this case, we say that the groups are ordered in that one group has a higher expected probability of DLT than the other group when receiving the same dose. Using a standard Phase I design to conduct two independent trials is to be avoided due to the possibility of observing a reversal [8]. A reversal occurs when the MTD selection violates the known group order. For instance, the MTD for a group known to have a poor prognosis should not be greater than that of the good prognosis group. Additionally, if the investigators decide on conducting a standard design, disregarding heterogeneity, the dose level that is recommended would be too toxic for part of the population and sub-optimal for the other. O’Quigley et al. [19] proposed a straightforward extension of the Continual Reassessment Method (CRM) to two groups in which two parameters are being estimated rather than one. The idea is that the additional parameter models the relationship between the two groups. Furthermore, every additional distinct group requires an extra parameter to be estimated according to the multi-sample CRM. Due to relatively small sample sizes and limited resources in Phase I trials, it is more efficient to exploit under-parameterized models rather than estimating additional continuous parameters. An intuitive and simple design extending the CRM to “shift” type models [20] accommodates for patient heterogeneity and is practical in trials [15]. The method argues that since we are dealing with a discrete set of dose levels {${d_{1}},{d_{2}},\dots ,{d_{k}}$}, the clinical difference between the groups can be accounted for by shifting the final recommended dose level of one group by one, two, or more levels away from the other group.
The CRM, and its patient heterogeneity extensions, assume complete information on the DLT status of each participant when assigning doses. Participants can experience a DLT at any point in the evaluation window, but they have to complete the full evaluation window without DLT to be classified as a non-DLT outcome. Due to the short observation window in Phase I clinical trials, many Phase I methods face a challenge in situations where toxicities are defined over a long evaluation window relative to patient accrual rate. For instance, suppose that DLT for a particular treatment can be observed at any point over 15 weeks, and the accrual rate is one patient per week. Since we have to wait for 15 weeks to score a patient’s outcome as a non-DLT, as many as 15 patients may be accrued before the first patient’s DLT outcome is observed. So what doses will these patients be administered? Waiting for 15 weeks until the evaluation of the previous patient is recorded would result in a long trial duration. Alternatively, putting them on the same dose as the first participant is an efficient use of resources in a small Phase I trial. Multiple publications have addressed late-onset toxicities, such as [23, 1, 13, 30, 29, 27]. The two most popular designs in practice are the time-to-event (TITE)-CRM [6] and the TITE-BOIN [31].
There are limited options for designing Phase I trials that simultaneously account for patient heterogeneity and late-onset toxicities. Salter et al. [22] proposed a modification to the TITE-CRM [6] using the two-parameter model of O’Quigley et al. [19] and maximum likelihood estimation. Chapple and Thall [2] presented a Bayesian design called Sub-TITE which makes sequentially adaptive subgroup-specific decisions while possibly combining subgroups that have similar estimated dose-toxicity curves. A logistic regression model for the probability of toxicity is utilized to make the decisions based on computed posterior quantities. This scheme allows for different subgroups to be combined for dose finding if the accumulated data indicates homogeneous groups. The Two-Stage Sub-TITE (2S-Sub-TITE) was subsequently presented by McGovern et al.[14], which delays borrowing strength and dynamic clustering across subgroups when the trial starts to improve trial accuracy. In the 1st stage, separate models are estimated for each subgroup. The 2nd stage is initiated at some pre-specified point of patient accrual, and the Sub-TITE design is subsequently followed.
The objective of this paper is to develop a practical hybrid design that combines elements from the TITE-CRM and the shift model framework (Shift TITE-CRM) to address patient heterogeneity and late-onset toxicities simultaneously. The rationale for extending the shift models framework to the TITE-CRM emanates from the computational simplicity exhibited by each method relative to the alternative methods described above. The rest of this article is laid out as follows. The following sub-section provides a motivating Phase 1/2 trial in which participants were separated into two different groups and late-onset toxicities were possible. In the Methods section, we demonstrate how the Shift TITE-CRM is executed by building on methodology from the TITE-CRM and the shift model CRM. We discuss the simulation design and investigate the operating characteristics of the proposed method in the Simulation Studies section. Finally, we will conclude with remarks and future research in the Conclusion and Discussion section.
Motivating Trial
Muller et al. [15] conducted a single-arm Phase I/II trial at the University of Virginia where patients received a radiation treatment via a novel real-time radiation oncology workflow for the delivery of high-dose single-fraction stereotactic body radiation therapy (SBRT). The study team believed that the therapy treatment at a dose higher than 8 Gy would lead to rapid, durable, and significant pain relief, etc. Among the 46 eligible patients for treatment, 31 were considered poor prognosis and classified as Group 1. 15 patients were considered good prognosis and classified as Group 2. DLTs were defined at grade $\ge 4$ toxicity occurring within 90 days of treatment. Patients in both groups were administered one of $k=4$ available dose levels, and dose allocation was conducted using a 2-stage shift model CRM for heterogeneous groups [26], but the potential for late-onset toxicity was not accounted for in the design. Acceptable safety was defined by any estimated DLT probability less than or equal to a maximum toxicity tolerance of $\Theta =0.20$.
In Group g, the trial is investigating four ordered dose levels: $\{{d_{g1}},{d_{g2}},{d_{g3}},{d_{g4}}\}$, and the primary objective is to identify the MTD in each of the g groups. Since patients in Group 1 have a poorer prognosis, the expected probability of toxicity should be higher than that of Group 2 at each dose level. Thus, the MTD for Group 1 should be lower than the MTD for Group 2. For instance, if the MTD was defined to be dose ${d_{23}}$ for Group 2, then the MTD for Group 1 could be one or two levels below ${d_{23}}$, given by the following set {${d_{11}}$, ${d_{12}}$}. For a fixed dose ${d_{2j}}$ administered in Group 2, the possible doses administered in Group 1 are $\{{d_{11}},{d_{12}},\dots ,{d_{1,j-1}}\}$. So the total number of possible shifts in the MTD between the two groups would be $j-1$, where the “true” shift could be any of these values. The true shift is sequentially estimated from a class of CRM shift models using the accumulated data from each new cohort of participants.
2 Methods
Implementation of the proposed design can be done using two approaches; a likelihood-based approach, or a Bayesian approach. We will be focusing on the latter since a likelihood approach would require an additional stage of rule-based allocation until at least one DLT and one non-DLT are observed (outcome heterogeneity) so that one can start modeling. Now, suppose there are K available dose levels and G distinct groups. Let ${d_{k}}$ denote the dose at level k for some Group g, where $k\in \{1,2,\dots ,K\}$ and $g\in \{1,2,\dots ,G\}$. We utilize the idea of the shift between group dose levels to incorporate the clinical relationship between the groups [9]. The shifts in the MTD then range from 0 to $K-1$ dose levels according to the clinical difference of one group relative to the other.
In general, assume that we have M possible shifts, indexed by $m=1,\dots ,M$, between the two groups. For every possible shift model m, the probability of DLT ${R_{mg}}({d_{k}})$ at dose k in Group g is modeled by
\[ {R_{mg}}({d_{k}})=\Pr ({Y_{gk}}=1|{d_{k}},g,m)={\Psi _{mg}}({d_{k}},{a_{m}})={P_{mgk}^{\exp ({a_{m}})}}\]
The “${P_{mgk}}$” values construct the skeleton, which is a set of pre-specified constants representing an initial guess for the probability of toxicity at each dose level. As mentioned before, the group with a poorer prognosis will have relatively larger skeleton values. The relative location of the MTDs is accounted for by selecting various skeletons with group-specific MTD positions reflecting all possible scenarios. Then, the skeletons are constructed to be both shifted and monotonic. The term “shift” refers to the relative location of MTDs between groups, while within each group, the skeleton is monotonic, assuming that DLT probabilities increase with dose. In the motivating trial with $K=4$ dose levels in each of $G=2$ groups, there were three shift models considered given in Table 1.Since a Bayesian form of the CRM is used, a prior probability distribution $g({a_{m}})$ of the parameter ${a_{m}}$, and a prior probability $p(m)$ for each shift model should be pre-specified. For the power model, it is recommended to use a zero mean normal prior distribution for the parameter a for each shift model m, such as ${g_{m}}({a_{m}})\sim N(0,{\sigma _{{a_{m}}}^{2}})$ [17] and the concept of the least informative variance has shown good performance across a broad array of dose-toxicity scenarios [11]. Let ${n_{g}}$ be the number of patients in Group g at the end of the study, so that the total number of patients accrued is $n={\textstyle\sum _{g=1}^{G}}{n_{g}}$. The likelihood under shift model m is then given by,
\[ {L_{m}}({\Omega _{j}}|{a_{m}})={\textstyle\textstyle\prod _{g=1}^{G}}{\textstyle\textstyle\prod _{j=1}^{{n_{g}}}}{\big({w_{jg}}{\Psi _{mg}}({x_{jg}},{a_{m}})\big)^{{y_{jg}}}}{\big(1-{w_{jg}}{\Psi _{mg}}({x_{jg}},{a_{m}})\big)^{(1-{y_{jg}})}}\]
where ${\Omega _{j}}$ denotes the data accumulated up to the jth patient, ${w_{jg}}$ denotes the weight for the jth patient in group g, ${\Psi _{mg}}({x_{jg}},{a_{m}})$ represents the probability of detecting a DLT in Group g for shift model m, corresponding to the dose ${x_{jg}}$ administered to patient j, and ${y_{jg}}$ denotes the DLT status of patient j in Group g. The weight w is a function of the observation of each patient and has a linear association with the dose-toxicity model. Utilizing a linear function for the weights has been shown to perform well in most scenarios [6]. Similar to the TITE-CRM [6], the weights are constructed using a linear function where ${w_{jg}}=\frac{{u_{jg}}}{T}$, ${u_{jg}}$ is the time to observe a DLT for patient j in Group g, and T is the length of time that defines the DLT evaluation window. The posterior density of a given the data accumulated after j participants,
\[ {f_{m}}({a_{m}}|{\Omega _{j}})=\frac{{L_{m}}({\Omega _{j}}|{a_{m}})\times g({a_{m}})}{{\int _{-\infty }^{\infty }}{L_{m}}({\Omega _{j}}|{a_{m}})\times g({a_{m}})d{a_{m}}}\]
and is used to generate an estimate ${\hat{a}_{m}}$ for ${a_{m}}$ for each shift model. Further, the posterior probabilities of the models given the data can be established and given by:
\[ \pi (m|{\Omega _{j}})=\frac{p(m){\int _{-\infty }^{\infty }}{L_{m}}({\Omega _{j}}|{a_{m}})\times g({a_{m}})d{a_{m}}}{{\sum \limits_{m=1}^{M}}p(m){\int _{-\infty }^{\infty }}{L_{m}}({\Omega _{j}}|{a_{m}})\times g({a_{m}})d{a_{m}}}\]
The design will choose the shift model with the largest posterior probability, $p(m)$, among the M models. It is expected that the more the data support a particular shift model m, the greater its posterior probability will be. So for Group g, a particular shift model is chosen, say h, that has the highest posterior probability among the models. Then, we take the working model ${\Psi _{hg}}({d_{k}},a)$ associated with this shift model, and estimate a by applying the Bayesian form of the CRM
\[\begin{aligned}{}{\hat{R}_{hg}}({d_{k}})& ={P_{hgk}^{\exp ({\hat{a}_{h}})}}\hspace{1em}\text{where,}\hspace{1em}\\ \\ {} {\hat{a}_{h}}& =\frac{{\int _{-\infty }^{\infty }}{a_{h}}\times {L_{h}}({\Omega _{j}}|{a_{h}})\times {g_{h}}({a_{h}})\hspace{0.1667em}d{a_{h}}}{{\int _{-\infty }^{\infty }}{L_{h}}({\Omega _{j}}|{a_{h}})\times {g_{h}}({a_{h}})\hspace{0.1667em}d{a_{h}}}\end{aligned}\]
We choose to estimate ${R_{hg}}({d_{k}})$ based on the “plug-in” estimate [18] for the MTD, which has been studied more thoroughly and systematically than other estimators in the literature [5]. When escalating, the design is restricted to not skipping any dose levels that haven’t been administered. Dose allocation for the next cohort of patients, or MTD selection if at the end of the trial, is found by selecting the dose in each group which minimizes the difference in the probability of toxicity and target DLT rate, $|{\hat{R}_{hg}}({d_{k}})-\Theta |$.3 Simulation Studies
3.1 Single Simulated Trial
This subsection exemplifies the proposed design in a single simulated trial with a similar setting to the motivating trial by Muller et al. [15]. Participants are classified into $G=2$ distinct groups with $K=4$ available dose levels, ${d_{g1}},\dots ,{d_{g4}}$. Participants with poor prognosis are classified as Group 1, and participants with good prognosis are classified as Group 2. The magnitude of the clinical difference is then translated by the number of shifts in dose levels between the groups. The true toxicity curves for Groups 1 and 2 are given by {0.03,0.11,0.21,0.33} and {0.01,0.03,0.11,0.21} respectively; where we assumed a one level shift in the MTD location between the two groups. The target DLT probability that defines the MTD is specified at $\Theta =0.20$, indicating that ${d_{3}}$ is the correct dose for Group 1, and ${d_{4}}$ is the correct dose for Group 2. The study is set to enroll a total of 46 participants, with an anticipated distribution of 50% in Group 1 and 50% in Group 2. The DLT endpoint was defined with respect to a 3-month follow-up period with one participant being enrolled every 0.5 months. We established the skeleton values using a systematic approach [10], and adjusted the position of these values to correspond to each of the three possible shift models given in Table 1. For each shift model, we assigned a prior probability $p(m)=\frac{1}{M}$ and implemented a normal prior $N(0,{\sigma _{a}^{2}}=1.34)$ on the parameter a. Failure times were generated under a conditionally uniform model. After each included participant, the model-based approach described in Section 3 is employed to assign the dose for the next participant. The MTD selection is generated at the end of the trial for each group, and a summary of the simulated trial is provided in Table 2. The initial patient, classified as being at poor risk, received dose level ${d_{1}}$ and exhibited a non-DLT response. Consequently, the subsequent recommended dose for the second patient, also categorized as poor risk, was ${d_{2}}$, resulting in a non-DLT response as well. When the third participant, categorized as being at good risk, enrolled in the study, the design recommended escalating the dose to ${d_{3}}$, taking into account the improved risk profile of participant 3 compared to participants 1 and 2. At the end of the trial we observed four consecutive non-DLTs in Group 1, and three consecutive non-DLTs in Group 2. The final estimated DLT probabilities for Group 1 were given by $\hat{R}({d_{1}})=0.067$, $\hat{R}({d_{2}})=0.125$, $\hat{R}({d_{3}})=0.194$, and $\hat{R}({d_{4}})=0.284$. The final estimated DLT probabilities for Group 2 were given by $\hat{R}({d_{1}})=0.028$, $\hat{R}({d_{2}})=0.067$, $\hat{R}({d_{3}})=0.125$, and $\hat{R}({d_{4}})=0.194$. Therefore, the recommended dose levels are ${d_{3}}$ and ${d_{4}}$ for Groups 1 and 2 respectively.
Table 1
Skeleton values by dose level for each possible shift model in the Motivating Trial.
Doses in Gy | |||||
Model | Prognosis group | 8 | 10 | 12.5 | 15 |
$m=1$ | 1 - Poor | 0.07 | 0.13 | 0.20 | 0.29 |
2 - Good | 0.03 | 0.07 | 0.13 | 0.20 | |
$m=2$ | 1 - Poor | 0.13 | 0.20 | 0.29 | 0.38 |
2 - Good | 0.03 | 0.07 | 0.13 | 0.20 | |
$m=3$ | 1 - Poor | 0.20 | 0.29 | 0.38 | 0.47 |
2 - Good | 0.03 | 0.07 | 0.13 | 0.20 |
Table 2
A single simulated trial of 46 patients where “j” denotes patient ID, “g” denotes group designation, “${u_{j}}$” denotes time to toxicity, “${x_{j}}$” denotes dose administered, and “${y_{j}}$” denotes DLT status.
j | g | ${u_{j}}$ | ${x_{j}}$ | ${y_{j}}$ | j | g | ${u_{j}}$ | ${x_{j}}$ | ${y_{j}}$ |
1 | 1 | - | ${d_{1}}$ | 0 | 24 | 1 | - | ${d_{2}}$ | 0 |
2 | 1 | - | ${d_{2}}$ | 0 | 25 | 2 | - | ${d_{3}}$ | 0 |
3 | 2 | - | ${d_{3}}$ | 0 | 26 | 2 | - | ${d_{3}}$ | 0 |
4 | 2 | - | ${d_{4}}$ | 0 | 27 | 1 | - | ${d_{2}}$ | 0 |
5 | 1 | 1.33 | ${d_{4}}$ | 1 | 28 | 2 | 2.28 | ${d_{3}}$ | 1 |
6 | 1 | 1.22 | ${d_{4}}$ | 1 | 29 | 2 | - | ${d_{3}}$ | 0 |
7 | 2 | 1.82 | ${d_{4}}$ | 1 | 30 | 2 | - | ${d_{3}}$ | 0 |
8 | 1 | - | ${d_{2}}$ | 0 | 31 | 2 | - | ${d_{4}}$ | 0 |
9 | 1 | - | ${d_{1}}$ | 0 | 32 | 1 | - | ${d_{3}}$ | 0 |
10 | 2 | 2.01 | ${d_{4}}$ | 1 | 33 | 2 | - | ${d_{3}}$ | 0 |
11 | 1 | - | ${d_{1}}$ | 0 | 34 | 2 | - | ${d_{3}}$ | 0 |
12 | 1 | - | ${d_{1}}$ | 0 | 35 | 2 | - | ${d_{3}}$ | 0 |
13 | 1 | - | ${d_{1}}$ | 0 | 36 | 2 | - | ${d_{3}}$ | 0 |
14 | 2 | - | ${d_{3}}$ | 0 | 37 | 2 | - | ${d_{4}}$ | 0 |
15 | 1 | - | ${d_{1}}$ | 0 | 38 | 1 | - | ${d_{3}}$ | 0 |
16 | 1 | - | ${d_{1}}$ | 0 | 39 | 1 | 2.97 | ${d_{3}}$ | 1 |
17 | 2 | - | ${d_{3}}$ | 0 | 40 | 2 | - | ${d_{4}}$ | 0 |
18 | 1 | - | ${d_{2}}$ | 0 | 41 | 2 | - | ${d_{4}}$ | 0 |
19 | 2 | - | ${d_{3}}$ | 0 | 42 | 1 | - | ${d_{3}}$ | 0 |
20 | 1 | 1.55 | ${d_{2}}$ | 1 | 43 | 2 | - | ${d_{4}}$ | 0 |
21 | 1 | - | ${d_{2}}$ | 0 | 44 | 1 | - | ${d_{3}}$ | 0 |
22 | 2 | - | ${d_{3}}$ | 0 | 45 | 1 | - | ${d_{2}}$ | 0 |
23 | 1 | - | ${d_{2}}$ | 0 | 46 | 1 | - | ${d_{2}}$ | 0 |
$\hat{a}=0.017$ | Recommended dose for group 1: ${d_{3}}$ | ||||||||
Recommended dose for group 2: ${d_{4}}$ |
3.2 Simulation Setting
In this section, we evaluate the operating characteristics of our method using a simulation study that consists of three scenarios used the 2-sample TITE-CRM paper [22]. We compare the performance of the two methods using these scenarios. Salter et al. [22] specified a target probability of DLT at $\Theta =0.20$ for a treatment that consists of $k=6$ dose levels, ${d_{1}},\dots ,{d_{6}}$. A total of 32 participants were accrued to the study, with an anticipated distribution of 50% in Group 1 and 50% in Group 2. The DLT endpoint was defined over a 6-months follow-up period with a new participant being accrued every 0.5 months. The true DLT probabilities for each of these scenarios are provided in the lines labeled $R({d_{k}})$ in Table 4. The proposed skeleton values in Salter et al. [22] were used as prior guess of DLT probabilities.
Next, we examined the performance of the proposed design in trials in which three patient groups were being studied. Across seven different toxicity scenarios, we simulated trials accruing a total of 36 participants, with an expected allocation of one-third in each of the three groups. The true toxicity probabilities for each of these scenarios are provided in the lines labeled $R({d_{k}})$ in Table 5. The scenarios were constructed in a way to accommodate all possible clinical differences between the groups knowing that $MT{D_{1}}\le MT{D_{2}}\le MT{D_{3}}$. The proportion of MTD selection for all scenarios is a result of 1000 simulated trials generated using the programming language R. In each scenario, the starting dose was ${d_{1}}$ for all groups. Participants were followed for 6 months, and a new patient is accrued on a fixed scheme of every 0.5 months. For each shift model, we assigned a prior probability $p(m)=\frac{1}{M}$ and implemented a normal prior $N(0,{\sigma _{a}^{2}}=1.34)$ on the parameter a. We investigated two different failure time models of the Shift TITE-CRM. S-TITE U generated the failure times under a conditionally uniform model, and S-TITE E generated failure times under a truncated exponential model, with an upper limit restricted to less than the follow-up period of 6 months. We established the skeleton using a systematic approach [10], and adjusted the position of these values to correspond to each of the seven possible orders consistent with the shifts given in Table 3. We also provide simulation results evaluating patient allocation, varying sample sizes, target DLT rates, and accrual rates. These additional results are given in the supplementary material.
Table 3
Skeleton value by dose level for three groups.
Group 1 | Group 2 | Group 3 | ||||||||||
Model | ${d_{1}}$ | ${d_{2}}$ | ${d_{3}}$ | ${d_{4}}$ | ${d_{1}}$ | ${d_{2}}$ | ${d_{3}}$ | ${d_{4}}$ | ${d_{1}}$ | ${d_{2}}$ | ${d_{3}}$ | ${d_{4}}$ |
$m=1$ | 0.05 | 0.15 | 0.25 | 0.35 | 0.05 | 0.15 | 0.25 | 0.35 | 0.05 | 0.15 | 0.25 | 0.35 |
$m=2$ | 0.15 | 0.25 | 0.35 | 0.45 | 0.05 | 0.15 | 0.25 | 0.35 | 0.05 | 0.15 | 0.25 | 0.35 |
$m=3$ | 0.25 | 0.35 | 0.45 | 0.55 | 0.05 | 0.15 | 0.25 | 0.35 | 0.05 | 0.15 | 0.25 | 0.35 |
$m=4$ | 0.15 | 0.25 | 0.35 | 0.45 | 0.15 | 0.25 | 0.35 | 0.45 | 0.05 | 0.15 | 0.25 | 0.35 |
$m=5$ | 0.25 | 0.35 | 0.45 | 0.55 | 0.15 | 0.25 | 0.35 | 0.45 | 0.05 | 0.15 | 0.25 | 0.35 |
$m=6$ | 0.25 | 0.35 | 0.45 | 0.55 | 0.25 | 0.35 | 0.45 | 0.55 | 0.05 | 0.15 | 0.25 | 0.35 |
3.3 Simulation Results
In the two group setting, the proportion of MTD selection over the three scenarios are provided in Table 4 where the results of the Shift TITE-CRM and the 2-sample TITE CRM are compared. The two methods had comparable ranges of proportion of correct selection (PCS) of the group-specific MTDs over the 3 scenarios. The 2-sample TITE CRM generated a PCS range of $\{0.37,0.58\}$ and the Shift TITE-CRM generated a PCS range of $\{0.426,0.579\}$. In the 1st scenario, ${d_{2}}$ and ${d_{3}}$ corresponded to true probabilities of DLT closest to $\Theta =0.20$ for Groups 1 and 2 respectively. The PCS in both groups were slightly higher with the 2-sample TITE CRM approach compared to the Shift-TITE CRM approach. In the 2nd scenario, the simulated trial assumed a greater clinical difference between the two groups, where ${d_{2}}$ and ${d_{4}}$ corresponded to true probabilities of DLT closest to the target DLT rate for Groups 1 and 2 respectively. The Shift TITE-CRM resulted in a higher $PC{S_{S}}=0.493$ relative to the 2-sample TITE CRM $PC{S_{2Sample}}=0.44$ in Group 1. In Group 2, the Shift TITE-CRM outperforms the 2-sample TITE CRM with approximately 10% difference in PCS (0.466 vs. 0.37). This specific scenario indicates that the proposed design is outperforming the 2-sample TITE CRM when there is a two level shift in MTD between the two groups. In the 3rd scenario, the two groups were assumed the same in terms of their probabilities of DLT over the entire range of dose levels. Therefore, the dose that corresponded to probability of DLT closest to $\Theta =0.20$ is ${d_{2}}$ in both groups. The two methods resulted in similar operating characteristics with $PC{S_{2Sample}}=0.45$ for both groups, $PC{S_{S}}=0.492$ and $PC{S_{S}}=0.426$ for Groups 1 and 2 respectively.
Table 4
The proportion of MTD Selection for Shift Models TITE and 2-sample TITE-CRM at each dose where “$R({d_{k}})$” denotes the true probability of DLT, “S-TITE” denotes the Shift TITE-CRM design, and “2-Sample TITE” denotes the 2-Sample TITE CRM design.
Group 1 | Group 2 | ||||||||||||
Scenario | ${d_{1}}$ | ${d_{2}}$ | ${d_{3}}$ | ${d_{4}}$ | ${d_{5}}$ | ${d_{6}}$ | ${d_{1}}$ | ${d_{2}}$ | ${d_{3}}$ | ${d_{4}}$ | ${d_{5}}$ | ${d_{6}}$ | |
1 | $R({d_{k}})$ | 0.08 | 0.20 | 0.35 | 0.50 | 0.70 | 0.80 | 0.01 | 0.05 | 0.18 | 0.40 | 0.55 | 0.70 |
S-TITE | 0.303 | 0.458 | 0.223 | 0.016 | 0.000 | 0.000 | 0.008 | 0.193 | 0.579 | 0.205 | 0.015 | 0.000 | |
2-Sample TITE | 0.22 | 0.49 | 0.25 | 0.05 | 0.000 | 0.000 | 0.001 | 0.15 | 0.58 | 0.20 | 0.04 | 0.000 | |
2 | $R({d_{k}})$ | 0.02 | 0.19 | 0.31 | 0.45 | 0.51 | 0.63 | 0.03 | 0.05 | 0.11 | 0.21 | 0.39 | 0.50 |
S-TITE | 0.105 | 0.493 | 0.319 | 0.078 | 0.005 | 0.000 | 0.000 | 0.057 | 0.354 | 0.466 | 0.117 | 0.006 | |
2-Sample TITE | 0.12 | 0.44 | 0.35 | 0.09 | 0.02 | 0.000 | 0.000 | 0.05 | 0.35 | 0.37 | 0.18 | 0.05 | |
3 | $R({d_{k}})$ | 0.07 | 0.23 | 0.31 | 0.35 | 0.45 | 0.57 | 0.07 | 0.23 | 0.31 | 0.35 | 0.45 | 0.57 |
S-TITE | 0.276 | 0.492 | 0.191 | 0.037 | 0.004 | 0.000 | 0.122 | 0.426 | 0.345 | 0.083 | 0.023 | 0.001 | |
2-Sample TITE | 0.25 | 0.45 | 0.23 | 0.07 | 0.03 | 0.00 | 0.25 | 0.45 | 0.25 | 0.04 | 0.01 | 0.000 |
Table 5
The proportion of MTD selection for each dose across the 7 scenarios where “$R({d_{k}})$” denotes the true probability of toxicity, “S-TITE U” denotes Shift TITE-CRM design with failure times generated uniformly, and “S-TITE E” denotes Shift TITE-CRM design with failure times generated exponentially.
Group 1 | Group 2 | Group 3 | |||||||||||
Scenario | ${d_{1}}$ | ${d_{2}}$ | ${d_{3}}$ | ${d_{4}}$ | ${d_{1}}$ | ${d_{2}}$ | ${d_{3}}$ | ${d_{4}}$ | ${d_{1}}$ | ${d_{2}}$ | ${d_{3}}$ | ${d_{4}}$ | |
1 | $R({d_{k}})$ | 0.05 | 0.15 | 0.25 | 0.35 | 0.05 | 0.15 | 0.25 | 0.35 | 0.05 | 0.15 | 0.25 | 0.35 |
S-TITE U | 0.064 | 0.324 | 0.472 | 0.140 | 0.008 | 0.243 | 0.496 | 0.253 | 0.002 | 0.142 | 0.437 | 0.419 | |
S-TITE E | 0.031 | 0.338 | 0.458 | 0.173 | 0.005 | 0.231 | 0.477 | 0.287 | 0.001 | 0.156 | 0.418 | 0.425 | |
2 | $R({d_{k}})$ | 0.15 | 0.25 | 0.35 | 0.45 | 0.05 | 0.15 | 0.25 | 0.35 | 0.05 | 0.15 | 0.25 | 0.35 |
S-TITE U | 0.262 | 0.416 | 0.269 | 0.053 | 0.035 | 0.321 | 0.443 | 0.201 | 0.009 | 0.214 | 0.403 | 0.374 | |
S-TITE E | 0.245 | 0.432 | 0.252 | 0.071 | 0.018 | 0.316 | 0.444 | 0.222 | 0.007 | 0.206 | 0.426 | 0.361 | |
3 | $R({d_{k}})$ | 0.22 | 0.33 | 0.42 | 0.52 | 0.05 | 0.17 | 0.27 | 0.37 | 0.03 | 0.12 | 0.24 | 0.40 |
S-TITE U | 0.472 | 0.364 | 0.142 | 0.022 | 0.067 | 0.361 | 0.437 | 0.135 | 0.009 | 0.250 | 0.474 | 0.267 | |
S-TITE E | 0.442 | 0.395 | 0.142 | 0.021 | 0.035 | 0.341 | 0.435 | 0.189 | 0.000 | 0.199 | 0.472 | 0.329 | |
4 | $R({d_{k}})$ | 0.13 | 0.27 | 0.45 | 0.55 | 0.16 | 0.26 | 0.36 | 0.46 | 0.05 | 0.15 | 0.25 | 0.35 |
S-TITE U | 0.388 | 0.480 | 0.121 | 0.011 | 0.222 | 0.460 | 0.258 | 0.060 | 0.034 | 0.313 | 0.403 | 0.250 | |
S-TITE E | 0.338 | 0.506 | 0.145 | 0.011 | 0.168 | 0.488 | 0.272 | 0.072 | 0.028 | 0.303 | 0.380 | 0.289 | |
5 | $R({d_{k}})$ | 0.22 | 0.32 | 0.40 | 0.52 | 0.15 | 0.24 | 0.33 | 0.43 | 0.05 | 0.15 | 0.25 | 0.40 |
S-TITE U | 0.522 | 0.358 | 0.110 | 0.010 | 0.219 | 0.415 | 0.309 | 0.057 | 0.029 | 0.348 | 0.427 | 0.196 | |
S-TITE E | 0.522 | 0.348 | 0.117 | 0.013 | 0.224 | 0.416 | 0.290 | 0.070 | 0.019 | 0.315 | 0.455 | 0.211 | |
6 | $R({d_{k}})$ | 0.25 | 0.35 | 0.45 | 0.55 | 0.25 | 0.35 | 0.45 | 0.55 | 0.05 | 0.15 | 0.25 | 0.35 |
S-TITE U | 0.709 | 0.244 | 0.045 | 0.002 | 0.494 | 0.375 | 0.118 | 0.013 | 0.060 | 0.415 | 0.386 | 0.139 | |
S-TITE E | 0.698 | 0.251 | 0.049 | 0.002 | 0.509 | 0.357 | 0.120 | 0.014 | 0.053 | 0.409 | 0.391 | 0.147 | |
7 | $R({d_{k}})$ | 0.24 | 0.40 | 0.54 | 0.66 | 0.11 | 0.26 | 0.41 | 0.55 | 0.03 | 0.10 | 0.25 | 0.40 |
S-TITE U | 0.696 | 0.266 | 0.037 | 0.001 | 0.293 | 0.461 | 0.227 | 0.019 | 0.023 | 0.359 | 0.493 | 0.125 | |
S-TITE E | 0.607 | 0.336 | 0.055 | 0.002 | 0.187 | 0.501 | 0.280 | 0.032 | 0.010 | 0.280 | 0.513 | 0.197 |
In the three group setting, the proportion of MTD selection across the seven scenarios are provided in Table 5. The range of PCS of the group-specific MTDs over the seven scenarios is $\{0.380,0.709\}$. In scenario 1, it is assumed that there is no difference between the groups. In Group 1, the S-TITE U generated a PCS = 0.472 similar to that of Group 2 with PCS = 0.496 and Group 3 with PCS = 0.437. As for the S-TITE E, all of the groups showed a slight reduction in PCS. In scenario 2, it is assumed that there is a one-shift difference in Groups 2 and 3 relative to Group 1. The PCS decreased on average across all three groups compared to scenario 1, while still maintaining the highest percentage of MTD selection at the correct doses across all groups. In scenario 3, we assumed a greater shift for Groups 2 and 3 relative to Group 1. S-TITE U generated a PCS = 0.472 for Group 1, PCS = 0.437 for Group 2, and PCS = 0.474 for Group 3. In scenario 4, we assumed that only Group 3 is clinically different to Group 1. The simulations generated a PCS range of {0.415,0.522}. The 5th and 7th scenarios are considered the most heterogeneous in regards to their group-specific MTDs; all groups are clinically different from one another. For these scenarios, the proposed method would indicate that the “true” shift model is $m=5$ provided in Table 3. Across these two scenarios, the PCS for Group 1 is 0.522 and 0.696 under S-TITE U, and the PCS is 0.522 and 0.607 under S-TITE E. Groups 2 and 3 generated a smaller PCS that ranges from 0.415 to 0.513 across the two methods. In the 6th scenario, Groups 1 and 2 are assumed to have ${d_{1}}$ as the correct dose, shifted two levels away from that of Group 3 ${d_{3}}$. The PCS of the MTD was high for Group 1 under both uniform and exponential assumptions. S-TITE U generated a PCS = 0.709, and S-TITE E generated a PCS = 0.698. In Groups 2 and 3, the PCS range was {0.386,0.509} across the two methods. Overall, the results indicate that Shift TITE-CRM is performing well in terms of correctly recommending group-specific MTDs. The Shift TITE-CRM also demonstrates good operating characteristics in terms of patient allocation to doses at and around group-specific MTDs, as detailed in the supplementary material.
4 Conclusions
The hybrid design (Shift TITE-CRM) presented in this article is a practical tool in Phase I clinical trials that addresses the issue of patient heterogeneity and late-onset toxicities simultaneously in a single trial. The proposed design is presented in a Bayesian framework since using likelihood estimation presents additional challenges in more than one group setting. Prior to estimation, at least one toxic and one non-toxic outcome have to be observed to fit the model. So the design for the first stage has to be specified and there is not a clear optimal design to use, especially in presence of patient heterogeneity. We examined the performance of the Shift TITE-CRM in terms of correctly selecting group-specific MTDs across seven different scenarios and compared its performance to the 2-sample TITE CRM. Each of the three 2-sample TITE CRM applications were competitive with the Shift TITE-CRM in terms of the PCS except when there was a shift greater than one in MTD level. In this case, our proposed Shift TITE-CRM demonstrated superior performance. We also examined the performance with different sample sizes, accrual rates, and target toxicity rates provided in the supplementary material. We expect to have at most three groups in Phase I trials and the proposed design generates good operating characteristics in a reasonable range of sample sizes for such trials. In our simulation study, we assumed a sample size of 36 patients separated into three groups. In a real-life phase I trial, it is recommended to have a larger sample size when patients are separated into more than two groups which will yield better precision in selecting the correct MTD as shown in the supplemental material.
For future work, we are exploring the use of this method for partially ordered groups, where group order is known between some but not all groups. We will also build an R package which will serve as a tool to conduct the Shift TITE-CRM design in Phase I clinical trials. Salter et al. [21] developed a SAS program to accommodate two groups using the TITE-CRM and likelihood estimation. In the R package, statisticians and clinicians will be able to simulate operating characteristics at the design stage and input their new data after accruing patients to recommend a dose to the next cohort of patients at the trial conduct stage. The skeleton values should be constructed according to the specific trial which should be straightforward when knowing the complete ordering of the groups. Note that an increase in the number of groups and/or dose levels leads to an increase in the required sample size to ensure good operating characteristics. An increase in the dimension also leads to the need to specify more skeleton values for the shift models, which can also add complexity. In partially ordered groups, the construction of the skeleton values could become more complex in some scenarios. In trials accounting for patient heterogeneity, borrowing information across groups early in the trial can lead to rapid dose escalation in less sensitive groups. Building on the approach of [28], future implementations are exploring the integration of asymmetric loss functions within the CRM shift model framework to impose penalties on overdosing. This approach aims to mitigate the risk of overdosing in each group, particularly during the early stages of the study when sample sizes are small.