The New England Journal of Statistics in Data Science logo


  • Help
Login Register

  1. Home
  2. To appear
  3. Modeling the Mean with Time as a Categor ...

The New England Journal of Statistics in Data Science

Submit your article Information Become a Peer-reviewer
  • Article info
  • Full article
  • More
    Article info Full article

Modeling the Mean with Time as a Categorical Variable in Longitudinal Designs for Smaller-Sized Clinical Trials: A Case Studies Approach Based on Three Phase 3 Clinical Trials in Rare Diseases
David Zahrieh   Yi Wang   Jennifer Le-Rademacher 1     All authors (4)

Authors

 
Placeholder
https://doi.org/10.51387/26-NEJSDS96
Pub. online: 28 January 2026      Type: Case Study, Application, And/or Practice Article      Open accessOpen Access
Area: Biomedical Research

1 Contributed equally.

Accepted
8 December 2025
Published
28 January 2026

Abstract

Background: Generalized estimating equations (GEE) and mixed-model repeated measures (MMRM) can handle longitudinal continuous outcomes when modeling the mean with time included categorically. Due to small sample sizes in rare diseases, a compound symmetry (CS) covariance pattern is sometimes adopted. In this setting, there is scant literature in the rare disease community that provide practical advice about the use of both methods based on real datasets from trials conducted in rare diseases, including when to use the sandwich variance estimator with or without a bias correction.
Methods: To fill this gap, we simulated data from three longitudinal, phase 3 trials conducted in rare diseases to jointly review the operating characteristics: a randomized trial in GNE myopathy (N = 44 placebo; N = 45 treatment) and pediatric X-linked hypophosphatemia (XLH) (N = 32 control; N = 29 treatment), and a single-arm in adult XLH (N = 14).
Results: In each trial, few participants discontinued; furthermore, <1.5% of the measurement occasions were missing outcome data, no missing outcome data pattern occurred in >1 participant, and the missing completely at random (MCAR) assumption was clinically justified. In the two trials with nonconstant variances/covariances over time, bias-corrected sandwich variance estimators with t-based inference was needed with MMRM and GEE. If the CS pattern was a good approximation, as seen in the pediatric XLH trial, then model-based standard errors with t-based inference performed well with both methods
Conclusion: Based on a review of three case studies, the MCAR assumption was plausible and missingness low. When modeling the mean response with time included categorically and with a parsimonious CS covariance structure, each method required careful consideration with its use.

1 Introduction

In longitudinal clinical trials the number and the collection times of the repeated measurements are oftentimes the same for all participants. Consequently, there is usually enough replications at each measurement occasion and within each treatment group, in the case of moderately sized or large randomized trials, to estimate the true underlying covariance matrix. Mixed-model repeated measures (MMRM) – a likelihood-based regression model for modeling the mean with time included categorically – has been generally recommended for the primary analysis of continuous outcomes with longitudinal trial designs with the use of unstructured (UN) modeling of within-patient covariances [1]. With the focus on modeling the mean response with time included categorically, the corresponding generalized estimating equations (GEE) method can provide an alternative (i.e., marginal models with an identity link). When the longitudinal measurements are multivariate Gaussian, the solution to the estimating equations is the generalized least squares estimator of the regression coefficients from MMRM [2]. Like MMRM, GEE can handle missing data easily, albeit provided that the longitudinal outcomes satisfy the missing completely at random (MCAR) assumption [3].
A compelling argument can sometimes be made to support the MCAR assumption in well-conducted trials in rare diseases. Patient centeredness is central to trial design. Involving patients, patient advocates, and family members in trial design can increase retention and ensure adequate assessment of trial outcomes [4, 5]. Compared with non-rare disease trials, trials in rare diseases tend to enroll fewer patients, address a high unmet medical need, and use patient-centered or surrogate endpoints to assess efficacy [6, 7]. Such design characteristics can mitigate dropouts and incomplete data. The use of external controls, shorter placebo-phase duration, and designs that assure participants that they will receive the experimental treatment favor patient participation and retention and further reduce the risk of dropout [8, 9]. However, if MCAR is not clinically defensible and dropout is assumed to be missing at random (MAR), the estimating equations need to be weighted by the propensity to dropout [10]. Whereas it is generally appreciated that MMRM provides a valid framework under the MAR assumption, it is not well recognized that this assumption does not hold if the covariance pattern has not been correctly specified [11]; and, therefore, may lead to a false sense of security with its use.
Little priori information about the covariance pattern is available in rare diseases. Furthermore, there are no reliable ways to identify the best covariance model in small samples [12]. If the data allow it, an UN covariance pattern should be considered; it minimizes the risk of a misspecified covariance structure. If convergence problems in numerical optimization are encountered, strategies to achieve convergence should be considered before adopting a parsimonious covariance pattern [1, 13, 14]. In some controlled or single-arm trials in rare diseases, however, there simply are not enough participants relative to the number of time points to estimate the covariance terms with an UN covariance pattern. In either case, and because we regard the correlation structure as a nuisance, the compound symmetry (CS) covariance pattern could pragmatically be considered in trials in rare diseases, which recognizes the dependence at the cost of only one extra covariance parameter.
Modeling time categorically imposes no structure on the mean response over time. It is unlikely, therefore, that the model for the mean response is misspecified; MMRM and GEE estimates of the regression coefficients will be valid. However, because the CS covariance pattern will likely be misspecified, the model-based standard errors (SEs) for MMRM and GEE estimates of the regression coefficients will be invalid [11, 15]. This is true in small and large samples [16]. With both methods, valid SEs of the regression coefficients can be based on the sandwich variance estimator, which are robust to any misspecification of the covariance [15]. Of course, the sandwich variance estimator is based on large-sample properties and its performance deteriorates for small samples [17–19]. Small-sample corrections for these SEs are needed to correct for bias.
Researchers have developed and compared a variety of bias-corrected SEs across a range of simulated scenarios, albeit separately for MMRM and GEE [18–27]. Furthermore, as opposed to relying on large-sampling properties of the regression coefficients when building confidence intervals (CIs) and testing hypotheses for the regression coefficients, $t-$ and F-distributions using degrees of freedom (DOF) approximations to account for uncertainty in estimation of the regression coefficients can be used. Because it is not always clear what defines a small sample, a suitable bias-corrected strategy should be considered for trials conducted in rare diseases along with the choice to use either MMRM or GEE.
While there are numerous statistical publications on using these methods spanning decades, there is a dearth of literature in the rare disease community that uses real data to collectively report on both methods applied to continuous outcomes collected longitudinally from trials in rare diseases that is succinctly presented in an accessible manner. Analysts in rare disease research may be confused on which method to apply in practice and how to use the chosen method correctly. Coupled with the need to specify the primary analysis method in a protocol before the trial begins [28], a case studies approach to prespecifying a modeling strategy in rare diseases was taken. Herein, we considered three unique, albeit representative longitudinal trial designs with continuous efficacy outcomes conducted in rare diseases. We report the degree of missing data, evaluate the missing data pattern, and assess common characteristics of the empirical covariance pattern of the repeated measurements. We then simulated data directly from these trials to share with other analysts in the rare disease community a concise review of the operating characteristics of these methods based on real data. We provide a succinct summary and practical advice to complement the statistical literature on when and how to apply these methods in practice.

2 Methods

2.1 Efficacy Outcomes

The primary (or important secondary) efficacy outcome from three completed, longitudinal, phase 3 FDA registration trials conducted in rare diseases were selected (Table 1): changes in the HHD upper extremity composite score, measured at baseline and every 8 weeks through study week 48 in UX001-CL301, a phase 3 randomized study evaluating sialic acid extended-release for GNE myopathy (N = 44 placebo; N = 45 treatment) [29]; changes in rickets severity, assessed by the Radiographic Global Impression of Change (RGI-C) global score and measured at weeks 40 and 64 in UX023-CL301, a randomized phase 3 trial (N = 32 control; N = 29 treatment) comparing the efficacy and safety of burosumab to conventional therapy for XLH [30]; and changes in alkaline phosphatase (ALP), measured at baseline and at weeks 4, 12, 24, 36, and 48 in UX023-CL304, a phase 3 single arm study in N = 14 participants investigating the effects of burosumab on osteomalacia in adults with X-linked hypophosphatemia (XLH) [31]. For each outcome, we report the degree and pattern of missing data, evaluated the MCAR assumption using Little’s MCAR Test [32], and assessed the empirical covariance pattern of the changes from baseline.
Table 1
The three longitudinal, phase 3 clinical trials conducted in rare diseases, which formed the basis for the simulation study.
Clinical Trial No. Post-BL Occasions Trial Duration Sample Size Primary (or Important Secondary) Efficacy Endpoint Prospectively Defined Primary Analysis ClinTrials.Gov
Randomized or Enrolled Primary Analysis Set
UX001-CL301 6 48 Weeks 89 88 Primary endpoint: Change from baseline at Week 48 in the HHD Upper Extremity (UE) Composite Score GEE, Change from Baseline, compound symmetry, adjusting for sex, region, and baseline value NCT02377921
UX023-CL301 (Pediatrics) 2 64 Weeks 61 61 Primary endpoint: Rickets severity, assessed by the Radiographic Global Impression of Change (RGI-C) global score and measured at Week 40 GEE, Change from Baseline, compound symmetry, adjusting for age and the baseline total rickets severity score NCT02915705
UX023-CL304 (Adults) 5 48 Weeks 14 14 Important secondary endpoint: Change from baseline at Week 48 in alkaline phosphatase (ALP) GEE, Change from Baseline, compound symmetry, adjusting for the baseline value NCT02537431
BL = Baseline; GEE = generalized estimating equations.
Trials UX001-CL301 and UX023-CL301 were randomized controlled trials, while UX023-CL304 was a single arm trial.
The prespecified Primary Analysis Set was defined as all randomized (or enrolled) participants with a baseline measurement and at least one post-baseline measurement.
Standard GEE was applied such that estimation was based on method-of-moments assuming asymptotic normality using the SAS GENMOD procedure.
The prespecified GEE method, which was the primary analysis method and specified in the protocol prior to seeing the data, modeled the longitudinal changes from baseline assuming a CS covariance pattern adjusting for the corresponding baseline values (baseline total rickets severity score in UX023-CL301) and other prespecified potential confounding factors; the sandwich variance estimator of the regression coefficients without a small-sample correction was used and also prespecified. The SAS procedure GENMOD was used such that estimation was based on the method-of-moments and CIs and tests assumed asymptotic normality [so-called standard GEE; Liang and Zeger (1986)] [3]. In each trial, the prespecified GEE method was applied to the primary analysis set, defined in the protocol as all randomized (or enrolled) participants who had a baseline measurement and ≥1 post-baseline measurement and was used to assess a treatment effect at a single time point. The number of participants in each primary analysis set was 88 (43 placebo; 45 treatment), 61 (32 control; 29 treatment), and 14 in UX001-CL301, UX023-CL301, and UX023-CL304, respectively (Table 1).
Table 2
Data generating model for UX001-CL301, UX023-CL301, and UX023-CL304.
Clinical Trial MMRM
Data Generating Model
UX001-CL301 $\begin{aligned}{}E\left({Y_{i}}\mid \hspace{2.5pt}{X_{i}}\right)={\beta _{0}}+{\beta _{1}}I\left(\mathrm{G}=1\right)& +{\beta _{2}}I\left(\mathrm{W}=16\right)+{\beta _{3}}I\left(\mathrm{W}=24\right)+{\beta _{4}}I\left(\mathrm{W}=32\right)+{\beta _{5}}I\left(\mathrm{W}=40\right)+{\beta _{6}}I\left(\mathrm{W}=48\right)+{\beta _{7}}I\left(\mathrm{G}=1\right)\times I\left(\mathrm{W}=16\right)\\ {} & +\hspace{2.5pt}{\beta _{8}}I\left(\mathrm{G}=1\right)\times I\left(\mathrm{W}=24\right)+{\beta _{9}}I\left(\mathrm{G}=1\right)\times I\left(\mathrm{W}=32\right)+{\beta _{10}}I\left(\mathrm{G}=1\right)\times I\left(\mathrm{W}=40\right)\\ {} & +{\beta _{11}}I\left(\mathrm{G}=1\right)\times I\left(\mathrm{W}=48\right)+{\gamma _{1}}\mathrm{Base}+{\gamma _{2}}I\left(\mathrm{Sex}=\mathrm{Male}\right)+{\gamma _{3}}I\left(\mathrm{Region}=\mathrm{NonUS}\right)\end{aligned}$
Repeated measurements taken on $i=1,\hspace{2.5pt}\dots ,\hspace{2.5pt}88$ participants (45 treatment; 43 placebo) at the same set of 6 post-baseline occasions. ${Y_{i}}$ = Change from baseline. I represents an indicator variable. G = Group; W = Week; Base = Baseline value; Reference categories for variables G, W, Sex, and Region were placebo, week 8, female, and US, respectively. Unstructured (UN) covariance structure was assumed. The estimates of the regression coefficients and variance / covariance matrix from fitting the MMRM model to the trial data were used as the true values. The covariates Base, Sex, and Region were generated from a distribution reasonably consistent with the trial data: a truncated N(56.14, 27.954), 55% males, and 66% Non-US. 100,000 data sets generated; on average, in any given simulated data set, 1.4% of the data were missing completely at random. The regression coefficients of primary interest are ${\beta _{1}}$ and ${\beta _{11}}$; specifically, the null hypothesis ${H_{0}}:\hspace{2.5pt}{\beta _{1}}+{\beta _{11}}=0$ corresponds to no treatment difference in the mean change from baseline at week 48 adjusting for the other covariates; this is similar to testing ${H_{0}}:\hspace{2.5pt}\mathrm{Treament}\hspace{2.5pt}\mathrm{Difference}\hspace{2.5pt}\text{in}\hspace{2.5pt}\mathrm{LS}\hspace{2.5pt}\mathrm{Means}\hspace{2.5pt}\mathrm{at}\hspace{2.5pt}\mathrm{Week}\hspace{2.5pt}48=0$ (i.e., the marginal mean change from baseline is the same in each arm).
UX023-CL301 (Pediatrics) $\begin{aligned}{}E\left({Y_{i}}\mid \hspace{2.5pt}{X_{i}}\right)={\beta _{0}}+{\beta _{1}}I\left(\mathrm{G}=2\right)& +{\beta _{2}}I\left(\mathrm{W}=64\right)+{\beta _{3}}I\left(\mathrm{G}=2\right)\times I\left(\mathrm{W}=64\right)\\ {} & +{\gamma _{1}}\mathrm{TOTSCORE}+{\gamma _{2}}I\left(\mathrm{Age}=2\right)\end{aligned}$
Repeated measurements taken on $i=1,\hspace{2.5pt}\dots ,61$ participants (32 conventional therapy; 29 burosumab) at the same set of 2 post-baseline occasions. ${Y_{i}}$ = Change from baseline. I represents an indicator variable. G = Group; W = Week; TOTSCORE = Baseline Total Rickets Severity Score; Reference categories for variables G, W, and Age were burosumab, week 40, and < 5 years, respectively. Unstructured (UN) covariance structure was assumed. The estimates of the regression coefficients and variance / covariance matrix from fitting the MMRM model to the trial data were used as the true values. The covariates TOTSCORE and Age were generated from a distribution reasonably consistent with the trial data: a truncated N(3.18, 1.057) and 57% ≥ 5 years old. 100,000 data sets were generated; on average, in any given simulated data set, 0.5% of the data were missing completely at random. The regression coefficient of primary interest, ${\beta _{1}}$, corresponds to the treatment difference in the mean change from baseline at week 40 adjusting for the other covariates; this is similar to testing ${H_{0}}:\hspace{2.5pt}\mathrm{Treament}\hspace{2.5pt}\mathrm{Difference}\hspace{2.5pt}\text{in}\hspace{2.5pt}\mathrm{LS}\hspace{2.5pt}\mathrm{Means}\hspace{2.5pt}\mathrm{at}\hspace{2.5pt}\mathrm{Week}\hspace{2.5pt}40=0$ (i.e., the marginal mean change from baseline is the same in each arm).
UX023-CL304 (Adults) $\begin{aligned}{}E\left({Y_{i}}\mid \hspace{2.5pt}{X_{i}}\right)={\beta _{0}}+{\beta _{1}}I\left(\mathrm{W}=12\right)& +{\beta _{2}}I\left(\mathrm{W}=24\right)+{\beta _{3}}I\left(\mathrm{W}=36\right)+{\beta _{4}}I\left(\mathrm{W}=48\right)\\ {} & +{\gamma _{1}}\mathrm{Base}\end{aligned}$
Repeated measurements taken on $i=1,\hspace{2.5pt}\dots ,14$ participants at the same set of 5 post-baseline occasions. ${Y_{i}}$ = Change from baseline in alkaline phosphatase. I represents an indicator variable. W = Week; Base = Baseline value; Reference category for the variable W was study week 4. Unstructured (UN) covariance structure was assumed. The estimates of the regression coefficients and variance / covariance matrix from fitting the MMRM model to the trial data were used as the true values. The covariate Base was generated from a distribution reasonably consistent with the trial data: a truncated N(113.14, 51.600). 500,000 data sets were generated; on average, in any given simulated data set, 1.2% of the data were missing completely at random. The regression coefficients of primary interest are ${\beta _{0}}$ and ${\beta _{4}}$; specifically, the null hypothesis ${H_{0}}:\hspace{2.5pt}{\beta _{0}}+{\beta _{4}}=0$ corresponds to no difference in the mean change from baseline at week 48 adjusting for the baseline value. This is similar to testing ${H_{0}}:\hspace{2.5pt}\mathrm{LS}\hspace{2.5pt}\mathrm{Mean}\hspace{2.5pt}\mathrm{at}\hspace{2.5pt}\mathrm{Week}\hspace{2.5pt}48=0$ (i.e., the marginal mean change from baseline equals zero).

2.2 Simulation Study

The aim was to understand the operating characteristics of the prespecified GEE method to estimation and MMRM applied to each trial, and to recommend when to use the sandwich variance estimator with or without a small-sample correction with both analytic approaches. Multivariate Gaussian longitudinal outcomes of changes from baseline were generated with a fixed covariance. For a linear model with a given covariance matrix, MMRM and GEE (marginal model with an identify link function) will give identical results of the regression coefficients; however, the methods differ in how they estimate the covariance [14]. Because GEE does not use likelihood methods, the estimated “model” is incomplete and not suitable for simulation. To study the operating characteristics of both methods, data were generated from an estimated MMRM fit to each dataset’s outcome (Appendix Table 1). For the two-arm trials (UX001-CL301 and UX023-CL301), the parameterization of the mean part of the model included an intercept, indicators for time, a group indicator, the two-way interaction between the indicators for time and the group indicator, the baseline value and a set of protocol-specified covariates. For the single arm trial (UX023-CL304), the mean part of the model comprised an intercept, indicators for time, and the baseline value. The baseline values and the protocol-specified covariates were generated from a distribution consistent with what was seen in the data. The models and primary statistical hypotheses expressed in mathematical terms are shown in Table 2. Based on re-analysis of the trials’ outcomes, we successfully estimated an UN covariance pattern. Compared with CS, the Akaike Information Criterion (AIC) [33] expressed a strong preference [34] (difference in AIC>10) for an UN covariance pattern in UX001-CL301 and UX023-CL304. The lack of a preference (difference in AIC<2) for an UN covariance pattern in UX023-CL301 notwithstanding, a fixed UN covariance pattern was used in data generation for the longitudinal outcomes from UX001-CL301, UX023-CL304, and UX023-CL301. To generate data that were MCAR, we removed each datum independently of the other data with probability 1.4%, 0.5%, and 1.2% in datasets simulated from UX001-CL301, UX023-CL301, and UX023-CL304, respectively. These missing data percentages were based on the real trial data; more dialogue is provided on this subject in the results section.
Table 3
For each dataset generated in the simulation study, the following models were applied.
Clinical Trial Model Method Covariance Pattern Standard Error Adjustment Small-Sample Correction to Standard Errors Degree of Freedom (DOF) in t Tests and t-based Confidence Limits
UX001-CL301 & UX023-CL301 (Pediatrics) & UX023-CL304 (Adults) M0 MMRM UN Kenward-Roger No Kenward-Roger approximation for the DOF based on the SE adjustment
M1 MMRM CS None, Model-based No Between-Within method DOF by Schluchter and Elashoff (1990)
M2 MMRM CS Sandwich Estimator No
M3 MMRM CS Sandwich Estimator Mancl and DeRouen (2001)
M4 GEE CS None, Model-based No Asymptotic normal distribution (standard GEE) with z-based inference
M5* GEE CS Sandwich Estimator No
M6 GEE CS Sandwich Estimator Mancl and DeRouen (2001)
M7 GEE CS None, Model-based No Between-Within method DOF by Schluchter and Elashoff (1990)
M8 GEE CS Sandwich Estimator No
M9 GEE CS Sandwich Estimator Mancl and DeRouen (2001)
GEE = generalized estimating equations; MMRM = mixed model repeated measures; SE = standard error; DOF = degrees of freedom; UN = Unstructured; CS = Compound symmetry.
* Prespecified GEE method, which was the primary analysis method defined in the protocol for each trial.
  • Note 1. Trials UX001-CL301 (N = 88) and UX023-CL301 (N = 61) were randomized controlled trials, while UX023-CL304 (N = 14) was a single arm trial.
  • Note 2. The SAS procedure PROC GLIMMIX with METHOD = RSPL was used to fit the MMRM models. The SAS Keyword EMPIRICAL = FIRORES was used to obtain the bias-corrected SEs by Mancl and DeRouen (2001) made to the empirical/robust standard errors. The SAS Keyword DDFM = BETWITHIN was used to obtain the between-within DOF approximation by Schluchter and Elashoff (1990): The DOF approximation used for the between and within-subject effects were ${N_{1}}-{p_{1}}$ and ${N_{2}}-({N_{1}}+{p_{2}})$, respectively; here, ${N_{1}}=$ number of analyzable subjects; ${N_{2}}=$ number of nonmissing observations; ${p_{1}}=$ number of between-subject effects + intercept; ${p_{2}}=$ number of within-subject effects. For hypothesis tests that involved within-subject effects, ${N_{2}}-({N_{1}}+{p_{2}})$ DOF was used.
  • Note 3. The standard GEE method of moments approach was used that mirrors the SAS procedure PROC GENMOD, albeit with the use of the GEE WITH DIAGNOSTICS (Version 1.06) SAS/IML macro by John Preisser, University of North Carolina-Chapel Hill; the SAS/IML macro calculates the model-based, empirical, and bias-corrected SEs (Mancl and DeRouen, 2001).
  • Note 4. All models adjusted for the prespecified covariates.
Table 3 lists the modeling scenarios studied. MMRM was estimated using restricted maximum likelihood with the SAS procedure GLIMMIX. MMRM with unstructured covariance (Model M0) served as the benchmark. Here, the Kenward-Roger method [35] was considered, which combines a SE adjustment with a Satterthwaite-type DOF approximation. Models M1, M2, and M3 represented an MMRM with CS covariance pattern that, respectively, relied on model-based, sandwich-based, and bias-corrected sandwich-based standard errors by Mancl and DeRouen (2001) [19] by specifying the EMPIRICAL = FIRORES option in the GLIMMIX statement. For models M1-M3, inferences were based on the t-distribution using the between-within method [36] (DDFM = BETWITHIN) for computing DOF. The DOF approximation used for the between- and within-subject effects in the model were ${N_{1}}-{p_{1}}$ and ${N_{2}}-({N_{1}}+{p_{2}})$, respectively, where ${N_{1}}=$ number of analyzable subjects; ${N_{2}}=$ number of nonmissing observations; ${p_{1}}=$ number of between-subject regression coefficients + intercept; ${p_{2}}=$ number of within-subject regression coefficients. For hypothesis tests that involved within-subject effects, ${N_{2}}-({N_{1}}+{p_{2}})$ DOF was used. For GEE, estimation was based on method-of-moments assuming a CS covariance pattern (Models M4-M9). Because built-in bias-corrected SEs are not available with GENMOD, we used the SAS/IML GEE macro (Version-1.06) [37], which mirrors the estimation method in GENMOD, but also calculates bias-corrected SEs by Mancl and DeRouen (2001). Inference was based on asymptotic normality (M4-M6) or the t-distribution using the between-within DOF-approximation obtained with MMRM (M7-M9).
We compared the bias and SEs associated with the regression coefficients of primary interest, the coverage of their corresponding 95% CIs, and the SEs and power of the test expressed in terms of the least square means (estimated marginal means) testing the null hypothesis that the treatment difference of the marginal mean change from baseline at weeks 48 and 40 equaled zero in the randomized trials UX001-CL301 and UX023-CL301, respectively, and the null hypothesis that the marginal mean change from baseline at week 48 equaled zero in the single arm trial UX023-CL304. Testing the primary hypotheses in terms of the estimated marginal means was prespecified in each study.

3 Results

Table 4
Missing data summary and empirical covariance pattern of the changes from baseline for each outcome according to the phase 3 clinical trial.
Phase 3 Clinical Trial No. Randomized or Enrolled No. Post-BL Occasions No. Patients Discontinuing Early Percent of BL and Post-BL Measurement Occasions Missing Data Little’s MCAR Testd Empirical Covariance Matrixe Changes from Baseline
UX001-CL301 89a 6 2b 1.4% (9 / 623) $P=0.54$ WK8 WK16 WK24 WK32 WK40 WK48
WK8 16.03 8.14 9.68 14.96 12.91 10.95
WK16 19.34 13.45 16.39 16.81 16.90
WK24 27.72 23.29 18.94 17.14
WK32 41.50 29.96 26.17
WK40 34.40 24.82
WK48 32.43
UX023-CL301 (Pediatrics) 61 2 0 0.5% (1 / 183) $P=0.19$ WK40 WK64
WK40 1.26 1.14
WK64 1.35
UX023-CL304 (Adults) 14 5 1c 1.2% (1 / 84) $P=0.42$ WK4 WK12 WK24 WK36 WK48
WK4 594.55 414.12 72.74 −70.23 −53.46
WK12 591.14 217.44 68.62 203.40
WK24 239.92 217.23 336.04
WK36 301.54 427.17
WK48 659.03
MCAR = Missing completely at random; WK = Week
Trials UX001-CL301 and UX023-CL301 were randomized controlled trials, while UX023-CL304 was a single arm trial.
  • a Of the 89 randomized participants, 88 were included in the prespecified Primary Analysis Set, which was defined in the protocol as all randomized participants with a baseline measurement and at least one post-baseline measurement; a single participant did not have any of the 6 post-baseline measurements.
  • b One participant submitted outcome data for the first five post-BL measurement occasions, while the other participant did not submit any post-BL measurements; reason for study discontinuation was subject non-compliance for both participants.
  • c The single participant, who withdrew consent, submitted outcome data for the first four post-BL measurement occasions.
  • d Data are missing completely at random (MCAR) when the pattern of missing values does not depend on the data values. The null hypothesis for Little’s MCAR Test is that the data are MCAR. A P > 0.05 indicates weak evidence against the null hypothesis.
  • e The empirical covariance matrix was estimated from the participants included in the prespecified Primary Analysis Set, namely, the N = 88, N = 61, and N = 14 participants in trials UX001-CL301, UX023-CL301, and UX023-CL304, respectively, who had a baseline measurement and at least one post-baseline measurement.
Within each trial, Table 4 shows the proportion of missing data, results from applying Little’s MCAR Test, and the empirical covariance pattern of the repeated measurements. Two participants in UX001-CL301 and one participant in trial UX023-CL304 discontinued the trial prematurely, while none of the participants in UX023-CL301 discontinued early. The reasons for study discontinuation were subject non-compliance (both participants in UX001-CL301) and withdrawal of consent (single participant in UX023-CL304). In total, 1.4%, 0.5%, and 1.2% of the measurement occasions in UX001-CL301, UX023-CL301, and UX023-CL304, respectively, were missing outcome data. In each trial, no missing outcome data pattern occurred in >1 participant such that the missing outcome data resembled a random sample of all the outcome data. Coupled with the clinical plausibility of MCAR and the results from applying Little’s MCAR test (all P>0.19), the longitudinal outcome data within each trial reasonably satisfied the MCAR assumption. Based on the empirical covariances for each outcome, the variances and the pairwise covariances were not constant over time.
nejsds96_g001.jpg
Figure 1
UX001-CL301 (N = 88 Two-Arm Study) – Simulation Results for models M0-M9: Model M0 represents the mixed-model repeated measures (MMRM) model with correctly specified unstructured (UN) covariance and use of the Kenward-Roger method, which combined a standard error adjustment with a Satterthwaite-type degree of freedom (DOF) approximation. M1-M3 represent an MMRM with compound symmetry (CS) covariance pattern that, respectively, used model-based, sandwich-based, and bias-corrected sandwich-based standard errors by Mancl and DeRouen (2001); t-based inference was used with the Between-Within method for determining DOF. M4-M9 represent a generalized estimating equations (GEE) model (marginal model with identify link) with inference based on asymptotic normality (M4-M6) or the t-distribution using the Between-Within DOF-approximation (M7-M9). Plots of the coverage of the 95% confidence intervals for the primary regression coefficients of interest (A) ${\beta _{1}}$ and (B) ${\beta _{11}}$, with the average standard errors of the respective regression coefficients listed at the right of each plot; the actual standard error estimated from the trial data for ${\beta _{1}}$ and ${\beta _{11}}$ is shown in parentheses for comparison. The red dashed line on both plots represents the targeted 95% confidence coefficient while the shaded pink region represents coverage between 94% to 96% as a frame of reference. (C) Plot of the average standard errors of the treatment difference in the marginal mean change from baseline at week 48 with respect to the actual standard error estimated from the trial data (green dashed line); the shaded green region represents ± 5% of the standard error estimated from the trial data as a frame of reference. The power used to test the primary hypothesis, ${H_{0}}:\hspace{2.5pt}\mathrm{Treatment}\hspace{2.5pt}\mathrm{Difference}\hspace{2.5pt}\text{in}\hspace{2.5pt}LS\hspace{2.5pt}\mathrm{Means}\hspace{2.5pt}\mathrm{at}\hspace{2.5pt}\mathrm{Week}\hspace{2.5pt}48=0$, is listed at the right of the plot.
Appendix Table 2A shows the operating characteristics associated with each method, with and without the sandwich variance estimator and small-sample correction, based on 100,000 simulated datasets each from UX001-CL301 and UX023-CL301 and 500,000 from UX023-CL304. The prespecified GEE method assumed a CS covariance pattern with the use of the sandwich variance estimator without a small-sample correction and inference based on asymptotic normality, while the data were generated from MMRM with UN covariance pattern. On average, in any given dataset simulated from UX001-CL301, UX023-CL301, and UX023-CL304, 1.4%, 0.5%, and 1.2% of the longitudinal measurements were MCAR.
nejsds96_g002.jpg
Figure 2
UX023-CL301 (N = 61 Two-Arm Study) – Simulation Results for models M0-M9: Model M0 represents the mixed-model repeated measures (MMRM) model with correctly specified unstructured (UN) covariance and use of the Kenward-Roger method, which combined a standard error adjustment with a Satterthwaite-type degree of freedom (DOF) approximation. M1-M3 represent an MMRM with compound symmetry (CS) covariance pattern that, respectively, used model-based, sandwich-based, and bias-corrected sandwich-based standard errors by Mancl and DeRouen (2001); t-based inference was used with the Between-Within method for determining DOF. M4-M9 represent a generalized estimating equations (GEE) model (marginal model with identify link) with inference based on asymptotic normality (M4-M6) or the t-distribution using the Between-Within DOF-approximation (M7-M9). Plots of the coverage of the 95% confidence intervals for the primary regression coefficient of interest (A) ${\beta _{1}}$, with the average standard errors of the respective regression coefficient listed at the right of the plot; the actual standard error estimated from the trial data for ${\beta _{1}}$ is shown in parentheses for comparison. The red dashed line on the plot represents the targeted 95% confidence coefficient while the shaded pink region represents coverage between 94% to 96% as a frame of reference. (B) Plot of the average standard errors of the treatment difference in the marginal mean change from baseline at week 40 with respect to the actual standard error estimated from the trial data (green dashed line); the shaded green region represents ± 5% of the actual standard error estimated from the trial data as a frame of reference. The power used to test the primary hypothesis, ${H_{0}}:\hspace{2.5pt}\mathrm{Treatment}\hspace{2.5pt}\mathrm{Difference}\hspace{2.5pt}\text{in}\hspace{2.5pt}LS\hspace{2.5pt}\mathrm{Means}\hspace{2.5pt}\mathrm{at}\hspace{2.5pt}\mathrm{Week}\hspace{2.5pt}40=0$, is listed at the right of the plot.

UX001-CL301

In the largest (N = 88) study with 6 post-baseline measurement occasions and little missing data, incorrect SEs resulted when the sandwich variance estimator was not used with MMRM and a misspecified CS covariance pattern leading to 95% CIs that were too wide or narrow for ${\beta _{1}}$ and ${\beta _{11}}$, respectively, and power that was too large (relative to the MMRM model with correctly specified covariance) for the test of the primary hypothesis (Figure 1). Specifically, the coverage probability for ${\beta _{1}}$(between-subject effect) and ${\beta _{11}}$ (within-subject effect) was 0.9903 and 0.9274, respectively, and the power testing the primary hypothesis was 0.1238 compared with 0.0977 achieved with MMRM that used a correctly specified covariance pattern and the Kenward-Roger adjustment. Findings were similar when the model-based SEs were used with GEE. Although the bias-corrected SEs for ${\beta _{1}}$, ${\beta _{11}}$, and for the primary hypothesis test with GEE and MMRM were larger, relative to the SEs seen in the data, t-based inference yielded coverage probabilities ≥95% for both ${\beta _{1}}$ and ${\beta _{11}}$, and power that was commensurate with the power achieved with MMRM that used a correctly specified covariance pattern and the Kenward-Roger adjustment.

UX023-CL301

In the modestly large study (N = 61) with 2 post-baseline measurement occasions and little missing data, any differences between the methods were subtle but no less instructive (Figure 2). A CS covariance pattern was a close approximation to the true underlying covariance pattern that was used to simulate the data. The elements of the 2x2 covariance matrix used in generating the data were ${\sigma _{1}^{2}}=0.5992$, ${\sigma _{2}^{2}}=0.5795$, and ${\sigma _{12}}={\sigma _{12}}=0.4437$. Consequently, both methods with model-based SEs and t-based inference achieved 95% coverage for ${\beta _{1}}$ (between-subject effect). MMRM with the Kenward-Roger adjustment also achieved the correct coverage. The use of the sandwich variance estimator without and with the bias-correction resulted in SEs that were modestly smaller and larger, respectively, relative to the SEs of ${\beta _{1}}$ and in the test of the primary hypothesis that was seen in the data.

UX023-CL304

In the small, single arm study (N = 14) with 5 post-baseline measurement occasions and for which there were also few missing data points, the CS covariance pattern was markedly misspecified for all models studied. Using bias-corrected sandwich variance estimators with both methods and with t-based inference was needed, consistently achieving coverage probabilities closest to 95% (${\beta _{0}}$ [between-subject effect] and ${\beta _{4}}$[within-subject effect]: 0.9541 and 0.9417 with both MMRM and GEE using t-based inference). Additionally, the standard error and power for the test of the primary hypothesis was similar to that achieved by MMRM with correctly specified covariance and the Kenward-Roger adjustment. Other models that assumed a CS covariance pattern consistently performed considerably worse (Figure 3). Interestingly, for MMRM with correctly specified covariance, the Kenward-Roger adjustment to the standard error for the intercept, ${\beta _{0}}$, was consistent with the data, however, the Satterthwaite-type DOF approximation based on that adjusted standard error resulted in 92% coverage. For all other regression coefficients (${\beta _{1}},\hspace{2.5pt}{\beta _{2}},\hspace{2.5pt}{\beta _{3}},\hspace{2.5pt}\text{and}\hspace{2.5pt}{\beta _{4}}$), which were fixed effects that changed within subjects, the correct coverage was achieved (Appendix Table 2).
nejsds96_g003.jpg
Figure 3
UX023-CL304 (N = 14 Single-Arm Study) – Simulation Results for models M0-M9: Model M0 represents the mixed-model repeated measures (MMRM) model with correctly specified unstructured (UN) covariance and use of the Kenward-Roger method, which combined a standard error adjustment with a Satterthwaite-type degree of freedom (DOF) approximation. M1-M3 represent an MMRM with compound symmetry (CS) covariance pattern that, respectively, used model-based, sandwich-based, and bias-corrected sandwich-based standard errors by Mancl and DeRouen (2001); t-based inference was used with the Between-Within method for determining DOF. M4-M9 represent a generalized estimating equations (GEE) model (marginal model with identify link) with inference based on asymptotic normality (M4-M6) or the t-distribution using the Between-Within DOF-approximation (M7-M9). Plots of the coverage of the 95% confidence intervals for the primary regression coefficients of interest (A) ${\beta _{0}}$, and (B) ${\beta _{4}}$ with the average standard errors of the respective regression coefficients listed at the right of each plot; the actual standard error estimated from the trial data for each regression coefficient is shown in parentheses for comparison. The red dashed line represents the targeted 95% confidence coefficient while the shaded pink region represents coverage between 94% to 96% as a frame of reference. (C) Plot of the average standard errors of the marginal mean change from baseline at week 48 with respect to the actual standard error estimated from the trial data (green dashed line); the shaded green region represents ± 5% of the actual standard error estimated from the trial data as a frame of reference. The power used to test the primary hypothesis, ${H_{0}}:\hspace{2.5pt}LS\hspace{2.5pt}\mathrm{Means}\hspace{2.5pt}\mathrm{at}\hspace{2.5pt}\mathrm{Week}\hspace{2.5pt}48=0$, is listed at the right of the plot.

3.1 Summary of Results

In each trial, <1.5% of the measurement occasions were missing outcome data and the MCAR assumption was reasonably satisfied. Box 1 succinctly summarizes the main findings. Notably, the empirical variances and pairwise covariances were not constant over time in UX001-CL301 and UX023-CL304; assuming a CS covariance pattern, therefore, resulted in considerable model misspecification of the covariances. Consequently, MMRM and GEE assuming a CS covariance pattern without the use of the sandwich variance estimator performed badly. Bias-corrected SEs with t-based inference was needed in the smallest trial, UX023-CL304, with both methods to consistently achieve at least 94% coverage probability for the regression coefficients; furthermore, the power testing the primary hypothesis was commensurate with the power achieved with MMRM that used a correctly specified covariance pattern and the Kenward-Roger standard error adjustment. The largest trial, UX001-CL301, also benefited from bias-corrected SEs with t-based inference that yielded coverage probabilities ≥95% and power identical to the power achieved when the covariance pattern was correctly specified. In the modestly large study, UX023-CL301, with only two post-baseline measurement occasions, adopting the sandwich variance estimator was not helpful. Given that the CS covariance pattern was a close approximation to the truth, the larger sampling variability associated with the sandwich variance estimator [17] resulted in coverage probabilities that were less than the nominal level; choosing model-based SEs with t-based inference with either method or MMRM with the Kenward-Roger adjustment achieved the correct coverage probability.
Box 1
Trial-specific simulation summary highlights when using a compound symmetry (CS) covariance pattern.
nejsds96_g004.jpg

4 Discussion

MMRM, a response profile analysis, arguably dates back to Greenhouse and Geisser (1959) [38], and has been widely adopted as the primary analysis for analyzing repeatedly measured continuous outcomes in randomized clinical trials (RCTs). For an example of a recently completed phase 3 trial (N = 65; NCT03399786) in a rare disease that applied MMRM as the primary analysis to assess a treatment effect at a single time point, albeit without the sandwich variance estimator and small-sample correction, see Raal et al (2020) [39]. Since the two initial companion papers on GEE for longitudinal data [3, 40] coupled with the connections drawn between GEE and likelihood-based methods [41] – and the widely available software notwithstanding – GEE has infrequently been used as the primary analysis for continuous outcomes in RCTs. Based on a recent comprehensive evaluation of statistical methods applied as the primary analysis to repeatedly measured continuous outcomes from RCTs published in 2019, Ren et al (2022) [42] estimated 4% (4/96) adopted GEE to assess a treatment effect at a single time point compared with 36% (35/96) that used MMRM; the remaining 59% (57/96) oversimplified the longitudinal design and applied conventional methods such as t-tests and analysis of covariance to assess a treatment effect at a single time point. One rare disease trial (N = 53; NCT02208687) that adopted in primary analysis GEE with the sandwich variance estimator was designed to investigate the effectiveness at a single time point of a self-management group program to improve social participation and endurance in patients with neuromuscular disease with chronic fatigue [43]. Either method can be considered for modeling longitudinal changes in the mean response in trials conducted in rare diseases with a parsimonious CS covariance structure; however, care needs to be exercised with both methods to ensure its correct use.
Inferences about longitudinal changes in mean response and its relationship to treatment arm are sensitive to the chosen covariance model. This is true for both MMRM and GEE when modeling the mean with time included categorically, and despite the covariances being treated as nuisance parameters with GEE. In rare disease trials, it is common, however, to pragmatically prespecify a parsimonious CS covariance pattern due to the smaller sample sizes or to prespecify a decision to adopt a CS covariance pattern if an UN covariance pattern was inestimable. Based on our case studies, the variances and covariances of the longitudinal outcomes were not constant over time. When the CS covariance pattern does not closely approximate the true but unknown covariance pattern, incorrect SEs of the regression coefficients will result. As seen with our case studies, to obtain valid SEs, the sandwich variance estimator coupled with a small-sample correction to correct for bias, must be considered with both methods.
With GEE, the sandwich variance estimator – as opposed to model-based SEs – is the default setting in most statistical packages, however, the opposite is true when fitting MMRM. Consider, for example, SAS and R. In SAS, the user needs to specify the EMPIRICAL option in the PROC GLIMMIX (or MIXED) statement to apply the sandwich variance estimator when fitting MMRM, while the sandwich variance estimator is the default when using GENMOD to fit the corresponding GEE. When fitting MMRM in R using the gls function in the nlme package [44], the analyst would inconveniently appeal to the clubSandwich package [45] and the function vcovCR, which returns a sandwich estimate of the variance-covariance matrix of the regression coefficients. The recently developed mmrm package [13] currently does not have an option to apply sandwich variance estimators. To fit the corresponding GEE in R, the user can appeal to the $\mathit{gee}$ [46] or $\mathit{geepack}$ [47] packages, and in both packages the sandwich variance estimator is the default. Defaulting to the sandwich variance estimator with GEE is not surprising. The properties of the sandwich variance estimator has been well studied with GEE [19, 48–51]. If there is misspecification of the covariance, however, not using the sandwich variance estimator can also result in misleading inferences for the regression coefficients in MMRM as seen in our case studies.
Some small-sample corrections to the sandwich variance estimators are available in SAS and R to correct for bias along with some DOF approximations to the t distribution. Pustejovsky and Tipton (2017) [24] and Wang and coauthors (2016) [25] provide a comprehensive review of small-sample corrections available in R and guidance on which approach to use for the methods considered in this article. Gosho et al (2021) [22] provide a practical review of the small-sample corrections for MMRM that can be obtained using the GLIMMIX procedure directly or programmatically using IML.
Longitudinal responses will invariably be incomplete. Unlike MCAR, when the missing response data mechanism is MAR, subjects with missing response data will no longer be a random subset of the sample. It is well known that MMRM provides a valid framework under MAR, a more defensible assumption than MCAR in any given trial, albeit provided that the covariance pattern has been correctly specified [1]. If estimating an UN covariance pattern is prohibitive due to the small sample, resulting in misspecification, inferences about the mean response over time will be invalid when dropout is MAR. This point further underscores the importance of considering the sandwich variance estimator with MMRM (± a small-sample correction). When dropout is MAR as opposed to MCAR, the validity of the analysis with the standard GEE method based on available observations will be compromised [14]. Inverse probability weighted (IPW) [10] GEE should be adopted to adjust the analysis for the propensity to dropout; otherwise, GEE will yield biased estimates of the coefficients in the model part of the mean response. The sandwich variance estimator when using IPW methods can be constructed to adjust for estimation of the weights. Because most statistical procedures allow for the inclusion of sampling weights, it is straightforward to apply IPW-GEE. While unlikely in a well-controlled longitudinal clinical trial [1], if the longitudinal responses are missing not at random (MNAR), none of the methods considered in this article will be valid [11].
In our case studies, the MCAR assumption was clinically defensible. The number of measurement occasions including baseline was ≤7, few patients discontinued prematurely, no missing outcome data pattern occurred in >1 participant, and the degree of missing outcome data was <1.5%. While the statistical test of the MCAR assumption likely had low power [52], its use combined with the empirical evidence provided justification to use GEE. Due to the hierarchy of dropout mechanisms [53], MMRM was also justified.
There were research limitations. First, we only considered GLIMMIX to estimate MMRM and the SAS/IML GEE macro, which reproduces the method-of-moments approach to estimation with GENMOD. SAS is widely used software and the primary analysis in each trial used GENMOD. Second, there are numerous bias-corrections to sandwich variance estimators in the literature but only a few are currently available with GLIMMIX and none with GENMOD. The choice of which bias-correction to use would depend on several factors [19, 24, 25] and should be prespecified. In this article, we only considered the bias-corrected SEs recommended by Mancl and DeRouen (2001) [19]. Third, for t-based inference, we only considered the between-within method DOF approximation for small-sample inference recommended by Schluchter and Elashoff (1990) [36]. When empirical or bias-corrected SEs are used with GLIMMIX, DOF approximation methods are limited to between-within, containment, and residual, where the latter two are not small-sample approximations; furthermore, GENMOD only performs inference assuming asymptotic normality. The determination of which DOF approximation to use is an active area of research, with many methods not accessible in standard SAS and R routines. Lastly, the research was limited to three case studies and, in each, <1.5% of the measurement occasions were missing outcome data. Although these case studies represented typical longitudinal designs in rare diseases, it was not an exhaustive representation and the conclusions may not necessarily extend to other design variations. In particular, dropout other than MCAR was not considered in the simulation study. However, to understand whether the findings held in the presence of moderate missingness defined as 10% of the measurement occasions being MCAR, the simulation study was repeated. The simulation-based conclusions when using a CS covariance pattern remained; however, for the smallest trial (UX023-CL304; N = 14), estimation was unstable or not always possible when assuming an unstructured covariance pattern. In any case, the intent of the current research was not to pressure test these methods; rather to elucidate the issues and the steps to address the issues using real datasets in rare diseases to aid an analyst working in a rare disease in selecting an appropriate modeling strategy.

5 Conclusion

Based on a review of three case studies recently conducted in rare diseases, the MCAR assumption was plausible and the missingness low. When modeling the mean response with time as a categorical variable, MMRM with UN covariance coupled with the Kenward-Roger standard error adjustment should be used if the data allow it. If not, then a parsimonious CS covariance structure can be considered. In the two case studies that exhibited nonconstant variance/covariances over time, the sandwich variance estimator and small-sample correction with t-based inference using the between-within DOF should be considered with both MMRM and GEE. If the CS pattern was a good approximation, as seen in the other case study, then model-based standard errors with t-based inference using the between-within DOF performed well with both methods.

Declaration of Any Potential Conflicts of Interest

At the time the research was conducted, David Zahrieh, Yi Wang, and Tony Koutsoukos were employees at Ultragenyx Pharmaceutical Inc. This research reflects the views of the authors and should not be construed to represent Ultragenyx’s views or policies.

Data Availability Statement

Data sharing is not applicable to this article as no new data were created or analyzed in this research.

Presentations

This work was presented as a Poster at the 2023 Joint Statistical Meetings on August 7, 2023.
Appendix Table 1
True values for the regression coefficients and variance/covariance matrix used to generate the data in the simulation study.
nejsds96_g005.jpg
Appendix Table 2A
Simulation study and comprehensive results for study UX001-CL301.
nejsds96_g006.jpg
Appendix Table 2A
(continued)
nejsds96_g007.jpg
Appendix Table 2B
Simulation study and comprehensive results for study UX023-CL301 (Pediatrics).
nejsds96_g008.jpg
Appendix Table 2B
(continued)
nejsds96_g009.jpg
Appendix Table 2C
Simulation study and comprehensive results for study UX023-CL304 (Adults).
nejsds96_g010.jpg
Appendix Table 2C
(continued)
nejsds96_g011.jpg
Appendix Table 2C
(continued)
nejsds96_g012.jpg

References

[1] 
Mallinckrodt, C. H., Lane, P. W., Schnell, D., Peng, Y. and Mancuso, J. P. Recommendations for the primary analysis of continuous endpoints in longitudinal clinical trials. Drug Inf J. 42(4) 303–319 (2008). https://doi.org/10.1177/009286150804200402
[2] 
Mancl, L. A. and Leroux, B. G. Efficience of regression estimates for clustered data. Biometrics. 52 500–511 (1996).
[3] 
Liang, K. Y. and Longitudinal, Z. SL. Data-Analysis Using Generalized Linear-Models. Biometrika. 73(1) 13–22 (1986). https://doi.org/10.1093/biomet/73.1.13. MR0836430
[4] 
Day, S., Jonker, A. H., Lau, L. P. L. et al. Recommendations for the design of small population clinical trials. Orphanet J Rare Dis. 13(1) 195 (2018). https://doi.org/10.1186/s13023-018-0931-2
[5] 
CMW, v. d. W. M. and du Prie-Olthof MJ, G. The patient’s view on rare disease trial design - a qualitative study. Orphanet J Rare Dis. 14(1) 31 (2019). https://doi.org/10.1186/s13023-019-1002-z
[6] 
Hall, A. K. and Ludington, E. Considerations for successful clinical development for orphan indications. Expert Opinion on Orphan Drugs. 1(11) 847–850 (2013).
[7] 
Hilgers, R. D., König, F., Molenberghs, G. and Senn, S. J. Design and analysis of clinical trials for small rare disease populations. J Rare Dis Res Treat. 1(3) 53–60 (2016).
[8] 
Abrahamyan, L., Feldman, B. M., Tomlinson, G. et al. Alternative designs for clinical trials in rare diseases. Am J Med Genet C Semin Med Genet. 172(4) 313–331 (2016). https://doi.org/10.1002/ajmg.c.31533
[9] 
Pizzamiglio, C., Vernon, H. J., Hanna, M. G. and Pitceathly, R. DS. Designing clinical trials for rare diseases: unique challenges and opportunities. Nat Rev Methods Primers. 2(1) (2022). https://doi.org/10.1038/s43586–022-00100-2
[10] 
Robins, J. M., Rotnitzky, A. and Zhao, L. P. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association. 90 106–121 (1995). MR1325118
[11] 
Fitzmaurice, G. M., Laird, N. M. and Ware, J. H. Applied longitudinal data analysis. 2nd ed. Probability and statistics. John Wiley & Sons (2011). MR2830137
[12] 
Gurka, M. J. Selecting the best linear mixed model under REML. The American Statistician. 60 19–26 (2006). https://doi.org/10.1198/000313006X90396. MR2224133
[13] 
mmrm: Mixed models for repeated measures (2022). https://CRAN.R-project.org/package=mmrm
[14] 
Lu, K. and Mehrotra, D. V. Specification of covariance structure in longitudinal data analysis for randomized clinical trials. Stat Med. 29(4) 474–488 (2010). https://doi.org/10.1002/sim.3820. MR2751783
[15] 
Liang, K. Y. and Zeger, S. L. Longitudinal data analysis using generalized linear models. Biometrika. 73(1) 13–22 (1986). https://doi.org/10.1093/biomet/73.1.13. MR0836430
[16] 
Gurka, M. J., Edwards, L. J. and Muller, K. E. Avoiding bias in mixed model inference for fixed effects. Stat Med. 30(22) 2696–2707 (2011). https://doi.org/10.1002/sim.4293. MR2843173
[17] 
Kauermann, G. and Carroll, R. J. The sandwich variance estimator: Efficiency properties and coverage probability of confidence intervals. Discussion Paper 189 Collaborative Research Center 3862000.
[18] 
Kauermann, G. and Carroll, R. J. A note on the efficiency of sandwich covariance matrix estimation. J Am Stat Assoc. 96(456) 1387–1396 (2001). https://doi.org/10.1198/016214501753382309. MR1946584
[19] 
Mancl, L. A. and DeRouen, T. A. A covariance estimator for GEE with improved small-sample properties. Biometrics. 57(1) 126–134 (2001). https://doi.org/10.1111/j.0006-341x.2001.00126.x. MR1833298
[20] 
Bell, R. M. and McCaffrey, D. F. Bias reduction in standard errors for linear regression with multi-stage samples. Survey Methodology. 28 169–181 (2002).
[21] 
Gosho, M., Hirakawa, A., Noma, H., Maruo, K. and Sato, Y. Comparison of bias-corrected covariance estimators for MMRM analysis in longitudinal data with dropouts. Stat Methods Med Res. 26(5) 2389–2406 (2017). https://doi.org/10.1177/0962280215597938. MR3712239
[22] 
Gosho, M., Noma, H. and Maruo, K. Practical Review and Comparison of Modified Covariance Estimators for Linear Mixed Models in Small-sample Longitudinal Studies with Missing Data. Int Stat Rev. 89(3) 550–572 (2021). https://doi.org/10.1111/insr.12447. MR4411918
[23] 
Gosho, M., Sato, T. and Takeuchi, H. Robust covariance estimator for small-sample adjustment in the generalized estimating equations: a simulation study. Science Journal of Applied Mathematics and Statistics. 2(1) 20–25 (2014).
[24] 
Pustejovsky, J. E. and Tipton, E. Small sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models. Journal of Business and Economic Statistics. 36(4) 672–683 (2018). https://doi.org/10.1080/07350015.2016.1247004. MR3871709
[25] 
Wang, M., Kong, L., Li, Z. and Zhang, L. Covariance estimators for generalized estimating equations (GEE) in longitudinal analysis with small samples. Stat Med. 35(10) 1706–1721 (2016). https://doi.org/10.1002/sim.6817. MR3513479
[26] 
Imbens, G. W. and Kolesar, M. Robust standard errors in small samples: Some practical advice. In Research NBoE, editor. NBER Working Paper Series. National Bureau of Economic Research (2012). Cambridge, MA.
[27] 
Pan, W. On the robust variance estimator in generalised estimating equations. Biometrika. 88(3) 901–906 (2001). https://doi.org/10.1093/biomet/88.3.901. MR1859421
[28] 
ICH E9 Statistical Principles for Clinical Trials (1998).
[29] 
Lochmüller, H., Behin, A., Caraco, Y. et al. A phase 3 randomized study evaluating sialic acid extended-release for GNE myopathy. Neurology. 92(18) E2109–E2117 (2019). https://doi.org/10.1212/Wnl.0000000000006932
[30] 
Imel, E. A., Glorieux, F. H., Whyte, M. P. et al. Burosumab versus conventional therapy in children with X-linked hypophosphataemia: a randomised, active-controlled, open-label, phase 3 trial. Lancet. 393(10189) 2416–2427 (2019). https://doi.org/10.1016/S0140-6736(19)30654-3
[31] 
Insogna, K. L., Rauch, F., Kamenicky, P. et al. Burosumab Improved Histomorphometric Measures of Osteomalacia in Adults with X-Linked Hypophosphatemia: A Phase 3, Single-Arm. International Trial. J Bone Miner Res. 34(12) 2183–2191 (2019). https://doi.org/10.1002/jbmr.3843
[32] 
Little, R. J. A. A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association. 83(44) 1198–1202 (1988). MR0997603
[33] 
Akaike, H. In Information theory and an extension of the maximum likelihood principle 267–281 (1973). MR0483125
[34] 
Burnham, K. P. and Anderson, D. R. Model selection and multimodel inference - A practical information-theoretic approach. 2 ed. Springer, New York, NY (2002). MR1919620
[35] 
Kenward, M. G. and Roger, J. H. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics. 53(3) 983–997 (1997). https://doi.org/10.2307/2533558
[36] 
Schluchter, M. D. and Small-Sample, E. JD. Adjustments to Tests with Unbalanced Re peated Measures Assuming Several Covariance Structures. Journal of Statistical Computation and Simulation. 37 69–87 (1990).
[37] 
Hammill, B. G. and Preisser, J. S. A SAS/IML software program for GEE and regression diagnostics. Comput Stat Data An. 51(2) 1197–1212 (2006). https://doi.org/10.1016/j.csda.2005.11.016. MR2297517
[38] 
Greenhouse, S. W. and Geisser, S. On methods in the analysis of profile data. Psychometrika. 32 95–112 (1959). https://doi.org/10.1007/BF02289823. MR0103783
[39] 
Raal, F. J., Rosenson, R. S., Reeskamp, L. F. et al. Evinacumab for Homozygous Familial Hypercholesterolemia. N Engl J Med. 383(8) 711–720 (2020). https://doi.org/10.1056/NEJMoa2004215
[40] 
Zeger, S. L. and Liang, K. Y. Longitudinal Data-Analysis for Discrete and Continuous Outcomes. Biometrics. 42(1) 121–130 (1986). https://doi.org/10.2307/2531248
[41] 
Zhao, L. P., Prentice, R. and Self, S. Multivariate mean parameter estimation by using a partly exponential model. Journal of the Royal Statistical Society, Series B. 54 805–811 (1992).
[42] 
Ren Zhang Y Jia Y, Y. et al. Analyses of repeatedly measured continuous outcomes in randomized controlled trials needed substantial improvements. J Clin Epidemiol. 143 105–117 (2022). https://doi.org/10.1016/j.jclinepi.2021.12.007
[43] 
Veenhuizen, Y., Cup, E. H. C., Jonker, M. A. et al. Self-management program improves participation in patients with neuromuscular disease: A randomized controlled trial. Neurology. 93(18) e1720–e1731 (2019). https://doi.org/10.1212/WNL.0000000000008393
[44] 
nlme: Linear and Nonlinear Mixed Effects Models. Version Version R package version 3.1-163 (2023). https://CRAN.R-project.org/package=nlme
[45] 
clubSandwich: Cluster-Robust (Sandwich) Variance Estimators with Small-Sample Corrections. Version R package version 0.5.8.9999 (2023). http://jepusto.github.io/clubSandwich/
[46] 
gee: Generalized estimation equation solver. Version R package version 4.13-19 (2015). https://cran.r-project.org/web/packages/gee/
[47] 
Halekoh, U., Hojsgaard, S. and Yan, J. The R Package geepack for Generalized Estimating Equations. J Stat Softw. 15(2) 1–11 (2006). https://doi.org/10.18637/jss.v015.i02
[48] 
Fay, M. P. and Graubard, B. I. Small-sample adjustments for Wald-type tests using sandwich estimators. Biometrics. 57(4) 1198–1206 (2001). https://doi.org/10.1111/j.0006-341X.2001.01198.x. MR1950428
[49] 
Gunsolley, J. C., Getchell, C. and Chinchilli, V. M. Small sample characteristics of generalized estimating equations. Communications in Statistics, Simulation and Computation. 24 869–878 (1995).
[50] 
Hinkley, D. V. and Wang, S. Efficiency of robust standard errors for regression coefficients. Communications in Statistics, Theory and Methods. 20 1–11 (1991). https://doi.org/10.1080/03610929108830479. MR1114631
[51] 
Kauermann, G. and Carroll, R. J. The sandwich variance estimator: efficiency properties and coverage probability of confidence intervals. Journal of the American Statistical Association. 96 1387–1396 (2001). https://doi.org/10.1198/016214501753382309. MR1946584
[52] 
Thoemmes, F. and Enders, C. K. A structural equation model for testing whether data are missing completely at random presented at. Annual Meeting of the American Educational Research Association, Chicago Chicago, IL (2007).
[53] 
Fitzmaurice, G. M. Methods for handling dropouts in longitudinal clinical trials. Statistica Neerlandica. 57(1) 75–99 (2003). https://doi.org/10.1111/1467-9574.00222. MR2055522
Reading mode PDF XML

Table of contents
  • 1 Introduction
  • 2 Methods
  • 3 Results
  • 4 Discussion
  • 5 Conclusion
  • Declaration of Any Potential Conflicts of Interest
  • Data Availability Statement
  • Presentations
  • References

Copyright
© 2026 New England Statistical Society
by logo by logo
Open access article under the CC BY license.

Keywords
Generalized estimating equations Longitudinal designs Mixed model repeated measures Rare diseases Sandwich variance estimator Small-sample correction

Metrics
since December 2021
85

Article info
views

26

Full article
views

41

PDF
downloads

34

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

  • Figures
    3
  • Tables
    13
nejsds96_g001.jpg
Figure 1
UX001-CL301 (N = 88 Two-Arm Study) – Simulation Results for models M0-M9: Model M0 represents the mixed-model repeated measures (MMRM) model with correctly specified unstructured (UN) covariance and use of the Kenward-Roger method, which combined a standard error adjustment with a Satterthwaite-type degree of freedom (DOF) approximation. M1-M3 represent an MMRM with compound symmetry (CS) covariance pattern that, respectively, used model-based, sandwich-based, and bias-corrected sandwich-based standard errors by Mancl and DeRouen (2001); t-based inference was used with the Between-Within method for determining DOF. M4-M9 represent a generalized estimating equations (GEE) model (marginal model with identify link) with inference based on asymptotic normality (M4-M6) or the t-distribution using the Between-Within DOF-approximation (M7-M9). Plots of the coverage of the 95% confidence intervals for the primary regression coefficients of interest (A) ${\beta _{1}}$ and (B) ${\beta _{11}}$, with the average standard errors of the respective regression coefficients listed at the right of each plot; the actual standard error estimated from the trial data for ${\beta _{1}}$ and ${\beta _{11}}$ is shown in parentheses for comparison. The red dashed line on both plots represents the targeted 95% confidence coefficient while the shaded pink region represents coverage between 94% to 96% as a frame of reference. (C) Plot of the average standard errors of the treatment difference in the marginal mean change from baseline at week 48 with respect to the actual standard error estimated from the trial data (green dashed line); the shaded green region represents ± 5% of the standard error estimated from the trial data as a frame of reference. The power used to test the primary hypothesis, ${H_{0}}:\hspace{2.5pt}\mathrm{Treatment}\hspace{2.5pt}\mathrm{Difference}\hspace{2.5pt}\text{in}\hspace{2.5pt}LS\hspace{2.5pt}\mathrm{Means}\hspace{2.5pt}\mathrm{at}\hspace{2.5pt}\mathrm{Week}\hspace{2.5pt}48=0$, is listed at the right of the plot.
nejsds96_g002.jpg
Figure 2
UX023-CL301 (N = 61 Two-Arm Study) – Simulation Results for models M0-M9: Model M0 represents the mixed-model repeated measures (MMRM) model with correctly specified unstructured (UN) covariance and use of the Kenward-Roger method, which combined a standard error adjustment with a Satterthwaite-type degree of freedom (DOF) approximation. M1-M3 represent an MMRM with compound symmetry (CS) covariance pattern that, respectively, used model-based, sandwich-based, and bias-corrected sandwich-based standard errors by Mancl and DeRouen (2001); t-based inference was used with the Between-Within method for determining DOF. M4-M9 represent a generalized estimating equations (GEE) model (marginal model with identify link) with inference based on asymptotic normality (M4-M6) or the t-distribution using the Between-Within DOF-approximation (M7-M9). Plots of the coverage of the 95% confidence intervals for the primary regression coefficient of interest (A) ${\beta _{1}}$, with the average standard errors of the respective regression coefficient listed at the right of the plot; the actual standard error estimated from the trial data for ${\beta _{1}}$ is shown in parentheses for comparison. The red dashed line on the plot represents the targeted 95% confidence coefficient while the shaded pink region represents coverage between 94% to 96% as a frame of reference. (B) Plot of the average standard errors of the treatment difference in the marginal mean change from baseline at week 40 with respect to the actual standard error estimated from the trial data (green dashed line); the shaded green region represents ± 5% of the actual standard error estimated from the trial data as a frame of reference. The power used to test the primary hypothesis, ${H_{0}}:\hspace{2.5pt}\mathrm{Treatment}\hspace{2.5pt}\mathrm{Difference}\hspace{2.5pt}\text{in}\hspace{2.5pt}LS\hspace{2.5pt}\mathrm{Means}\hspace{2.5pt}\mathrm{at}\hspace{2.5pt}\mathrm{Week}\hspace{2.5pt}40=0$, is listed at the right of the plot.
nejsds96_g003.jpg
Figure 3
UX023-CL304 (N = 14 Single-Arm Study) – Simulation Results for models M0-M9: Model M0 represents the mixed-model repeated measures (MMRM) model with correctly specified unstructured (UN) covariance and use of the Kenward-Roger method, which combined a standard error adjustment with a Satterthwaite-type degree of freedom (DOF) approximation. M1-M3 represent an MMRM with compound symmetry (CS) covariance pattern that, respectively, used model-based, sandwich-based, and bias-corrected sandwich-based standard errors by Mancl and DeRouen (2001); t-based inference was used with the Between-Within method for determining DOF. M4-M9 represent a generalized estimating equations (GEE) model (marginal model with identify link) with inference based on asymptotic normality (M4-M6) or the t-distribution using the Between-Within DOF-approximation (M7-M9). Plots of the coverage of the 95% confidence intervals for the primary regression coefficients of interest (A) ${\beta _{0}}$, and (B) ${\beta _{4}}$ with the average standard errors of the respective regression coefficients listed at the right of each plot; the actual standard error estimated from the trial data for each regression coefficient is shown in parentheses for comparison. The red dashed line represents the targeted 95% confidence coefficient while the shaded pink region represents coverage between 94% to 96% as a frame of reference. (C) Plot of the average standard errors of the marginal mean change from baseline at week 48 with respect to the actual standard error estimated from the trial data (green dashed line); the shaded green region represents ± 5% of the actual standard error estimated from the trial data as a frame of reference. The power used to test the primary hypothesis, ${H_{0}}:\hspace{2.5pt}LS\hspace{2.5pt}\mathrm{Means}\hspace{2.5pt}\mathrm{at}\hspace{2.5pt}\mathrm{Week}\hspace{2.5pt}48=0$, is listed at the right of the plot.
Table 1
The three longitudinal, phase 3 clinical trials conducted in rare diseases, which formed the basis for the simulation study.
Table 2
Data generating model for UX001-CL301, UX023-CL301, and UX023-CL304.
Table 3
For each dataset generated in the simulation study, the following models were applied.
Table 4
Missing data summary and empirical covariance pattern of the changes from baseline for each outcome according to the phase 3 clinical trial.
Box 1
Trial-specific simulation summary highlights when using a compound symmetry (CS) covariance pattern.
Appendix Table 1
True values for the regression coefficients and variance/covariance matrix used to generate the data in the simulation study.
Appendix Table 2A
Simulation study and comprehensive results for study UX001-CL301.
Appendix Table 2A
(continued)
Appendix Table 2B
Simulation study and comprehensive results for study UX023-CL301 (Pediatrics).
Appendix Table 2B
(continued)
Appendix Table 2C
Simulation study and comprehensive results for study UX023-CL304 (Adults).
Appendix Table 2C
(continued)
Appendix Table 2C
(continued)
nejsds96_g001.jpg
Figure 1
UX001-CL301 (N = 88 Two-Arm Study) – Simulation Results for models M0-M9: Model M0 represents the mixed-model repeated measures (MMRM) model with correctly specified unstructured (UN) covariance and use of the Kenward-Roger method, which combined a standard error adjustment with a Satterthwaite-type degree of freedom (DOF) approximation. M1-M3 represent an MMRM with compound symmetry (CS) covariance pattern that, respectively, used model-based, sandwich-based, and bias-corrected sandwich-based standard errors by Mancl and DeRouen (2001); t-based inference was used with the Between-Within method for determining DOF. M4-M9 represent a generalized estimating equations (GEE) model (marginal model with identify link) with inference based on asymptotic normality (M4-M6) or the t-distribution using the Between-Within DOF-approximation (M7-M9). Plots of the coverage of the 95% confidence intervals for the primary regression coefficients of interest (A) ${\beta _{1}}$ and (B) ${\beta _{11}}$, with the average standard errors of the respective regression coefficients listed at the right of each plot; the actual standard error estimated from the trial data for ${\beta _{1}}$ and ${\beta _{11}}$ is shown in parentheses for comparison. The red dashed line on both plots represents the targeted 95% confidence coefficient while the shaded pink region represents coverage between 94% to 96% as a frame of reference. (C) Plot of the average standard errors of the treatment difference in the marginal mean change from baseline at week 48 with respect to the actual standard error estimated from the trial data (green dashed line); the shaded green region represents ± 5% of the standard error estimated from the trial data as a frame of reference. The power used to test the primary hypothesis, ${H_{0}}:\hspace{2.5pt}\mathrm{Treatment}\hspace{2.5pt}\mathrm{Difference}\hspace{2.5pt}\text{in}\hspace{2.5pt}LS\hspace{2.5pt}\mathrm{Means}\hspace{2.5pt}\mathrm{at}\hspace{2.5pt}\mathrm{Week}\hspace{2.5pt}48=0$, is listed at the right of the plot.
nejsds96_g002.jpg
Figure 2
UX023-CL301 (N = 61 Two-Arm Study) – Simulation Results for models M0-M9: Model M0 represents the mixed-model repeated measures (MMRM) model with correctly specified unstructured (UN) covariance and use of the Kenward-Roger method, which combined a standard error adjustment with a Satterthwaite-type degree of freedom (DOF) approximation. M1-M3 represent an MMRM with compound symmetry (CS) covariance pattern that, respectively, used model-based, sandwich-based, and bias-corrected sandwich-based standard errors by Mancl and DeRouen (2001); t-based inference was used with the Between-Within method for determining DOF. M4-M9 represent a generalized estimating equations (GEE) model (marginal model with identify link) with inference based on asymptotic normality (M4-M6) or the t-distribution using the Between-Within DOF-approximation (M7-M9). Plots of the coverage of the 95% confidence intervals for the primary regression coefficient of interest (A) ${\beta _{1}}$, with the average standard errors of the respective regression coefficient listed at the right of the plot; the actual standard error estimated from the trial data for ${\beta _{1}}$ is shown in parentheses for comparison. The red dashed line on the plot represents the targeted 95% confidence coefficient while the shaded pink region represents coverage between 94% to 96% as a frame of reference. (B) Plot of the average standard errors of the treatment difference in the marginal mean change from baseline at week 40 with respect to the actual standard error estimated from the trial data (green dashed line); the shaded green region represents ± 5% of the actual standard error estimated from the trial data as a frame of reference. The power used to test the primary hypothesis, ${H_{0}}:\hspace{2.5pt}\mathrm{Treatment}\hspace{2.5pt}\mathrm{Difference}\hspace{2.5pt}\text{in}\hspace{2.5pt}LS\hspace{2.5pt}\mathrm{Means}\hspace{2.5pt}\mathrm{at}\hspace{2.5pt}\mathrm{Week}\hspace{2.5pt}40=0$, is listed at the right of the plot.
nejsds96_g003.jpg
Figure 3
UX023-CL304 (N = 14 Single-Arm Study) – Simulation Results for models M0-M9: Model M0 represents the mixed-model repeated measures (MMRM) model with correctly specified unstructured (UN) covariance and use of the Kenward-Roger method, which combined a standard error adjustment with a Satterthwaite-type degree of freedom (DOF) approximation. M1-M3 represent an MMRM with compound symmetry (CS) covariance pattern that, respectively, used model-based, sandwich-based, and bias-corrected sandwich-based standard errors by Mancl and DeRouen (2001); t-based inference was used with the Between-Within method for determining DOF. M4-M9 represent a generalized estimating equations (GEE) model (marginal model with identify link) with inference based on asymptotic normality (M4-M6) or the t-distribution using the Between-Within DOF-approximation (M7-M9). Plots of the coverage of the 95% confidence intervals for the primary regression coefficients of interest (A) ${\beta _{0}}$, and (B) ${\beta _{4}}$ with the average standard errors of the respective regression coefficients listed at the right of each plot; the actual standard error estimated from the trial data for each regression coefficient is shown in parentheses for comparison. The red dashed line represents the targeted 95% confidence coefficient while the shaded pink region represents coverage between 94% to 96% as a frame of reference. (C) Plot of the average standard errors of the marginal mean change from baseline at week 48 with respect to the actual standard error estimated from the trial data (green dashed line); the shaded green region represents ± 5% of the actual standard error estimated from the trial data as a frame of reference. The power used to test the primary hypothesis, ${H_{0}}:\hspace{2.5pt}LS\hspace{2.5pt}\mathrm{Means}\hspace{2.5pt}\mathrm{at}\hspace{2.5pt}\mathrm{Week}\hspace{2.5pt}48=0$, is listed at the right of the plot.
Table 1
The three longitudinal, phase 3 clinical trials conducted in rare diseases, which formed the basis for the simulation study.
Clinical Trial No. Post-BL Occasions Trial Duration Sample Size Primary (or Important Secondary) Efficacy Endpoint Prospectively Defined Primary Analysis ClinTrials.Gov
Randomized or Enrolled Primary Analysis Set
UX001-CL301 6 48 Weeks 89 88 Primary endpoint: Change from baseline at Week 48 in the HHD Upper Extremity (UE) Composite Score GEE, Change from Baseline, compound symmetry, adjusting for sex, region, and baseline value NCT02377921
UX023-CL301 (Pediatrics) 2 64 Weeks 61 61 Primary endpoint: Rickets severity, assessed by the Radiographic Global Impression of Change (RGI-C) global score and measured at Week 40 GEE, Change from Baseline, compound symmetry, adjusting for age and the baseline total rickets severity score NCT02915705
UX023-CL304 (Adults) 5 48 Weeks 14 14 Important secondary endpoint: Change from baseline at Week 48 in alkaline phosphatase (ALP) GEE, Change from Baseline, compound symmetry, adjusting for the baseline value NCT02537431
BL = Baseline; GEE = generalized estimating equations.
Trials UX001-CL301 and UX023-CL301 were randomized controlled trials, while UX023-CL304 was a single arm trial.
The prespecified Primary Analysis Set was defined as all randomized (or enrolled) participants with a baseline measurement and at least one post-baseline measurement.
Standard GEE was applied such that estimation was based on method-of-moments assuming asymptotic normality using the SAS GENMOD procedure.
Table 2
Data generating model for UX001-CL301, UX023-CL301, and UX023-CL304.
Clinical Trial MMRM
Data Generating Model
UX001-CL301 $\begin{aligned}{}E\left({Y_{i}}\mid \hspace{2.5pt}{X_{i}}\right)={\beta _{0}}+{\beta _{1}}I\left(\mathrm{G}=1\right)& +{\beta _{2}}I\left(\mathrm{W}=16\right)+{\beta _{3}}I\left(\mathrm{W}=24\right)+{\beta _{4}}I\left(\mathrm{W}=32\right)+{\beta _{5}}I\left(\mathrm{W}=40\right)+{\beta _{6}}I\left(\mathrm{W}=48\right)+{\beta _{7}}I\left(\mathrm{G}=1\right)\times I\left(\mathrm{W}=16\right)\\ {} & +\hspace{2.5pt}{\beta _{8}}I\left(\mathrm{G}=1\right)\times I\left(\mathrm{W}=24\right)+{\beta _{9}}I\left(\mathrm{G}=1\right)\times I\left(\mathrm{W}=32\right)+{\beta _{10}}I\left(\mathrm{G}=1\right)\times I\left(\mathrm{W}=40\right)\\ {} & +{\beta _{11}}I\left(\mathrm{G}=1\right)\times I\left(\mathrm{W}=48\right)+{\gamma _{1}}\mathrm{Base}+{\gamma _{2}}I\left(\mathrm{Sex}=\mathrm{Male}\right)+{\gamma _{3}}I\left(\mathrm{Region}=\mathrm{NonUS}\right)\end{aligned}$
Repeated measurements taken on $i=1,\hspace{2.5pt}\dots ,\hspace{2.5pt}88$ participants (45 treatment; 43 placebo) at the same set of 6 post-baseline occasions. ${Y_{i}}$ = Change from baseline. I represents an indicator variable. G = Group; W = Week; Base = Baseline value; Reference categories for variables G, W, Sex, and Region were placebo, week 8, female, and US, respectively. Unstructured (UN) covariance structure was assumed. The estimates of the regression coefficients and variance / covariance matrix from fitting the MMRM model to the trial data were used as the true values. The covariates Base, Sex, and Region were generated from a distribution reasonably consistent with the trial data: a truncated N(56.14, 27.954), 55% males, and 66% Non-US. 100,000 data sets generated; on average, in any given simulated data set, 1.4% of the data were missing completely at random. The regression coefficients of primary interest are ${\beta _{1}}$ and ${\beta _{11}}$; specifically, the null hypothesis ${H_{0}}:\hspace{2.5pt}{\beta _{1}}+{\beta _{11}}=0$ corresponds to no treatment difference in the mean change from baseline at week 48 adjusting for the other covariates; this is similar to testing ${H_{0}}:\hspace{2.5pt}\mathrm{Treament}\hspace{2.5pt}\mathrm{Difference}\hspace{2.5pt}\text{in}\hspace{2.5pt}\mathrm{LS}\hspace{2.5pt}\mathrm{Means}\hspace{2.5pt}\mathrm{at}\hspace{2.5pt}\mathrm{Week}\hspace{2.5pt}48=0$ (i.e., the marginal mean change from baseline is the same in each arm).
UX023-CL301 (Pediatrics) $\begin{aligned}{}E\left({Y_{i}}\mid \hspace{2.5pt}{X_{i}}\right)={\beta _{0}}+{\beta _{1}}I\left(\mathrm{G}=2\right)& +{\beta _{2}}I\left(\mathrm{W}=64\right)+{\beta _{3}}I\left(\mathrm{G}=2\right)\times I\left(\mathrm{W}=64\right)\\ {} & +{\gamma _{1}}\mathrm{TOTSCORE}+{\gamma _{2}}I\left(\mathrm{Age}=2\right)\end{aligned}$
Repeated measurements taken on $i=1,\hspace{2.5pt}\dots ,61$ participants (32 conventional therapy; 29 burosumab) at the same set of 2 post-baseline occasions. ${Y_{i}}$ = Change from baseline. I represents an indicator variable. G = Group; W = Week; TOTSCORE = Baseline Total Rickets Severity Score; Reference categories for variables G, W, and Age were burosumab, week 40, and < 5 years, respectively. Unstructured (UN) covariance structure was assumed. The estimates of the regression coefficients and variance / covariance matrix from fitting the MMRM model to the trial data were used as the true values. The covariates TOTSCORE and Age were generated from a distribution reasonably consistent with the trial data: a truncated N(3.18, 1.057) and 57% ≥ 5 years old. 100,000 data sets were generated; on average, in any given simulated data set, 0.5% of the data were missing completely at random. The regression coefficient of primary interest, ${\beta _{1}}$, corresponds to the treatment difference in the mean change from baseline at week 40 adjusting for the other covariates; this is similar to testing ${H_{0}}:\hspace{2.5pt}\mathrm{Treament}\hspace{2.5pt}\mathrm{Difference}\hspace{2.5pt}\text{in}\hspace{2.5pt}\mathrm{LS}\hspace{2.5pt}\mathrm{Means}\hspace{2.5pt}\mathrm{at}\hspace{2.5pt}\mathrm{Week}\hspace{2.5pt}40=0$ (i.e., the marginal mean change from baseline is the same in each arm).
UX023-CL304 (Adults) $\begin{aligned}{}E\left({Y_{i}}\mid \hspace{2.5pt}{X_{i}}\right)={\beta _{0}}+{\beta _{1}}I\left(\mathrm{W}=12\right)& +{\beta _{2}}I\left(\mathrm{W}=24\right)+{\beta _{3}}I\left(\mathrm{W}=36\right)+{\beta _{4}}I\left(\mathrm{W}=48\right)\\ {} & +{\gamma _{1}}\mathrm{Base}\end{aligned}$
Repeated measurements taken on $i=1,\hspace{2.5pt}\dots ,14$ participants at the same set of 5 post-baseline occasions. ${Y_{i}}$ = Change from baseline in alkaline phosphatase. I represents an indicator variable. W = Week; Base = Baseline value; Reference category for the variable W was study week 4. Unstructured (UN) covariance structure was assumed. The estimates of the regression coefficients and variance / covariance matrix from fitting the MMRM model to the trial data were used as the true values. The covariate Base was generated from a distribution reasonably consistent with the trial data: a truncated N(113.14, 51.600). 500,000 data sets were generated; on average, in any given simulated data set, 1.2% of the data were missing completely at random. The regression coefficients of primary interest are ${\beta _{0}}$ and ${\beta _{4}}$; specifically, the null hypothesis ${H_{0}}:\hspace{2.5pt}{\beta _{0}}+{\beta _{4}}=0$ corresponds to no difference in the mean change from baseline at week 48 adjusting for the baseline value. This is similar to testing ${H_{0}}:\hspace{2.5pt}\mathrm{LS}\hspace{2.5pt}\mathrm{Mean}\hspace{2.5pt}\mathrm{at}\hspace{2.5pt}\mathrm{Week}\hspace{2.5pt}48=0$ (i.e., the marginal mean change from baseline equals zero).
Table 3
For each dataset generated in the simulation study, the following models were applied.
Clinical Trial Model Method Covariance Pattern Standard Error Adjustment Small-Sample Correction to Standard Errors Degree of Freedom (DOF) in t Tests and t-based Confidence Limits
UX001-CL301 & UX023-CL301 (Pediatrics) & UX023-CL304 (Adults) M0 MMRM UN Kenward-Roger No Kenward-Roger approximation for the DOF based on the SE adjustment
M1 MMRM CS None, Model-based No Between-Within method DOF by Schluchter and Elashoff (1990)
M2 MMRM CS Sandwich Estimator No
M3 MMRM CS Sandwich Estimator Mancl and DeRouen (2001)
M4 GEE CS None, Model-based No Asymptotic normal distribution (standard GEE) with z-based inference
M5* GEE CS Sandwich Estimator No
M6 GEE CS Sandwich Estimator Mancl and DeRouen (2001)
M7 GEE CS None, Model-based No Between-Within method DOF by Schluchter and Elashoff (1990)
M8 GEE CS Sandwich Estimator No
M9 GEE CS Sandwich Estimator Mancl and DeRouen (2001)
GEE = generalized estimating equations; MMRM = mixed model repeated measures; SE = standard error; DOF = degrees of freedom; UN = Unstructured; CS = Compound symmetry.
* Prespecified GEE method, which was the primary analysis method defined in the protocol for each trial.
  • Note 1. Trials UX001-CL301 (N = 88) and UX023-CL301 (N = 61) were randomized controlled trials, while UX023-CL304 (N = 14) was a single arm trial.
  • Note 2. The SAS procedure PROC GLIMMIX with METHOD = RSPL was used to fit the MMRM models. The SAS Keyword EMPIRICAL = FIRORES was used to obtain the bias-corrected SEs by Mancl and DeRouen (2001) made to the empirical/robust standard errors. The SAS Keyword DDFM = BETWITHIN was used to obtain the between-within DOF approximation by Schluchter and Elashoff (1990): The DOF approximation used for the between and within-subject effects were ${N_{1}}-{p_{1}}$ and ${N_{2}}-({N_{1}}+{p_{2}})$, respectively; here, ${N_{1}}=$ number of analyzable subjects; ${N_{2}}=$ number of nonmissing observations; ${p_{1}}=$ number of between-subject effects + intercept; ${p_{2}}=$ number of within-subject effects. For hypothesis tests that involved within-subject effects, ${N_{2}}-({N_{1}}+{p_{2}})$ DOF was used.
  • Note 3. The standard GEE method of moments approach was used that mirrors the SAS procedure PROC GENMOD, albeit with the use of the GEE WITH DIAGNOSTICS (Version 1.06) SAS/IML macro by John Preisser, University of North Carolina-Chapel Hill; the SAS/IML macro calculates the model-based, empirical, and bias-corrected SEs (Mancl and DeRouen, 2001).
  • Note 4. All models adjusted for the prespecified covariates.
Table 4
Missing data summary and empirical covariance pattern of the changes from baseline for each outcome according to the phase 3 clinical trial.
Phase 3 Clinical Trial No. Randomized or Enrolled No. Post-BL Occasions No. Patients Discontinuing Early Percent of BL and Post-BL Measurement Occasions Missing Data Little’s MCAR Testd Empirical Covariance Matrixe Changes from Baseline
UX001-CL301 89a 6 2b 1.4% (9 / 623) $P=0.54$ WK8 WK16 WK24 WK32 WK40 WK48
WK8 16.03 8.14 9.68 14.96 12.91 10.95
WK16 19.34 13.45 16.39 16.81 16.90
WK24 27.72 23.29 18.94 17.14
WK32 41.50 29.96 26.17
WK40 34.40 24.82
WK48 32.43
UX023-CL301 (Pediatrics) 61 2 0 0.5% (1 / 183) $P=0.19$ WK40 WK64
WK40 1.26 1.14
WK64 1.35
UX023-CL304 (Adults) 14 5 1c 1.2% (1 / 84) $P=0.42$ WK4 WK12 WK24 WK36 WK48
WK4 594.55 414.12 72.74 −70.23 −53.46
WK12 591.14 217.44 68.62 203.40
WK24 239.92 217.23 336.04
WK36 301.54 427.17
WK48 659.03
MCAR = Missing completely at random; WK = Week
Trials UX001-CL301 and UX023-CL301 were randomized controlled trials, while UX023-CL304 was a single arm trial.
  • a Of the 89 randomized participants, 88 were included in the prespecified Primary Analysis Set, which was defined in the protocol as all randomized participants with a baseline measurement and at least one post-baseline measurement; a single participant did not have any of the 6 post-baseline measurements.
  • b One participant submitted outcome data for the first five post-BL measurement occasions, while the other participant did not submit any post-BL measurements; reason for study discontinuation was subject non-compliance for both participants.
  • c The single participant, who withdrew consent, submitted outcome data for the first four post-BL measurement occasions.
  • d Data are missing completely at random (MCAR) when the pattern of missing values does not depend on the data values. The null hypothesis for Little’s MCAR Test is that the data are MCAR. A P > 0.05 indicates weak evidence against the null hypothesis.
  • e The empirical covariance matrix was estimated from the participants included in the prespecified Primary Analysis Set, namely, the N = 88, N = 61, and N = 14 participants in trials UX001-CL301, UX023-CL301, and UX023-CL304, respectively, who had a baseline measurement and at least one post-baseline measurement.
Box 1
Trial-specific simulation summary highlights when using a compound symmetry (CS) covariance pattern.
nejsds96_g004.jpg
Appendix Table 1
True values for the regression coefficients and variance/covariance matrix used to generate the data in the simulation study.
nejsds96_g005.jpg
Appendix Table 2A
Simulation study and comprehensive results for study UX001-CL301.
nejsds96_g006.jpg
Appendix Table 2A
(continued)
nejsds96_g007.jpg
Appendix Table 2B
Simulation study and comprehensive results for study UX023-CL301 (Pediatrics).
nejsds96_g008.jpg
Appendix Table 2B
(continued)
nejsds96_g009.jpg
Appendix Table 2C
Simulation study and comprehensive results for study UX023-CL304 (Adults).
nejsds96_g010.jpg
Appendix Table 2C
(continued)
nejsds96_g011.jpg
Appendix Table 2C
(continued)
nejsds96_g012.jpg

The New England Journal of Statistics in Data Science

  • ISSN: 2693-7166
  • Copyright © 2021 New England Statistical Society

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer
Powered by PubliMill  •  Privacy policy