1 Introduction
Linear mixed effects (LME) models are frequently used to analyze repeated measures data because they model flexibly the within-subject correlation often present in this type of data. Usually, for mathematical convenience, it is assumed that both random effect and error term follow normal distributions (N-LME). These restrictive assumptions, however, may result in a lack of robustness against departures from the normal distribution and invalid statistical inferences, especially when the data show heavy tails. For instance, substantial bias in the maximum likelihood (ML) estimates of regression parameters can result when the random-effects distribution is misspecified [11].
Several researchers have investigated alternative distributions for errors in LME models. For example, [28] propose a robust hierarchical linear mixed model in which the random effects and the within-subject errors have a multivariate Student’s t-distribution (t-LME). [18] and [16] developed some additional tools for t-LME models from likelihood-based and Bayesian perspectives. However, in longitudinal studies, such as those on environmental pollution and infectious diseases, measurements of some variables may be subjected to certain threshold values below or above which the measurements are not quantifiable. For instance, viral load measures the amount of actively replicating virus and, depending upon the diagnostic assays used, its measurement may be subjected to detection limits, below or above, which are not quantifiable. For this kind of data, [30] proposed the LME model for censored responses (LMEC), and an exact EM-type algorithm is proposed. Recently, [21] and [23] have considered the Student’s t-distribution in the context of LMEC models (t-LMEC), including influence diagnostics analyses, under different perturbation schemes. A common feature of these classes of LMECs is that the error terms are unconditionally independent. However, in longitudinal studies, repeated measures are collected over time, and hence, the error term tends to be serially correlated.
There are several proposals in the literature that account for the time dependence in longitudinal data. For instance, [9] studied longitudinal data with random effects and AR(1) errors under the Bayesian paradigm, [4] extended the classical random effects model to deal with time dependence, leaving the random effects distribution unspecified, [13] proposed a robust structure for a censored linear model based on the multivariate Student’s t-distribution, considering a damped exponential correlation (DEC) structure. More recently, [26] have proposed a full likelihood-based approach for the Gaussian N-LMEC modeling with autoregressive correlation of order p (AR(p)) errors (AR(p)-LMEC), including the implementation of the EM algorithm for maximum likelihood (ML) estimation.
Even though some proposals have been made to deal with the problem of serial correlation among the observations in LMEC models, to the best of our knowledge, there are no studies of t-LMEC with serially correlated error structures, such as DEC or AR(p). Thus, in this paper we propose the t-LMEC modeling considering some useful correlation structures (CSs). We develop a full likelihood-based treatment, including the implementation of a computationally efficient estimation method via the EM algorithm with the likelihood function, predictions of unobservable values of the response, and the asymptotic standard errors as byproducts. The model developed here is an extension of those previously presented by [21], [22] and [26] for the analysis of mixed-effects models with censored responses and HIV data.
The proposed algorithm and methods are implemented in the R package ARpLMEC [27], available on CRAN repository.
The rest of the paper is organized as follows: Section 2 discusses some preliminary results related to the multivariate Student’s t-distribution and its truncated version. Some of its key properties are also presented. In Section 3, the related likelihood-based inference is presented, including estimation of the standard errors for the regression parameters. The application of the proposed method is presented in Sections 4 and 5 through some simulation studies and the analysis of data from a real HIV case study, respectively. Finally, Section 6 concludes with a short discussion of issues raised by this study and some possible directions for future research.
2 Preliminaries
We begin our exposition by defining the notation and presenting some basic concepts which are used throughout the development of our methodology. We denote a random variable by an upper-case letter and its realization by the corresponding lower case and use boldface letters for vectors and matrices. ${\mathbf{I}_{p}}$ represents a $p\times p$ identity matrix and ${\mathbf{A}^{\top }}$ denotes the transpose of A matrix. Thus, for $\mathbf{a}={({a_{1}},\dots ,{a_{p}})^{\top }}$ and $\mathbf{b}={({b_{1}},\dots ,{b_{p}})^{\top }}$ we have that, if the Borel set in ${\mathbb{R}^{p}}$ has the form:
then, we use the shorthand notations $\{\mathbf{Y}\in \mathbb{A}\}=\{\mathbf{a}\le \mathbf{Y}\le \mathbf{b}\}$ and
(2.1)
\[\begin{aligned}{}\mathbb{A}& =\big\{({y_{1}},\dots ,{y_{p}})\in {\mathbb{R}^{p}}:\hspace{0.1667em}\hspace{0.1667em}\hspace{0.1667em}{a_{1}}\le {y_{1}}\le {b_{1}},\dots ,{a_{p}}\le {y_{p}}\le {b_{p}}\big\}\\ {} & =\big\{\mathbf{y}\in {\mathbb{R}^{p}}:\mathbf{a}\le \mathbf{y}\le \mathbf{b}\big\}\end{aligned}\]2.1 The Multivariate Student’s t-distribution
$\mathbf{Y}\sim {t_{p}}(\boldsymbol{\mu },\boldsymbol{\Sigma },\nu )$ denotes a random vector Y following a p-variate Student’s-t distribution, with location vector $\boldsymbol{\mu }$, positive-definite scale-covariance matrix $\boldsymbol{\Sigma }$ and degrees of freedom ν and ${t_{p}}(\mathbf{y}\mid \boldsymbol{\mu },\boldsymbol{\Sigma },\nu )$ denotes its probability density function (pdf).
The cumulative distribution function (cdf) of Y on $[\mathbf{a},\mathbf{b}]$ is denoted by:
An important property of the random vector Y is that it can be written as a scale mixture of the MVN (multivariate normal) random vector coupled with a positive random variable, i.e.,
where $\mathbf{Z}\sim {N_{p}}(\mathbf{0},\boldsymbol{\Sigma })$ independent of $U\sim \text{Gamma}(\nu /2,\nu /2)$, where ${N_{p}}(\boldsymbol{\mu },\boldsymbol{\Sigma })$ and $\text{Gamma}(a,b)$ denote the p-variate normal with mean vector $\boldsymbol{\mu }$ and variance-covariance matrix $\boldsymbol{\Sigma }$ and the gamma distribution with mean $a/b$, respectively.
2.2 The Multivariate Truncated Student’s t-distribution
A p-dimensional random vector Y is said to follow a doubly truncated Student’s t-distribution with location vector $\boldsymbol{\mu }$, scale-covariance matrix $\boldsymbol{\Sigma }$ and degrees of freedom ν over the truncation region $\mathbb{A}$, defined in Eq. (2.1), denoted by $\mathbf{Y}\sim T{t_{p}}(\boldsymbol{\mu },\boldsymbol{\Sigma },\nu ;\mathbb{A})$, if it has the pdf:
\[ T{t_{p}}(\mathbf{y}|\boldsymbol{\mu },\boldsymbol{\Sigma },\nu ;\mathbb{A})=\frac{{t_{p}}(\mathbf{y}|\boldsymbol{\mu },\boldsymbol{\Sigma },\nu )}{{L_{p}}(\mathbf{a},\mathbf{b};\boldsymbol{\mu },\boldsymbol{\Sigma },\nu )},\hspace{0.1667em}\hspace{0.1667em}\mathbf{a}\le \mathbf{y}\le \mathbf{b},\]
where ${L_{p}}(\mathbf{a},\mathbf{b};\boldsymbol{\mu },\boldsymbol{\Sigma },\nu )$ as defined in Eq. (2.2).The cdf of Y evaluated at the region $\{\mathbf{a}\le \mathbf{y}\le \mathbf{b}\}$ is:
(2.4)
\[\begin{aligned}{}T{T_{p}}(\mathbf{y}|\boldsymbol{\mu },\boldsymbol{\Sigma },\nu ;\mathbb{A})& =\frac{1}{{L_{p}}(\mathbf{a},\mathbf{b};\boldsymbol{\mu },\boldsymbol{\Sigma },\nu )}{\int _{\mathbf{a}}^{\mathbf{y}}}{t_{p}}(\mathbf{x}|\boldsymbol{\mu },\boldsymbol{\Sigma },\nu )d\mathbf{x}\\ {} & =\frac{{L_{p}}(\mathbf{a},\mathbf{y};\boldsymbol{\mu },\boldsymbol{\Sigma },\nu )}{{L_{p}}(\mathbf{a},\mathbf{b};\boldsymbol{\mu },\boldsymbol{\Sigma },\nu )}.\end{aligned}\]3 Linear Mixed-effects with Censored Response
3.1 Model Specification
We proceed as in [21, 28], by considering a generalization of the classical N-LME model in which the random terms are assumed to follow a Student’s-t distribution as follows:
The subscript “i” is the subject index; Diag(A,B) is a block diagonal matrix whose elements are the matrices A and B. ${\mathbf{Y}_{i}}={({Y_{i1}},\dots ,{Y_{i{n_{i}}}})^{\top }}$ is a ${n_{i}}\times 1$ vector of observed continuous responses for sample unit “i”, ${\mathbf{X}_{i}}$ is the ${n_{i}}\times s$ design matrix corresponding to the fixed effects $\boldsymbol{\beta }$, which is the $s\times 1$ vector of population-averaged regression coefficients. ${\mathbf{Z}_{i}}$ is the ${n_{i}}\times q$ design matrix corresponding to the $q\times 1$ vector of random effects ${\mathbf{b}_{i}}$. ${\boldsymbol{\epsilon }_{i}}$ is the ${n_{i}}\times 1$ vector of random errors, the dispersion matrix $\mathbf{D}=\mathbf{D}(\boldsymbol{\alpha })$ depends on unknown and reduced parameters $\boldsymbol{\alpha }$. The correlation structure of the error vector is assumed to be ${\boldsymbol{\Omega }_{i}}={\sigma ^{2}}{\mathbf{R}_{i}}$, where ${\mathbf{R}_{i}}={\mathbf{R}_{i}}(\boldsymbol{\phi })$, for $\boldsymbol{\phi }={({\phi _{1}},\dots ,{\phi _{p}})^{\top }}$ is a ${n_{i}}\times {n_{i}}$ matrix, that incorporates a time-dependence structure.
Note that ${\mathbf{b}_{i}}$ and ${\boldsymbol{\epsilon }_{i}}$ are uncorrelated, once $\text{Cov}({\mathbf{b}_{i}},{\boldsymbol{\epsilon }_{i}})=\mathbb{E}[{\mathbf{b}_{i}}{\boldsymbol{\epsilon }_{i}^{\top }}]=\mathbb{E}[\mathbb{E}({\mathbf{b}_{i}}{\boldsymbol{\epsilon }_{i}^{\top }}|{U_{i}})]=\mathbf{0}$, where ${U_{i}}$ is a scalar generated from $\text{Gamma}(\nu /2,\nu /2)$. Classical inference on the parameter vector $\boldsymbol{\theta }={({\boldsymbol{\beta }^{\top }},{\sigma ^{2}},{\boldsymbol{\alpha }^{\top }},{\boldsymbol{\phi }^{\top }},\nu )^{\top }}$ is based on the marginal distribution of ${\mathbf{Y}_{i}}$, thus ${\mathbf{Y}_{i}}\stackrel{\mathrm{ind}.}{\sim }{t_{{n_{i}}}}({\boldsymbol{\mu }_{i}},{\boldsymbol{\Sigma }_{i}},\nu )$, for $i=1,\dots ,n$, where ${\boldsymbol{\mu }_{i}}={\mathbf{X}_{i}}\boldsymbol{\beta }$ and ${\boldsymbol{\Sigma }_{i}}={\sigma ^{2}}{\mathbf{R}_{i}}+{\mathbf{Z}_{i}}\mathbf{D}{\mathbf{Z}_{i}^{\top }}$. The estimates from the multivariate t-LME are more robust against outliers than those based on the standard LME; in this sense, [28] showed by simulation study that the t-LME substantially outperforms the normal or standard LME when outliers are present in the data. This issue has also been discussed by [21] in the context of censored mixed-effects models.
In this study, we follow [13, 15, 21, 23, 26] by considering non-informative censoring observations (or censoring at random), i.e. we assume that the response ${Y_{ij}}$ is not fully observed for all $i,j$. Thus, let $({\mathbf{V}_{i}},{\mathbf{C}_{i}})$ be the observed data for the i-th subject, where ${\mathbf{V}_{i}}$ represents the vector of uncensored readings (${V_{ij}}={V_{0i}}$) or censoring interval (${V_{1ij}}$, ${V_{2ij}}$) and ${\mathbf{C}_{i}}$ is the vector of censoring indicators, such that:
for all $i\in \{1,\dots ,n\}$ and $j\in \{1,\dots ,{n_{i}}\}$, i.e., ${C_{ij}}=1$ if ${Y_{ij}}$ is located within a specific interval. Note that for a right-censored observation ${V_{2ij}}=+\infty $ and for a left-censored observation ${V_{1ij}}=-\infty $. Moreover, missing at random (MAR) observations can be handled by considering ${V_{1ij}}=-\infty $ and ${V_{2ij}}=+\infty $. The model defined in Eqs. (3.1)–(3.3) is henceforth called the t-LMEC model.
(3.3)
\[ {C_{ij}}=\left\{\begin{array}{l@{\hskip10.0pt}l}1\hspace{1em}& \text{if}\hspace{1em}{V_{1ij}}\le {Y_{ij}}\le {V_{2ij}},\\ {} 0\hspace{1em}& \text{if}\hspace{1em}{Y_{ij}}={V_{0i}},\end{array}\right.\]3.2 Within-subject Dependence Structures
In order to enable some flexibility when modeling the error covariance, we consider essentially three dependence structures: unconditionally independent (UNC), the continuous-type autoregressive process of order p and the damped exponential correlation, which will be discussed next.
-
• Unconditional independenceThe most common and simplest approach is to assume that the error terms are Unconditionally independent (UNC), i.e. we have ${\mathbf{R}_{i}}={\mathbf{I}_{{n_{i}}}}$, for each $i=1,\dots ,n$. In practice, the linear mixed models considering UNC errors is very frequent, for example it has been considered by Matos et al. [21]. It will be denoted as UNC-t-LMEC.However, in longitudinal studies, repeated measures are collected over time and hence the error term might be serially correlated.
-
In order to account for the within-subject serial correlation, we consider other two general structures.
-
• Autoregressive dependence of order pIn this case, we propose ${\mathbf{R}_{i}}$ as a structured AR(p) dependence matrix [8]. Specifically,
(3.4)
\[ {\mathbf{R}_{i}}={\mathbf{R}_{i}}(\boldsymbol{\phi })=\frac{1}{1-{\phi _{1}}{\rho _{1}}-\cdots -{\phi _{p}}{\rho _{p}}}[{\rho _{|r-s|}}],\]\[ {\rho _{k}}={\phi _{1}}{\rho _{k-1}}+\cdots +{\phi _{p}}{\rho _{k-p}},\hspace{0.1667em}\hspace{0.1667em}{\rho _{0}}=1,\hspace{0.1667em}\hspace{0.1667em}\hspace{0.1667em}k=1,\dots ,p.\]In addition, the roots of $1-{\phi _{1}}B-{\phi _{2}}{B^{2}}-\cdots -{\phi _{p}}{B^{p}}=0$ must lie outside the unit circle to ensure stationarity of the AR(p) model.Following [3], the autoregressive process can be reparameterized using a one-to-one, continuous and differentiable transformation in order to simplify the conditions for stationarity. For details on the estimation of the autoregressive coefficients, we refer to [29], see also [17, 16] for details on estimation of LME model with AR(p) dependence.The model formulated in Eqs. (3.1)–(3.3) with ${\boldsymbol{\Omega }_{i}}={\sigma ^{2}}{\mathbf{R}_{i}}$, where ${\mathbf{R}_{i}}$ is given by Eq. (3.4), $i=1,\dots ,n$, will be denoted AR(p)-t-LMEC. In order to accommodate situations in which measurements are taken irregularly over discrete time, we modify ${\mathbf{R}_{i}}$ by computing it for a regular range of time and then suppressing the line and column regarding the position from the missing measurements. -
• Damped exponential correlationIn this case, following [25], we propose to structure ${\mathbf{R}_{i}}$ as a damped exponential correlation (DEC) matrix, as follows:
(3.5)
\[ {\mathbf{R}_{i}}={\mathbf{R}_{i}}(\boldsymbol{\phi },{\mathbf{t}_{i}})=\big[{\phi _{1}^{|{t_{ij}}-{t_{ik}}{|^{{\phi _{2}}}}}}\big],\hspace{0.1667em}\hspace{0.1667em}\hspace{0.1667em}\hspace{0.1667em}0\le {\phi _{1}}\lt 1,\hspace{0.1667em}\hspace{0.1667em}{\phi _{2}}\ge 0,\]-
(i) if ${\phi _{2}}=0$, then ${\mathbf{R}_{i}}$ reduces to the compound symmetry correlation structure (DEC-SYM);
-
(ii) if ${\phi _{2}}=1$, then ${\mathbf{R}_{i}}$ reduces to the DEC-AR(1) correlation structure;
-
(iii) if $0\lt {\phi _{2}}\lt 1$, then ${\mathbf{R}_{i}}$ generates a decay rate slower than the DEC-AR(1) structure;
-
(iv) if ${\phi _{2}}\gt 1$, then ${\mathbf{R}_{i}}$ generates a decay rate faster than the DEC-AR(1) structure; and
-
(v) if ${\phi _{2}}\to \infty $, then ${\mathbf{R}_{i}}$ converges to the correlation matrix of a moving-average of order 1 (DEC-MA(1)).
-
3.3 The Likelihood Function
Let ${\mathbf{Y}_{i}}={({\mathbf{Y}_{i}^{o}},{\mathbf{Y}_{i}^{c}})^{\top }}$ where ${\mathbf{Y}_{i}^{o}}$ and ${\mathbf{Y}_{i}^{c}}$ represent the ${n_{i}^{o}}$ and ${n_{i}^{c}}$ vectors of observed outcomes and censored observations for subject i, respectively; with ${n_{i}}={n_{i}^{o}}+{n_{i}^{c}}$, such that ${C_{ij}}=0$ for all elements in ${\mathbf{Y}_{i}^{o}}$ and 1 for all elements in ${\mathbf{Y}_{i}^{c}}$. After reordering, ${\mathbf{Y}_{i}}$, ${\mathbf{V}_{i}}$, ${\boldsymbol{\mu }_{i}}$, and ${\boldsymbol{\Sigma }_{i}}$ can be partitioned as follows:
where $\mathrm{vec}(.)$ denotes the function which stacks vectors or matrices of the same number of columns.
(3.6)
\[\begin{aligned}{}{\mathbf{Y}_{i}}& =\mathrm{vec}\big({\mathbf{Y}_{i}^{o}},{\mathbf{Y}_{i}^{c}}\big),\hspace{0.1667em}{\mathbf{V}_{i}}=\mathrm{vec}\big({\mathbf{V}_{i}^{o}},{\mathbf{V}_{i}^{c}}\big),\hspace{0.1667em}{\boldsymbol{\mu }_{i}^{\top }}=\big({\boldsymbol{\mu }_{i}^{o}},{\boldsymbol{\mu }_{i}^{c}}\big)\hspace{0.1667em}\hspace{0.1667em}\mathrm{and}\hspace{0.1667em}\hspace{0.1667em}\\ {} {\boldsymbol{\Sigma }_{i}}& =\left(\begin{array}{c@{\hskip10.0pt}c}{\boldsymbol{\Sigma }_{i}^{oo}}& {\boldsymbol{\Sigma }_{i}^{oc}}\\ {} {\boldsymbol{\Sigma }_{i}^{co}}& {\boldsymbol{\Sigma }_{i}^{cc}}\end{array}\right),\end{aligned}\]Using Proposition 1 (Appendix A), we have that
where ${L_{p}}(\mathbf{a},\mathbf{b};\boldsymbol{\mu },\boldsymbol{\Sigma },\nu )$ is as defined in Eq. (2.2), which can be easily evaluated by using the R package MomTrunc.
\[\begin{aligned}{}{\mathbf{Y}_{i}^{o}}& \sim {t_{{n_{i}^{o}}}}\big({\boldsymbol{\mu }_{i}^{o}},{\boldsymbol{\Sigma }_{i}^{oo}},\nu \big),\hspace{1em}\text{and}\hspace{1em}\\ {} {\mathbf{Y}_{i}^{c}}|{\mathbf{Y}_{i}^{o}}={\mathbf{y}_{i}^{o}}& \sim {t_{{n_{i}^{c}}}}\big({\boldsymbol{\mu }_{i}^{co}},{\mathbf{S}_{i}^{co}},\nu +{n_{i}^{o}}\big),\end{aligned}\]
where
\[\begin{aligned}{}{\boldsymbol{\mu }_{i}^{o}}& ={\mathbf{X}_{i}^{o}}\boldsymbol{\beta }\hspace{2.5pt}\hspace{2.5pt}\text{and}\hspace{2.5pt}\hspace{2.5pt}{\boldsymbol{\mu }_{i}^{c}}={\mathbf{X}_{i}^{c}}\boldsymbol{\beta },\\ {} {\boldsymbol{\mu }_{i}^{co}}& ={\boldsymbol{\mu }_{i}^{c}}+{\boldsymbol{\Sigma }_{i}^{co}}{\boldsymbol{\Sigma }_{i}^{oo-1}}({\mathbf{y}_{i}^{o}}-{\boldsymbol{\mu }_{i}^{o}}),\\ {} {\mathbf{S}_{i}^{co}}& =(\frac{\nu +{\delta ^{2}}({\mathbf{y}_{i}^{o}})}{\nu +{n_{i}^{o}}}){\mathbf{S}_{i}},\\ {} {\mathbf{S}_{i}}& ={\boldsymbol{\Sigma }_{i}^{cc}}-{\boldsymbol{\Sigma }_{i}^{co}}{\boldsymbol{\Sigma }_{i}^{oo-1}}{\boldsymbol{\Sigma }_{i}^{oc}}\hspace{2.5pt}\hspace{2.5pt}\text{and}\hspace{2.5pt}\\ {} {\delta ^{2}}({\mathbf{y}_{i}^{o}})& ={({\mathbf{y}_{i}^{o}}-{\boldsymbol{\mu }_{i}^{o}})^{\top }}{\boldsymbol{\Sigma }_{i}^{oo-1}}({\mathbf{y}_{i}^{o}}-{\boldsymbol{\mu }_{i}^{o}}).\end{aligned}\]
Let $\boldsymbol{\theta }={({\boldsymbol{\beta }^{\top }},{\sigma ^{2}},{\boldsymbol{\alpha }^{\top }},{\boldsymbol{\phi }^{\top }},\nu )^{\top }}$ be the parameters vector, as presented by [21], the likelihood function for subject i is given by:
(3.7)
\[\begin{aligned}{}{L_{i}}(\boldsymbol{\theta })& =f({\mathbf{y}_{i}}|\boldsymbol{\theta })=f({\mathbf{V}_{i}}|{\mathbf{C}_{i}},\boldsymbol{\theta })\\ {} & =f\big({\mathbf{y}_{i}^{o}}|\boldsymbol{\theta }\big)P\big({\mathbf{V}_{1i}^{c}}\le {\mathbf{Y}_{i}^{c}}\le {\mathbf{V}_{2i}^{c}}|{\mathbf{V}_{i}^{o}},\boldsymbol{\theta }\big)\\ {} & ={t_{{n_{i}^{o}}}}\big({\mathbf{V}_{i}^{o}};{\boldsymbol{\mu }_{i}^{o}},{\boldsymbol{\Sigma }_{i}^{oo}},\boldsymbol{\nu }\big){L_{{n_{i}^{c}}}}\big({\mathbf{V}_{1i}^{c}},{\mathbf{V}_{2i}^{c}};{\boldsymbol{\mu }_{i}^{co}},{\mathbf{S}_{i}^{co}},\nu +{n_{i}^{o}}\big),\end{aligned}\]The log-likelihood function for the observed data is given by $\ell (\boldsymbol{\theta }|\mathbf{y})={\textstyle\sum _{i=1}^{n}}\log {L_{i}}(\boldsymbol{\theta })$, and the estimates obtained by maximizing the log-likelihood function $\ell (\boldsymbol{\theta }|\mathbf{y})$ are the ML estimates.
3.4 The EM Algorithm
In order to obtain the ML estimation of the parameters in the t-LMEC model, presented in Section 3.1, we implemented the ECM algorithm, proposed by [24]. This algorithm preserves the properties of simplicity and stability of EM algorithm, however replace the M-step with a sequence of conditional maximization (CM) steps. Based in property of multivariate Student’s t-distribution, the t-LMEC model can be expressed in the following hierarchical model:
\[ \begin{aligned}{}\mathbf{Y}i|{\mathbf{b}_{i}},{u_{i}}& \stackrel{\mathrm{ind}.}{\sim }{\mathrm{N}_{{n_{i}}}}\big({\boldsymbol{\mu }_{i}},{u_{i}^{-1}}{\boldsymbol{\Omega }_{i}}\big),\\ {} {\mathbf{b}_{i}}|{u_{i}}& \stackrel{\mathrm{ind}.}{\sim }{\mathrm{N}_{q}}\big(\mathbf{0},{u_{i}^{-1}}\mathbf{D}\big),\\ {} {u_{i}}& \stackrel{\mathrm{iid}.}{\sim }Gamma\bigg(\frac{\nu }{2},\frac{\nu }{2}\bigg).\end{aligned}\]
Thus, it is possible to apply the ECM algorithm by assuming that $\mathbf{Y}=({\mathbf{Y}_{1}^{\top }},\dots ,{\mathbf{Y}_{n}^{\top }})$, $\mathbf{b}=({\mathbf{b}_{1}^{\top }},\dots ,{\mathbf{b}_{n}^{\top }})$ and $\mathbf{u}={({u_{1}},\dots ,{u_{n}})^{\top }}$ are hypothetical missing variables, and augmenting with the observed variables $(\mathbf{V},\mathbf{C})$ where $\mathbf{V}=\mathrm{vec}({\mathbf{V}_{1}},\dots ,{\mathbf{V}_{n}})$ and $\mathbf{C}=\mathrm{vec}({\mathbf{C}_{1}},\dots ,{\mathbf{C}_{n}})$.Considering ${\mathbf{Y}_{c}}={({\mathbf{C}^{\top }},{\mathbf{V}^{\top }},{\mathbf{y}^{\top }},{\mathbf{b}^{\top }},{\mathbf{u}^{\top }})^{\top }}$ and $\boldsymbol{\theta }={({\boldsymbol{\beta }^{\top }},{\sigma ^{2}},{\boldsymbol{\alpha }^{\top }},{\boldsymbol{\phi }^{\top }},\nu )^{\top }}$, the complete-data log-likelihood, is given by
where C is a constant that does not depend on the $\boldsymbol{\theta }$. $h({u_{i}}|\nu )$ represents the pdf of a $Gamma(\nu /2,\nu /2)$ distribution.
(3.8)
\[ \begin{aligned}{}{\ell _{c}}(\boldsymbol{\theta }|{\mathbf{y}_{c}})& ={\sum \limits_{i=1}^{n}}\left[-\frac{{n_{i}}}{2}\log {\sigma ^{2}}-\frac{1}{2}\log (|{\mathbf{R}_{i}}|)\right.\\ {} & \hspace{1em}-\left.\frac{{u_{i}}}{2{\sigma ^{2}}}{({\mathbf{y}_{i}}-{\boldsymbol{\mu }_{i}}-{\mathbf{Z}_{i}}{\mathbf{b}_{i}})^{\top }}{\mathbf{R}_{i}^{-1}}({\mathbf{y}_{i}}-{\boldsymbol{\mu }_{i}}-{\mathbf{Z}_{i}}{\mathbf{b}_{i}})\right.\\ {} & \hspace{1em}-\left.\frac{1}{2}\log |\mathbf{D}|-\frac{{u_{i}}}{2}{\mathbf{b}_{i}^{\top }}{\mathbf{D}^{-1}}{\mathbf{b}_{i}}+\log h({u_{i}}|\nu )+C\right],\end{aligned}\]Given the estimate $\boldsymbol{\theta }={\widehat{\boldsymbol{\theta }}^{(k)}}$, at iteration “$(k)$” of the algorithm, in the E-step computes the conditional expectation of the complete-data log-likelihood function, given by:
\[\begin{aligned}{}Q\big(\boldsymbol{\theta }|{\widehat{\boldsymbol{\theta }}^{(k)}}\big)& =\mathbb{E}\big[{\ell _{c}}(\boldsymbol{\theta }|{\mathbf{y}_{c}})|\mathbf{V},\mathbf{C},{\widehat{\boldsymbol{\theta }}^{(k)}}\big]\\ {} & ={\sum \limits_{i=1}^{n}}{Q_{1i}}\big(\boldsymbol{\beta },{\sigma ^{2}},\boldsymbol{\phi }|{\widehat{\boldsymbol{\theta }}^{(k)}}\big)+{\sum \limits_{i=1}^{n}}{Q_{2i}}\big(\boldsymbol{\alpha }|{\widehat{\boldsymbol{\theta }}^{(k)}}\big),\end{aligned}\]
where
\[\begin{aligned}{}& {Q_{1i}}\big(\boldsymbol{\beta },{\sigma ^{2}},\boldsymbol{\phi }|{\widehat{\boldsymbol{\theta }}^{(k)}}\big)\\ {} & \hspace{1em}=-\frac{{n_{i}}}{2}\log {\sigma ^{2}}-\frac{1}{2}\log (|{\mathbf{R}_{i}}|)\\ {} & \hspace{2em}-\frac{1}{2{\sigma ^{2}}}\Big[{\widehat{a}_{i}^{(k)}}-2{\boldsymbol{\mu }_{i}^{\top }}{\mathbf{R}_{i}^{-1}}\big({\widehat{{u_{i}}{\mathbf{y}_{i}}}^{(k)}}-{\mathbf{Z}_{i}}{\widehat{{u_{i}}{\mathbf{b}_{i}}}^{(k)}}\big)\\ {} & \hspace{2em}+{\widehat{{u_{i}}}^{(k)}}{\boldsymbol{\mu }_{i}^{\top }}{\mathbf{R}_{i}^{-1}}{\boldsymbol{\mu }_{i}}\Big]\hspace{1em}\text{and}\end{aligned}\]
\[\begin{aligned}{}{Q_{2i}}\big(\boldsymbol{\alpha }|{\widehat{\boldsymbol{\theta }}^{(k)}}\big)& =-\frac{1}{2}\log |\mathbf{D}|-\frac{1}{2}\operatorname{tr}\big({\widehat{{u_{i}}{\mathbf{b}_{i}}{\mathbf{b}_{i}^{\top }}}^{(k)}}{\mathbf{D}^{-1}}\big),\hspace{2.5pt}\text{with}\\ {} {\widehat{a}_{i}^{(k)}}& =\operatorname{tr}\big({\widehat{{u_{i}}{\mathbf{y}_{i}}{\mathbf{y}_{i}^{\top }}}^{(k)}}{\mathbf{R}_{i}^{-1}}-2{\widehat{{u_{i}}{\mathbf{y}_{i}}{\mathbf{b}_{i}^{\top }}}^{(k)}}{\mathbf{Z}_{i}^{\top }}{\mathbf{R}_{i}^{-1}}\\ {} & \hspace{1em}+{\widehat{{u_{i}}{\mathbf{b}_{i}}{\mathbf{b}_{i}^{\top }}}^{(k)}}{\mathbf{Z}_{i}^{\top }}{\mathbf{R}_{i}^{-1}}{\mathbf{Z}_{i}}\big),\\ {} {\widehat{{u_{i}}{\mathbf{b}_{i}}}^{(k)}}& =\mathbb{E}\big[{u_{i}}{\mathbf{b}_{i}}|{\mathbf{V}_{i}},{\mathbf{C}_{i}},{\widehat{\boldsymbol{\theta }}^{(k)}}\big]={\boldsymbol{\varphi }_{i}}\big({\widehat{{u_{i}}{\mathbf{y}_{i}}}^{(k)}}-{\widehat{{u_{i}}}^{(k)}}{\boldsymbol{\mu }_{i}}\big),\\ {} {\widehat{{u_{i}}{\mathbf{b}_{i}}{\mathbf{b}_{i}^{\top }}}^{(k)}}& =\mathbb{E}\big[{u_{i}}{\mathbf{b}_{i}}{\mathbf{b}_{i}^{\top }}|{\mathbf{V}_{i}},{\mathbf{C}_{i}},{\widehat{\boldsymbol{\theta }}^{(k)}}\big]\\ {} & ={\boldsymbol{\Lambda }_{i}}+{\boldsymbol{\varphi }_{i}}\big({\widehat{{u_{i}}{\mathbf{y}_{i}}{\mathbf{y}_{i}^{\top }}}^{(k)}}-2{\widehat{{u_{i}}{\mathbf{y}_{i}}}^{(k)}}{\boldsymbol{\mu }_{i}}\\ {} & \hspace{1em}+{\widehat{{u_{i}}}^{(k)}}{\boldsymbol{\mu }_{i}}{\boldsymbol{\mu }_{i}^{\top }}\big){\boldsymbol{\varphi }_{i}^{\top }},\\ {} {\widehat{{u_{i}}{\mathbf{y}_{i}}{\mathbf{b}_{i}^{\top }}}^{(k)}}& =\mathbb{E}\big[{u_{i}}{\mathbf{y}_{i}}{\mathbf{b}_{i}^{\top }}|{\mathbf{V}_{i}},{\mathbf{C}_{i}},{\widehat{\boldsymbol{\theta }}^{(k)}}\big]\\ {} & =\big({\widehat{{u_{i}}{\mathbf{y}_{i}}{\mathbf{y}_{i}^{\top }}}^{(k)}}-{\widehat{{u_{i}}{\mathbf{y}_{i}}}^{(k)}}{\boldsymbol{\mu }_{i}^{\top }}\big){\boldsymbol{\varphi }_{i}^{\top }},\end{aligned}\]
where ${\boldsymbol{\Lambda }_{i}}={({\mathbf{D}^{-1}}+{\mathbf{Z}_{i}^{\top }}{\mathbf{R}_{i}^{-1}}{\mathbf{Z}_{i}}/{\sigma ^{2}})^{-1}}$ and ${\boldsymbol{\varphi }_{i}}={\boldsymbol{\Lambda }_{i}}{\mathbf{Z}_{i}^{\top }}{\mathbf{R}_{i}^{-1}}/{\sigma ^{2}}$.In conditional maximization (CM) steps, we update ${\widehat{\boldsymbol{\theta }}^{(k+1)}}$ by maximizing $Q(\boldsymbol{\theta }|{\widehat{\boldsymbol{\theta }}^{(k)}})$ over $\boldsymbol{\theta }$, as follows:
where $N={\textstyle\sum _{i=1}^{n}}{n_{i}}$.
(3.9)
\[\begin{aligned}{}{\widehat{\boldsymbol{\beta }}^{(k+1)}}& ={\Bigg({\sum \limits_{i=1}^{n}}{\widehat{{u_{i}}}^{(k)}}{\mathbf{X}_{i}^{\top }}{\widehat{\mathbf{R}}_{i}^{-1(k)}}{\mathbf{X}_{i}}\Bigg)^{-1}}{\sum \limits_{i=1}^{n}}{\mathbf{X}_{i}^{\top }}{\widehat{\mathbf{R}}_{i}^{-1(k)}}\\ {} & \hspace{1em}\times \big({\widehat{{u_{i}}{\mathbf{y}_{i}}}^{(k)}}-{\mathbf{Z}_{i}}{\widehat{{u_{i}}{\mathbf{b}_{i}}}^{(k)}}\big),\end{aligned}\](3.10)
\[\begin{aligned}{}{\widehat{{\sigma ^{2}}}^{(k+1)}}& =\frac{1}{N}{\sum \limits_{i=1}^{n}}\left[{\widehat{a}_{i}^{(k)}}-2{\widehat{\boldsymbol{\mu }}_{i}^{(k+1)\top }}{\mathbf{R}_{i}^{-1}}\right.\\ {} & \hspace{1em}\times \left.\big({\widehat{{u_{i}}{\mathbf{y}_{i}}}^{(k)}}-{\mathbf{Z}_{i}}{\widehat{{u_{i}}{\mathbf{b}_{i}}}^{(k)}}\big)\right.\\ {} & \hspace{1em}+\left.{\widehat{{u_{i}}}^{(k)}}{\widehat{\boldsymbol{\mu }}_{i}^{(k+1)\top }}{\mathbf{R}_{i}^{-1}}{\widehat{\boldsymbol{\mu }}_{i}^{(k+1)}}\right],\end{aligned}\](3.11)
\[\begin{aligned}{}{\widehat{\mathbf{D}}^{(k+1)}}& =\frac{1}{n}{\sum \limits_{i=1}^{n}}{\widehat{{u_{i}}{\mathbf{b}_{i}}{\mathbf{b}_{i}^{\top }}}^{(k)}},\end{aligned}\](3.12)
\[\begin{aligned}{}{\widehat{\phi }^{(k+1)}}& =\underset{\phi }{\arg \max }\bigg(-\frac{1}{2}\log (|{\mathbf{R}_{i}}|)\\ {} & \hspace{1em}-\frac{1}{2{\widehat{{\sigma ^{2}}}^{(k+1)}}}\Big[{\widehat{a}_{i}^{(k)}}-2{\widehat{\boldsymbol{\mu }}_{i}^{(k+1)\top }}{\mathbf{R}_{i}^{-1}}\\ {} & \hspace{1em}\times \big({\widehat{{u_{i}}{\mathbf{y}_{i}}}^{(k)}}-{\mathbf{Z}_{i}}{\widehat{{u_{i}}{\mathbf{b}_{i}}}^{(k)}}\big)\\ {} & \hspace{1em}+{\widehat{{u_{i}}}^{(k)}}{\widehat{\boldsymbol{\mu }}_{i}^{(k+1)\top }}{\mathbf{R}_{i}^{-1}}{\widehat{\boldsymbol{\mu }}_{i}^{(k+1)}}\Big]\bigg),\end{aligned}\](3.13)
\[\begin{aligned}{}{\widehat{\nu }^{(k+1)}}& =\underset{\nu }{\arg \max }\Bigg\{{\sum \limits_{i=1}^{n}}\log \\ {} & \hspace{1em}\hspace{2em}\hspace{2em}{L_{{n_{i}^{c}}}}\big({\mathbf{V}_{1i}^{c}},{\mathbf{V}_{2i}^{c}};{\boldsymbol{\mu }_{i}^{c{o^{(k+1)}}}},{\mathbf{S}_{i}^{c{o^{(k+1)}}}},\nu +{n_{i}^{o}}\big)\\ {} & \hspace{1em}+{\sum \limits_{i=1}^{n}}\log {t_{{n_{i}^{o}}}}\big({\mathbf{V}_{i}^{o}};{\boldsymbol{\mu }_{i}^{{o^{(k+1)}}}},{\boldsymbol{\Sigma }_{i}^{o{o^{(k+1)}}}},\boldsymbol{\nu }\big)\Bigg\},\end{aligned}\]The algorithm is iterated until a suitable convergence rule is satisfied. In this case, the process is iterated until some distance between two successive evaluations of the currently penalized log-likelihood ${\ell _{p}}(\boldsymbol{\theta },\lambda )$, such as $|\ell ({\widehat{\boldsymbol{\theta }}^{(k+1)}})/\ell ({\widehat{\boldsymbol{\theta }}^{(k)}})-1|$ becomes small enough. For example, we adopted $\epsilon ={10^{-6}}$.
As proposed by [26], a set of starting values may be achieved by computing ${\widehat{\boldsymbol{\beta }}^{(0)}}$, ${\widehat{{\sigma ^{2}}}^{(0)}}$, ${\widehat{\mathbf{D}}^{(0)}}$ and ${\widehat{\boldsymbol{\phi }}^{(0)}}$, as the solution of the normal linear mixed-effects model, using the R package nlme.
On the other hand, it is important to stress that, from Eqs. (3.9)–(3.12), the E-step reduces to the computation of
\[\begin{aligned}{}\widehat{{u_{i}}{\mathbf{y}_{i}}{\mathbf{y}_{i}^{\top }}}& =\mathbb{E}\big[{u_{i}}{\mathbf{y}_{i}}{\mathbf{y}_{i}^{\top }}|{\mathbf{V}_{i}},{\mathbf{C}_{i}},\boldsymbol{\theta }\big],\hspace{0.1667em}\hspace{0.1667em}\hspace{1em}\\ {} \widehat{{u_{i}}{\mathbf{y}_{i}}}& =\mathbb{E}[{u_{i}}{\mathbf{y}_{i}}|{\mathbf{V}_{i}},{\mathbf{C}_{i}},\boldsymbol{\theta }]\hspace{1em}\text{and}\\ {} \hspace{1em}\widehat{{u_{i}}}& =\mathbb{E}[{u_{i}}|{\mathbf{V}_{i}},{\mathbf{C}_{i}},\boldsymbol{\theta }].\end{aligned}\]
These expected values can be determined, using the marginal and conditional moments of the first two moments of the TMVT distribution under a double truncation, presented in Propositions 3 and 4 (Appendix B). Thus:
-
1. If the i-th subject has only non-censored components, then:\[ \widehat{{u_{i}}}=\bigg(\frac{\nu +{n_{i}}}{\nu +{\delta ^{2}}({\mathbf{y}_{i}})}\bigg),\hspace{1em}\widehat{{u_{i}}{\mathbf{y}_{i}}}=\widehat{{u_{i}}}{\mathbf{y}_{i}},\hspace{1em}\widehat{{u_{i}}{\mathbf{y}_{i}}{\mathbf{y}_{i}^{\top }}}\hspace{-0.1667em}=\hspace{-0.1667em}\widehat{{u_{i}}}{\mathbf{y}_{i}}{\mathbf{y}_{i}^{\top }},\]where ${\delta ^{2}}({\mathbf{y}_{i}})={({\mathbf{y}_{i}}-{\boldsymbol{\mu }_{i}})^{\top }}{\boldsymbol{\Sigma }_{i}^{-1}}({\mathbf{y}_{i}}-{\boldsymbol{\mu }_{i}})$.
-
2. If the i-th subject has only censored components then, we have:\[\begin{aligned}{}\widehat{{u_{i}}}& =\frac{{L_{{n_{i}}}}({\mathbf{V}_{1i}},{\mathbf{V}_{2i}};{\boldsymbol{\mu }_{i}},{\boldsymbol{\Sigma }_{i}^{\ast }},\nu +2)}{{L_{{n_{i}}}}({\mathbf{V}_{1i}},{\mathbf{V}_{2i}};{\boldsymbol{\mu }_{i}},{\boldsymbol{\Sigma }_{i}},\nu )},\\ {} \widehat{{u_{i}}{\mathbf{y}_{i}}}& =\widehat{{u_{i}}}\mathbb{E}({\mathbf{W}_{i}}),\\ {} \widehat{{u_{i}}{\mathbf{y}_{i}}{\mathbf{y}_{i}^{\top }}}& =\widehat{{u_{i}}}\mathbb{E}\big({\mathbf{W}_{i}}{\mathbf{W}_{i}^{\top }}\big),\end{aligned}\]where $\mathbf{W}\sim T{t_{{n_{i}}}}({\boldsymbol{\mu }_{i}},{\boldsymbol{\Sigma }_{i}^{\ast }},\nu +2;({\mathbf{V}_{1i}},{\mathbf{V}_{2i}}))$, ${\boldsymbol{\mu }_{i}}={\mathbf{X}_{i}}\boldsymbol{\beta }$, ${\boldsymbol{\Sigma }_{i}^{\ast }}=\frac{\nu }{\nu +2}{\boldsymbol{\Sigma }_{i}}$, ${\boldsymbol{\Sigma }_{i}}={\mathbf{Z}_{i}}\mathbf{D}{\mathbf{Z}_{i}^{\top }}+{\boldsymbol{\Omega }_{i}}$.
-
3. If the i-th subject has censored and uncensored components and given that ${\mathbf{Y}_{i}}|{\mathbf{V}_{i}},{\mathbf{C}_{i}}$, ${\mathbf{Y}_{i}}|{\mathbf{V}_{i}},{\mathbf{C}_{i}},{\mathbf{Y}_{i}^{o}}$ and ${\mathbf{Y}_{i}^{c}}|{\mathbf{V}_{i}}$, ${\mathbf{C}_{i}}$, ${\mathbf{Y}_{i}^{o}}$ are equivalent process, then, we have:\[\begin{aligned}{}\widehat{{u_{i}}}& =\bigg(\frac{{n_{i}^{o}}+\nu }{\nu +{\delta ^{2}}({\mathbf{y}_{i}^{o}})}\bigg)\\ {} & \hspace{1em}\times \frac{{L_{{n_{i}^{c}}}}({\mathbf{V}_{1i}^{c}},{\mathbf{V}_{2i}^{c}};{\boldsymbol{\mu }_{i}^{co}},{\widetilde{\mathbf{S}}_{i}^{co}},\nu +{n_{i}^{o}}+2)}{{L_{{n_{i}^{c}}}}({\mathbf{V}_{1i}^{c}},{\mathbf{V}_{2i}^{c}};{\boldsymbol{\mu }_{i}^{co}},{\mathbf{S}_{i}^{co}},\nu +{n_{i}^{o}})},\\ {} \widehat{{u_{i}}{\mathbf{y}_{i}}}& =\mathrm{vec}\big(\widehat{{u_{i}}}{\mathbf{y}_{i}^{o}},\widehat{{u_{i}}}\mathbb{E}[{\mathbf{W}_{i}}]\big),\\ {} \widehat{{u_{i}}{\mathbf{y}_{i}}{\mathbf{y}_{i}^{\top }}}& =\left(\begin{array}{c@{\hskip10.0pt}c}\widehat{{u_{i}}}{\mathbf{y}_{i}^{o}}{\mathbf{y}_{i}^{o\top }}& \widehat{{u_{i}}}{\mathbf{y}_{i}^{o}}{\mathbb{E}^{\top }}[{\mathbf{W}_{i}}]\\ {} \widehat{{u_{i}}}\mathbb{E}[{\mathbf{W}_{i}}]{\mathbf{y}_{i}^{{o^{\top }}}}& \widehat{{u_{i}}}\mathbb{E}[{\mathbf{W}_{i}}{\mathbf{W}_{i}^{\top }}]\end{array}\right),\end{aligned}\]where ${\mathbf{W}_{i}}\sim T{t_{{n_{i}^{c}}}}({\boldsymbol{\mu }_{i}^{co}},{\widetilde{\mathbf{S}}_{i}^{co}},\nu +{n_{i}^{o}}+2;({\mathbf{V}_{1i}^{c}},{\mathbf{V}_{2i}^{c}}))$, ${\widetilde{\mathbf{S}}_{i}^{co}}=(\frac{\nu +{\delta ^{2}}({\mathbf{y}_{i}^{o}})}{\nu +{n_{i}^{o}}+2}){\mathbf{S}_{i}}$ and ${\mathbf{S}_{i}}$, ${\mathbf{S}_{i}^{co}}$ and ${\boldsymbol{\mu }_{i}^{co}}$ are defined in Section 3.3.
Formulas for $\mathbb{E}[\mathbf{W}]$ and $\mathbb{E}[\mathbf{W}{\mathbf{W}^{\top }}]$, where $\mathbf{W}\sim T{t_{p}}(\boldsymbol{\mu },\boldsymbol{\Sigma },\nu ;\mathbb{A})$, have been recently developed in the R package relliptical [31], using a slice sampling algorithm with Gibbs sampler steps.
3.5 The Expected Information Matrix
As developed by [21] and [26], we used the [20]’s technique to obtain an asymptotic approximation for the variances of the fixed effects in our t-MLE model. This approximation is given by:
\[ {\mathbf{I}_{s}}(\boldsymbol{\beta }|\mathbf{y})={\mathbf{I}_{c}}(\boldsymbol{\beta }|\mathbf{y})-{\mathbf{I}_{m}}(\boldsymbol{\beta }|\mathbf{y}),\]
where ${\mathbf{I}_{s}}(\boldsymbol{\beta }|\mathbf{y})$ represents the information about $\boldsymbol{\beta }$ in the observed data y, ${\mathbf{I}_{c}}(\boldsymbol{\beta }|\mathbf{y})$ is the conditional expectation of the complete-data information and ${\mathbf{I}_{m}}(\boldsymbol{\beta }|\mathbf{y})$ is the missing information. Therefore, the approximated covariance-matrix of $\widehat{\boldsymbol{\beta }}$ is given by:
\[ \widehat{\mathrm{Cov}}(\widehat{\boldsymbol{\beta }})\approx {\mathbf{I}_{s}^{-1}}(\boldsymbol{\beta }){|_{\widehat{\boldsymbol{\theta }}}},\]
where
\[\begin{aligned}{}{\mathbf{I}_{s}}(\boldsymbol{\beta })& ={\sum \limits_{i=1}^{n}}\left\{\bigg(\frac{\nu +{n_{i}}}{\nu +{n_{i}}+2}\bigg){\mathbf{X}_{i}^{\top }}{\boldsymbol{\Sigma }_{i}^{-1}}{\mathbf{X}_{i}}\right.\\ {} & \hspace{1em}-\left.{\mathbf{X}_{i}^{\top }}{\boldsymbol{\Sigma }_{i}^{-1}}\bigg(\bigg(\frac{\nu +{n_{i}}+2}{\nu +{n_{i}}}\bigg){\mathrm{E}_{2}}-{\mathrm{E}_{1}}\bigg){\boldsymbol{\Sigma }_{i}^{-1}}{\mathbf{X}_{i}}\right\},\end{aligned}\]
where
\[\begin{aligned}{}{\mathrm{E}_{1}}& =(\widehat{{u_{i}}{\mathbf{y}_{i}}}-\widehat{{u_{i}}}{\boldsymbol{\mu }_{i}}){(\widehat{{u_{i}}{\mathbf{y}_{i}}}-\widehat{{u_{i}}}{\boldsymbol{\mu }_{i}})^{\top }}\hspace{1em}\text{and}\\ {} {\mathrm{E}_{2}}& =\big(\widehat{{u_{i}^{2}}{\mathbf{y}_{i}}{\mathbf{y}_{i}^{\top }}}-\widehat{{u_{i}^{2}}{\mathbf{y}_{i}}}{\boldsymbol{\mu }_{i}^{\top }}-{\boldsymbol{\mu }_{i}}\widehat{{u_{i}^{2}}{\mathbf{y}_{i}^{\top }}}+\widehat{{u_{i}^{2}}}{\boldsymbol{\mu }_{i}}{\boldsymbol{\mu }_{i}^{\top }}\big).\end{aligned}\]
Note that ${\mathrm{E}_{1}}$ depend on the computation of $\widehat{{u_{i}}}$, $\widehat{{u_{i}}{\mathbf{y}_{i}}}$ that can be obtained in Section 3.4 and ${\mathrm{E}_{2}}$ depend on the following quantities
\[\begin{aligned}{}\widehat{{u_{i}^{2}}}& =\mathbb{E}\bigg[{\bigg(\frac{\nu +{n_{i}}}{\nu +{\delta ^{2}}({\mathbf{y}_{i}})}\bigg)^{2}}|{\mathbf{V}_{i}},{\mathbf{C}_{i}},\boldsymbol{\theta }\bigg],\\ {} \widehat{{u_{i}^{2}}{\mathbf{y}_{i}}}& =\mathbb{E}\bigg[{\bigg(\frac{\nu +{n_{i}}}{\nu +{\delta ^{2}}({\mathbf{y}_{i}})}\bigg)^{2}}{\mathbf{y}_{i}}|{\mathbf{V}_{i}},{\mathbf{C}_{i}},\boldsymbol{\theta }\bigg]\hspace{1em}\text{and}\\ {} \widehat{{u_{i}^{2}}{\mathbf{y}_{i}}{\mathbf{y}_{i}^{\top }}}& =\mathbb{E}\bigg[{\bigg(\frac{\nu +{n_{i}}}{\nu +{\delta ^{2}}({\mathbf{y}_{i}})}\bigg)^{2}}{\mathbf{y}_{i}}{\mathbf{y}_{i}^{\top }}|{\mathbf{V}_{i}},{\mathbf{C}_{i}},\boldsymbol{\theta }\bigg].\end{aligned}\]
These expected values can be determined in closed form using Propositions 3 and 4, as follows:
-
1. If the i-th subject has only non-censored components, then,\[ \widehat{{u_{i}^{2}}}\hspace{-0.1667em}=\hspace{-0.1667em}{\bigg(\frac{\nu +{n_{i}}}{\nu +{\delta ^{2}}({\mathbf{y}_{i}})}\bigg)^{2}},\hspace{1em}\widehat{{u_{i}^{2}}{\mathbf{y}_{i}}}\hspace{-0.1667em}=\hspace{-0.1667em}\widehat{{u_{i}^{2}}}{\mathbf{y}_{i}},\hspace{1em}\widehat{{u_{i}^{2}}{\mathbf{y}_{i}}{\mathbf{y}_{i}^{\top }}}\hspace{-0.1667em}=\hspace{-0.1667em}\widehat{{u_{i}^{2}}}{\mathbf{y}_{i}}{\mathbf{y}_{i}^{\top }},\]where ${\delta ^{2}}({\mathbf{y}_{i}})={({\mathbf{y}_{i}}-{\boldsymbol{\mu }_{i}})^{\top }}{\boldsymbol{\Sigma }_{i}^{-1}}({\mathbf{y}_{i}}-{\boldsymbol{\mu }_{i}})$.
-
2. If the i-th subject has only censored components then\[\begin{aligned}{}\widehat{{u_{i}^{2}}}& ={c_{p}}(\nu ,2)\frac{{L_{{n_{i}}}}({\mathbf{V}_{1i}},{\mathbf{V}_{2i}};{\boldsymbol{\mu }_{i}},{\boldsymbol{\Sigma }_{i}^{\ast }},\nu +4)}{{L_{{n_{i}}}}({\mathbf{V}_{1i}},{\mathbf{V}_{2i}};{\boldsymbol{\mu }_{i}},{\boldsymbol{\Sigma }_{i}},\nu )},\\ {} \widehat{{u_{i}^{2}}{\mathbf{y}_{i}}}& =\widehat{{u_{i}^{2}}}\mathbb{E}[{\mathbf{W}_{i}}],\\ {} \widehat{{u_{i}^{2}}{\mathbf{y}_{i}}{\mathbf{y}_{i}^{\top }}}& =\widehat{{u_{i}^{2}}}\mathbb{E}\big[{\mathbf{W}_{i}}{\mathbf{W}_{i}^{\top }}\big],\end{aligned}\]where ${c_{p}}(\nu ,2)=\frac{({n_{i}}+\nu )(\nu +2)}{\nu ({n_{i}}+\nu +2)}$, ${\mathbf{W}_{i}}\sim T{t_{{n_{i}}}}({\boldsymbol{\mu }_{i}},{\boldsymbol{\Sigma }_{i}^{\ast }},\nu +4$; $({\mathbf{V}_{1i}},{\mathbf{V}_{2i}}))$, ${\boldsymbol{\Sigma }_{i}^{\ast }}=\frac{\nu }{\nu +4}{\boldsymbol{\Sigma }_{i}}$, ${\boldsymbol{\mu }_{i}}={\mathbf{X}_{i}}\boldsymbol{\beta }$, ${\boldsymbol{\Sigma }_{i}}={\mathbf{Z}_{i}}\mathbf{D}{\mathbf{Z}_{i}^{\top }}+{\boldsymbol{\Omega }_{i}}$.
-
3. If the i-th subject has censored and uncensored components and given that $\{{\mathbf{Y}_{i}}|{\mathbf{V}_{i}},{\mathbf{C}_{i}}\}$, $\{{\mathbf{Y}_{i}}|{\mathbf{V}_{i}},{\mathbf{C}_{i}},{\mathbf{Y}_{i}^{o}}\}$ and $\{{\mathbf{Y}_{i}^{c}}|{\mathbf{V}_{i}},{\mathbf{C}_{i}},{\mathbf{Y}_{i}^{o}}\}$ are equivalent process, we have:\[\begin{aligned}{}\widehat{{u_{i}^{2}}}& =\frac{{d_{p}}({n_{i}^{o}},\nu ,2)}{{(\nu +{\delta ^{2}}({\mathbf{y}_{i}^{o}}))^{2}}}\\ {} & \hspace{1em}\times \frac{{L_{{n_{i}^{c}}}}({\mathbf{V}_{1i}^{c}},{\mathbf{V}_{2i}^{c}};{\boldsymbol{\mu }_{i}^{co}},{\widetilde{\mathbf{S}}_{i}^{co}},\nu +{n_{i}^{o}}+4)}{{L_{{n_{i}^{c}}}}({\mathbf{V}_{1i}^{c}},{\mathbf{V}_{2i}^{c}};{\boldsymbol{\mu }_{i}^{co}},{\mathbf{S}_{i}^{co}},\nu +{n_{i}^{o}})},\\ {} \widehat{{u_{i}^{2}}{\mathbf{y}_{i}}}& =\mathrm{vec}\big(\widehat{{u_{i}^{2}}}{\mathbf{y}_{i}^{o}},\widehat{{u_{i}^{2}}}\mathbb{E}[{\mathbf{W}_{i}}]\big),\\ {} \widehat{{u_{i}^{2}}{\mathbf{y}_{i}}{\mathbf{y}_{i}^{\top }}}& =\left(\begin{array}{c@{\hskip10.0pt}c}\widehat{{u_{i}^{2}}}{\mathbf{y}_{i}^{o}}{\mathbf{y}_{i}^{o\top }}& \widehat{{u_{i}^{2}}}{\mathbf{y}_{i}^{o}}{\mathbb{E}^{\top }}[{\mathbf{W}_{i}}]\\ {} \widehat{{u_{i}^{2}}}\mathbb{E}[{\mathbf{W}_{i}}]{\mathbf{y}_{i}^{{o^{\top }}}}& \widehat{{u_{i}^{2}}}\mathbb{E}[{\mathbf{W}_{i}}{\mathbf{W}_{i}^{\top }}]\end{array}\right),\end{aligned}\]where ${d_{p}}({n_{i}^{o}},\nu ,2)=\frac{(\nu +{n_{i}})({n_{i}^{o}}+\nu +2)({n_{i}^{o}}+\nu )}{{n_{i}}+\nu +2}$, ${\mathbf{W}_{i}}\sim T{t_{{n_{i}^{c}}}}({\boldsymbol{\mu }_{i}^{co}},{\widetilde{\mathbf{S}}_{i}^{co}},\nu +{n_{i}^{o}}+4;({\mathbf{V}_{1i}^{c}},{\mathbf{V}_{2i}^{c}}))$, ${\widetilde{\mathbf{S}}_{i}^{co}}=(\frac{\nu +{\delta ^{2}}({\mathbf{y}_{i}^{o}})}{\nu +{n_{i}^{o}}+4}){\mathbf{S}_{i}}$ and ${\mathbf{S}_{i}}$, ${\mathbf{S}_{i}^{co}}$ and ${\boldsymbol{\mu }_{i}^{co}}$ are presented in Section 3.3.
4 Simulation Studies
In this Section, we conduct two simulation studies in order to analyze the performance of our proposed methods. The first simulation study shows the asymptotic behavior of the ML estimators developed in Section 3.4 and the second one examines the consistency of approximate the standard error (SE) presented in Section 3.5. We consider the AR(p)-LMEC model, defined in Eqs. (3.1)–(3.3) with left censored cases and ${n_{i}}=7$ measurements for each subject. As recommended by [26], for both simulation schemes, the initial set of values to generate the sample were $\beta ={(1,2,1)^{\top }}$, ${\sigma ^{2}}=0.5$. For the dispersion matrices of the random error and random effects we assume an autoregressive dependence of order $p=2$ with $\phi ={(0.6,-0.2)^{\top }}$ and $D=\left(\begin{array}{c@{\hskip10.0pt}c}0.490& 0.001\\ {} 0.001& 0.002\end{array}\right)$, respectively. The design matrices for each individual are given by ${\mathbf{X}_{i}}=({{\mathbf{X}_{i}}_{1}},{{\mathbf{X}_{i}}_{2}},{{\mathbf{X}_{i}}_{3}})$ and ${\mathbf{Z}_{i}}=({\mathbf{1}_{7}},{{\mathbf{Z}_{i}}_{2}})$, with ${\mathbf{X}_{ik}}={({X_{ik1}},\dots ,{X_{ik7}})^{\top }}$ and ${\mathbf{Z}_{i2}}={({Z_{i21}},\dots ,{Z_{i27}})^{\top }}$ where ${\mathbf{1}_{m}}$ represents a $1\times m$ vector of ones. Each vector, ${X_{ikj}}$ and ${Z_{i2j}}$ are generated independently of a uniform distribution, thus ${X_{ikj}}\sim U(0,1)$ and ${Z_{i2j}}\sim U(1,3)$, $i=1,\dots ,n$, $k=2,3$ and $j=1,\dots ,7$. These sets of values are fixed for all the replications. It is important to stress that, for the two simulation, we consider the parameter ν as fixed at 4. Thus, the parameter vector is $\boldsymbol{\theta }={({\boldsymbol{\beta }^{\top }},{\sigma ^{2}},{\boldsymbol{\alpha }^{\top }},{\boldsymbol{\phi }^{\top }})^{\top }}$.
4.1 Simulation Study 1
We generated $R=100$ datasets from the AR(2)-t-LMEC model, considering each one of the combinations, of different sample sizes $n\in \{50,100,200,400,600\}$ and censoring levels $l\% \in \{0\% ,5\% ,20\% ,40\% \}$. In each combination, were fitted the AR(2)-t-LMEC and AR(2)-N-LMEC models. Thus, for both models, we computed the bias ($Bias$) and mean squared error ($MSE$) of the estimates obtained. These measures are defined by:
\[\begin{aligned}{}Bias(\widehat{{\theta _{i}}})& =\frac{1}{R}{\sum \limits_{j=1}^{R}}\big({\widehat{{\theta _{i}}}^{(j)}}-{\theta _{i}}\big)\hspace{2.5pt}\hspace{2.5pt}\hspace{2.5pt}\text{and}\hspace{2.5pt}\hspace{2.5pt}\hspace{2.5pt}\\ {} MSE(\widehat{{\theta _{i}}})& =\frac{1}{R}{\sum \limits_{j=1}^{R}}{\big({\widehat{{\theta _{i}}}^{(j)}}-{\theta _{i}}\big)^{2}},\end{aligned}\]
where ${\widehat{{\theta _{i}}}^{(j)}}$ denotes the ML estimate of ${\theta _{i}}$, for the j-th replication, $j=1,\dots ,R$.From Figures 1–2, we observe as a general rule that, when the distribution’s assumption is correct, the $Bias$ and $MSE$ tend to zero when the sample size increases, indicating that the ML estimates based on the proposed EM-type algorithm do provide good asymptotic properties, under the t-LMEC model. However, from Figures 3–4, we noted that when the distribution’s assumption is incorrect, the ML estimates do not provide good asymptotic properties; for example, the Bias of ${\widehat{\phi }_{1}}$ under the AR(2)-N-LMEC model, departs from zero when the sample size increases.
4.2 Simulation Study 2
In this simulation study, we consider sample sizes $n\in \{40,100\}$ and generated $R=100$ Monte Carlo samples from the AR(2)-N-LMEC and AR(2)-t-LMEC models, with different censoring levels $l\% \in \{0\% ,5\% ,15\% ,25\% ,40\% \}$. We analyzed the $SE$ for the $\boldsymbol{\beta }=({\beta _{1}},{\beta _{2}},{\beta _{3}})$ parameter vector (MC-SE), the averages values of the standard errors computed, using the empirical information matrix (MC-IM-SE) described in Section 3.5, and the percentage of times in the R samples, that the $95\% $ confidence interval contained the true parameter values (MC-COV), assuming asymptotic normality. These measures are defined by:
\[ \text{MC-IM-SE}(\widehat{{\boldsymbol{\beta }_{i}}})=\frac{1}{R}{\sum \limits_{j=1}^{R}}{\widehat{SE({\boldsymbol{\beta }_{i}})}^{(j)}}\]
and
\[ \text{MC-SE}(\widehat{{\boldsymbol{\beta }_{i}}})=\sqrt{\frac{1}{R-1}\Bigg({\sum \limits_{j=1}^{R}}{\big({\widehat{{\boldsymbol{\beta }_{i}}}^{(j)}}\big)^{2}}-{\Bigg(\frac{1}{R}{\sum \limits_{j=1}^{R}}{\widehat{{\boldsymbol{\beta }_{i}}}^{(j)}}\Bigg)^{2}}\Bigg)},\]
where for $j=1,\dots ,R$, we have that ${\widehat{{\boldsymbol{\beta }_{i}}}^{(j)}}$ and $\widehat{SE{({\boldsymbol{\beta }_{i}})^{(j)}}}$ represent the ML estimates of parameter ${\boldsymbol{\beta }_{i}}$ and the SE estimate of ${\widehat{{\boldsymbol{\beta }_{i}}}^{(j)}}$, respectively.Table 1
Standard errors of parameter estimates (MC-SE), average values of the standard errors (MC-IM-SE) and MC-COV, under t-MLE model with $n=40$.
Censoring | AR(2)-N-LMEC | |||
Levels | Criteria | ${\beta _{1}}$ | ${\beta _{2}}$ | ${\beta _{3}}$ |
0% | MC-IM-SE | 0.2106 | 0.2298 | 0.2160 |
MC-SE | 0.1648 | 0.1791 | 0.1604 | |
MC-COV | 0.98 | 0.97 | 0.98 | |
5% | MC-IM-SE | 0.2220 | 0.2151 | 0.2066 |
MC-SE | 0.1840 | 0.1601 | 0.1770 | |
MC-COV | 0.97 | 0.96 | 0.96 | |
15% | MC-IM-SE | 0.2136 | 0.2155 | 0.2100 |
MC-SE | 0.1538 | 0.1661 | 0.1611 | |
MC-COV | 1 | 0.96 | 0.97 | |
25% | MC-IM-SE | 0.2177 | 0.2161 | 0.2076 |
MC-SE | 0.2009 | 0.1708 | 0.1840 | |
MC-COV | 0.96 | 0.98 | 0.93 | |
40% | MC-IM-SE | 0.2569 | 0.2559 | 0.2551 |
MC-SE | 0.2050 | 0.2440 | 0.2124 | |
MC-COV | 0.96 | 0.95 | 0.96 | |
Censoring | AR(2)-t-LMEC | |||
Levels | Criteria | ${\beta _{1}}$ | ${\beta _{2}}$ | ${\beta _{3}}$ |
0% | MC-IM-SE | 0.1469 | 0.1600 | 0.1467 |
MC-SE | 0.1366 | 0.1450 | 0.1338 | |
MC-COV | 0.95 | 0.98 | 0.96 | |
5% | MC-IM-SE | 0.1654 | 0.1575 | 0.1567 |
MC-SE | 0.1543 | 0.1418 | 0.1492 | |
MC-COV | 0.94 | 0.95 | 0.97 | |
15% | MC-IM-SE | 0.1834 | 0.1811 | 0.1743 |
MC-SE | 0.1443 | 0.1471 | 0.1315 | |
MC-COV | 0.98 | 0.97 | 0.98 | |
25% | MC-IM-SE | 0.2053 | 0.1973 | 0.2004 |
MC-SE | 0.1636 | 0.1451 | 0.1439 | |
MC-COV | 0.98 | 0.97 | 0.96 | |
40% | MC-IM-SE | 0.2525 | 0.2545 | 0.2503 |
MC-SE | 0.1690 | 0.1774 | 0.1781 | |
MC-COV | 0.98 | 0.94 | 0.93 |
Table 2
Standard errors of parameter estimates (MC-SE), average values of the standard errors (MC-IM-SE) and MC-COV, under t-MLE model with $n=100$.
Censoring | AR(2)-N-LMEC | |||
Levels | Criteria | ${\beta _{1}}$ | ${\beta _{2}}$ | ${\beta _{3}}$ |
0% | MC-IM-SE | 0.1267 | 0.1296 | 0.1231 |
MC-SE | 0.1159 | 0.1240 | 0.1242 | |
MC-COV | 0.94 | 0.96 | 0.95 | |
5% | MC-IM-SE | 0.1144 | 0.1151 | 0.1241 |
MC-SE | 0.1040 | 0.1063 | 0.1201 | |
MC-COV | 0.95 | 0.97 | 0.95 | |
15% | MC-IM-SE | 0.1149 | 0.1240 | 0.1208 |
MC-SE | 0.1003 | 0.1052 | 0.1062 | |
MC-COV | 0.97 | 0.97 | 0.99 | |
25% | MC-IM-SE | 0.1259 | 0.1257 | 0.1245 |
MC-SE | 0.1103 | 0.1315 | 0.1151 | |
MC-COV | 0.99 | 0.95 | 0.96 | |
40% | MC-IM-SE | 0.1446 | 0.1412 | 0.1371 |
MC-SE | 0.1144 | 0.1260 | 0.1271 | |
MC-COV | 0.97 | 0.98 | 0.96 | |
Censoring | AR(2)-t-LMEC | |||
Levels | Criteria | ${\beta _{1}}$ | ${\beta _{2}}$ | ${\beta _{3}}$ |
0% | MC-IM-SE | 0.0915 | 0.0960 | 0.0913 |
MC-SE | 0.0938 | 0.0923 | 0.0882 | |
MC-COV | 0.97 | 0.95 | 0.96 | |
5% | MC-IM-SE | 0.1012 | 0.1006 | 0.1041 |
MC-SE | 0.0938 | 0.0962 | 0.0939 | |
MC-COV | 0.96 | 0.97 | 0.97 | |
15% | MC-IM-SE | 0.1096 | 0.1102 | 0.1145 |
MC-SE | 0.1003 | 0.0978 | 0.0941 | |
MC-COV | 0.97 | 0.97 | 0.97 | |
25% | MC-IM-SE | 0.1257 | 0.1272 | 0.1249 |
MC-SE | 0.0932 | 0.1133 | 0.1021 | |
MC-COV | 0.99 | 0.96 | 0.96 | |
40% | MC-IM-SE | 0.1495 | 0.1478 | 0.1554 |
MC-SE | 0.0907 | 0.0968 | 0.1085 | |
MC-COV | 0.98 | 0.99 | 0.98 |
5 Application
In this Section, we study the viral load data of patients infected with the human immunodeficiency virus type 1 (HIV-1), from the clinical trial “A5055”, previously analyzed by [32]. This clinical trial consists of 44 HIV-1 infected patients that were treated with one of the two powerful antiretroviral therapies: (i) IDV 800 mg and RTV 200 mg, twice a day every 12 hs. and (ii) IDV 400 mg and RTV 400 mg, twice a day every 12 hs. (see [33] for more details). Were measured the number of RNA copies in the blood plasma (viral load) and the immunological marker of T cells of differentiation groups 4 (CD4), at different times of the treat. It is important to note that the viral loads were not necessarily measured at equally spaced intervals of days.
Following [33], we investigate the longitudinal evolution of RNA viral load (in log-base-10 scale) of the 44 patients, during antiretroviral treatment. We focus on analyzing only 42 HIV-1 infected patients; two patients were excluded from the analysis because both of them dropped out very early ($\mathrm{\# }\hspace{2.5pt}4$ and $\mathrm{\# }\hspace{2.5pt}8$ with 3 and 1 measurements, respectively).
The trajectories of transformed RNA viral load along with the detection limit by plotting dotted red lines are shown in Figure 5. The lower detection limit for RNA viral load is $50\hspace{2.5pt}\text{copies}/\text{ml}$, thus 106 out of 312 viral load measurements, below the detection limit, are considered to be censoring (approximately 34%). We start by presenting the results obtained by fitting the LMEC model defined in (3.1) and (3.3), as follows:
where
For comparison purposes, we fitted the N-LMEC and the t-LMEC counterpart with the structure dependency errors discussed in Section 3.2. Thus,
On the other hand, Table 5 shows the summaries of the log-likelihood, AIC and BIC criteria for t-LMEC and N-LMEC models, respectively, under the different autocorrelation structures. We observe that t-LMEC consistently outperforms the normal counterpart in all cases. In particular, these criteria indicate a preference of the unspecified correlation structure (DEC(AR)), i.e., the estimated correlation structure of the error is given by:
Thus, by the previous analysis, we conclude that the t-LMEC model with DEC(AR) correlation structure for the error, is the most appropriate model for these dataset.
\[ {\mathbf{Y}_{i}}={\mathbf{X}_{i}}\boldsymbol{\beta }+{\mathbf{Z}_{i}}{\mathbf{b}_{i}}+{\boldsymbol{\epsilon }_{i}},\hspace{0.1667em}\hspace{0.1667em}\]
with
-
${\mathbf{Y}_{i}}={({Y_{i1}},\dots ,{Y_{i{n_{i}}}})^{\top }}$,
-
${\mathbf{X}_{i}}=({\mathbf{1}_{{n_{i}}}^{\top }},{\mathbf{t}_{i}^{\top }},{\mathrm{Treat}_{i}}\times {\mathbf{1}_{{n_{i}}}^{\top }},{\mathrm{CD}\mathrm{4}^{1/2}}\times {\mathbf{1}_{{n_{i}}}^{\top }},{\mathrm{Treat}_{i}}\times {\mathbf{t}_{i}^{\top }})$,
-
${\mathbf{Z}_{i}}=({\mathbf{1}_{{n_{i}}}^{\top }},{\mathbf{t}_{i}^{\top }})$, $\boldsymbol{\beta }={({\beta _{0}},\dots ,{\beta _{4}})^{\top }}$, ${\mathbf{b}_{i}}={({b_{i0}},{b_{i1}})^{\top }}$,
-
${\boldsymbol{\epsilon }_{i}}={({\epsilon _{1}},\dots ,{\epsilon _{{n_{i}}}})^{\top }}$, for $j=1,\dots ,{n_{i}}$ and $i=1,\dots ,42$,
-
• ${\mathbf{1}_{m}}$ represents a $1\times m$ vector of ones and ${\mathbf{t}_{i}}=({t_{i1}},\dots ,{t_{i{n_{i}}}})$.
-
• ${Y_{ij}}$ is the is the log10-transformation of the viral load of subject “i” at time ${t_{ij}}$ (days).
-
• ${\mathrm{Treat}_{i}}$ is a treatment indicator (i.e. 0 for the treatments 1 and 1 for the treatments 2).
-
• ${\mathrm{CD}\mathrm{4}_{ij}^{1/2}}$ represents the square root of CD4 counts of patient i, measured at time ${t_{ij}}$.
-
• The random intercept and random slope are represent by ${b_{i0}}$ and ${b_{i1}}$, respectively.
-
(1) Assuming the N-LMEC model.We made a preliminary analysis using unconditional independence (UNC) in the random effects and random errors, i.e., the model proposed by [30]. The scatter plot of the estimated random intersections (${b_{0}}$ effect) and random slopes (${b_{1}}$ effect), presented in Figure 6(a), suggests that there is a possible linear relationship between both of them. We can suspect that there is a dependency structure for the random effect. From Figure 6(b), we observe that, apparently, the behavior of the marginal residuals is not totally random, highlighting a trend. Finally, in Figure 6(c)–(e), we show the QQ-plots for the ${b_{0}}$ and ${b_{1}}$ effect and the standard marginal residuals. These figures clearly indicate that the normality assumption for random effects and within-subject errors may be unrealistic. Moreover, the variogram plot of the standard marginal residuals, presented in Figure 7, indicates a long-term autocorrelation, showing a possible intra-individual correlation over time.Tables 3 presents the results for the “A5055” dataset considering the N-LMEC model, under the different correlation structures. We observed that only the estimation of ${\sigma ^{2}}$ parameters is affected by the choice of the correlation structure.
-
(2) Assuming the t-LMEC model.In order to address the possible serial correlation among within-subject errors, we fit several models: i) the unconditionally independent (UNC), ii) the damped exponential correlation (DEC), and iii) the continuous-type autoregressive process (AR) of order $p\in \{1,2,3\}$ cases. It is important to stress that considering the DEC structure it is possible to obtain different correlation structures, for instance, the symmetry (SYM) and DEC-AR(1).From Table 4, we observed that there are no significant differences in the estimation of the fixed and random effects parameters under the different correlation structures. On the other hand, the estimation of the ν and ${\sigma ^{2}}$ parameters are affected by the choice of the correlation structure.Note from Figure 8(a)–(b) that the inclusion of a correlation structure improves the model fit. The QQ plots for the ${b_{0}}$, ${b_{1}}$ effect, and standard marginal residuals, under the t-LMEC model with DEC-AR(1) correlation structure, presented in Figure 8(c)–(e) show a significant improvement, with respect to fit obtained under the N-LMEC model. On the other hand, in Figure 9(a)–(b), we present the simulated envelope plots of Marginal residuals, from fitting the DEC(AR)-N-LMEC and DEC(AR)-t-LMEC models. We observed that some marginal residuals under the DEC(AR)-N-LMEC model are outside the boundary of the envelope, indicating the inadequacy of this model. However, the marginal residuals corresponding to the DEC(AR)-t-LMEC model are within the boundary of the envelope. This shows that although we consider the DEC(AR) correlation structure, a correct specification of the data distribution assumption is essential. Finally, as suggested by [19], we performed the likelihood ratio test to examine the appropriateness of using DEC(AR) dependency structure errors versus UNC errors under the t-LMEC model, obtaining a p-value of 0.0058. As expected, this result indicates the existence of strong dependence among within-patient errors across occasions under the t-LMEC model.
Figure 5
A5055 Data. Trajectories of ${\log _{10}}(RNA)$ for 42 HIV-1 infected patients who were randomized to one of two treatment regimens.
Figure 6
A5055 Data. Scatter diagrams of the random effect and marginal errors and Q-Q plots of estimates of random intercepts and slopes and marginal residuals for log10(RNA) using the N-LMEC model with conditional independence in the random effects and the random errors.
Figure 7
A5055 Data. Variogram from the marginal residuals for log10(RNA) using the N-LMEC model with conditional independence in the random effects and the random errors.
Table 3
A5055 Data. Summary of parameter estimates and the standard errors of fixed effects (in parentheses) under the N-LMEC model, with different correlation structures for the errors.
Correlation structures | |||||||
Parameter estimates | Damped exponential | AR(p) | |||||
UNC | DEC | DEC-AR(1) | SYM | AR(1) | AR(2) | AR(3) | |
${\beta _{0}}$ | 4.383 | 4.379 | 4.378 | 4.388 | 4.384 | 4.382 | 4.287 |
(0.533) | (0.002) | (0.002) | (0.010) | (0.540) | (0.553) | (0.587) | |
${\beta _{1}}$ | −0.004 | −0.004 | −0.004 | −0.004 | −0.004 | −0.004 | −0.003 |
(0.009) | (0.008) | (0.008) | (0.008) | (0.009) | (0.009) | (0.009) | |
${\beta _{2}}$ | 0.305 | 0.304 | 0.303 | 0.304 | 0.307 | 0.308 | 0.361 |
(0.309) | (0.058) | (0.058) | (0.013) | (0.310) | (0.309) | (0.316) | |
${\beta _{3}}$ | −0.111 | −0.111 | −0.111 | −0.112 | −0.111 | −0.111 | −0.113 |
(0.031) | (0.010) | (0.010) | (0.008) | (0.032) | (0.032) | (0.034) | |
${\beta _{4}}$ | −0.003 | −0.003 | −0.003 | −0.003 | −0.003 | −0.003 | −0.003 |
(0.005) | (0.004) | (0.004) | (0.004) | (0.005) | (0.005) | (0.006) | |
${\sigma ^{2}}$ | 0.733 | 0.733 | 0.732 | 0.765 | 0.427 | 0.417 | 0.193 |
(–) | (–) | (–) | (–) | (–) | (–) | (–) | |
${\alpha _{11}}$ | 0.383 | 0.382 | 0.381 | 0.352 | 0.382 | 0.384 | 0.294 |
(–) | (–) | (–) | (–) | (–) | (–) | (–) | |
${\alpha _{12}}$ | −0.004 | −0.004 | −0.004 | −0.004 | −0.004 | −0.004 | −0.003 |
(–) | (–) | (–) | (–) | (–) | (–) | (–) | |
${\alpha _{22}}$ | >0.001 | >0.001 | >0.001 | >0.001 | >0.001 | >0.001 | >0.001 |
(–) | (–) | (–) | (–) | (–) | (–) | (–) | |
${\phi _{1}}$ | – | 0.998 | 1.000 | 0.000 | 0.652 | 0.762 | 0.273 |
(–) | (–) | (–) | (–) | (–) | (–) | (–) | |
${\phi _{2}}$ | – | 0.106 | 1 | 0 | – | −0.484 | 0.102 |
(–) | (–) | (–) | (–) | (–) | (–) | (–) | |
${\phi _{1}}$ | – | – | – | – | – | – | −0.820 |
(–) | (–) | (–) | (–) | (–) | (–) | (–) |
Table 4
A5055 Data. Summary of parameter estimates and the standard errors of fixed effects (in parentheses)under the t-LMEC model, with different correlation structures for the errors.
Correlation structures | |||||||
Parameter estimates | Damped exponential | AR(p) | |||||
UNC | DEC | DEC-AR(1) | SYM | AR(1) | AR(2) | AR(3) | |
${\beta _{0}}$ | 3.995 | 3.998 | 4.051 | 4.001 | 3.988 | 3.987 | 4.080 |
(0.604) | (0.509) | (0.531) | (0.513) | (0.607) | (0.606) | (0.643) | |
${\beta _{1}}$ | −0.005 | −0.005 | −0.004 | −0.005 | −0.005 | −0.005 | −0.004 |
(0.008) | (0.007) | (0.007) | (0.007) | (0.008) | (0.008) | (0.009) | |
${\beta _{2}}$ | 0.350 | 0.350 | 0.385 | 0.345 | 0.354 | 0.355 | 0.353 |
(0.309) | (0.250) | (0.264) | (0.253) | (0.308) | (0.309) | (0.323) | |
${\beta _{3}}$ | −0.086 | −0.086 | −0.092 | −0.086 | −0.086 | −0.086 | −0.094 |
(0.020) | (0.020) | (0.020) | (0.020) | (0.020) | (0.020) | (0.021) | |
${\beta _{4}}$ | −0.003 | −0.003 | −0.004 | −0.004 | −0.004 | −0.004 | −0.004 |
(0.005) | (0.005) | (0.005) | (0.005) | (0.005) | (0.006) | (0.006) | |
${\sigma ^{2}}$ | 0.473 | 0.476 | 0.510 | 0.507 | 0.286 | 0.282 | 0.175 |
(–) | (–) | (–) | (–) | (–) | (–) | (–) | |
${\alpha _{11}}$ | 0.220 | 0.222 | 0.210 | 0.194 | 0.218 | 0.218 | 0.222 |
(–) | (–) | (–) | (–) | (–) | (–) | (–) | |
${\alpha _{12}}$ | −0.002 | −0.002 | −0.002 | −0.002 | −0.002 | −0.002 | −0.002 |
(–) | (–) | (–) | (–) | (–) | (–) | (–) | |
${\alpha _{22}}$ | >0.001 | >0.001 | >0.001 | >0.001 | >0.001 | >0.001 | >0.001 |
(–) | (–) | (–) | (–) | (–) | (–) | (–) | |
${\phi _{1}}$ | – | 0.105 | 0.899 | 0.066 | 0.634 | 0.731 | 0.276 |
(–) | (–) | (–) | (–) | (–) | (–) | (–) | |
${\phi _{2}}$ | – | 0.998 | 1 | 0 | – | −0.421 | 0.109 |
(–) | (–) | (–) | (–) | (–) | (–) | (–) | |
${\phi _{3}}$ | – | – | – | – | – | – | −0.789 |
(–) | (–) | (–) | (–) | (–) | (–) | (–) | |
ν | 3.382 | 3.407 | 3.953 | 3.304 | 3.247 | 3.296 | 4.915 |
(–) | (–) | (–) | (–) | (–) | (–) | (–) |
Table 5
A5055 Data. Summary of selection criteria for the t-LMEC and N-LMEC model under different correlation structures.
Correlation structures | ||||||||
Model | Criteria | Damped exponential | AR(p) | |||||
UNC | DEC | DEC-AR(1) | SYM | AR(1) | AR(2) | AR(3) | ||
t-LMEC | loglik | −353.840 | −353.877 | −349.323 | −353.696 | −353.439 | −353.456 | −352.021 |
AIC | 729.681 | 731.755 | 720.646 | 729.393 | 728.878 | 730.913 | 730.042 | |
BIC | 770.854 | 776.671 | 761.819 | 770.566 | 770.0517 | 775.829 | 778.701 | |
N-LMEC | loglik | −362.571 | −362.643 | −362.671 | −362.569 | −362.343 | −362.385 | −355.407 |
AIC | 743.142 | 747.285 | 745.342 | 745.139 | 744.686 | 746.771 | 734.815 | |
BIC | 776.829 | 788.458 | 782.773 | 782.569 | 782.116 | 787.944 | 779.731 |
All the computational procedures were realized using a 64-bit Windows environment on a notebook machine with a 1.99 GHz Intel Core i7 processor with 12GB of RAM, the EM algorithm took approximately 9.8 min to converge by each model.
6 Conclusions
In this paper, we proposed a robust linear mixed effect model to analyze longitudinal censored or missing data under the multivariate Student’s t-distribution with within-subject correlation, considering some useful dependence structures. In practical implementation, the EM algorithm is used to obtain the ML estimates of the model parameters. The model proposed in this manuscript is an extension of [21], [22] and [26] by developing some additional tools for robust inferences in practical data analysis.
Two simulation studies were performed in order to evaluate the proposed model. The simulation studies validate our method’s performance and indicate an efficiency gain of the t-LMEC model over the N-LMEC model when data present heavy tails. A real data set, previously analyzed with a conditional independent structure, is analyzed. The results show the t-LMEC model with DEC-AR(1) correlation structure fits the data better. The proposed methods were implemented as part of the new R package ARpLMEC, which is available for download at the CRAN repository. Moreover, we have prepared a Rmarkdown file which contains the codes to reproduce the results of the application and it is available with this manuscript via GitHub https://github.com/hlachos/tlmec/.
Although the t-LMEC model considered here has shown great flexibility for modeling symmetric data with evidence of heavier tails than the normal distributions, its robustness against outliers can be seriously affected by the presence of skewness. Thus, it is of interest to generalize the t-LMEC model by considering a more flexible family of distributions, such as the scale mixtures of skew-normal (SMSN) distributions [14], to accommodate the censoring, skewness and heavy tails simultaneously. It is also of interest to develop effective algorithms for non-linear censored mixed effect (NLMEC) models or semiparametric structures in order to accommodate substantial non-linearity trends observed in the data.
Another possible to explore is to study models that estimate the conditional heteroscedasticity of dataset, as ARCH [12], GARCH [5, 6] models, among others. Finally, another interesting avenue for research is to propose methods within a unified framework in LMEC/NLMEC models for checking random-effects distribution following [1] for generalized linear mixed models and [10] for non-linear mixed models. An in-depth investigation of such extensions is beyond the scope of the present paper, but these are interesting topics for further research.