How to estimate a latent process mixed model using lcmm function (2024)

How to estimate a latent process mixed model using lcmm function (1)

Source: vignettes/latent_process_model_with_lcmm.Rmd

latent_process_model_with_lcmm.Rmd

Background on the model

Each dynamic phenomenon can be characterized by a latent process \((\Lambda(t))\) which evolves in continuous time \(t\). When modeling repeated measures of marker, we usually don’t think of it as a latent process measured with error. Yet, this is the underlying assumption made by the mixed model theory. Function lcmm exploits this framework to extend the linear mixed model theory to any type of outcome (ordinal, binary, continuous with any distribution).

The latent process mixed model

The latent process mixed model is introduced in Proust-Lima et al. (2006 - https://doi.org/10.1111/j.1541-0420.2006.00573.x and 2013 - https://doi.org/10.1111/bmsp.12000 ).

The quantity of interest defined as a latent process is modeled according to time using a linear mixed model:

\[\Lambda(t) = X(t) \beta + Z(t)u_i +w_i(t)\]

where:

  • \(X(t)\) and \(Z(t)\) are vectors of covariates (\(Z(t)\) is included in \(X(t)\));
  • \(\beta\) are the fixed effects (i.e., population mean effects);
  • \(u_i\) are the random effects (i.e., individual effects); they are distributed according to a zero-mean multivariate normal distribution with covariance matrix \(B\);
  • \((w_i(t))\) is a Gaussian process that might be added in the model to relax the intra-subject correlation structure.

The relationship between the latent process of interest and the observations of the marker \(Y_{ij}\) (for subject \(i\) and occasion \(j\)) is simultaneously defined in an equation of observation:

\[Y_{ij} = H( ~ \Lambda(t_{ij})+\epsilon_{ij} ~ ; \eta)\]

where:

  • \(t_{ij}\) is the time of measurement for subject \(i\) and occasion \(j\);
  • \(\epsilon_{ij}\) is an independent zero-mean Gaussian error;
  • \(H\) is the link function (parameterized by \(\eta\)) that transforms the latent process into the scale and metric of the marker.

Different parametric families are used. When the marker is continuous, \(H^{-1}\) is a parametric family of increasing monotonic functions among:

  • the linear transformation: this reduces to the linear mixed model (2 parameters)
  • the Beta cumulative distribution family rescaled (4 parameters)
  • the basis of quadratic I-splines with m knots (m+2 parameters)

When the marker is discrete (binary or ordinal): \(H\) is a threshold function, that is each level of Y corresponds to an interval of \(\Lambda(t_{ij})+\epsilon_{ij}\) which boundaries are to be estimated.

Identifiability

As in any latent variable model, the metric of the latent variable has to be defined. In lcmm, the variance of the errors is 1 and the mean intercept (in \(\beta\)) is 0.

Example with CES-D

In this vignette, the latent process mixed models implemented in lcmm are illustrated by the study of the linear trajectory of depressive symptoms (as measured by CES-D scale) according to \(age65\) and adjusted for male. Correlated random effects for the intercept and \(age65\) are included.

Model considered:

\[CESD_{ij} = H(~ \beta_{1}age65_{ij}+\beta_{2}male_{i}+\beta_{3}age65_{ij}male_{i} +u_{0i}+u_{1i}age65_{ij}+\epsilon_{ij} ~ ; ~ \eta)\]

Where : \(u_{i} \sim \mathcal{N}(0,B)\) and \(\epsilon_{ij} \sim \mathcal{N}(0,1)\)

The Fixed part is \(\beta_{1}age65_{ij}+\beta_{2}male_{i}+\beta_{3}age65_{ij}male_{i}\) ; the random part is \(u_{0i}+u_{1i}age65_{ij}\).

Estimate the model for different continuous link functions \(H\)

We use the age variable recentered around 65 years old and in decades:

The latent process mixed model can be fitted with different link functions as shown below. This is done with argument link.

Linear link function

When defining the linear link function, the model reduces to a standard linear mixed model. The model can be fitted with lcmm function (with the linear link function by default):

mlin <- lcmm(CESD ~ age65*male, random=~ age65, subject='ID', data=paquid) #link= linear

It is the exact same model as one fitted by hlme. The only difference with a hlme object is the parameterization for the intercept and the residual standard error that are considered as rescaling parameters.

mlin2 <- hlme(CESD ~ age65*male, random=~ age65, subject='ID', data=paquid) #link= linear

The log likelihood are the same but the estimated parameters \(\beta\) are not in the same scale

Nonlinear link function 1: Beta cumulative distribution function

The rescaled cumulative distribution function (CDF) of a Beta distribution provides concave, convex or sigmoïd transformations between the marker and its underlying latent process.

mbeta <- lcmm(CESD ~ age65*male, random=~ age65, subject='ID', data=paquid, link='beta')

Nonlinear link function 2: Quadratic I-splines

The family of quadratic I-splines approximates any continuous increasing link function. It involves nodes that are distributed within the range of the marker. By default, 5 equidistant knots located in the marker range are used :

mspl <- lcmm(CESD ~ age65*male, random=~ age65, subject='ID', data=paquid, link='splines')

The number of knots and their location may be specified. The number of nodes is first entered followed by -, then the location is specified with equi, quant or manual for respectively equidistant knots, knots at quantiles of the marker distribution or interior knots entered manually in argument intnodes. For example, 7-equi-splines means I-splines with 7 equidistant nodes, 6-quant-splines means I-splines with 6 nodes located at the quantiles of the marker distribution. The shortcut splines stands for 5-equi-splines.

For an example with 5 knots placed at the quantiles:

mspl5q <- lcmm(CESD ~ age65*male, random=~ age65, subject='ID', data=paquid, link='5-quant-splines')

Select the best model

Objects mlin, mbeta, mspl and mspl5q are latent process mixed models that assume the exact same trajectory for the underlying latent process but different link functions: linear,BetaCDF, I-splines with 5 equidistant knots (default with link=‘splines’) and I-splines with 5 knots at percentiles, respectively. To select the most appropriate link function, one can compare these different models. Usually this is achieved by comparing the models in terms of goodness-of-fit using measures such as AIC or UACV.

The summarytable command gives the AIC (the UACV is in the output of each model):

In this case, the model with a link function approximated by I-splines with 5 knots placed at the quantiles provides the best fit according to the AIC criterion.

The different estimated link functions can be compared in a plot:

col <- rainbow(5)plot(mlin, which="linkfunction", bty='l', ylab="CES-D", col=col[1], lwd=2, xlab="underlying latent process")plot(mbeta, which="linkfunction", add=TRUE, col=col[2], lwd=2)plot(mspl, which="linkfunction", add=TRUE, col=col[3], lwd=2)plot(mspl5q, which="linkfunction", add=TRUE, col=col[4], lwd=2)legend(x="topleft", legend=c("linear", "beta","splines (5equidistant)","splines (5 at quantiles)"), lty=1, col=col, bty="n", lwd=2)

How to estimate a latent process mixed model using lcmm function (2)

We see that the 2 splines transformations are very close. The linear model does not seem to be appropriate, as shown by the gap betwwen the linear curve and the splines curves. The beta transformation departs from the splines only in the high values of the latent process.

Confidence bands of the transformations can be obtained by the Monte Carlo method :

linkspl5q <- predictlink(mspl5q,ndraws=2000)plot(linkspl5q, col=col[4], lty=2, shades=TRUE)legend(x="left", legend=c("95% confidence bands","for splines at quantiles"),lty=c(2,NA), col=c(col[4],NA), bty="n", lwd=1, cex=0.8)

How to estimate a latent process mixed model using lcmm function (3)

Estimate the model with a discrete link function \(H\)

Sometimes, with markers that have only a restricted number of different levels, continuous link functions are not appropriate and the ordinal nature of the marker has to be handled. lcmm function handles such a case by considering threshold link function. However, one has to know that numerical complexity of the model with threshold link function is much more important (due to a numerical integration over the random effect distribution). This has to be kept in mind when fitting this model and the number of random effects is to be chosen parcimoniously.

Note that this model becomes a cumulative probit mixed model.

Here is an example with \(HIER\) variable (4 levels) as considering a threshold link function for CESD would involve too many parameters given the range in 0-52 (e.g., 52 threshold parameters).

mthresholds <- lcmm(HIER ~ age65*male, random=~ age65, subject='ID', data=paquid, link='thresholds')

Postfit outputs

Summary

The summary of the model includes convergence, goodness of fit criteria and estimated parameters.

summary(mspl5q) General latent class mixed model  fitted by maximum likelihood method  lcmm(fixed = CESD ~ age65 * male, random = ~age65, subject = "ID",  link = "5-quant-splines", data = paquid) Statistical Model:  Dataset: paquid  Number of subjects: 500  Number of observations: 2104  Number of observations deleted: 146  Number of latent classes: 1  Number of parameters: 13  Link function: Quadratic I-splines with nodes 0 2 6 12 52  Iteration process:  Convergence criteria satisfied  Number of iterations: 19  Convergence criteria: parameters= 1.3e-08  : likelihood= 1.5e-07  : second derivatives= 3.1e-14  Goodness-of-fit statistics:  maximum log-likelihood: -6320.08  AIC: 12666.17  BIC: 12720.96   Discrete posterior log-likelihood: -6309.09  Discrete AIC: 12644.18   Mean discrete AIC per subject: 12.6442  Mean UACV per subject: 12.6439  Mean discrete LL per subject: -12.6182  Maximum Likelihood Estimates:  Fixed effects in the longitudinal model: coef Se Wald p-valueintercept (not estimated) 0 age65 0.42421 0.06279 6.756 0.00000male -0.83140 0.19742 -4.211 0.00003age65:male 0.23371 0.10300 2.269 0.02327Variance-covariance matrix of the random-effects: intercept age65intercept 1.89911 age65 -0.39567 0.1711Residual standard error (not estimated) = 1Parameters of the link function: coef Se Wald p-valueI-splines1 -2.03816 0.13469 -15.132 0.00000I-splines2 1.04627 0.02461 42.510 0.00000I-splines3 0.74190 0.03773 19.665 0.00000I-splines4 0.98399 0.03237 30.400 0.00000I-splines5 1.55606 0.04480 34.735 0.00000I-splines6 0.93273 0.16614 5.614 0.00000I-splines7 1.38790 0.17687 7.847 0.00000

Graph of predicted trajectories according to a profile of covariates

The predicted trajectories can be computed in the natural scale of the dependent variable and according to a profile of covariates:

datnew <- data.frame(age=seq(65,95,length=100))datnew$age65 <- (datnew$age - 65)/10datnew$male <- 0women <- predictY(mspl5q, newdata=datnew, var.time="age", draws=TRUE)datnew$male <- 1men <- predictY(mspl5q, newdata=datnew, var.time="age", draws=TRUE)

And then plotted:

plot(women, lwd=c(2,1), type="l", col=6, ylim=c(0,20), xlab="age in year",ylab="CES-D",bty="l", legend=NULL, shades = TRUE)plot(men, add=TRUE, col=4, lwd=c(2,1), shades=TRUE)legend(x="topleft", bty="n", ncol=2, lty=c(1,1,2,2), col=c(6,4,6,4), legend=c("women","men", "95% CI", "95% CI"), lwd=c(2,2,1,1)) 

How to estimate a latent process mixed model using lcmm function (4)

Goodness of fit 1: plot of residuals

The subject-specific residuals (qqplot in bottom right panel) should be Gaussian.

plot(mspl5q, cex.main=0.9)

Goodness of fit 2: plot of predictions versus observations

The mean predictions and observations can be plotted according to time. Note that the predictions and observations are in the scale of the latent process (observations are transformed with the estimated link function):

plot(mspl5q, which="fit", var.time="age65", bty="l", xlab="(age-65)/10", break.times=8, ylab="latent process", lwd=2, marg=FALSE, ylim=c(-1,2), shades=TRUE, col=2)

How to estimate a latent process mixed model using lcmm function (5)

To go further …

heterogeneous profiles of trajectories

The latent process mixed model extends to the heterogeneous case with latent classes. The same strategy as explained with hlme (see vignette ) can be used.

joint analysis of a time to event

The latent process mixed model extends to the case of a joint model. This is done in Jointlcmm and mpjlcmm. See the Jointlcmm vignette.

multiple markers of the same latent process

In some cases, several markers of the same underlying latent process may be measured. The latent process mixed model extends to that case. This is the purpose of multlcmm (see the vignette for continuous and ordinal outcomes.

How to estimate a latent process mixed model using lcmm function (2024)

FAQs

What is the latent class mixed effect model? ›

The Latent Class Linear Mixed Model (LCLMM) combines the features of the linear mixed model (LMM) with an additional component, which partitions the population into subpopulations or latent classes. This model has usually been specified with relatively simple, restrictive assumptions.

What is LCMm? ›

The R package lcmm provides a series of functions to estimate statistical models based on the linear mixed model theory. It includes the estimation of: mixed models and latent class mixed models for Gaussian longitudinal outcomes (hlme), curvilinear and ordinal univariate longitudinal outcomes (lcmm)

What is the estimate in a linear mixed model? ›

In this model, the parameters to estimate are the fixed-effects coefficients β, and the variance components θ and σ2. The two most commonly used approaches to parameter estimation in linear mixed-effects models are maximum likelihood and restricted maximum likelihood methods.

How do linear mixed models deal with missing data? ›

Think of it like having a jigsaw puzzle, and even if some pieces are missing, you can still put together a pretty good picture using the pieces you do have. That's what a linear-mixed model does with missing data - it helps us find relationships and patterns in our data even when some information is missing.

What is the latent class model of the LCM? ›

A latent class model (LCM) relates a set of observed discrete multivariate variables to a set of latent variables (latent variables are not directly observed but are rather inferred, mostly through a mathematical model, from other variables that are observed; e.g., quality of life or intelligence of a person is a ...

What is latent mixture modeling? ›

Latent variable mixture modeling is an emerging person-centered statistical approach that models heterogeneity by classifying individuals into unobserved groupings (latent classes) with similar (more hom*ogenous) patterns.

What is the Lcmm link function? ›

lcmm function estimates mixed models and latent class mixed models for different types of outcomes by assuming a parameterized link function for linking the outcome Y(t) with the underlying latent process L(t) it measures.

What is the survival rate for LCMm? ›

Bortezomib has shown superior efficacy in LCMM patients over nonbortezomib regimens as demonstrated by better overall response rate (95.5% vs. 60%), progression-free survival (PFS) (25% vs. 9% at two years), and overall survival (OS) (24% vs. 9% at five years).

How high is too high for a kappa light chain? ›

Normal results from a kappa free light chain test depend on the testing method and the lab's established reference ranges. The normal ranges for free light chains are generally: 3.3 to 19.4 milligrams per liter (mg/L) kappa free light chains.

How do you estimate a model? ›

Model estimation
  1. Consider a statistical process where an outcome is a function of various predictor variables . ...
  2. The notation for this equation can be simplified by using a matrix where the columns are the different predictor variables and the rows are different observations in a dataset.

How to interpret mixed effect model results? ›

Interpret the key results for Fit Mixed Effects Model
  1. Step 1: Determine whether the random terms significantly affect the response.
  2. Step 2: Determine whether the fixed effect terms significantly affect the response.
  3. Step 3: Determine how well the model fits your data.

What to report for a mixed effect model? ›

Report the results obtained: in addition to the fixed effects, report the variance of the random effects, commenting it through the relative Intraclass Correlation Coefficient (ICC). Describe what type of diagnostics you used to evaluate your model.

What does a linear mixed model tell you? ›

LMMs allow us to understand the important effects between and within levels while incorporating the corrections for standard errors for non-independence embedded in the data structure.

What are the disadvantages of linear mixed model? ›

Disadvantages include computational issues, interpretation, and problems arising from using the default correlation structure. If you only put random intercepts in the model you are assuming compound symmetry which doesn't fit well for serial data especially over long time spans.

What is the difference between linear and nonlinear mixed model? ›

Unlike linear mixed-effects models for longitudinal data, nonlinear mixed-effects models enable researchers to apply a wide range of nonlinear growth functions to data, including multi- phase functions. This talk reviews the syntax for the NLMIXED procedure for fitting a variety of nonlinear mixed-effects models.

What is the meaning of latent class? ›

Latent class analysis (LCA) is a statistical procedure used to identify qualitatively different subgroups within populations that share certain outward characteristics (Hagenaars & McCutcheon, 2002). Subgroups are referred to as latent groups (or classes).

What is an example of a latent class? ›

Examples of Latent Class Analysis

For example, you think that people fall into one of three different types: abstainers, social drinkers and alcoholics. Since you cannot directly measure what category someone falls into, this is a latent variable (a variable that cannot be directly measured).

What is latent class choice model? ›

Latent Class Choice Model (LCCM) is the most popular nonparametric distribution model and is usually adopted when the analyst hypothesizes that the unobserved heterogeneity can be represented through discrete constructs such as different decision protocols used by individuals, segments of the population with varying ...

What is the latent factor analysis model? ›

Factor Analysis is a method for modeling observed variables, and their covariance structure, in terms of a smaller number of underlying unobservable (latent) “factors.” The factors typically are viewed as broad concepts or ideas that may describe an observed phenomenon.

Top Articles
Latest Posts
Article information

Author: Prof. An Powlowski

Last Updated:

Views: 6553

Rating: 4.3 / 5 (64 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Prof. An Powlowski

Birthday: 1992-09-29

Address: Apt. 994 8891 Orval Hill, Brittnyburgh, AZ 41023-0398

Phone: +26417467956738

Job: District Marketing Strategist

Hobby: Embroidery, Bodybuilding, Motor sports, Amateur radio, Wood carving, Whittling, Air sports

Introduction: My name is Prof. An Powlowski, I am a charming, helpful, attractive, good, graceful, thoughtful, vast person who loves writing and wants to share my knowledge and understanding with you.