lcmm function - RDocumentation (2024)

Description

This function fits mixed models and latent class mixed models for differenttypes of outcomes. It handles continuous longitudinal outcomes (Gaussian ornon-Gaussian) as well as bounded quantitative, discrete and ordinallongitudinal outcomes. The different types of outcomes are taken intoaccount using parameterized nonlinear link functions between the observedoutcome and the underlying latent process of interest it measures. At thelatent process level, the model estimates a standard linear mixed model or alatent class linear mixed model when heterogeneity in the population isinvestigated (in the same way as in function hlme). It should benoted that the program also works when no random-effect is included.Parameters of the nonlinear link function and of the latent process mixedmodel are estimated simultaneously using a maximum likelihood method.

Usage

lcmm( fixed, mixture, random, subject, classmb, ng = 1, idiag = FALSE, nwg = FALSE, link = "linear", intnodes = NULL, epsY = 0.5, cor = NULL, data, B, convB = 1e-04, convL = 1e-04, convG = 1e-04, maxiter = 100, nsim = 100, prior, pprior = NULL, range = NULL, subset = NULL, na.action = 1, posfix = NULL, partialH = FALSE, verbose = FALSE, returndata = FALSE, var.time = NULL, nproc = 1, clustertype = NULL, computeDiscrete = NULL)

Value

The list returned is:

ns

number of grouping units in thedataset

ng

number of latent classes

loglik

log-likelihood ofthe model

best

vector of parameter estimates in the same order asspecified in B and detailed in section details

V

if the model converged (conv=1 or 3), vector containing the upper trianglematrix of variance-covariance estimates of Best with exception forvariance-covariance parameters of the random-effects for which V contains thevariance-covariance estimates of the Cholesky transformed parameters displayed incholesky. If conv=2, V contains the second derivatives of thelog-likelihood.

gconv

vector of convergence criteria: 1. on theparameters, 2. on the likelihood, 3. on the derivatives

conv

statusof convergence: =1 if the convergence criteria were satisfied, =2 if themaximum number of iterations was reached, =4 or 5 if a problem occuredduring optimisation

call

the matched call

niter

number ofMarquardt iterations

dataset

dataset

N

internal informationused in related functions

idiag

internal information used in relatedfunctions

pred

table of individual predictions and residuals in theunderlying latent process scale; it includes marginal predictions (pred_m),marginal residuals (resid_m), subject-specific predictions (pred_ss) andsubject-specific residuals (resid_ss) averaged over classes, the transformedobservations in the latent process scale (obs) and finally theclass-specific marginal and subject-specific predictions (with the number ofthe latent class: pred_m_1,pred_m_2,...,pred_ss_1,pred_ss_2,...). If var.timeis specified, the corresponding measurement time is also included. Thisoutput is not available yet when specifying a thresholds transformation.

pprob

table of posterior classification and posterior individualclass-membership probabilities

Xnames

list of covariates included inthe model

predRE

table containing individual predictions of therandom-effects : a column per random-effect, a line per subject. This outputis not available yet when specifying a thresholds transformation.

cholesky

vector containing the estimates of the Cholesky transformedparameters of the variance-covariance matrix of the random-effects

estimlink

table containing the simulated values of the marker andcorresponding estimated link function

epsY

definite positive realused to rescale the marker in (0,1) when the beta link function is used. Bydefault, epsY=0.5.

linktype

indicator of link function type: 0 forlinear, 1 for beta, 2 for splines and 3 for thresholds

linknodes

vector of nodes useful only for the 'splines' linkfunction

data

the original data set (if returndata is TRUE)

Arguments

fixed

a two-sided linear formula object for specifying thefixed-effects in the linear mixed model at the latent process level. Theresponse outcome is on the left of ~ and the covariates are separatedby + on the right of the ~. Fo identifiability purposes, theintercept specified by default should not be removed by a -1.

mixture

a one-sided formula object for the class-specific fixedeffects in the latent process mixed model (to specify only for a number oflatent classes greater than 1). Among the list of covariates included infixed, the covariates with class-specific regression parameters areentered in mixture separated by +. By default, an interceptis included. If no intercept, -1 should be the first term included.

random

an optional one-sided formula for the random-effects in thelatent process mixed model. Covariates with a random-effect are separated by+. By default, no random effect is included.

subject

name of the covariate representing the grouping structure.

classmb

an optional one-sided formula describing the covariates inthe class-membership multinomial logistic model. Covariates included areseparated by +. No intercept should be included in this formula.

ng

number of latent classes considered. If ng=1 nomixture nor classmb should be specified. If ng>1,mixture is required.

idiag

optional logical for the variance-covariance structure of therandom-effects. If FALSE, a non structured matrix ofvariance-covariance is considered (by default). If TRUE a diagonalmatrix of variance-covariance is considered.

nwg

optional logical of class-specific variance-covariance of therandom-effects. If FALSE the variance-covariance matrix is commonover latent classes (by default). If TRUE a class-specificproportional parameter multiplies the variance-covariance matrix in eachclass (the proportional parameter in the last latent class equals 1 toensure identifiability).

link

optional family of link functions to estimate. By default,"linear" option specifies a linear link function leading to a standardlinear mixed model (hom*ogeneous or heterogeneous as estimated inhlme). Other possibilities include "beta" for estimating a linkfunction from the family of Beta cumulative distribution functions,"thresholds" for using a threshold model to describe the correspondencebetween each level of an ordinal outcome and the underlying latent process,and "Splines" for approximating the link function by I-splines. For thislatter case, the number of nodes and the nodes location should be alsospecified. The number of nodes is first entered followed by -, thenthe location is specified with "equi", "quant" or "manual" for respectivelyequidistant nodes, nodes at quantiles of the marker distribution or interiornodes entered manually in argument intnodes. It is followed by- and finally "splines" is indicated. For example, "7-equi-splines"means I-splines with 7 equidistant nodes, "6-quant-splines" means I-splineswith 6 nodes located at the quantiles of the marker distribution and"9-manual-splines" means I-splines with 9 nodes, the vector of 7 interiornodes being entered in the argument intnodes.

intnodes

optional vector of interior nodes. This argument is onlyrequired for a I-splines link function with nodes entered manually.

epsY

optional definite positive real used to rescale the marker in(0,1) when the beta link function is used. By default, epsY=0.5.

cor

optional brownian motion or autoregressive process modeling thecorrelation between the observations. "BM" or "AR" should be specified,followed by the time variable between brackets. By default, no correlationis added.

data

optional data frame containing the variables named infixed, mixture, random, classmb andsubject.

B

optional specification for the initial values for the parameters.Three options are allowed: (1) a vector of initial values is entered (theorder in which the parameters are included is detailed in detailssection). (2) nothing is specified. A preliminary analysis involving theestimation of a standard linear mixed model is performed to choose initialvalues. (3) when ng>1, a lcmm object is entered. It should correspond tothe exact same structure of model but with ng=1. The program willautomatically generate initial values from this model. This specificationavoids the preliminary analysis indicated in (2). Note that due to possiblelocal maxima, the B vector should be specified and several differentstarting points should be tried.

convB

optional threshold for the convergence criterion based on theparameter stability. By default, convB=0.0001.

convL

optional threshold for the convergence criterion based on thelog-likelihood stability. By default, convL=0.0001.

convG

optional threshold for the convergence criterion based on thederivatives. By default, convG=0.0001.

maxiter

optional maximum number of iterations for the Marquardtiterative algorithm. By default, maxiter=100.

nsim

number of points used to plot the estimated link function. Bydefault, nsim=100.

prior

name of the covariate containing the prior on the latent classmembership. The covariate should be an integer with values in 0,1,...,ng.When there is no prior, the value should be 0. When there is a prior for thesubject, the value should be the number of the latent class (in 1,...,ng).

pprior

optional vector specifying the names of the covariates containing theprior probabilities to belong to each latent class. These probabilities should bebetween 0 and 1 and should sum up to 1 for each subject.

range

optional vector indicating the range of the outcome (that isthe minimum and maximum). By default, the range is defined according to theminimum and maximum observed values of the outcome. The option should beused only for Beta and Splines transformations.

subset

optional vector giving the subset of observations indata to use. By default, all lines.

na.action

Integer indicating how NAs are managed. The default is 1for 'na.omit'. The alternative is 2 for 'na.fail'. Other options such as'na.pass' or 'na.exclude' are not implemented in the current version.

posfix

Optional vector specifying the indices in vector B of theparameters that should not be estimated. Default to NULL, all parameters areestimated.

partialH

optional logical for Splines link functions only.Indicates whether the parameters of the link functions can be dropped fromthe Hessian matrix to define convergence criteria.

verbose

logical indicating if information about computation should bereported. Default to TRUE.

returndata

logical indicating if data used for computation should bereturned. Default to FALSE, data are not returned.

var.time

optional character indicating the name of the time variable.

nproc

the number cores for parallel computation.Default to 1 (sequential mode).

clustertype

optional character indicating the type of cluster for parallel computation.

computeDiscrete

optional logical indicating if a dscrete likelihood and UACVshould be computed. By default, if the outcome only consists of integers computeDiscrete=TRUE.

Author

Cecile Proust-Lima, Amadou Diakite, Benoit Liquet and VivianePhilipps

cecile.proust-lima@inserm.fr

Details

A. THE PARAMETERIZED LINK FUNCTIONS

lcmm function estimates mixed models and latent class mixed modelsfor different types of outcomes by assuming a parameterized link functionfor linking the outcome Y(t) with the underlying latent process L(t) itmeasures. To fix the latent process dimension, we chose to constrain the(first) intercept of the latent class mixed model at the latent processlevel at 0 and the standard error of the gaussian error of measurement at 1.These two parameters are replaced by additional parameters in theparameterized link function :

1. With the "linear" link function, 2 parameters are required thatcorrespond directly to the intercept and the standard error: (Y - b1)/b2 =L(t).

2. With the "beta" link function, 4 parameters are required for thefollowing transformation: [ h(Y(t)',b1,b2) - b3]/b4 where h is the Beta CDFwith canonical parameters c1 and c2 that can be derived from b1 and b2 asc1=exp(b1)/[exp(b2)*(1+exp(b1))] and c2=1/[exp(b2)*(1+exp(b1))], and Y(t)'is the rescaled outcome i.e. Y(t)'= [ Y(t) - min(Y(t)) + epsY ] / [max(Y(t)) - min(Y(t)) +2*epsY ].

3. With the "splines" link function, n+2 parameters are required for thefollowing transformation b_1 + b_2*I_1(Y(t)) + ... + b_n+2 I_n+1(Y(t)),where I_1,...,I_n+1 is the basis of quadratic I-splines. To constraint theparameters to be positive, except for b_1, the program estimates b_k^* (fork=2,...,n+2) so that b_k=(b_k^*)^2.

4. With the "thresholds" link function for an ordinal outcome in levels0,...,C. A maximumn of C parameters are required for the followingtransformation: Y(t)=c <=> b_c < L(t) <= b_c+1 with b_0 = - infinity andb_C+1=+infinity. The number of parameters is reduced if some levels do nothave any information. For example, if a level c is not observed in thedataset, the corresponding threshold b_c+1 is constrained to be the sameas the previous one b_c. The number of parameters in the link function isreduced by 1.

To constraint the parameters to be increasing, except for the firstparameter b_1, the program estimates b_k^* (for k=2,...C) so thatb_k=b_k-1+(b_k^*)^2.

Details of these parameterized link functions can be found in the referredpapers.

B. THE VECTOR OF PARAMETERS B

The parameters in the vector of initial values B or in the vector ofmaximum likelihood estimates best are included in the followingorder: (1) ng-1 parameters are required for intercepts in the latent classmembership model, and if covariates are included in classmb, ng-1paramaters should be entered for each one; (2) for all covariates infixed, one parameter is required if the covariate is not inmixture, ng paramaters are required if the covariate is also inmixture; When ng=1, the intercept is not estimated and no parametershould be specified in B. When ng>1, the first intercept is notestimated and only ng-1 parameters should be specified in B; (3) thevariance of each random-effect specified in random (including theintercept) if idiag=TRUE and the inferior triangularvariance-covariance matrix of all the random-effects if idiag=FALSE;(4) only if nwg=TRUE, ng-1 parameters for class-specific proportionalcoefficients for the variance covariance matrix of the random-effects; (5)In contrast with hlme, due to identifiability purposes, the standard errorof the Gaussian error is not estimated (fixed at 1), and should not bespecified in B; (6) The parameters of the link function: 2 for"linear", 4 for "beta", n+2 for "splines" with n nodes and the number oflevels minus one for "thresholds".

C. CAUTIONS REGARDING THE USE OF THE PROGRAM

Some caution should be made when using the program. convergence criteriaare very strict as they are based on derivatives of the log-likelihood inaddition to the parameter and log-likelihood stability. In some cases, theprogram may not converge and reach the maximum number of iterations fixed at100. In this case, the user should check that parameter estimates at thelast iteration are not on the boundaries of the parameter space. If theparameters are on the boundaries of the parameter space, the identifiabilityof the model is critical. This may happen especially with splines parametersthat may be too close to 0 (lower boundary) or classmb parameters that aretoo high or low (perfect classification). When identifiability of someparameters is suspected, the program can be run again from the formerestimates by fixing the suspected parameters to their value with optionposfix. This usually solves the problem. An alternative is to remove theparameters of the Beta of Splines link function from the inverse of theHessian with option partialH. If not, the program should be run again withother initial values, with a higher maximum number of iterations or lessstrict convergence tolerances.

Specifically when investigating heterogeneity (that is with ng>1): (1) Asthe log-likelihood of a latent class model can have multiple maxima, acareful choice of the initial values is crucial for ensuring convergencetoward the global maximum. The program can be run without entering thevector of initial values (see point 2). However, we recommend tosystematically enter initial values in B and try different sets ofinitial values. (2) The automatic choice of initial values we providerequires the estimation of a preliminary linear mixed model. The user shouldbe aware that first, this preliminary analysis can take time for largedatatsets and second, that the generated initial values can be very notlikely and even may converge slowly to a local maximum. This is the reasonwhy several alternatives exist. The vector of initial values can be directlyspecified in B the initial values can be generated (automatically orrandomly) from a model with ng=. Finally, function gridsearchperforms an automatic grid search.

D. NUMERICAL INTEGRATION WITH THE THRESHOLD LINK FUNCTION

With exception for the threshold link function, maximum likelihoodestimation implemented in lcmm does not require any numerical integrationover the random-effects so that the estimation procedure is relatively fast.See Proust et al. (2006) for more details on the estimation procedure.

However, with the threshold link function and when at least onerandom-effect is specified, a numerical integration over the random-effectsdistribution is required in each computation of the individual contributionto the likelihood which complicates greatly the estimation procedure. Forthe moment, we do not allow any option regarding the numerical integrationtechnics used. 1. When a single random-effect is specified, we use astandard non-adaptive Gaussian quadrature with 30 points. 2. When at leasttwo random-effects are specified, we use a multivariate non-adaptiveGaussian quadrature implemented by Genz (1996) in HRMSYM Fortran subroutine.

Further developments should allow for adaptive technics and more optionsregarding the numerical integration technic.

E. POSTERIOR DISCRETE LIKELIHOOD

Models involving nonlinear continuous link functions assume the continuousdata while the model with a threshold model assumes discrete data. As aconsequence, comparing likelihoods or criteria based on the likelihood (asAIC) for these models is not possible as the former are based on a Lebesguemeasure and the latter on a counting measure. To make the comparisonpossible, we compute the posterior discrete likelihood for all the modelswith a nonlinear continuous link function. This posterior likelihoodconsiders the data as discrete; it is computed at the MLE (maximumlikelihood estimates) using the counting measure so that models withthreshold or continuous link functions become comparable. Further detailscan be found in Proust-Lima, Amieva, Jacqmin-Gadda (2012).

In addition to the Akaike information criterion based on the discreteposterior likelihood, we also compute a universal approximatecross-validation criterion to compare models based on a different measure.See Commenges, Proust-Lima, Samieri, Liquet (2015) for further details.

References

Proust-Lima C, Philipps V, Liquet B (2017). Estimation of Extended Mixed Models Using Latent Classes and Latent Processes: The R Package lcmm. Journal of Statistical Software, 78(2), 1-56. doi:10.18637/jss.v078.i02

Genz and Keister (1996). Fully symmetric interpolatory rules for multipleintegrals over infinite regions with gaussian weight. Journal ofComputational and Applied Mathematics 71: 299-309.

Proust and Jacqmin-Gadda (2005). Estimation of linear mixed models with amixture of distribution for the random-effects. Comput Methods ProgramsBiomed 78: 165-73.

Proust, Jacqmin-Gadda, Taylor, Ganiayre, and Commenges (2006). A nonlinearmodel with latent process for cognitive evolution using multivariatelongitudinal data. Biometrics 62: 1014-24.

Proust-Lima, Dartigues and Jacqmin-Gadda (2011). Misuse of the linear mixedmodel when evaluating risk factors of cognitive decline. Amer J Epidemiol174(9): 1077-88.

Proust-Lima, Amieva and Jacqmin-Gadda (2013). Analysis of multivariate mixedlongitudinal data : a flexible latent process approach, British Journal ofMathematical and Statistical Psychology 66(3): 470-87.

Commenges, Proust-Lima, Samieri, Liquet (2015). A universal approximatecross-validation criterion for regular risk functions. Int J Biostat. 2015May;11(1):51-67

See Also

postprob, plot.lcmm, plot.predict,hlme

lcmm function - RDocumentation (2024)

FAQs

What is the Lcmm link function? ›

lcmm function estimates mixed models and latent class mixed models for different types of outcomes by assuming a parameterized link function for linking the outcome Y(t) with the underlying latent process L(t) it measures.

What is the latent class mixed model? ›

The Latent Class Linear Mixed Model (LCLMM) combines the features of the linear mixed model (LMM) with an additional component, which partitions the population into subpopulations or latent classes. This model has usually been specified with relatively simple, restrictive assumptions.

What is the explanation of link function? ›

Link functions are used to connect the outcome variable to the linear model (that is, the linear combination of the parameters estimated for each of the predictors in the model).

What is the main function of a link? ›

In computing, a hyperlink, or simply a link, is a digital reference to data that the user can follow or be guided to by clicking or tapping. A hyperlink points to a whole document or to a specific element within a document. Hypertext is text with hyperlinks.

What is the latent class model of the LCM? ›

A latent class model (LCM) relates a set of observed discrete multivariate variables to a set of latent variables (latent variables are not directly observed but are rather inferred, mostly through a mathematical model, from other variables that are observed; e.g., quality of life or intelligence of a person is a ...

What are the assumptions of the latent class model? ›

The main assumptions of LCA: persons in a sample belong to one latent class or another, i.e. latent classes are exhaustive and capture the whole of the sample. Another assumption is that each person at a given time belongs to one and only one of the latent classes: classes are mutually exclusive.

What is latent class analysis for dummies? ›

Latent class analysis (LCA) is a modeling approach that identifies individuals that share common characteristics, allowing distinct “clusters” to be isolated.

What is the function of the data link? ›

The data-link layer is responsible for transferring messages (or frame) from a given node to all other nodes in the CAN network. This layer handles bit stuffing and checksums for error handling, and after sending a message, waits for acknowledgment from the receivers. It is subdivided into two further layers: (a)

What does a log link function do? ›

The log link exponentiates the linear predictors. It does not log transform the outcome variable. Where μ=predicted value of Y given X, exp(β0) = the effect on the mean of μ when X=0, and exp(β1)= the multiplicative effect on the mean of Y for a one-unit increase in X.

What is the link function in Jet Reports? ›

LINK gives users advanced filtering capabilities in Jet Essentials. Using LINK allows users to tie together information from different tables. LINK is available for all connector types.

What is the function of link in reinforcement? ›

The main function of links is to prevent the longitudinal bars from buckling or slipping out of position. This is achieved by connecting the bars in a way that ensures they are securely held in place, even under extreme loads or seismic activity.

Top Articles
Latest Posts
Article information

Author: Corie Satterfield

Last Updated:

Views: 6549

Rating: 4.1 / 5 (62 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Corie Satterfield

Birthday: 1992-08-19

Address: 850 Benjamin Bridge, Dickinsonchester, CO 68572-0542

Phone: +26813599986666

Job: Sales Manager

Hobby: Table tennis, Soapmaking, Flower arranging, amateur radio, Rock climbing, scrapbook, Horseback riding

Introduction: My name is Corie Satterfield, I am a fancy, perfect, spotless, quaint, fantastic, funny, lucky person who loves writing and wants to share my knowledge and understanding with you.