Description
This function fits joint latent class mixed models for a longitudinaloutcome and a right-censored (possibly left-truncated) time-to-event. Thefunction handles competing risks and Gaussian or non Gaussian (curvilinear)longitudinal outcomes. For curvilinear longitudinal outcomes, normalizingcontinuous functions (splines or Beta CDF) can be specified as inlcmm
.
Usage
Jointlcmm( fixed, mixture, random, subject, classmb, ng = 1, idiag = FALSE, nwg = FALSE, survival, hazard = "Weibull", hazardtype = "Specific", hazardnodes = NULL, hazardrange = NULL, TimeDepVar = NULL, link = NULL, intnodes = NULL, epsY = 0.5, range = NULL, cor = NULL, data, B, convB = 1e-04, convL = 1e-04, convG = 1e-04, maxiter = 100, nsim = 100, prior = NULL, pprior = NULL, logscale = FALSE, subset = NULL, na.action = 1, posfix = NULL, partialH = FALSE, verbose = FALSE, returndata = FALSE, var.time = NULL, nproc = 1, clustertype = NULL)jlcmm( fixed, mixture, random, subject, classmb, ng = 1, idiag = FALSE, nwg = FALSE, survival, hazard = "Weibull", hazardtype = "Specific", hazardnodes = NULL, hazardrange = NULL, TimeDepVar = NULL, link = NULL, intnodes = NULL, epsY = 0.5, range = NULL, cor = NULL, data, B, convB = 1e-04, convL = 1e-04, convG = 1e-04, maxiter = 100, nsim = 100, prior = NULL, pprior = NULL, logscale = FALSE, subset = NULL, na.action = 1, posfix = NULL, partialH = FALSE, verbose = FALSE, returndata = FALSE, var.time = NULL, nproc = 1, clustertype = NULL)
Value
The list returned is:
- loglik
log-likelihood of the model
- best
vector of parameter estimates in the same order as specified in
B
and detailed in sectiondetails
- V
if the model converged (conv=1 or 3), vector containingthe upper triangle matrix of variance-covariance estimates of
Best
with exception for variance-covariance parameters of the random-effects forwhichV
contains the variance-covariance estimates of the Choleskytransformed parameters displayed incholesky
.If conv=2,V
contains the second derivatives of the log-likelihood.- gconv
vector ofconvergence criteria: 1. on the parameters, 2. on the likelihood, 3. on thederivatives
- conv
status of convergence: =1 if the convergencecriteria were satisfied, =2 if the maximum number of iterations was reached,=4 or 5 if a problem occured during optimisation
- call
the matchedcall
- niter
number of Marquardt iterations
- pred
table ofindividual predictions and residuals; it includes marginal predictions(pred_m), marginal residuals (resid_m), subject-specific predictions(pred_ss) and subject-specific residuals (resid_ss) averaged over classes,the observation (obs) and finally the class-specific marginal andsubject-specific predictions (with the number of the latent class:pred_m_1,pred_m_2,...,pred_ss_1,pred_ss_2,...). If
var.time
is specified, the corresponding measurement time is also included.- pprob
table ofposterior classification and posterior individual class-membershipprobabilities based on the longitudinal data and the time-to-event data
- pprobY
table of posterior classification and posterior individualclass-membership probabilities based only on the longitudinal data
- predRE
table containing individual predictions of the random-effects:a column per random-effect, a line per subject
- cholesky
vectorcontaining the estimates of the Cholesky transformed parameters of thevariance-covariance matrix of the random-effects
- scoretest
Statisticof the Score Test for the conditional independence assumption of thelongitudinal and survival data given the latent class structure. Under thenull hypothesis, the statistics is a Chi-square with p degrees of freedomwhere p indicates the number of random-effects in the longitudinal mixedmodel. See Jacqmin-Gadda and Proust-Lima (2009) for more details.
- predSurv
table of predictions giving for the window of times to event(called "time"), the predicted baseline risk function in each latent class(called "RiskFct") and the predicted cumulative baseline risk function ineach latent class (called "CumRiskFct").
- hazard
internal informationabout the hazard specification used in related functions
- data
theoriginal data set (if returndata is TRUE)
Arguments
two-sided linear formula object for the fixed-effects in thelinear mixed model. The response outcome is on the left of one-sided formula object for the class-specific fixed effectsin the linear mixed model (to specify only for a number of latent classesgreater than 1). Among the list of covariates included in optional one-sided formula for the random-effects in thelinear mixed model. Covariates with a random-effect are separated by name of the covariate representing the grouping structure(called subject identifier) specified with ''. optional one-sided formula describing the covariates in theclass-membership multinomial logistic model. Covariates included areseparated by optional number of latent classes considered. If optional logical for the structure of the variance-covariancematrix of the random-effects. If optional logical indicating if the variance-covariance of therandom-effects is class-specific. If two-sided formula object. The left side of the formulacorresponds to a optional family of hazard function assumed for the survivalmodel. By default, "Weibull" specifies a Weibull baseline risk function.Other possibilities are "piecewise" for a piecewise constant risk functionor "splines" for a cubic M-splines baseline risk function. For these twolatter families, the number of nodes and the location of the nodes should bespecified as well, separated by optional indicator for the type of baseline risk functionwhen ng>1. By default "Specific" indicates a class-specific baseline riskfunction. Other possibilities are "PH" for a baseline risk functionproportional in each latent class, and "Common" for a baseline risk functionthat is common over classes. In the presence of competing events, a vectorof hazardtypes should be given. optional vector containing interior nodes if optional vector indicating the range of the survival times (that isthe minimum and maximum). By default, the range is defined according to theminimum and maximum observed values of the survival times. The option should beused only for piecewise constant and Splines hazard functions. optional vector containing an intermediate timecorresponding to a change in the risk of event. This time-dependentcovariate can only take the form of a time variable with the assumption thatthere is no effect on the risk before this time and a constant effect on therisk of event after this time (example: initiation of a treatment to accountfor). optional family of link functions to estimate. By default,"linear" option specifies a linear link function leading to a standardlinear mixed model (hom*ogeneous or heterogeneous as estimated in optional vector of interior nodes. This argument is onlyrequired for a I-splines link function with nodes entered manually. optional definite positive real used to rescale the marker in(0,1) when the beta link function is used. By default, epsY=0.5. optional vector indicating the range of the outcome (that isthe minimum and maximum). By default, the range is defined according to theminimum and maximum observed values of the outcome. The option should beused only for Beta and Splines transformations. optional brownian motion or autoregressive process modeling thecorrelation between the observations. "BM" or "AR" should be specified,followed by the time variable between brackets. By default, no correlationis added.~
and thecovariates are separated by +
on the right of the ~
. Bydefault, an intercept is included. If no intercept, -1
should be thefirst term included on the right of ~
.fixed
, thecovariates with class-specific regression parameters are entered inmixture
separated by +
. By default, an intercept is included.If no intercept, -1
should be the first term included.+
. By default, an intercept is included. If no intercept, -1
should be the first term included.+
. No intercept should be included in this formula.ng=1
(bydefault) no mixture
nor classmb
should be specified. Ifng>1
, mixture
is required.FALSE
, a non structured matrix ofvariance-covariance is considered (by default). If TRUE
a diagonalmatrix of variance-covariance is considered.FALSE
the variance-covariancematrix is common over latent classes (by default). If TRUE
aclass-specific proportional parameter multiplies the variance-covariancematrix in each class (the proportional parameter in the last latent classequals 1 to ensure identifiability).surv()
object of type "counting" for right-censoredand left-truncated data (example: Surv(Time,EntryTime,Indicator)
) orof type "right" for right-censored data (example:Surv(Time,Indicator)
). Multiple causes of event can be considered inthe Indicator (0 for censored, k for cause k of event). The right side ofthe formula specifies the names of covariates to include in the survivalmodel with mixture()
when the effect is class-specific (example:Surv(Time,Indicator) ~
X1 + mixture(X2)
for a class-commoneffect of X1 and a class-specific effect of X2). In the presence ofcompeting events, covariate effects are common by default. Codecause(X3)
specifies a cause-specific covariate effect for X3 on eachcause of event while cause1(X3)
(or cause2(X3)
, ...) specifiesa cause-specific effect of X3 on the first (or second, ...) cause only.-
. The number of nodes is enteredfirst followed by -
, then the location is specified with "equi","quant" or "manual" for respectively equidistant nodes, nodes at quantilesof the times of event distribution or interior nodes entered manually inargument hazardnodes
. It is followed by -
and finally"piecewise" or "splines" indicates the family of baseline risk functionconsidered. Examples include "5-equi-splines" for M-splines with 5equidistant nodes, "6-quant-piecewise" for piecewise constant risk over 5intervals and nodes defined at the quantiles of the times of eventsdistribution and "9-manual-splines" for M-splines risk function with 9nodes, the vector of 7 interior nodes being entered in the argumenthazardnodes
. In the presence of competing events, a vector of hazardsshould be provided such as hazard=c("Weibull","splines"
with 2 causesof event, the first one modelled by a Weibull baseline cause-specific riskfunction and the second one by splines.splines
or piecewise
is specified for the baseline hazardfunction in hazard
.hlme
). Other possibilities include "beta" for estimating a linkfunction from the family of Beta cumulative distribution functions,"thresholds" for using a threshold model to describe the correspondencebetween each level of an ordinal outcome and the underlying latent process,and "Splines" for approximating the link function by I-splines. For thislatter case, the number of nodes and the nodes location should be alsospecified. The number of nodes is first entered followed by -
, thenthe location is specified with "equi", "quant" or "manual" for respectivelyequidistant nodes, nodes at quantiles of the marker distribution or interiornodes entered manually in argument intnodes
. It is followed by-
and finally "splines" is indicated. For example, "7-equi-splines"means I-splines with 7 equidistant nodes, "6-quant-splines" means I-splineswith 6 nodes located at the quantiles of the marker distribution and"9-manual-splines" means I-splines with 9 nodes, the vector of 7 interiornodes being entered in the argument intnodes
.
optional data frame containing the variables named infixed
, mixture
, random
, classmb
andsubject
.
optional specification for the initial values for the parameters.Three options are allowed: (1) a vector of initial values is entered (theorder in which the parameters are included is detailed in details
section). (2) nothing is specified. A preliminary analysis involving theestimation of a standard linear mixed model is performed to choose initialvalues. (3) when ng>1, a Jointlcmm object is entered. It should correspondto the exact same structure of model but with ng=1. The program willautomatically generate initial values from this model. This specificationavoids the preliminary analysis indicated in (2) Note that due to possiblelocal maxima, the B
vector should be specified and several differentstarting points should be tried.
optional threshold for the convergence criterion based on theparameter stability. By default, convB=0.0001.
optional threshold for the convergence criterion based on thelog-likelihood stability. By default, convL=0.0001.
optional threshold for the convergence criterion based on thederivatives. By default, convG=0.0001.
optional maximum number of iterations for the Marquardtiterative algorithm. By default, maxiter=150.
optional number of points for the predicted survival curves andpredicted baseline risk curves. By default, nsim=100.
optional name of a covariate containing a prior informationabout the latent class membership. The covariate should be an integer withvalues in 0,1,...,ng. Value O indicates no prior for the subject while avalue in 1,...,ng indicates that the subject belongs to the correspondinglatent class.
optional vector specifying the names of the covariates containing theprior probabilities to belong to each latent class. These probabilities should bebetween 0 and 1 and should sum up to 1 for each subject.
optional boolean indicating whether an exponential(logscale=TRUE) or a square (logscale=FALSE -by default) transformation isused to ensure positivity of parameters in the baseline risk functions. Seedetails section
a specification of the rows to be used: defaults to all rows.This can be any valid indexing vector for the rows of data or if that is notsupplied, a data frame made up of the variable used in formula.
Integer indicating how NAs are managed. The default is 1for 'na.omit'. The alternative is 2 for 'na.fail'. Other options such as'na.pass' or 'na.exclude' are not implemented in the current version.
Optional vector specifying the indices in vector B of theparameters that should not be estimated. Default to NULL, all parameters areestimated.
optional logical for Piecewise and Splines baseline riskfunctions and Splines link functions only. Indicates whether the parameters of thebaseline risk or link functions can be dropped from the Hessian matrix todefine convergence criteria.
logical indicating if information about computation should bereported. Default to TRUE.
logical indicating if data used for computation should bereturned. Default to FALSE, data are not returned.
optional character indicating the name of the time variable.
the number cores for parallel computation.Default to 1 (sequential mode).
optional character indicating the type of cluster for parallel computation.
Author
Cecile Proust Lima, Amadou Diakite and Viviane Philipps cecile.proust-lima@inserm.fr
Details
A. BASELINE RISK FUNCTIONS
For the baseline risk functions, the following parameterizations wereconsidered. Be careful, parametrisations changed in lcmm_V1.5:
1. With the "Weibull" function: 2 parameters are necessary w_1 and w_2 sothat the baseline risk function a_0(t) = w_1^2*w_2^2*(w_1^2*t)^(w_2^2-1) iflogscale=FALSE and a_0(t) = exp(w_1)*exp(w_2)(t)^(exp(w_2)-1) iflogscale=TRUE.
2. with the "piecewise" step function and nz nodes (y_1,...y_nz), nz-1parameters are necesssary p_1,...p_nz-1 so that the baseline risk functiona_0(t) = p_j^2 for y_j < t =< y_j+1 if logscale=FALSE and a_0(t) = exp(p_j)for y_j < t =< y_j+1 if logscale=TRUE.
3. with the "splines" function and nz nodes (y_1,...y_nz), nz+2 parametersare necessary s_1,...s_nz+2 so that the baseline risk function a_0(t) =sum_j s_j^2 M_j(t) if logscale=FALSE and a_0(t) = sum_j exp(s_j) M_j(t) iflogscale=TRUE where M_j is the basis of cubic M-splines.
Two parametrizations of the baseline risk function are proposed(logscale=TRUE or FALSE) because in some cases, especially when theinstantaneous risks are very close to 0, some convergence problems mayappear with one parameterization or the other. As a consequence, werecommend to try the alternative parameterization (changing logscale option)when a joint latent class model does not converge (maximum number ofiterations reached) where as convergence criteria based on the parametersand likelihood are small.
B. THE VECTOR OF PARAMETERS B
The parameters in the vector of initial values B
or in the vector ofmaximum likelihood estimates best
are included in the followingorder: (1) ng-1 parameters are required for intercepts in the latent classmembership model, and if covariates are included in classmb
, ng-1parameters should be entered for each one; (2) parameters for the baselinerisk function: 2 parameters for each Weibull, nz-1 for each piecewiseconstant risk and nz+2 for each splines risk; this number should bemultiplied by ng if specific hazard is specified; otherwise, ng-1 additionalproportional effects are expected if PH hazard is specified; otherwisenothing is added if common hazard is specified. In the presence of competingevents, the number of parameters should be adapted to the number of causesof event; (3) for all covariates in survival
, ng parameters arerequired if the covariate is inside a mixture()
, otherwise 1parameter is required. Covariates parameters should be included in the sameorder as in survival
. In the presence of cause-specific effects, thenumber of parameters should be multiplied by the number of causes; (4) forall covariates in fixed
, one parameter is required if the covariateis not in mixture
, ng parameters are required if the covariate isalso in mixture
. Parameters should be included in the same order asin fixed
; (5) the variance of each random-effect specified inrandom
(including the intercept) if idiag=TRUE
and theinferior triangular variance-covariance matrix of all the random-effects ifidiag=FALSE
; (6) only if nwg=TRUE
, ng-1 parameters forclass-specific proportional coefficients for the variance covariance matrixof the random-effects; (7) the variance of the residual error.
C. CAUTION
Some caution should be made when using the program:
(1) As the log-likelihood of a latent class model can have multiple maxima,a careful choice of the initial values is crucial for ensuring convergencetoward the global maximum. The program can be run without entering thevector of initial values (see point 2). However, we recommend tosystematically enter initial values in B
and try different sets ofinitial values.
(2) The automatic choice of initial values that we provide requires theestimation of a preliminary linear mixed model. The user should be awarethat first, this preliminary analysis can take time for large datatsets andsecond, that the generated initial values can be very not likely and evenmay converge slowly to a local maximum. This is a reason why severalalternatives exist. The vector of initial values can be directly specifiedin B
the initial values can be generated (automatically or randomly)from a model with ng=
. Finally, function gridsearch
performsan automatic grid search.
(3) Convergence criteria are very strict as they are based on derivatives ofthe log-likelihood in addition to the parameter and log-likelihoodstability. In some cases, the program may not converge and reach themaximum number of iterations fixed at 150. In this case, the user shouldcheck that parameter estimates at the last iteration are not on theboundaries of the parameter space. If the parameters are on the boundariesof the parameter space, the identifiability of the model is critical. Thismay happen especially when baseline risk functions involve splines (valueclose to the lower boundary - 0 with logscale=F -infinity with logscale=F)or classmb parameters that are too high or low (perfect classification) orlinkfunction parameters. When identifiability of some parameters issuspected, the program can be run again from the former estimates by fixingthe suspected parameters to their value with option posfix. This usuallysolves the problem. An alternative is to remove the parameters of the Betaof Splines link function from the inverse of the Hessian with optionpartialH. If not, the program should be run again with other initialvalues. Some problems of convergence may happen when the instantaneousrisks of event are very low and "piecewise" or "splines" baseline riskfunctions are specified. In this case, changing the parameterization of thebaseline risk functions with option logscale is recommended (see paragraph Afor details).
References
Proust-Lima C, Philipps V, Liquet B (2017). Estimation of Extended Mixed Models Using Latent Classes and Latent Processes: The R Package lcmm. Journal of Statistical Software, 78(2), 1-56. doi:10.18637/jss.v078.i02
Lin, H., Turnbull, B. W., McCulloch, C. E. and Slate, E. H. (2002). Latentclass models for joint analysis of longitudinal biomarker and event processdata: application to longitudinal prostate-specific antigen readings andprostate cancer. Journal of the American Statistical Association 97, 53-65.
Proust-Lima, C. and Taylor, J. (2009). Development and validation of adynamic prognostic tool for prostate cancer recurrence using repeatedmeasures of post-treatment PSA: a joint modelling approach. Biostatistics10, 535-49.
Jacqmin-Gadda, H. and Proust-Lima, C. (2010). Score test for conditionalindependence between longitudinal outcome and time-to-event given theclasses in the joint latent class model. Biometrics 66(1), 11-9
Proust-Lima, Sene, Taylor and Jacqmin-Gadda (2014). Joint latent classmodels of longitudinal and time-to-event data: a review. Statistical Methodsin Medical Research 23, 74-90.
See Also
postprob
, plot.Jointlcmm
,plot.predict
, epoce