Structural equation modeling (SEM) is a general modeling framework that incorporates many common statistical methods, including regression, analysis of variance (ANOVA), confirmatory factor analysis, and simultaneous equations.
35 SEM has advanced to include the simultaneous estimation of generalized linear equations (GSEM), such as logistic, Poisson, and Cox proportional hazards regression models.
36 GSEM offers several advantages over traditional analytic methodology. First, as demonstrated in the present analysis, it allows for the estimation of multiple equations simultaneously, so that associations between multiple predictor and outcome variables can be assessed in the same model—even when the distribution of outcome measures varies from dichotomous (e.g., VI), to ordinal (e.g., self-rated health), to Poisson (e.g., disability indicators), to continuous, to time-dependent (e.g., mortality) events. Second, constructs such as disability can be estimated net of random measurement error. Theoretically meaningful constructs can be developed by using latent variables (or factors), which are unobserved variables that are indirectly measured by multiple observed variables through a system of equations. Heuristically, the multiple observed variables are optimally combined into a composite representing the latent variable. This method has great potential for improving the measurement quality of health data collected in surveys, given the ability to adjust for random measurement error. Third, GSEM provides a powerful tool for the assessment of mediation effects (indirect pathways). Mediation is estimated and tested in a single step, with potentially more statistical power than traditional multistep methods.
37 Finally, GSEM software (e.g., M-Plus
38 ) can incorporate sample weights and the complex sample survey design (clustering and stratification) into the analysis. This advance permits the appropriate application of GSEM to complex sample survey data including the NHIS.
The GSEM depicted in
Figure 1was fit to the data with the aforementioned additional covariate controls.
36 The equations for the VI and self-rated health outcomes are logistic regressions, that for the disability latent variable outcome is ordinary linear regression, and that for the mortality outcome is a Cox proportional hazards regression. The disability latent factor is continuous, but the paths linking disability to the variables days in bed and days of restricted-activity are Poisson. The disability latent variable combines these variables while removing the random measurement error from each. All equations are estimated simultaneously by using a weighted maximum-likelihood estimator,
39 with standard errors that are corrected for all features of the complex sample features of the NHIS.
40
First, the model was estimated without the mortality outcome, to obtain traditional SEM estimated fit statistics for the model without mortality (model 1).
35 41 42 Second, the mortality outcome was added, and estimates were provided from this model (model 2). Third, model 2 was re-estimated, treating the ordinal self-rated health variable as continuous to calculate indirect effects on mortality through self-rated health. Results from this model were identical with the results in model 2, in which self-rated health was treated as ordinal with the exception of a slight change in the effects of nonocular conditions on disability and the change from logit to linear parameters for the self-rated health outcome. Finally, the mortality outcome is rescaled from time intervals in days to time intervals in years, to obtain baseline hazards estimates (model 3). Results from model 3 were the same as those from model 2 for all outcomes except mortality. Results from the mortality equations were substantively the same in size and significance of effects.
Indirect or mediation effects are calculated by multiplying the two parameters involved in the mediation relationship.
35 43 For example, to obtain the effect of VI on mortality through the disability mechanism, the raw coefficients for the effect of VI on disability are multiplied by the effect of disability on mortality. The new parameter is exponentiated to obtain the hazard ratio (HR) for the indirect effect. Total effects are the effects of the independent variable (VI) on mortality through all pathways, including the direct and all mediation pathways after adjustment for model covariates. Total effects are calculated by summing the raw coefficients for the direct effects together with the indirect effects. Standard errors for both indirect and total effects were obtained by using the delta method.
44
Descriptive and model-based analyses were completed with adjustments for sample weights and design effects (SUDAAN 9.0, 2004
45 ; Research Triangle Institute, Research Triangle Park, NC, and M-Plus 4.21
38 statistical packages, respectively).