Example data: 634 older adults (age 80-100) self-reporting on 7 items assessing the Instrumental Activities of Daily Living (IADL) as follows:
Two versions of a response format were available:
Binary -> 0 = “needs help”, 1 = “does not need help”
Categorical -> 0 = “can’t do it”, 1=”big problems”, 2=”some problems”, 3=”no problems”
Higher scores indicate greater function. We will look at each response format in turn.
if (require(lavaan) == FALSE){
install.packages("lavaan")
}
library(lavaan)
The data are in a text file named adl.dat
orignally used in Mplus (so no column names were included at the top of the file). The file contains more items than we will use, so we select only items above from the whole file.
#read in data file (Mplus file format so having to label columns)
adlData = read.table(file = "adl.dat", header = FALSE, na.strings = ".", col.names = c("case", paste0("dpa", 1:14), paste0("dia", 1:7), paste0("cpa", 1:14), paste0("cia", 1:7)))
#select Situations items and PersonID variables
iadlDataInit = adlData[c(paste0("cia", 1:7))]
#remove cases with all items missing
removeCases = which(apply(X = iadlDataInit, MARGIN = 1, FUN = function (x){ length(which(is.na(x)))}) == 7)
iadlData = iadlDataInit[-removeCases,]
grm2Psyntax = "
IADL =~ cia1 + cia2 + cia3 + cia4 + cia5 + cia6 + cia7
"
grm2Pestimates = cfa(model = grm2Psyntax, data = iadlData, ordered = c("cia1", "cia2", "cia3", "cia4", "cia5", "cia6", "cia7"), std.lv = TRUE, parameterization="theta")
lavaan WARNING: 18 bivariate tables have empty cells; to see them, use:
lavInspect(fit, "zero.cell.tables")
summary(grm2Pestimates, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)
lavaan (0.5-23.1097) converged normally after 105 iterations
Used Total
Number of observations 546 634
Estimator DWLS Robust
Minimum Function Test Statistic 40.866 88.144
Degrees of freedom 14 14
P-value (Chi-square) 0.000 0.000
Scaling correction factor 0.483
Shift parameter 3.497
for simple second-order correction (Mplus variant)
Model test baseline model:
Minimum Function Test Statistic 48788.234 24684.141
Degrees of freedom 21 21
P-value 0.000 0.000
User model versus baseline model:
Comparative Fit Index (CFI) 0.999 0.997
Tucker-Lewis Index (TLI) 0.999 0.995
Robust Comparative Fit Index (CFI) NA
Robust Tucker-Lewis Index (TLI) NA
Root Mean Square Error of Approximation:
RMSEA 0.059 0.099
90 Percent Confidence Interval 0.039 0.081 0.079 0.119
P-value RMSEA <= 0.05 0.211 0.000
Robust RMSEA NA
90 Percent Confidence Interval NA NA
Standardized Root Mean Square Residual:
SRMR 0.027 0.027
Weighted Root Mean Square Residual:
WRMR 0.986 0.986
Parameter Estimates:
Information Expected
Standard Errors Robust.sem
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
IADL =~
cia1 3.450 0.301 11.470 0.000 3.450 0.960
cia2 3.620 0.507 7.139 0.000 3.620 0.964
cia3 3.026 0.292 10.361 0.000 3.026 0.949
cia4 3.310 0.300 11.030 0.000 3.310 0.957
cia5 2.360 0.190 12.412 0.000 2.360 0.921
cia6 1.943 0.182 10.680 0.000 1.943 0.889
cia7 1.044 0.135 7.724 0.000 1.044 0.722
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.cia1 0.000 0.000 0.000
.cia2 0.000 0.000 0.000
.cia3 0.000 0.000 0.000
.cia4 0.000 0.000 0.000
.cia5 0.000 0.000 0.000
.cia6 0.000 0.000 0.000
.cia7 0.000 0.000 0.000
IADL 0.000 0.000 0.000
Thresholds:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
cia1|t1 -4.946 0.385 -12.838 0.000 -4.946 -1.377
cia1|t2 -3.636 0.324 -11.233 0.000 -3.636 -1.012
cia1|t3 -0.900 0.219 -4.111 0.000 -0.900 -0.251
cia2|t1 -5.502 0.658 -8.356 0.000 -5.502 -1.465
cia2|t2 -4.797 0.605 -7.928 0.000 -4.797 -1.277
cia2|t3 -2.973 0.463 -6.421 0.000 -2.973 -0.792
cia3|t1 -4.426 0.362 -12.226 0.000 -4.426 -1.389
cia3|t2 -3.702 0.331 -11.175 0.000 -3.702 -1.162
cia3|t3 -2.050 0.262 -7.821 0.000 -2.050 -0.643
cia4|t1 -4.490 0.355 -12.647 0.000 -4.490 -1.298
cia4|t2 -3.100 0.282 -10.977 0.000 -3.100 -0.897
cia4|t3 -1.098 0.225 -4.881 0.000 -1.098 -0.317
cia5|t1 -4.098 0.281 -14.607 0.000 -4.098 -1.599
cia5|t2 -2.195 0.190 -11.578 0.000 -2.195 -0.856
cia5|t3 -0.618 0.151 -4.089 0.000 -0.618 -0.241
cia6|t1 -3.605 0.255 -14.125 0.000 -3.605 -1.650
cia6|t2 -2.558 0.203 -12.628 0.000 -2.558 -1.171
cia6|t3 -1.531 0.171 -8.974 0.000 -1.531 -0.701
cia7|t1 -3.311 0.268 -12.345 0.000 -3.311 -2.291
cia7|t2 -2.695 0.199 -13.556 0.000 -2.695 -1.864
cia7|t3 -1.788 0.145 -12.304 0.000 -1.788 -1.237
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.cia1 1.000 1.000 0.078
.cia2 1.000 1.000 0.071
.cia3 1.000 1.000 0.098
.cia4 1.000 1.000 0.084
.cia5 1.000 1.000 0.152
.cia6 1.000 1.000 0.209
.cia7 1.000 1.000 0.479
IADL 1.000 1.000 1.000
Scales y*:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
cia1 0.278 0.278 1.000
cia2 0.266 0.266 1.000
cia3 0.314 0.314 1.000
cia4 0.289 0.289 1.000
cia5 0.390 0.390 1.000
cia6 0.458 0.458 1.000
cia7 0.692 0.692 1.000
R-Square:
Estimate
cia1 0.922
cia2 0.929
cia3 0.902
cia4 0.916
cia5 0.848
cia6 0.791
cia7 0.521
Latent Variables: Estimate Std.Err z-value P(>|z|) Std.lv Std.all IADL =~
cia1 3.450 0.301 11.470 0.000 3.450 0.960 cia2 3.620 0.507 7.139 0.000 3.620 0.964 cia3 3.026 0.292 10.361 0.000 3.026 0.949 cia4 3.310 0.300 11.030 0.000 3.310 0.957 cia5 2.360 0.190 12.412 0.000 2.360 0.921 cia6 1.943 0.182 10.680 0.000 1.943 0.889 cia7 1.044 0.135 7.724 0.000 1.044 0.722
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
cia1|t1 -4.946 0.385 -12.838 0.000 -4.946 -1.377
cia1|t2 -3.636 0.324 -11.233 0.000 -3.636 -1.012
cia1|t3 -0.900 0.219 -4.111 0.000 -0.900 -0.251
cia2|t1 -5.502 0.658 -8.356 0.000 -5.502 -1.465
cia2|t2 -4.797 0.605 -7.928 0.000 -4.797 -1.277
cia2|t3 -2.973 0.463 -6.421 0.000 -2.973 -0.792
cia3|t1 -4.426 0.362 -12.226 0.000 -4.426 -1.389
cia3|t2 -3.702 0.331 -11.175 0.000 -3.702 -1.162
cia3|t3 -2.050 0.262 -7.821 0.000 -2.050 -0.643
cia4|t1 -4.490 0.355 -12.647 0.000 -4.490 -1.298
cia4|t2 -3.100 0.282 -10.977 0.000 -3.100 -0.897
cia4|t3 -1.098 0.225 -4.881 0.000 -1.098 -0.317
cia5|t1 -4.098 0.281 -14.607 0.000 -4.098 -1.599
cia5|t2 -2.195 0.190 -11.578 0.000 -2.195 -0.856
cia5|t3 -0.618 0.151 -4.089 0.000 -0.618 -0.241
cia6|t1 -3.605 0.255 -14.125 0.000 -3.605 -1.650
cia6|t2 -2.558 0.203 -12.628 0.000 -2.558 -1.171
cia6|t3 -1.531 0.171 -8.974 0.000 -1.531 -0.701
cia7|t1 -3.311 0.268 -12.345 0.000 -3.311 -2.291
cia7|t2 -2.695 0.199 -13.556 0.000 -2.695 -1.864
cia7|t3 -1.788 0.145 -12.304 0.000 -1.788 -1.237
Logit = 1.7*probit, or Probit = Logit/1.7
IFA model: Probit(y=1) = –threshold + loading(Theta)
Threshold = expected probit of (y=0) for someone with Theta=0
When *-1, threshold \(\rightarrow\) intercept: expected probit for (y=1) instead
Loading = regression of item probit on Theta
For 4-category responses, the sub-models look like this: Probit(y= 0 vs 123) = -threshold$1 + loading(Theta) Probit(y= 01 vs 23) = -threshold$2 + loading(Theta) Probit y= 012 vs 3) = -threshold$3 + loading(Theta)
IRT model: Probit(y) = a(theta – difficulty) a = discrimination (rescaled slope) = loading b = difficulty (location on latent metric) = threshold/loading
For 4-category responses, the sub-models look like this: $1 Probit(y= 0 vs 123) = a(theta – difficulty$1) $2 Probit(y= 01 vs 23) = a(theta – difficulty$2) $3 Probit(y= 012 vs 3) = a(theta – difficulty$3)
LOCAL FIT VIA RAW RESIDUAL CORRELATIONS LEFTOVER POLYCHORIC CORRELATION (HOW FAR OFF FROM DATA)
With raw residuals, the scale is that of correlation (different from normalized)
resid(object = grm2Pestimates)
$type
[1] "raw"
$cov
cia1 cia2 cia3 cia4 cia5 cia6 cia7
cia1 0.000
cia2 0.011 0.000
cia3 0.011 0.019 0.000
cia4 -0.004 -0.029 -0.031 0.000
cia5 -0.032 -0.045 -0.073 0.030 0.000
cia6 -0.034 -0.041 -0.032 0.016 0.038 0.000
cia7 -0.033 -0.012 0.008 0.026 -0.024 0.032 0.000
$mean
cia1 cia2 cia3 cia4 cia5 cia6 cia7
0 0 0 0 0 0 0
$th
cia1|t1 cia1|t2 cia1|t3 cia2|t1 cia2|t2 cia2|t3 cia3|t1 cia3|t2 cia3|t3 cia4|t1 cia4|t2 cia4|t3 cia5|t1 cia5|t2 cia5|t3 cia6|t1 cia6|t2
4.442409e-07 2.380854e-07 -5.151503e-07 5.839083e-07 4.231467e-07 -5.816415e-07 8.124267e-08 -2.486851e-07 4.924815e-07 -8.218725e-08 1.260747e-07 2.750163e-07 -1.837495e-08 -2.242499e-08 1.384602e-07 -4.543028e-08 1.239583e-07
cia6|t3 cia7|t1 cia7|t2 cia7|t3
-2.758846e-07 4.511172e-07 2.015248e-07 1.683076e-07
lavaanCatItemPlot = function(lavObject, varname, sds = 3){
output = inspect(object = lavObject, what = "est")
if (!varname %in% rownames(output$lambda)) stop(paste(varname, "not found in lavaan object"))
if (dim(output$lambda)[2]>1) stop("plots only given for one factor models")
itemloading = output$lambda[which(rownames(output$lambda) == varname),1]
itemthresholds = output$tau[grep(pattern = varname, x = rownames(output$tau))]
factorname = colnames(output$lambda)
factormean = output$alpha[which(rownames(output$alpha) == factorname)]
factorvar = output$psi[which(rownames(output$psi) == factorname)]
factormin = factormean - 3*sqrt(factorvar)
factormax = factormean + 3*sqrt(factorvar)
factorX = seq(factormin, factormax, .01)
itemloc = which(lavObject@Data@ov$name == varname)
itemlevels = unlist(strsplit(x = lavObject@Data@ov$lnam[itemloc], split = "\\|"))
if (length(itemthresholds)>1){
plotdata = NULL
plotdata2 = NULL
itemY = NULL
itemY2 = NULL
itemX = NULL
itemText = NULL
for (level in 1:length(itemthresholds)){
itemY = pnorm(q = -1*itemthresholds[level] + itemloading*factorX)
itemY2 = cbind(itemY2, pnorm(q = -1*itemthresholds[level] + itemloading*factorX))
itemText = paste0("P(", varname, " > ", itemlevels[level], ")")
itemText2 = paste0("P(", varname, " = ", itemlevels[level], ")")
plotdata = rbind(plotdata, data.frame(factor = factorX, prob = itemY, plot = itemText))
if (level == 1){
plotdata2 = data.frame(factor = factorX, plot = itemText2, prob = matrix(1, nrow = dim(itemY2)[1], ncol=1) - itemY2[,level])
} else if (level == length(itemthresholds)){
plotdata2 = rbind(plotdata2, data.frame(factor = factorX, plot = itemText2, prob = itemY2[,level-1] - itemY2[,level]))
plotdata2 = rbind(plotdata2, data.frame(factor = factorX, plot = paste0("P(", varname, " = ", itemlevels[level+1], ")"), prob = itemY2[,level]))
} else {
plotdata2 = rbind(plotdata2, data.frame(factor = factorX, plot = itemText2, prob = itemY2[,level-1] - itemY2[,level]))
}
}
names(plotdata) = c(factorname , "Probability", "Cumulative")
ggplot(data = plotdata, aes_string(x = factorname, y = "Probability", colour = "Cumulative")) + geom_line(size = 2)
names(plotdata2) = c(factorname, "Response", "Probability")
ggplot(data = plotdata2, aes_string(x = factorname, y = "Probability", colour = "Response")) + geom_line(size = 2)
} else {
itemY = pnorm(q = -1*itemthresholds[1] + itemloading*factorX)
itemText2 = paste0("P(", varname, " = ", itemlevels[1], ")")
plotdata = data.frame(factor = factorX, prob = itemY, plot = itemText2)
names(plotdata) = c(factorname , "Probability", "Response")
ggplot(data = plotdata, aes_string(x = factorname, y = "Probability", colour = "Response")) + geom_line(size = 2)
}
}
lavaanCatItemPlot(lavObject = grm2Pestimates, varname = "cia2", sds = 3)
We can also set the item discriminations equal for a 1PL-like model:
grm1Psyntax = "
IADL =~ a*cia1 + a*cia2 + a*cia3 + a*cia4 + a*cia5 + a*cia6 + a*cia7
"
grm1Pestimates = cfa(model = grm1Psyntax, data = iadlData, ordered = c("cia1", "cia2", "cia3", "cia4", "cia5", "cia6", "cia7"), std.lv = TRUE, parameterization="theta")
lavaan WARNING: 18 bivariate tables have empty cells; to see them, use:
lavInspect(fit, "zero.cell.tables")
summary(grm1Pestimates, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)
lavaan (0.5-23.1097) converged normally after 32 iterations
Used Total
Number of observations 546 634
Estimator DWLS Robust
Minimum Function Test Statistic 196.654 181.733
Degrees of freedom 20 20
P-value (Chi-square) 0.000 0.000
Scaling correction factor 1.128
Shift parameter 7.402
for simple second-order correction (Mplus variant)
Model test baseline model:
Minimum Function Test Statistic 48788.234 24684.141
Degrees of freedom 21 21
P-value 0.000 0.000
User model versus baseline model:
Comparative Fit Index (CFI) 0.996 0.993
Tucker-Lewis Index (TLI) 0.996 0.993
Robust Comparative Fit Index (CFI) NA
Robust Tucker-Lewis Index (TLI) NA
Root Mean Square Error of Approximation:
RMSEA 0.127 0.122
90 Percent Confidence Interval 0.111 0.144 0.106 0.138
P-value RMSEA <= 0.05 0.000 0.000
Robust RMSEA NA
90 Percent Confidence Interval NA NA
Standardized Root Mean Square Residual:
SRMR 0.105 0.105
Weighted Root Mean Square Residual:
WRMR 2.164 2.164
Parameter Estimates:
Information Expected
Standard Errors Robust.sem
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
IADL =~
cia1 (a) 2.852 0.154 18.566 0.000 2.852 0.944
cia2 (a) 2.852 0.154 18.566 0.000 2.852 0.944
cia3 (a) 2.852 0.154 18.566 0.000 2.852 0.944
cia4 (a) 2.852 0.154 18.566 0.000 2.852 0.944
cia5 (a) 2.852 0.154 18.566 0.000 2.852 0.944
cia6 (a) 2.852 0.154 18.566 0.000 2.852 0.944
cia7 (a) 2.852 0.154 18.566 0.000 2.852 0.944
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.cia1 0.000 0.000 0.000
.cia2 0.000 0.000 0.000
.cia3 0.000 0.000 0.000
.cia4 0.000 0.000 0.000
.cia5 0.000 0.000 0.000
.cia6 0.000 0.000 0.000
.cia7 0.000 0.000 0.000
IADL 0.000 0.000 0.000
Thresholds:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
cia1|t1 -4.161 0.197 -21.096 0.000 -4.161 -1.377
cia1|t2 -3.059 0.199 -15.404 0.000 -3.059 -1.012
cia1|t3 -0.757 0.173 -4.382 0.000 -0.757 -0.251
cia2|t1 -4.428 0.192 -23.062 0.000 -4.428 -1.465
cia2|t2 -3.860 0.189 -20.430 0.000 -3.860 -1.277
cia2|t3 -2.392 0.197 -12.172 0.000 -2.392 -0.792
cia3|t1 -4.197 0.193 -21.701 0.000 -4.197 -1.389
cia3|t2 -3.510 0.198 -17.755 0.000 -3.510 -1.162
cia3|t3 -1.943 0.188 -10.340 0.000 -1.943 -0.643
cia4|t1 -3.924 0.198 -19.846 0.000 -3.924 -1.298
cia4|t2 -2.709 0.195 -13.921 0.000 -2.709 -0.897
cia4|t3 -0.959 0.175 -5.485 0.000 -0.959 -0.317
cia5|t1 -4.831 0.259 -18.655 0.000 -4.831 -1.599
cia5|t2 -2.587 0.197 -13.150 0.000 -2.587 -0.856
cia5|t3 -0.728 0.170 -4.274 0.000 -0.728 -0.241
cia6|t1 -4.987 0.262 -19.018 0.000 -4.987 -1.650
cia6|t2 -3.537 0.206 -17.180 0.000 -3.537 -1.171
cia6|t3 -2.117 0.189 -11.208 0.000 -2.117 -0.701
cia7|t1 -6.922 0.474 -14.594 0.000 -6.922 -2.291
cia7|t2 -5.634 0.332 -16.985 0.000 -5.634 -1.864
cia7|t3 -3.738 0.249 -15.017 0.000 -3.738 -1.237
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.cia1 1.000 1.000 0.110
.cia2 1.000 1.000 0.110
.cia3 1.000 1.000 0.110
.cia4 1.000 1.000 0.110
.cia5 1.000 1.000 0.110
.cia6 1.000 1.000 0.110
.cia7 1.000 1.000 0.110
IADL 1.000 1.000 1.000
Scales y*:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
cia1 0.331 0.331 1.000
cia2 0.331 0.331 1.000
cia3 0.331 0.331 1.000
cia4 0.331 0.331 1.000
cia5 0.331 0.331 1.000
cia6 0.331 0.331 1.000
cia7 0.331 0.331 1.000
R-Square:
Estimate
cia1 0.890
cia2 0.890
cia3 0.890
cia4 0.890
cia5 0.890
cia6 0.890
cia7 0.890
Similarly, we can use the anova()
function to compare model fit:
anova(grm1Pestimates, grm2Pestimates)
Scaled Chi Square Difference Test (method = "satorra.2000")
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
grm2Pestimates 14 40.866
grm1Pestimates 20 196.654 67.889 3.9116 5.508e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1