CFA Example Using Forgiveness of Situations (N = 1103)

The Forgiveness of Situations Subscale includes six items, three of which are reverse-coded, on a seven-point scale:

  1. When things go wrong for reasons that can’t be controlled, I get stuck in negative thoughts about it. (R)
  2. With time I can be understanding of bad circumstances in my life.
  3. If I am disappointed by uncontrollable circumstances in my life, I continue to think negatively about them. (R)
  4. I eventually make peace with bad situations in my life.
  5. It’s really hard for me to accept negative situations that aren’t anybody’s fault. (R)
  6. Eventually I let go of negative thoughts about bad circumstances that are beyond anyone’s control.

Response Anchors:

Package Installation and Loading

For this example, we will be using the ggplot2 (for general plotting), melt2 (for data reshaping before plotting), and lavaan (for CFA) packages.

if (require(ggplot2)==FALSE){
  install.packages("ggplot2")
}
Loading required package: ggplot2
library(ggplot2)
if (require(reshape2) == FALSE){
  install.packages("reshape2")
}
Loading required package: reshape2
if (require(lavaan) == FALSE){
  install.packages("lavaan")
}
Loading required package: lavaan
This is lavaan 0.5-23.1097
lavaan is BETA software! Please report any bugs.
library(lavaan)
if (require(psych) == FALSE){
  install.packages("psych")
}
Loading required package: psych

Attaching package: ‘psych’

The following object is masked from ‘package:lavaan’:

    cor2cov

The following objects are masked from ‘package:ggplot2’:

    %+%, alpha
if (require(knitr) == FALSE){
  install.packages("knitr")
}
Loading required package: knitr
if (require(kableExtra) == FALSE){
  install.packages("kableExtra")
}
Loading required package: kableExtra

Data Import into R

The data are in a text file named Study2.dat orignally used in Mplus (so no column names were included at the top of the file). The file contains more items than we will use, so we select only the “Forgiveness of Situations” items from the whole file.

#read in data file (Mplus file format so having to label columns)
hfsData = read.table(file = "Study2.dat", header = FALSE, na.strings = "99999", col.names = c("PersonID", "Self1", "Self2r", "Self3", "Self4r", "Self5", "Self6r", "Other1r", "Other2", "Other3r", "Other4", "Other5r", "Other6", "Sit1r", "Sit2", "Sit3r", "Sit4", "Sit5r", "Sit6", "Selfsub", "Othsub", "Sitsub", "HFSsum"))
#select Situations items and PersonID variables
hfsSituations = hfsData[c("PersonID", "Sit1r", "Sit2", "Sit3r", "Sit4", "Sit5r", "Sit6")]

Observed Sample Statistics

Sample Correlation Matrix

The observed correlation matrix, rounded to three digits:

#here the c() function selects only the variables, not the PersionID variable
round(cor(hfsSituations[c("Sit1r", "Sit2", "Sit3r", "Sit4", "Sit5r", "Sit6")]), digits = 3)
      Sit1r  Sit2 Sit3r  Sit4 Sit5r  Sit6
Sit1r 1.000 0.240 0.647 0.300 0.453 0.297
Sit2  0.240 1.000 0.317 0.570 0.255 0.457
Sit3r 0.647 0.317 1.000 0.369 0.482 0.356
Sit4  0.300 0.570 0.369 1.000 0.289 0.448
Sit5r 0.453 0.255 0.482 0.289 1.000 0.304
Sit6  0.297 0.457 0.356 0.448 0.304 1.000

Sample Means and Variances

The observed means, rounded to three digits:

apply(X = hfsSituations[c("Sit1r", "Sit2", "Sit3r", "Sit4", "Sit5r", "Sit6")], MARGIN = 2, FUN = function(x) round(mean(x), digits = 3))
Sit1r  Sit2 Sit3r  Sit4 Sit5r  Sit6 
4.547 5.289 4.896 5.359 4.860 5.321 

The observed variances (using \(N\) in the denominator to match ML estimated output and Mplus example), rounded to three digits:

apply(X = hfsSituations[c("Sit1r", "Sit2", "Sit3r", "Sit4", "Sit5r", "Sit6")], MARGIN = 2, FUN = function(x) round(var(x, na.rm = TRUE)*1102/1103, digits = 3))
Sit1r  Sit2 Sit3r  Sit4 Sit5r  Sit6 
3.049 1.903 2.543 1.967 2.945 2.341 

Sample Covariances

To do a CFA analysis, you only really need means, variances, and either correlations or covariances among items. That said, modern methods of estimation use the raw data (often called full information) rather than the summary statistics as the raw data enable better missing data assumptions when using maximum likelihood and Bayesian estimation methods.

The sample covariance matrix can be found from the sample correlations and variances. Each covariance between a pair of variables \(y_1\) and \(y_2\) is denoted with a \(\sigma_{y_1, y_2}\) and each correlation is denoted with a \(\rho_{y_1, y_2}\). The variance of a variable is denoted by \(\sigma^2_{y_1}\) and the standard deviation of a variable is the square root of the variance \(\sqrt{\sigma^2_{y_1}}\). The covariance can be found by taking the correlation and multiplying it by the product of the standard deviations.

\[\sigma_{y_1, y_2} = \rho_{y_1, y_2}\sqrt{\sigma^2_{y_1}}\sqrt{\sigma^2_{y_2}}. \]

Inversely, the correlation can be found by taking the covariance and dividing it by the product of the standard deviations:

\[\rho_{y_1, y_2} = \frac{\sigma_{y_1, y_2}}{\sqrt{\sigma^2_{y_1}}\sqrt{\sigma^2_{y_2}}}. \] Again, we change the denominator from \(N-1\) to \(N\) to be consistent with the Mplus example, which calculates covariances using maximum likelihood.

round(cov(hfsSituations[c("Sit1r", "Sit2", "Sit3r", "Sit4", "Sit5r", "Sit6")])*1102/1103, digits = 3)
      Sit1r  Sit2 Sit3r  Sit4 Sit5r  Sit6
Sit1r 3.049 0.577 1.802 0.734 1.358 0.795
Sit2  0.577 1.903 0.697 1.103 0.604 0.965
Sit3r 1.802 0.697 2.543 0.824 1.319 0.868
Sit4  0.734 1.103 0.824 1.967 0.695 0.962
Sit5r 1.358 0.604 1.319 0.695 2.945 0.798
Sit6  0.795 0.965 0.868 0.962 0.798 2.341

Sample Item Response Distributions

The assumptions of CFA (i.e., normally distributed factors, no item-level factor interactions, and conditionally normal distributed items) lead to the overall assumption that our item responses must be normally distributed. From the histograms below, do you think these are normally distributed?

#stack data
melted = melt(hfsSituations, id.vars = "PersonID")
#plot by variable
ggplot(melted, aes(value)) + geom_bar() + facet_wrap(~ variable)

Lavaan Syntax

lavaan syntax is constructed using a long text string and saved in a character object. In the example below model01SyntaxLong = " opens the text string, which continues for multiple lines until the final " terminates the string. The variable model01Syntax now contains the text of the lavaan syntax. Within the syntax string, the R comment character # still functions to comment text around the syntax. Each part of the model syntax corresponds to a set of parameters in the CFA model. You can find information about lavaan at http://lavaan.ugent.be.

model01SyntaxLong = "
# Model 1 --  Fully Z-Scored Factor Identification Approach
# Item factor loadings --> list the factor to the left of the =~ and the items to the right, separated by a plus
# Once the factor is defined it can be used in syntax as if it is an observed variable
# Parameters can be fixed to constant values by using the * command; starting values can be specified by using start()*; labels can be implemented with []
Sit =~ Sit1r + Sit2 + Sit3r + Sit4 + Sit5r + Sit6
# Item intercepts --> ~ 1 indicates means or intercepts
# You can put multiple lines of syntax on a single line using a ; 
Sit1r ~ 1; Sit2 ~ 1; Sit3r  ~ 1; Sit4 ~ 1; Sit5r ~ 1; Sit6 ~ 1;
# Item error (unique) variances and covariances --> use the ~~ command
Sit1r ~~ Sit1r; Sit2 ~~ Sit2; Sit3r ~~ Sit3r; Sit4 ~~ Sit4; Sit5r ~~ Sit5r; Sit6 ~~ Sit6; 
# Factor variance
Sit ~~ Sit
# Factor mean (intercept)
Sit ~ 0
"

To run lavaan, the syntax string variable is passed to the lavaan(model = model01SyntaxLong, ...) function. In the function call, we also supply:

  • data = hfsSituations: The data frame which contains the variables in the syntax.
  • estimator = "MLR": Enables the use of robust maximum likelihood estimation, which helps for data that are not entirely normal.
  • mimic = "mplus": Ensures compatability with Mplus output, which is often needed in practice (and is certainly needed for our homework system).
  • std.lv = TRUE: Uses the Z-score method of identification, setting all latent variable means to zero, all latent variable variances to one, and estimating all factor loadings.
model01Estimates = lavaan(model = model01SyntaxLong, data = hfsSituations, estimator = "MLR", mimic = "mplus", std.lv = TRUE)

If the model estimation was successful, no errors would be displayed (and no output would be shown). All model information and results are contained within the model01Estimates object. To access a summary of the results use the summary() function with the following arugments:

  • model01Estimates: The analysis to be summarized.
  • fit.measures=TRUE: Provides model fit indices in summary.
  • rsqaure=TRUE: Provides the proportion of variance “explained” for each variable (\(R^2\)).
  • standardized = TRUE: Provides standardized estimates in the summary.
summary(model01Estimates, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)
lavaan (0.5-23.1097) converged normally after  20 iterations

  Number of observations                          1103

  Number of missing patterns                         1

  Estimator                                         ML      Robust
  Minimum Function Test Statistic              427.937     307.804
  Degrees of freedom                                 9           9
  P-value (Chi-square)                           0.000       0.000
  Scaling correction factor                                  1.390
    for the Yuan-Bentler correction (Mplus variant)

Model test baseline model:

  Minimum Function Test Statistic             1981.034    1128.693
  Degrees of freedom                                15          15
  P-value                                        0.000       0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    0.787       0.732
  Tucker-Lewis Index (TLI)                       0.645       0.553

  Robust Comparative Fit Index (CFI)                         0.787
  Robust Tucker-Lewis Index (TLI)                            0.646

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -11536.404  -11536.404
  Scaling correction factor                                  1.416
    for the MLR correction
  Loglikelihood unrestricted model (H1)     -11322.435  -11322.435
  Scaling correction factor                                  1.407
    for the MLR correction

  Number of free parameters                         18          18
  Akaike (AIC)                               23108.808   23108.808
  Bayesian (BIC)                             23198.912   23198.912
  Sample-size adjusted Bayesian (BIC)        23141.739   23141.739

Root Mean Square Error of Approximation:

  RMSEA                                          0.205       0.173
  90 Percent Confidence Interval          0.189  0.222       0.160  0.188
  P-value RMSEA <= 0.05                          0.000       0.000

  Robust RMSEA                                               0.205
  90 Percent Confidence Interval                             0.185  0.224

Standardized Root Mean Square Residual:

  SRMR                                           0.086       0.086

Parameter Estimates:

  Information                                 Observed
  Standard Errors                   Robust.huber.white

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  Sit =~                                                                
    Sit1r             1.234    0.069   17.906    0.000    1.234    0.707
    Sit2              0.702    0.074    9.441    0.000    0.702    0.509
    Sit3r             1.241    0.063   19.847    0.000    1.241    0.778
    Sit4              0.784    0.069   11.333    0.000    0.784    0.559
    Sit5r             1.023    0.053   19.179    0.000    1.023    0.596
    Sit6              0.819    0.069   11.942    0.000    0.819    0.535

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit1r             4.547    0.053   86.474    0.000    4.547    2.604
   .Sit2              5.289    0.042  127.346    0.000    5.289    3.834
   .Sit3r             4.896    0.048  101.959    0.000    4.896    3.070
   .Sit4              5.359    0.042  126.896    0.000    5.359    3.821
   .Sit5r             4.860    0.052   94.060    0.000    4.860    2.832
   .Sit6              5.321    0.046  115.492    0.000    5.321    3.477
    Sit               0.000                               0.000    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit1r             1.526    0.149   10.217    0.000    1.526    0.500
   .Sit2              1.409    0.128   11.014    0.000    1.409    0.741
   .Sit3r             1.004    0.135    7.456    0.000    1.004    0.395
   .Sit4              1.352    0.127   10.672    0.000    1.352    0.687
   .Sit5r             1.899    0.118   16.025    0.000    1.899    0.645
   .Sit6              1.671    0.159   10.517    0.000    1.671    0.714
    Sit               1.000                               1.000    1.000

R-Square:
                   Estimate
    Sit1r             0.500
    Sit2              0.259
    Sit3r             0.605
    Sit4              0.313
    Sit5r             0.355
    Sit6              0.286

Each section contains different portions of model information.

Unstandardized Model Parameter Estmates

FACTOR LOADINGS (regression slopes of item response on factor)
Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  Sit =~                                                                
    Sit1r             1.234    0.069   17.906    0.000    1.234    0.707
    Sit2              0.702    0.074    9.441    0.000    0.702    0.509
    Sit3r             1.241    0.063   19.847    0.000    1.241    0.778
    Sit4              0.784    0.069   11.333    0.000    0.784    0.559
    Sit5r             1.023    0.053   19.179    0.000    1.023    0.596
    Sit6              0.819    0.069   11.942    0.000    0.819    0.535
Intercepts (of Items) – HERE, ARE ACTUAL ITEM MEANS BECAUSE FACTOR MEAN IS ZERO

Note: the last term in the list, Sit is the “intercept” (mean) of the factor. As it does not have a standard error, this indicates it was fixed to zero and not estimated.

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit1r             4.547    0.053   86.474    0.000    4.547    2.604
   .Sit2              5.289    0.042  127.346    0.000    5.289    3.834
   .Sit3r             4.896    0.048  101.959    0.000    4.896    3.070
   .Sit4              5.359    0.042  126.896    0.000    5.359    3.821
   .Sit5r             4.860    0.052   94.060    0.000    4.860    2.832
   .Sit6              5.321    0.046  115.492    0.000    5.321    3.477
    Sit               0.000                               0.000    0.000
Residual (Unique) Variances (variance of error terms)

Note: the last term in the list, Sit, is the variance of the factor. As it does not have a standard error, this indicates it was fixed to one (from the std.lv = TRUE option we used in the lavaan() function call) and not estimated.

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit1r             1.526    0.149   10.217    0.000    1.526    0.500
   .Sit2              1.409    0.128   11.014    0.000    1.409    0.741
   .Sit3r             1.004    0.135    7.456    0.000    1.004    0.395
   .Sit4              1.352    0.127   10.672    0.000    1.352    0.687
   .Sit5r             1.899    0.118   16.025    0.000    1.899    0.645
   .Sit6              1.671    0.159   10.517    0.000    1.671    0.714
    Sit               1.000                               1.000    1.000

Making use of the unstandardized model estimates:

Writing out the model—individual predicted values:

  • \(Y_1 = \mu_1 + \lambda_1F + e_1 = 4.547 + 1.234F + e_1\)

Writing out the model—predicted item variances and covariances:

  • \(Var\left( Y_1 \right) = \left( \lambda^2_1 \right) Var\left(F\right) + Var\left(e_1\right) = (1.234^2)*(1) + 1.526 = 3.049\) (= original item variance)

  • \(Cov\left( Y_1, Y_2 \right) = \lambda_1 Var\left(F\right) \lambda_2 = (1.234)*(1)*(.702) = .866\) (actual covariance = .577, so the model over-predicted how related items 1 and 2 should be)

Standardized Model Parameter Estimates

The last two columns of the summary output, Std.lv and Std.all, contain the standardized model parameter estimates. To understand the difference between standardized and unstandardized parameter estimates, let’s start with the standardized estimates. Standardized regression coefficients (and factor loadings) have a scale that is “units of Y” per “unit of X”. That is, the slope/loading, represents the increase in the dependent variable \(X\) per unit of the independent variabel \(X\) (in our case, the factor). The units of \(Y\) are given by the standard deviation of \(Y\), or \(SD(Y)\). Similarly, the units of \(X\) are given by the standard deviation of \(X\), or \(SD(X)\). You can think of the units associated by the fraction \(\frac{SD(Y)}{SD(X)}\). So, the first factor loading (a value of 1.234 for the item Sit1r) indicates the numeric response to the item goes up by 1.234 for every one-unit increase in the factor Sit.

The process of standardization removes the units attached to the parameter. So, if the unstandardized factor loadings are expreesed in units of \(\frac{SD(Y)}{SD(X)}\), the standardized units are acheived by either dividing or multiplying by the appropriate standard deviation to make the numerator or denominator of \(\frac{SD(Y)}{SD(X)}\) equal to one. For the standardized estimates under std.lv, the units of the factor \((SD(X))\) are removed (yielding \(\frac{SD(Y)}{1}\)) by multiplying the estimate by the factor standard deviation. Because the factor standard deviation is set to one, all estimates in this column are the same as the unstandardized estimate. These unstandardized estimates can be used to see how parameters would look under the Z-score identifiction method another identification method was used.

The standardized estimates listed under std.all are formed by multiplying the estimate by the standard deviation of the factor and dividing that by the unconditional (raw) standard deviation of the item. For instance, the “fully” standardized factor loading of the first item is found by multiplying the unstandardized coefficient (1.234) by one (the factor standard deviation) and dividing by the item’s standard deviation (\(\sqrt{3.049}\) – the square root of the item variance shown at the beginning of this example). The resulting value, \(1.234\times\frac{1}{\sqrt{3.049}} = 0.707\), represents the factor loading would the analysis have been run (a) with a Z-score factor identification and (b) on an item that was a Z-score.

The process of standardization is the same for all parameters of the model: intercepts, loadings, and residual (unique) variances. The interpretation of standardized item intercepts is difficult and often these are not reported. The standardized versions of factor loadings and unique variances are commonly reported.

Moreover, in for items measuring one factor, the standardized loadings lead directly to the \(R^2\) estimate – the amount of variance in the item responses explained by the factor. The item \(R^2\) is found by the square of the unstandardized factor loading. The R-Square information is found at the end of the model summary, so long as the option rsquare = TRUE is specified in the summary() function call.

R-Square:
                   Estimate
    Sit1r             0.500
    Sit2              0.259
    Sit3r             0.605
    Sit4              0.313
    Sit5r             0.355
    Sit6              0.286

Shortening Lavaan Syntax: Lavaan Default Options

The syntax above is designed to show the mapping of all CFA parameters to all lavaan syntax elements. In reality, all you would need to write to define this model is:

model01SyntaxShort = "
Sit =~ Sit1r + Sit2 + Sit3r + Sit4 + Sit5r + Sit6
"

The syntax input into the sem() function uses some defaults:

  • All item intercepts are estimated (each item gets its own) and the factor mean is fixed at zero.
  • All item error (unique) variances are estimated (each item gets its own).
  • All factor variances and covariances are estimated (to do so, the first item after the =~ has its factor loading set to 1 – a marker item identification method)

Similarly, to run this syntax the sem fuction is now called. To see the results, the summary function is used. The sem function simplifies the syntax needed to conduct a confirmatory factor analysis. These are demonstrated below. Note the output is identical to what was run previously.

model01EstimatesShort = sem(model = model01SyntaxShort, data = hfsSituations, estimator = "MLR", mimic = "mplus", std.lv = TRUE)
summary(model01EstimatesShort, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)
lavaan (0.5-23.1097) converged normally after  20 iterations

  Number of observations                          1103

  Number of missing patterns                         1

  Estimator                                         ML      Robust
  Minimum Function Test Statistic              427.937     307.804
  Degrees of freedom                                 9           9
  P-value (Chi-square)                           0.000       0.000
  Scaling correction factor                                  1.390
    for the Yuan-Bentler correction (Mplus variant)

Model test baseline model:

  Minimum Function Test Statistic             1981.034    1128.693
  Degrees of freedom                                15          15
  P-value                                        0.000       0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    0.787       0.732
  Tucker-Lewis Index (TLI)                       0.645       0.553

  Robust Comparative Fit Index (CFI)                         0.787
  Robust Tucker-Lewis Index (TLI)                            0.646

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -11536.404  -11536.404
  Scaling correction factor                                  1.416
    for the MLR correction
  Loglikelihood unrestricted model (H1)     -11322.435  -11322.435
  Scaling correction factor                                  1.407
    for the MLR correction

  Number of free parameters                         18          18
  Akaike (AIC)                               23108.808   23108.808
  Bayesian (BIC)                             23198.912   23198.912
  Sample-size adjusted Bayesian (BIC)        23141.739   23141.739

Root Mean Square Error of Approximation:

  RMSEA                                          0.205       0.173
  90 Percent Confidence Interval          0.189  0.222       0.160  0.188
  P-value RMSEA <= 0.05                          0.000       0.000

  Robust RMSEA                                               0.205
  90 Percent Confidence Interval                             0.185  0.224

Standardized Root Mean Square Residual:

  SRMR                                           0.086       0.086

Parameter Estimates:

  Information                                 Observed
  Standard Errors                   Robust.huber.white

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  Sit =~                                                                
    Sit1r             1.234    0.069   17.906    0.000    1.234    0.707
    Sit2              0.702    0.074    9.441    0.000    0.702    0.509
    Sit3r             1.241    0.063   19.847    0.000    1.241    0.778
    Sit4              0.784    0.069   11.333    0.000    0.784    0.559
    Sit5r             1.023    0.053   19.179    0.000    1.023    0.596
    Sit6              0.819    0.069   11.942    0.000    0.819    0.535

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit1r             4.547    0.053   86.474    0.000    4.547    2.604
   .Sit2              5.289    0.042  127.346    0.000    5.289    3.834
   .Sit3r             4.896    0.048  101.959    0.000    4.896    3.070
   .Sit4              5.359    0.042  126.896    0.000    5.359    3.821
   .Sit5r             4.860    0.052   94.060    0.000    4.860    2.832
   .Sit6              5.321    0.046  115.492    0.000    5.321    3.477
    Sit               0.000                               0.000    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit1r             1.526    0.149   10.217    0.000    1.526    0.500
   .Sit2              1.409    0.128   11.014    0.000    1.409    0.741
   .Sit3r             1.004    0.135    7.456    0.000    1.004    0.395
   .Sit4              1.352    0.127   10.672    0.000    1.352    0.687
   .Sit5r             1.899    0.118   16.025    0.000    1.899    0.645
   .Sit6              1.671    0.159   10.517    0.000    1.671    0.714
    Sit               1.000                               1.000    1.000

R-Square:
                   Estimate
    Sit1r             0.500
    Sit2              0.259
    Sit3r             0.605
    Sit4              0.313
    Sit5r             0.355
    Sit6              0.286

Alternative Identification Methods

There are multiple equivalent ways of getting the same CFA model, but with different scaling for the factor mean and variance (i.e., different means of identification). Now let’s see the model parameters when using the marker item for model identification instead. In the marker item identification method:

  • The first factor loading of a factor (the first variable after the =~ symbol in lavaan syntax) is set to one
  • The factor variance is estimated
  • The factor mean is set to zero
  • All item intercepts and unique variances are estimated (as done in the Z-score identification method)

Factor Mean Zero; Factor Variance Estimated (marker item factor loading)

The lavaan syntax is the same for the marker item identification, but the call to the lavaan() function does not include the std.lv = TRUE option, which defaults to the marker item method of identification:

model02Estimates = sem(model = model01SyntaxShort, data = hfsSituations, estimator = "MLR", mimic = "mplus")
summary(model02Estimates, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)
lavaan (0.5-23.1097) converged normally after  28 iterations

  Number of observations                          1103

  Number of missing patterns                         1

  Estimator                                         ML      Robust
  Minimum Function Test Statistic              427.937     307.804
  Degrees of freedom                                 9           9
  P-value (Chi-square)                           0.000       0.000
  Scaling correction factor                                  1.390
    for the Yuan-Bentler correction (Mplus variant)

Model test baseline model:

  Minimum Function Test Statistic             1981.034    1128.693
  Degrees of freedom                                15          15
  P-value                                        0.000       0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    0.787       0.732
  Tucker-Lewis Index (TLI)                       0.645       0.553

  Robust Comparative Fit Index (CFI)                         0.787
  Robust Tucker-Lewis Index (TLI)                            0.646

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -11536.404  -11536.404
  Scaling correction factor                                  1.416
    for the MLR correction
  Loglikelihood unrestricted model (H1)     -11322.435  -11322.435
  Scaling correction factor                                  1.407
    for the MLR correction

  Number of free parameters                         18          18
  Akaike (AIC)                               23108.808   23108.808
  Bayesian (BIC)                             23198.912   23198.912
  Sample-size adjusted Bayesian (BIC)        23141.739   23141.739

Root Mean Square Error of Approximation:

  RMSEA                                          0.205       0.173
  90 Percent Confidence Interval          0.189  0.222       0.160  0.188
  P-value RMSEA <= 0.05                          0.000       0.000

  Robust RMSEA                                               0.205
  90 Percent Confidence Interval                             0.185  0.224

Standardized Root Mean Square Residual:

  SRMR                                           0.086       0.086

Parameter Estimates:

  Information                                 Observed
  Standard Errors                   Robust.huber.white

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  Sit =~                                                                
    Sit1r             1.000                               1.234    0.707
    Sit2              0.569    0.083    6.831    0.000    0.702    0.509
    Sit3r             1.005    0.035   28.553    0.000    1.241    0.778
    Sit4              0.636    0.082    7.741    0.000    0.784    0.559
    Sit5r             0.829    0.053   15.698    0.000    1.023    0.596
    Sit6              0.664    0.081    8.143    0.000    0.819    0.535

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit1r             4.547    0.053   86.474    0.000    4.547    2.604
   .Sit2              5.289    0.042  127.346    0.000    5.289    3.834
   .Sit3r             4.896    0.048  101.959    0.000    4.896    3.070
   .Sit4              5.359    0.042  126.896    0.000    5.359    3.821
   .Sit5r             4.860    0.052   94.060    0.000    4.860    2.832
   .Sit6              5.321    0.046  115.492    0.000    5.321    3.477
    Sit               0.000                               0.000    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit1r             1.526    0.149   10.217    0.000    1.526    0.500
   .Sit2              1.409    0.128   11.014    0.000    1.409    0.741
   .Sit3r             1.004    0.135    7.456    0.000    1.004    0.395
   .Sit4              1.352    0.127   10.672    0.000    1.352    0.687
   .Sit5r             1.899    0.118   16.025    0.000    1.899    0.645
   .Sit6              1.671    0.159   10.517    0.000    1.671    0.714
    Sit               1.523    0.170    8.953    0.000    1.000    1.000

R-Square:
                   Estimate
    Sit1r             0.500
    Sit2              0.259
    Sit3r             0.605
    Sit4              0.313
    Sit5r             0.355
    Sit6              0.286

Unstandardized Model Results

FACTOR LOADINGS (regression slopes of item response on factor)

Here, loading for SIT1R is not tested and has no standard error (Std.Err) because it is fixed to one. Also, note the Std.lv estimates are identical to the unstandardized estimates in the Z-score factor identification method. Further, the Std.all estimates are equal to the Std.all estimates in the Z-score factor identification method. These will be equal for all identification methods.

Latent Variables:
                  Estimate  Std.Err  z-value  P(>|z|)  Std.lv  Std.all
  Sit =~
    Sit1r            1.000                               1.234    0.707
    Sit2             0.569    0.083    6.831    0.000    0.702    0.509
    Sit3r            1.005    0.035   28.553    0.000    1.241    0.778
    Sit4             0.636    0.082    7.741    0.000    0.784    0.559
    Sit5r            0.829    0.053   15.698    0.000    1.023    0.596
    Sit6             0.664    0.081    8.143    0.000    0.819    0.535
Intercepts (of Items) – EXPECTED Y WHEN FACTOR = 0, or for mean of factor in sample

Here, all are identical to what was found in the previous Z-score standardization. That is because we left the factor mean to be zero.

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit1r             4.547    0.053   86.474    0.000    4.547    2.604
   .Sit2              5.289    0.042  127.346    0.000    5.289    3.834
   .Sit3r             4.896    0.048  101.959    0.000    4.896    3.070
   .Sit4              5.359    0.042  126.896    0.000    5.359    3.821
   .Sit5r             4.860    0.052   94.060    0.000    4.860    2.832
   .Sit6              5.321    0.046  115.492    0.000    5.321    3.477
    Sit               0.000                               0.000    0.000
Residual (unique) variances and factor variance

Here, the factor variance is estimated, which is different from the previous identification method.

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit1r             1.526    0.149   10.217    0.000    1.526    0.500
   .Sit2              1.409    0.128   11.014    0.000    1.409    0.741
   .Sit3r             1.004    0.135    7.456    0.000    1.004    0.395
   .Sit4              1.352    0.127   10.672    0.000    1.352    0.687
   .Sit5r             1.899    0.118   16.025    0.000    1.899    0.645
   .Sit6              1.671    0.159   10.517    0.000    1.671    0.714
    Sit               1.523    0.170    8.953    0.000    1.000    1.000

Estimating Factor Means: Marker Item Loading (=1) and Marker Item Intercept (=0)

To estimate this model, we have to return to the general lavaan model function. In the syntax below you see 1*Sit1r, setting the factor loading for Sit1r to one (the marker item factor loading). Futher, you see Sit1r ~ 0 setting the item intercept of Sit1r to zero, the marker item intercept. Finally, to estimate the factor mean, we use Sit ~ 1, which estimates the factor mean explicitly.

model03Syntax = "
# Model 1 --  Fully Z-Scored Factor Identification Approach
# Item factor loadings --> list the factor to the left of the =~ and the items to the right, separated by a plus
# Once the factor is defined it can be used in syntax as if it is an observed variable
# Parameters can be fixed to constant values by using the * command; starting values can be specified by using start()*; labels can be implemented with []
Sit =~ 1*Sit1r + Sit2 + Sit3r + Sit4 + Sit5r + Sit6
# Item intercepts --> ~ 1 indicates means or intercepts
# You can put multiple lines of syntax on a single line using a ; 
Sit1r ~ 0; Sit2 ~ 1; Sit3r  ~ 1; Sit4 ~ 1; Sit5r ~ 1; Sit6 ~ 1;
# Item error (unique) variances and covariances --> use the ~~ command
Sit1r ~~ Sit1r; Sit2 ~~ Sit2; Sit3r ~~ Sit3r; Sit4 ~~ Sit4; Sit5r ~~ Sit5r; Sit6 ~~ Sit6; 
# Factor variance
Sit ~~ Sit
# Factor mean (intercept)
Sit ~ 1
"
model03Estimates = sem(model = model03Syntax, data = hfsSituations, estimator = "MLR", mimic = "mplus")
summary(model03Estimates, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)
lavaan (0.5-23.1097) converged normally after  55 iterations

  Number of observations                          1103

  Number of missing patterns                         1

  Estimator                                         ML      Robust
  Minimum Function Test Statistic              427.937     307.804
  Degrees of freedom                                 9           9
  P-value (Chi-square)                           0.000       0.000
  Scaling correction factor                                  1.390
    for the Yuan-Bentler correction (Mplus variant)

Model test baseline model:

  Minimum Function Test Statistic             1981.034    1128.693
  Degrees of freedom                                15          15
  P-value                                        0.000       0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    0.787       0.732
  Tucker-Lewis Index (TLI)                       0.645       0.553

  Robust Comparative Fit Index (CFI)                         0.787
  Robust Tucker-Lewis Index (TLI)                            0.646

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -11536.404  -11536.404
  Scaling correction factor                                  1.416
    for the MLR correction
  Loglikelihood unrestricted model (H1)     -11322.435  -11322.435
  Scaling correction factor                                  1.407
    for the MLR correction

  Number of free parameters                         18          18
  Akaike (AIC)                               23108.808   23108.808
  Bayesian (BIC)                             23198.912   23198.912
  Sample-size adjusted Bayesian (BIC)        23141.739   23141.739

Root Mean Square Error of Approximation:

  RMSEA                                          0.205       0.173
  90 Percent Confidence Interval          0.189  0.222       0.160  0.188
  P-value RMSEA <= 0.05                          0.000       0.000

  Robust RMSEA                                               0.205
  90 Percent Confidence Interval                             0.185  0.224

Standardized Root Mean Square Residual:

  SRMR                                           0.086       0.086

Parameter Estimates:

  Information                                 Observed
  Standard Errors                   Robust.huber.white

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  Sit =~                                                                
    Sit1r             1.000                               1.234    0.707
    Sit2              0.569    0.083    6.831    0.000    0.702    0.509
    Sit3r             1.005    0.035   28.553    0.000    1.241    0.778
    Sit4              0.636    0.082    7.741    0.000    0.784    0.559
    Sit5r             0.829    0.053   15.698    0.000    1.023    0.596
    Sit6              0.664    0.081    8.143    0.000    0.819    0.535

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit1r             0.000                               0.000    0.000
   .Sit2              2.701    0.383    7.045    0.000    2.701    1.958
   .Sit3r             0.325    0.171    1.899    0.058    0.325    0.204
   .Sit4              2.469    0.380    6.502    0.000    2.469    1.760
   .Sit5r             1.092    0.246    4.431    0.000    1.092    0.636
   .Sit6              2.304    0.369    6.249    0.000    2.304    1.506
    Sit               4.547    0.053   86.474    0.000    3.684    3.684

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit1r             1.526    0.149   10.217    0.000    1.526    0.500
   .Sit2              1.409    0.128   11.014    0.000    1.409    0.741
   .Sit3r             1.004    0.135    7.456    0.000    1.004    0.395
   .Sit4              1.352    0.127   10.672    0.000    1.352    0.687
   .Sit5r             1.899    0.118   16.025    0.000    1.899    0.645
   .Sit6              1.671    0.159   10.517    0.000    1.671    0.714
    Sit               1.523    0.170    8.953    0.000    1.000    1.000

R-Square:
                   Estimate
    Sit1r             0.500
    Sit2              0.259
    Sit3r             0.605
    Sit4              0.313
    Sit5r             0.355
    Sit6              0.286

Here, you will note the model is identical to the previous two – the log-likelihood is -11536.404, same as previously. Now, the itercepts has the mean of item Sit1r set to zero and the mean of the factor estimated.

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit1r             0.000                               0.000    0.000
   .Sit2              2.701    0.383    7.045    0.000    2.701    1.958
   .Sit3r             0.325    0.171    1.899    0.058    0.325    0.204
   .Sit4              2.469    0.380    6.502    0.000    2.469    1.760
   .Sit5r             1.092    0.246    4.431    0.000    1.092    0.636
   .Sit6              2.304    0.369    6.249    0.000    2.304    1.506
    Sit               4.547    0.053   86.474    0.000    3.684    3.684

The mean of the factor is 4.547, which, in this case, is the mean of item Sit1r. That is because the factor loading of the item is set to 1.0. You will also notice that the other item intercepts have changed from previous models. In general, an item’s mean is found by adding the item intercept to the product of the factor loading times the factor mean, or \(\mu_i + \lambda_i\mu_F\).

Model Fit Information for a Single-Factor Model (same regardless of factor scaling method):

Calculating model degrees of freedom (v is number of observed variables in a model):

  • Total df = \(\frac{v(v+1)}{2} + v = 27\)
  • Number of parameters in our model: 18
  • Degrees of freedom = 27-18 = 9

Loglikelihood – use for testing differences in model fit across nested models

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -11536.404  -11536.404
  Scaling correction factor                                  1.416
    for the MLR correction
  Loglikelihood unrestricted model (H1)     -11322.435  -11322.435
  Scaling correction factor                                  1.407
    for the MLR correction
  • Loglikelihood user model (H0): this is for your specified model
  • Scaling correction factor for the MLR correction: indicates how far off from normal=1
  • H1 Value: this is for a saturated (perfect) model

Information Criteria “smaller is better” – use for nested or non-nested model comparisons

  Number of free parameters                         18          18
  Akaike (AIC)                               23108.808   23108.808
  Bayesian (BIC)                             23198.912   23198.912
  Sample-size adjusted Bayesian (BIC)        23141.739   23141.739
  • is # of estimated parameters (“free” to be not 0)
  • AIC = (-2\(LL_{H0}\)) + (2estimated parameters)
  • BIC = (-2\(LL_{H0}\)) + (LN Nestimated parameters)
  • Sample-size adjusted Bayesian (BIC) = BIC replacing N with (N + 2) / 24

Chi-Square Test of Model Fit (Significance is bad here) for your specified model

  Estimator                                         ML      Robust
  Minimum Function Test Statistic              427.937     307.804
  Degrees of freedom                                 9           9
  P-value (Chi-square)                           0.000       0.000
  Scaling correction factor                                  1.390
    for the Yuan-Bentler correction (Mplus variant)
  • Estimator: Look at robust column for MLR version of test
  • Minimum Function Test Statistic: leftover after estimating our one-factor model
  • Scaling correction factor for the Yuan-Bentler correction (Mplus variant): indicates how far off from normal=1

Where does this \(\chi^2\) value for “model fit” come from? A rescaled −2LL model comparison of this one-factor model (H0) against the saturated model (H1) that perfectly reproduces the data covariances:

  • Step 1: Original −2ΔLL = −2*(LLfewer – LLmore) = −2(−11,536.404 + 11,322.435) = 427.938
  • Step 2: Scaling correction = [ (#parmsfewerscalefewer) – (#parmsmorescalemore) ] / (#parmsfewer – #parmsmore) = [ (18 * 1.4158) – (27 * 1.4073) ] / (18 – 27) = −12.501 / −9 = 1.3903
  • Step 3: Rescaled −2ΔLL = −2ΔLL / scaling correction = 427.938 / 1.903 = 307.803 ~matches model \(\chi^2\)
  • Step 4: Difference in df = #parmsmore – #parmsfewer = 27 – 18 = 9

Where the Saturated (H1) Model Comes From: All Means, Variances, and Covariances Estimated

How to fit the saturated (Unstructured) Baseline Model: Item means, variances, and covariances in original data:

modelSaturated = "
# Item intercepts --> ~ 1 indicates means or intercepts
# You can put multiple lines of syntax on a single line using a ; 
Sit1r ~ 1; Sit2 ~ 1; Sit3r  ~ 1; Sit4 ~ 1; Sit5r ~ 1; Sit6 ~ 1;

# Item variances and covariances --> use the ~~ command

# Variances:
Sit1r ~~ Sit1r; Sit2 ~~ Sit2; Sit3r ~~ Sit3r; Sit4 ~~ Sit4; Sit5r ~~ Sit5r; Sit6 ~~ Sit6; 

# Covariances:
Sit1r ~~ Sit2; Sit1r ~~ Sit3r; Sit1r ~~ Sit4; Sit1r ~~ Sit5r; Sit1r ~~ Sit6;
Sit2 ~~ Sit3r; Sit2 ~~ Sit4; Sit2 ~~ Sit5r; Sit2 ~~ Sit6;
Sit3r ~~ Sit4; Sit3r ~~ Sit5r; Sit3r ~~ Sit6;
Sit4 ~~ Sit5r; Sit4 ~~ Sit6;
Sit5r ~~ Sit6;
"
modelSaturatedEstimates = sem(model = modelSaturated, data = hfsSituations, estimator = "MLR", mimic = "mplus")
summary(modelSaturatedEstimates, fit.measures = TRUE)

Note that H0 and H1 are now the same! Our H0 model IS the H1 saturated model: the log-likelihood of H0 and H1 are the same. Also, note there are 27 free parameters (six item means, six item variances, and 15 covariances).

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -11322.435  -11322.435
  Loglikelihood unrestricted model (H1)     -11322.435  -11322.435

  Number of free parameters                         27          27
  Akaike (AIC)                               22698.870   22698.870
  Bayesian (BIC)                             22834.027   22834.027
  Sample-size adjusted Bayesian (BIC)        22748.268   22748.268

Also note how the model \(\chi^2\) test has zero degrees of freedom: this is because all 27 possible free parameters were estimated. This model fits the data perfectly.

  Estimator                                         ML      Robust
  Minimum Function Test Statistic                0.000       0.000
  Degrees of freedom                                 0           0
  Minimum Function Value               0.0000000000000
  Scaling correction factor                                     NA
    for the Yuan-Bentler correction (Mplus variant)

The Rest of the One-Factor Model Fit Statistics

User model versus baseline model:

  Comparative Fit Index (CFI)                    0.787       0.732
  Tucker-Lewis Index (TLI)                       0.645       0.553

  Robust Comparative Fit Index (CFI)                         0.787
  Robust Tucker-Lewis Index (TLI)                            0.646
  
Root Mean Square Error of Approximation:

  RMSEA                                          0.205       0.173
  90 Percent Confidence Interval          0.189  0.222       0.160  0.188
  P-value RMSEA <= 0.05                          0.000       0.000

  Robust RMSEA                                               0.205
  90 Percent Confidence Interval                             0.185  0.224

Standardized Root Mean Square Residual:

  SRMR                                           0.086       0.086
  • CFI/TLI: Want close to one (1 = saturated model)
  • RMSEA: Want close to zero (0 = saturated model)
  • SRMR (Standardized Root Mean Square Residual): want close to zero (0 = saturated model)

Chi-Square Test of Model Fit for the Baseline Model (for the “no covariances” model)

This compares the estimated model to the one that has no covariances. It is a last-resort type model in that if the estimated model has a test with a non-significant p-value, then this indicates there will be no factors to be found in the data.

Model test baseline model:

  Minimum Function Test Statistic             1981.034    1128.693
  Degrees of freedom                                15          15
  P-value                                        0.000       0.000

Where does this χ2 value for “fit of the baseline model” come from? A rescaled −2LL model comparison of the independence model with NO covariances to the saturated model:

  • Step 1: Original −2ΔLL = −2*(LLfewer – LLmore) = −2(−12,312.952 + 11,322.435) = 1,981.034
  • Step 2: Scaling correction = [ (#parmsfewerscalefewer) – (#parmsmorescalemore) ] / (#parmsfewer – #parmsmore) = [ (12 * 0.9725) – (27 * 1.4073) ] / (12 – 27) = –26.372 / −15 = 1.7551

  • Step 3: Rescaled −2ΔLL = −2ΔLL / scaling correction = 1,981.034 / 1.7551 = 1,128.704 ~matches baseline \(\chi^2\)
  • Step 4: Difference in df = #parmsmore – #parmsfewer = 27 – 12 = 15

What’s the point? This baseline model fit test tells us whether there are any covariances at all (i.e., whether it even makes sense to try to fit latent factors to predict them).

How to fit the Independence (Null) Baseline Model: Item means and variances, but NO covariances

modelIndependence = "
# Item intercepts --> ~ 1 indicates means or intercepts
# You can put multiple lines of syntax on a single line using a ; 
Sit1r ~ 1; Sit2 ~ 1; Sit3r  ~ 1; Sit4 ~ 1; Sit5r ~ 1; Sit6 ~ 1;

# Item variances and covariances --> use the ~~ command

# Variances:
Sit1r ~~ Sit1r; Sit2 ~~ Sit2; Sit3r ~~ Sit3r; Sit4 ~~ Sit4; Sit5r ~~ Sit5r; Sit6 ~~ Sit6; 

"
modelIndependenceEstimates = sem(model = modelIndependence, data = hfsSituations, estimator = "MLR", mimic = "mplus")
summary(modelIndependenceEstimates, fit.measures = TRUE)

Note how the test of the model vs. saturated is identical to the test of the model vs. baseline: that indicates this model is the baseline model.

Although not 0, this is the worst possible RMSEA while still allowing separate means and variances per item in these data. RMSEA is a parsimony-corrected absolute fit index (so, its fit is relative to the saturated model).

CFI and TLI are 0 because they are “incremental fit” indices relative to the independence model (which this is).

SRMR is also an absolute fit index (relative to saturated model), so this is the worst it gets for these data, too.

Poor Model Fit: Need for Model Modifications

For this section, any of the three equivalent single factor models can be used.

Modification Indices

So global fit for the one-factor model is not so good (RMSEA = .173, CFI = .732). What do the voo-doo modification indices suggest we do to fix it? To get modification indices, use the modificationindices() function. Here, the function is called with options:

  • object = model01Estimates: The model estimates object from which to derive modification indices.
  • sort. = TRUE: Sort the output from highest to lowest modification index.
modificationindices(object = model01Estimates, sort. = TRUE)
     lhs op   rhs      mi mi.scaled    epc sepc.lv sepc.all sepc.nox
27  Sit2 ~~  Sit4 224.295   161.330  0.702   0.702    0.363    0.363
22 Sit1r ~~ Sit3r 199.679   143.624  1.023   1.023    0.368    0.368
29  Sit2 ~~  Sit6  88.836    63.897  0.486   0.486    0.230    0.230
21 Sit1r ~~  Sit2  68.985    49.619 -0.464  -0.464   -0.193   -0.193
34  Sit4 ~~  Sit6  64.710    46.544  0.415   0.415    0.193    0.193
23 Sit1r ~~  Sit4  50.442    36.281 -0.403  -0.403   -0.164   -0.164
26  Sit2 ~~ Sit3r  48.505    34.888 -0.357  -0.357   -0.162   -0.162
30 Sit3r ~~  Sit4  40.616    29.214 -0.336  -0.336   -0.150   -0.150
25 Sit1r ~~  Sit6  33.475    24.078 -0.358  -0.358   -0.134   -0.134
32 Sit3r ~~  Sit6  31.132    22.392 -0.319  -0.319   -0.131   -0.131
28  Sit2 ~~ Sit5r   7.117     5.119 -0.151  -0.151   -0.064   -0.064
33  Sit4 ~~ Sit5r   6.906     4.967 -0.149  -0.149   -0.062   -0.062
24 Sit1r ~~ Sit5r   6.417     4.615  0.176   0.176    0.059    0.059
31 Sit3r ~~ Sit5r   3.548     2.552  0.123   0.123    0.045    0.045
35 Sit5r ~~  Sit6   0.736     0.529 -0.053  -0.053   -0.020   -0.020

The output includes:

  • lhs: Left hand side for parameter (in lavaan syntax)
  • op: Operation of parameter: ~~ is for (residual) covariances; =~ is for additional factor loadings. All other lavaan syntax types are possible, too, but here item means and variances fit perfectly (see below)
  • rhs: Right hand side for parameter (in lavaan syntax)
  • mi: Modification index using ML
  • mi.scaled: Modification index using MLR (use this one)
  • epc: Expected parameter change (from zero)
  • sepc.lv: Standardized expected parameter change using std.lv standardization
  • sepc.all: Standardized expected parameter change using std.all standardization
  • sepc.nox: Standardized expected parameter change using std.nox standardization

Highest values are for error covariances (for unknown multidimensionality).

Another approach: Residual Covariance Matrices

Another approach—how about we examine local fit and see where the problems seem to be?

The means and variances of the items will be perfectly reproduced, so that’s not an issue. Misfit results from the difference between the observed and model-predicted covariances.

lavaan gives us the “residual” (defined as observed – predicted) or “leftover” covariance matrix, but it is scale dependent and thus not so helpful. We can use the resid() function to obtain the residual covariance matrix (observed covariances minus model-implied covariances):

resid(object = model01Estimates, type = "raw")
$type
[1] "raw"

$cov
      Sit1r  Sit2   Sit3r  Sit4   Sit5r  Sit6  
Sit1r  0.000                                   
Sit2  -0.290  0.000                            
Sit3r  0.271 -0.174  0.000                     
Sit4  -0.234  0.552 -0.149  0.000              
Sit5r  0.096 -0.114  0.050 -0.108  0.000       
Sit6  -0.216  0.390 -0.149  0.319 -0.040  0.000

$mean
Sit1r  Sit2 Sit3r  Sit4 Sit5r  Sit6 
    0     0     0     0     0     0 

lavaan also gives us “normalized” residuals, which can be thought of as z-scores for how large the residual leftover covariance is in absolute terms. Because the denominator decreases with sample size, however, these values may be inflated in large samples, so look for relatively large values.

“Normalized” Residuals for Inter-Item Covariances = (observed – predicted) / SD(observed):

resid(object = model01Estimates, type = "normalized")
$type
[1] "normalized"

$cov
      Sit1r  Sit2   Sit3r  Sit4   Sit5r  Sit6  
Sit1r  0.000                                   
Sit2  -3.503  0.000                            
Sit3r  2.977 -2.253  0.000                     
Sit4  -2.928  6.560 -1.959  0.000              
Sit5r  0.960 -1.434  0.548 -1.372  0.000       
Sit6  -2.345  4.721 -1.756  3.925 -0.444  0.000

$mean
Sit1r  Sit2 Sit3r  Sit4 Sit5r  Sit6 
    0     0     0     0     0     0 

NEGATIVE NORMALIZED RESIDUAL: Less related than you predicted (don’t want to be together) POSITIVE NORMALIZED RESIDUAL: More related than you predicted (want to be more together)

Why might the normalized residuals (leftover correlations) for the positive-worded items be larger than for the negatively-worded items?

These results suggest that wording valence is playing a larger role in the pattern of covariance across items than what the one-factor model predicts. Rather than adding voo-doo covariances among the residuals for specific items, how about a two-factor model based on wording instead?

Model 4. Fully Z-Scored, 2-Factor Model

Here, we use the shortened syntax and the sem() function to get results.

model04Syntax = "
# SitP loadings (all estimated)
SitP =~ Sit2 + Sit4 + Sit6
# SitN loadings (all estimated)
SitN =~ Sit1r + Sit3r + Sit5r
"
model04Estimates = sem(model = model04Syntax, data = hfsSituations, estimator = "MLR", mimic = "mplus", std.lv = TRUE)
summary(model04Estimates, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)
lavaan (0.5-23.1097) converged normally after  23 iterations

  Number of observations                          1103

  Number of missing patterns                         1

  Estimator                                         ML      Robust
  Minimum Function Test Statistic               35.410      24.924
  Degrees of freedom                                 8           8
  P-value (Chi-square)                           0.000       0.002
  Scaling correction factor                                  1.421
    for the Yuan-Bentler correction (Mplus variant)

Model test baseline model:

  Minimum Function Test Statistic             1981.034    1128.693
  Degrees of freedom                                15          15
  P-value                                        0.000       0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    0.986       0.985
  Tucker-Lewis Index (TLI)                       0.974       0.972

  Robust Comparative Fit Index (CFI)                         0.988
  Robust Tucker-Lewis Index (TLI)                            0.977

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -11340.140  -11340.140
  Scaling correction factor                                  1.402
    for the MLR correction
  Loglikelihood unrestricted model (H1)     -11322.435  -11322.435
  Scaling correction factor                                  1.407
    for the MLR correction

  Number of free parameters                         19          19
  Akaike (AIC)                               22718.281   22718.281
  Bayesian (BIC)                             22813.391   22813.391
  Sample-size adjusted Bayesian (BIC)        22753.042   22753.042

Root Mean Square Error of Approximation:

  RMSEA                                          0.056       0.044
  90 Percent Confidence Interval          0.038  0.075       0.028  0.061
  P-value RMSEA <= 0.05                          0.277       0.708

  Robust RMSEA                                               0.052
  90 Percent Confidence Interval                             0.030  0.076

Standardized Root Mean Square Residual:

  SRMR                                           0.029       0.029

Parameter Estimates:

  Information                                 Observed
  Standard Errors                   Robust.huber.white

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  SitP =~                                                               
    Sit2              1.007    0.052   19.487    0.000    1.007    0.730
    Sit4              1.064    0.050   21.195    0.000    1.064    0.759
    Sit6              0.956    0.053   18.203    0.000    0.956    0.625
  SitN =~                                                               
    Sit1r             1.325    0.048   27.698    0.000    1.325    0.759
    Sit3r             1.349    0.044   30.514    0.000    1.349    0.846
    Sit5r             1.009    0.055   18.358    0.000    1.009    0.588

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  SitP ~~                                                               
    SitN              0.564    0.041   13.775    0.000    0.564    0.564

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit2              5.289    0.042  127.346    0.000    5.289    3.834
   .Sit4              5.359    0.042  126.896    0.000    5.359    3.821
   .Sit6              5.321    0.046  115.492    0.000    5.321    3.477
   .Sit1r             4.547    0.053   86.474    0.000    4.547    2.604
   .Sit3r             4.896    0.048  101.959    0.000    4.896    3.070
   .Sit5r             4.860    0.052   94.060    0.000    4.860    2.832
    SitP              0.000                               0.000    0.000
    SitN              0.000                               0.000    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit2              0.888    0.097    9.173    0.000    0.888    0.467
   .Sit4              0.835    0.093    9.003    0.000    0.835    0.425
   .Sit6              1.428    0.134   10.684    0.000    1.428    0.610
   .Sit1r             1.294    0.103   12.547    0.000    1.294    0.425
   .Sit3r             0.724    0.092    7.857    0.000    0.724    0.285
   .Sit5r             1.926    0.119   16.128    0.000    1.926    0.654
    SitP              1.000                               1.000    1.000
    SitN              1.000                               1.000    1.000

R-Square:
                   Estimate
    Sit2              0.533
    Sit4              0.575
    Sit6              0.390
    Sit1r             0.575
    Sit3r             0.715
    Sit5r             0.346

This model fits better (we would call this good fit). We can also compare the fit of this model with Model 1 using the anova() function. Here, the null hypothesis is that Model 1 fits as well as Model 4.

anova(model01Estimates, model04Estimates)
Scaled Chi Square Difference Test (method = "satorra.bentler.2001")

                 Df   AIC   BIC  Chisq Chisq diff Df diff Pr(>Chisq)    
model04Estimates  8 22718 22813  35.41                                  
model01Estimates  9 23109 23199 427.94     342.29       1  < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This function conducts the MLR version of the likelihood ratio test for us. The significant difference indicates that Model 4 fits better than Model 1.

Let’s see: Any more local fit problems? The modification indices are better (values of a little over 10 seem okay).

modificationindices(object = model04Estimates, sort. = TRUE)
     lhs op   rhs     mi mi.scaled    epc sepc.lv sepc.all sepc.nox
29  SitN =~  Sit6 15.383    10.827  0.245   0.245    0.160    0.160
30  Sit2 ~~  Sit4 15.382    10.827  0.332   0.332    0.172    0.172
35  Sit4 ~~  Sit6 13.886     9.774 -0.273  -0.273   -0.127   -0.127
27  SitN =~  Sit2 13.886     9.773 -0.224  -0.224   -0.162   -0.162
24  SitP =~ Sit1r  9.244     6.507 -0.223  -0.223   -0.128   -0.128
44 Sit3r ~~ Sit5r  9.244     6.507 -0.277  -0.277   -0.101   -0.101
42 Sit1r ~~ Sit3r  7.549     5.313  0.409   0.409    0.147    0.147
26  SitP =~ Sit5r  7.549     5.313  0.191   0.191    0.111    0.111
32  Sit2 ~~ Sit1r  6.685     4.705 -0.115  -0.115   -0.048   -0.048
41  Sit6 ~~ Sit5r  5.847     4.115  0.138   0.138    0.052    0.052
40  Sit6 ~~ Sit3r  1.321     0.930  0.052   0.052    0.021    0.021
43 Sit1r ~~ Sit5r  0.667     0.469  0.071   0.071    0.024    0.024
25  SitP =~ Sit3r  0.667     0.469  0.060   0.060    0.037    0.037
33  Sit2 ~~ Sit3r  0.402     0.283 -0.025  -0.025   -0.012   -0.012
39  Sit6 ~~ Sit1r  0.346     0.244  0.030   0.030    0.011    0.011
36  Sit4 ~~ Sit1r  0.307     0.216 -0.025  -0.025   -0.010   -0.010
38  Sit4 ~~ Sit5r  0.201     0.142  0.022   0.022    0.009    0.009
34  Sit2 ~~ Sit5r  0.129     0.091  0.017   0.017    0.007    0.007
37  Sit4 ~~ Sit3r  0.101     0.071  0.013   0.013    0.006    0.006
31  Sit2 ~~  Sit6  0.023     0.016  0.010   0.010    0.005    0.005
28  SitN =~  Sit4  0.023     0.016  0.009   0.009    0.007    0.007

The normalized residuals have two values that are bigger in absolute value than 2:

resid(object = model04Estimates, type = "normalized")
$type
[1] "normalized"

$cov
      Sit2   Sit4   Sit6   Sit1r  Sit3r  Sit5r 
Sit2   0.000                                   
Sit4   0.370  0.000                            
Sit6   0.031 -0.676  0.000                     
Sit1r -2.125 -0.768  0.870  0.000              
Sit3r -0.896  0.192  1.658  0.172  0.000       
Sit5r  0.382  1.128  2.847  0.212 -0.464  0.000

$mean
 Sit2  Sit4  Sit6 Sit1r Sit3r Sit5r 
    0     0     0     0     0     0 

Because we have no real theoretical or defendable reason to fit any of these suggested parameters, we will not add any new parameters. This will be about as good as it gets.

Calculating Reliabilities

As we have demonstrated good model fit, estimates of trait reliabilities can now be found. Although I do not dispense such advice, many people use a CFA to indicate model fit then proceed to use sum scores for each trait in a model. I do not like this because sum scores are perfectly correlated with the factor scores from the parallel items model – but we never tested that model (should it not fit we may get a reversal in the rank ordering of people by score).

Reliabilities for Sum Scores (Omega)

That said, reliabilities for sum scores can be found by using CFA model parameters. This reliability coefficient is called “Omega” by Rod McDonald and is calculated:

\[ \rho_\omega = \frac{\sigma^2_F\left( \sum_{i=1}^{Items} \lambda_i \right)^2}{\sigma^2_F\left( \sum_{i=1}^{Items} \lambda_i \right)^2 \sum_{i=1}^{Items} \sum_{j=1}^{Items} \psi_{ij}}\]

In words, Omega = Var(Factor)* (Sum of loadings)^2 / [ Var(Factor)* (Sum of loadings)^2 + Sum of error variances + 2* Sum of error covariances]

For our data:

  • Omega for Positive Factor = .744
    (1.007+1.064+0.956)2 / (1.007+1.064+0.956)2 + (0.888+0.835+1.428) + 2*0

(alpha was .746)

  • Omega for Negative Factor = .775 (1.325+1.349+1.009)2 / (1.325+1.349+1.009)2 + (1.294+0.724+1.926) + 2*0

(alpha was .780)

Calculating Omega from lavaan

Through the use of parameter labels and new parameters, you can use lavaan to calculate Omega. In this syntax, the terms before the items are labels representing the factor loadings and unique variances. They can be used later in the syntax to form new parameters or to put constraints on the model.

model04SyntaxOmega = "
# SitP loadings (all estimated)
SitP =~ L2*Sit2 + L4*Sit4 + L6*Sit6
# SitN loadings (all estimated)
SitN =~ L1*Sit1r + L3*Sit3r + L5*Sit5r
# Unique Variances:
Sit1r ~~ E1*Sit1r; Sit2 ~~ E2*Sit2; Sit3r ~~ E3*Sit3r; Sit4 ~~ E4*Sit4; Sit5r ~~ E5*Sit5r; Sit6 ~~ E6*Sit6; 
# Calculate Omega Reliability for Sum Scores:
OmegaP := ((L2 + L4 + L6)^2) / ( ((L2 + L4 + L6)^2) + E2 + E4 + E6)
OmegaN := ((L1 + L3 + L5)^2) / ( ((L1 + L3 + L5)^2) + E1 + E3 + E5)
"
model04EstimatesOmega = sem(model = model04SyntaxOmega, data = hfsSituations, estimator = "MLR", mimic = "mplus", std.lv = TRUE)
summary(model04EstimatesOmega, fit.measures = FALSE, rsquare = FALSE, standardized = FALSE, header = FALSE)

Parameter Estimates:

  Information                                 Observed
  Standard Errors                   Robust.huber.white

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  SitP =~                                             
    Sit2      (L2)    1.007    0.052   19.487    0.000
    Sit4      (L4)    1.064    0.050   21.195    0.000
    Sit6      (L6)    0.956    0.053   18.203    0.000
  SitN =~                                             
    Sit1r     (L1)    1.325    0.048   27.698    0.000
    Sit3r     (L3)    1.349    0.044   30.514    0.000
    Sit5r     (L5)    1.009    0.055   18.358    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  SitP ~~                                             
    SitN              0.564    0.041   13.775    0.000

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .Sit2              5.289    0.042  127.346    0.000
   .Sit4              5.359    0.042  126.896    0.000
   .Sit6              5.321    0.046  115.492    0.000
   .Sit1r             4.547    0.053   86.474    0.000
   .Sit3r             4.896    0.048  101.959    0.000
   .Sit5r             4.860    0.052   94.060    0.000
    SitP              0.000                           
    SitN              0.000                           

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .Sit1r     (E1)    1.294    0.103   12.547    0.000
   .Sit2      (E2)    0.888    0.097    9.173    0.000
   .Sit3r     (E3)    0.724    0.092    7.857    0.000
   .Sit4      (E4)    0.835    0.093    9.003    0.000
   .Sit5r     (E5)    1.926    0.119   16.128    0.000
   .Sit6      (E6)    1.428    0.134   10.684    0.000
    SitP              1.000                           
    SitN              1.000                           

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)
    OmegaP            0.744    0.020   37.956    0.000
    OmegaN            0.775    0.014   56.802    0.000

Calculating Omega through `lavaan also provides its standard error. Note: the Omega equation used above is for the Z-score factor identification method. If the marker-item method is used for factor loadings, then multiply the sum of the loadings by the variance of the factor, as shown in the equation above.

Of note about Omega: If we constrain all factor loadings of a trait to be equal (called Tau-Equivalent), Omega is equal to Alpha. In the syntax below, putting the labels LP and LN in multiple places causes the factor loadings for those places to be constrained to be equal:

model04SyntaxAlpha = "
# SitP loadings (all estimated)
SitP =~ LP*Sit2 + LP*Sit4 + LP*Sit6
# SitN loadings (all estimated)
SitN =~ LN*Sit1r + LN*Sit3r + LN*Sit5r
# Unique Variances:
Sit1r ~~ E1*Sit1r; Sit2 ~~ E2*Sit2; Sit3r ~~ E3*Sit3r; Sit4 ~~ E4*Sit4; Sit5r ~~ E5*Sit5r; Sit6 ~~ E6*Sit6; 
# Calculate Omega Reliability for Sum Scores:
OmegaP := ((3*LP)^2) / ( ((3*LP)^2) + E2 + E4 + E6)
OmegaN := ((3*LN)^2) / ( ((3*LN)^2) + E1 + E3 + E5)
"
model04EstimatesAlpha = sem(model = model04SyntaxAlpha, data = hfsSituations, estimator = "MLR", mimic = "mplus", std.lv = TRUE)
summary(model04EstimatesAlpha, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)
lavaan (0.5-23.1097) converged normally after  18 iterations

  Number of observations                          1103

  Number of missing patterns                         1

  Estimator                                         ML      Robust
  Minimum Function Test Statistic               73.500      55.487
  Degrees of freedom                                12          12
  P-value (Chi-square)                           0.000       0.000
  Scaling correction factor                                  1.325
    for the Yuan-Bentler correction (Mplus variant)

Model test baseline model:

  Minimum Function Test Statistic             1981.034    1128.693
  Degrees of freedom                                15          15
  P-value                                        0.000       0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    0.969       0.961
  Tucker-Lewis Index (TLI)                       0.961       0.951

  Robust Comparative Fit Index (CFI)                         0.971
  Robust Tucker-Lewis Index (TLI)                            0.963

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -11359.185  -11359.185
  Scaling correction factor                                  1.163
    for the MLR correction
  Loglikelihood unrestricted model (H1)     -11322.435  -11322.435
  Scaling correction factor                                  1.407
    for the MLR correction

  Number of free parameters                         15          15
  Akaike (AIC)                               22748.370   22748.370
  Bayesian (BIC)                             22823.457   22823.457
  Sample-size adjusted Bayesian (BIC)        22775.813   22775.813

Root Mean Square Error of Approximation:

  RMSEA                                          0.068       0.057
  90 Percent Confidence Interval          0.054  0.084       0.044  0.071
  P-value RMSEA <= 0.05                          0.021       0.167

  Robust RMSEA                                               0.066
  90 Percent Confidence Interval                             0.049  0.084

Standardized Root Mean Square Residual:

  SRMR                                           0.061       0.061

Parameter Estimates:

  Information                                 Observed
  Standard Errors                   Robust.huber.white

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  SitP =~                                                               
    Sit2      (LP)    1.014    0.036   28.384    0.000    1.014    0.734
    Sit4      (LP)    1.014    0.036   28.384    0.000    1.014    0.733
    Sit6      (LP)    1.014    0.036   28.384    0.000    1.014    0.653
  SitN =~                                                               
    Sit1r     (LN)    1.254    0.032   38.954    0.000    1.254    0.735
    Sit3r     (LN)    1.254    0.032   38.954    0.000    1.254    0.804
    Sit5r     (LN)    1.254    0.032   38.954    0.000    1.254    0.682

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  SitP ~~                                                               
    SitN              0.578    0.041   14.230    0.000    0.578    0.578

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit2              5.289    0.042  127.346    0.000    5.289    3.827
   .Sit4              5.359    0.042  126.896    0.000    5.359    3.872
   .Sit6              5.321    0.046  115.492    0.000    5.321    3.427
   .Sit1r             4.547    0.053   86.474    0.000    4.547    2.667
   .Sit3r             4.896    0.048  101.959    0.000    4.896    3.141
   .Sit5r             4.860    0.052   94.060    0.000    4.860    2.645
    SitP              0.000                               0.000    0.000
    SitN              0.000                               0.000    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit1r     (E1)    1.335    0.083   16.168    0.000    1.335    0.459
   .Sit2      (E2)    0.882    0.083   10.610    0.000    0.882    0.462
   .Sit3r     (E3)    0.857    0.069   12.339    0.000    0.857    0.353
   .Sit4      (E4)    0.887    0.075   11.779    0.000    0.887    0.463
   .Sit5r     (E5)    1.805    0.115   15.710    0.000    1.805    0.534
   .Sit6      (E6)    1.382    0.118   11.703    0.000    1.382    0.573
    SitP              1.000                               1.000    1.000
    SitN              1.000                               1.000    1.000

R-Square:
                   Estimate
    Sit1r             0.541
    Sit2              0.538
    Sit3r             0.647
    Sit4              0.537
    Sit5r             0.466
    Sit6              0.427

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    OmegaP            0.746    0.020   38.193    0.000    0.746    0.764
    OmegaN            0.780    0.013   58.288    0.000    0.780    0.783

Compared with what the psych package gives:

alpha(x = hfsSituations[c("Sit2", "Sit4", "Sit6")], use = "all.obs")

Reliability analysis   
Call: alpha(x = hfsSituations[c("Sit2", "Sit4", "Sit6")], use = "all.obs")

  raw_alpha std.alpha G6(smc) average_r S/N   ase mean  sd
      0.74      0.74    0.67      0.49 2.9 0.014  5.3 1.2

 lower alpha upper     95% confidence boundaries
0.71 0.74 0.77 

 Reliability if an item is dropped:
     raw_alpha std.alpha G6(smc) average_r S/N alpha se
Sit2      0.62      0.62    0.45      0.45 1.6    0.023
Sit4      0.63      0.63    0.46      0.46 1.7    0.022
Sit6      0.73      0.73    0.57      0.57 2.7    0.016

 Item statistics 
        n raw.r std.r r.cor r.drop mean  sd
Sit2 1103  0.82  0.83  0.71   0.60  5.3 1.4
Sit4 1103  0.82  0.83  0.70   0.59  5.4 1.4
Sit6 1103  0.80  0.78  0.59   0.51  5.3 1.5

Non missing response frequency for each item
        1    2    3    4    5    6    7 miss
Sit2 0.02 0.02 0.05 0.13 0.31 0.24 0.22    0
Sit4 0.02 0.02 0.04 0.15 0.27 0.24 0.25    0
Sit6 0.03 0.03 0.06 0.14 0.23 0.25 0.27    0
alpha(x = hfsSituations[c("Sit1r", "Sit3r", "Sit5r")], use = "all.obs")

Reliability analysis   
Call: alpha(x = hfsSituations[c("Sit1r", "Sit3r", "Sit5r")], use = "all.obs")

  raw_alpha std.alpha G6(smc) average_r S/N   ase mean  sd
      0.77      0.77     0.7      0.53 3.3 0.012  4.8 1.4

 lower alpha upper     95% confidence boundaries
0.74 0.77 0.79 

 Reliability if an item is dropped:
      raw_alpha std.alpha G6(smc) average_r S/N alpha se
Sit1r      0.65      0.65    0.48      0.48 1.9    0.021
Sit3r      0.62      0.62    0.45      0.45 1.7    0.023
Sit5r      0.78      0.79    0.65      0.65 3.7    0.013

 Item statistics 
         n raw.r std.r r.cor r.drop mean  sd
Sit1r 1103  0.85  0.85  0.74   0.63  4.5 1.7
Sit3r 1103  0.85  0.86  0.76   0.66  4.9 1.6
Sit5r 1103  0.78  0.78  0.58   0.51  4.9 1.7

Non missing response frequency for each item
         1    2    3    4    5    6    7 miss
Sit1r 0.05 0.08 0.16 0.19 0.18 0.17 0.17    0
Sit3r 0.03 0.05 0.12 0.20 0.21 0.20 0.20    0
Sit5r 0.04 0.06 0.13 0.18 0.18 0.19 0.22    0

Reliabilities for Factor Scores

Factor scores have a long history with many critiques and complaints. That said, factor scores can be constructed analogously across measurement models using Bayesian estmates. We will describe this more in a later lecture. For our purposes, in CFA, the two main types of Bayesian Estimates (MAP: Maximum A Posteriori and EAP: Expected A Posteriori) are identical. lavaan will not provide standard errors for factor scores, so I created a function factorScores() to not only create the factor scores using EAP (again, identical to MAP) but also to give standard errors for each score. The function takes the lavaan model output and returns a list of three elements: scores containing factor scores and standard errors, factorCov containing the theoretical covariance matrix of the factor scores, and factorCor the theoretical correlation matrix of the factor scores.

To determine the reliability of a factor score, we must return to our classical notion of what reliability means: \[ \rho = \frac{Var(True)}{Var(Total)} = \frac{Var(True)}{Var(True) + Var(Error)} = \frac{Var(F)}{Var(F)+SE(F)^2}\] In CFA, \(Var(True)\) is found by the estimated factor variance and \(Var(Error)\) comes from the squared standard error of a factor score (from an observation with complete data).

I created a function named factorScoreReliability() to calculate the reliability for factor scores. Using the function we can see that the reliabiltiy of our factor scores is .817 for SitP and .851 for SitN, which har higher than the Omega reliablities for sum scores. In general, factor score reliabilities are greater than what is found in Omega and Alpha due to the use of a prior distribution for the factor itself and due to the factor correlation allowing indirect information from other factors to help in the estimation of each factor.

factorScores = function(lavObject){
  output = inspect(object = lavObject, what = "est")
  
  sigma = output$lambda %*% output$psi %*% t(output$lambda) + output$theta
  modelData = lavObject@Data@X[[1]]
  
  scores = t(output$alpha%*%matrix(1, nrow=1, ncol=dim(modelData)[1]) +
    output$psi %*% t(output$lambda) %*% solve(sigma)%*%(t(modelData) - output$nu%*%matrix(1, nrow=1, ncol=dim(modelData)[1])))
  varscores = output$psi - output$psi %*% t(output$lambda) %*% solve(sigma) %*% output$lambda %*% output$psi
  factorSE = sqrt(diag(varscores))
  names(factorSE) = paste0(names(factorSE), ".SE")
  factorSEmat = matrix(1, nrow=nrow(scores), ncol = 1) %*% matrix(factorSE, nrow = 1, ncol = ncol(scores))
  colnames(factorSEmat) = names(factorSE)
  
  result = data.frame(cbind(scores, factorSEmat))
  names(result)
  odds = seq(1, ncol(result)-1, 2)
  evens = seq(2, ncol(result), 2)
  result = result[c(odds,evens)]
  
  factorCov = varscores
  factorCorr = solve(sqrt(diag(diag(varscores)))) %*% varscores %*% solve(sqrt(diag(diag(varscores))))
  
  return(list(scores = result, factorCov = factorCov, factorCorr = factorCorr))
}
factorScoreReliability = function(lavObject){
  output = inspect(object = lavObject, what = "est")
  sigma = output$lambda %*% output$psi %*% t(output$lambda) + output$theta
  varscores = output$psi - output$psi %*% t(output$lambda) %*% solve(sigma) %*% output$lambda %*% output$psi
  
  return(diag(output$psi)/(diag(output$psi) + diag(varscores)))
  
}
factorScoreReliability(lavObject = model04Estimates)
     SitP      SitN 
0.8177844 0.8510698 

Testing Assumptions of Classical Test Theory

We can also compare the model fit of the CFA model with the tau-equivalent model, essentially testing whether or not CTT (and alpha) is appropriate:

anova(model04EstimatesAlpha, model04EstimatesOmega)
Scaled Chi Square Difference Test (method = "satorra.bentler.2001")

                      Df   AIC   BIC Chisq Chisq diff Df diff Pr(>Chisq)    
model04EstimatesOmega  8 22718 22813 35.41                                  
model04EstimatesAlpha 12 22748 22824 73.50     33.635       4  8.851e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The significant difference indicates that the tau-equivalent assumptions of equal item loadings do not hold for all of the data. In the Mplus handout, Lesa Hoffman shows that tau equivalence holds for one of the subscales, but we don’t need to have tau equivalence to move forward with results as CFA is subsumes the tau-equivalent model (meaning it will be tau equivalent if the data are tau equivalent but can also be more general).

Another model often discussed in CTT is the parallel items model. The CFA version of the parallel items model is a model where all factor loadings for a factor are constrained to be equal and all unique variances for a factor are constrained to be equal. This model is one step more restrictive than the tau equivalent items model. Syntax for this model is as follows:

model04SyntaxSB = "
# SitP loadings (all estimated)
SitP =~ LP*Sit2 + LP*Sit4 + LP*Sit6
# SitN loadings (all estimated)
SitN =~ LN*Sit1r + LN*Sit3r + LN*Sit5r
# Unique Variances:
Sit1r ~~ EN*Sit1r; Sit2 ~~ EP*Sit2; Sit3r ~~ EN*Sit3r; Sit4 ~~ EP*Sit4; Sit5r ~~ EN*Sit5r; Sit6 ~~ EP*Sit6; 
# Calculate Omega Reliability for Sum Scores:
OmegaP := ((3*LP)^2) / ( ((3*LP)^2) + 3*EP)
OmegaN := ((3*LN)^2) / ( ((3*LN)^2) + 3*EN)
"
model04EstimatesSB = sem(model = model04SyntaxSB, data = hfsSituations, estimator = "MLR", mimic = "mplus", std.lv = TRUE)
summary(model04EstimatesSB, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)
lavaan (0.5-23.1097) converged normally after  12 iterations

  Number of observations                          1103

  Number of missing patterns                         1

  Estimator                                         ML      Robust
  Minimum Function Test Statistic              181.247     132.341
  Degrees of freedom                                16          16
  P-value (Chi-square)                           0.000       0.000
  Scaling correction factor                                  1.370
    for the Yuan-Bentler correction (Mplus variant)

Model test baseline model:

  Minimum Function Test Statistic             1981.034    1128.693
  Degrees of freedom                                15          15
  P-value                                        0.000       0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    0.916       0.896
  Tucker-Lewis Index (TLI)                       0.921       0.902

  Robust Comparative Fit Index (CFI)                         0.918
  Robust Tucker-Lewis Index (TLI)                            0.924

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -11413.059  -11413.059
  Scaling correction factor                                  0.847
    for the MLR correction
  Loglikelihood unrestricted model (H1)     -11322.435  -11322.435
  Scaling correction factor                                  1.407
    for the MLR correction

  Number of free parameters                         11          11
  Akaike (AIC)                               22848.117   22848.117
  Bayesian (BIC)                             22903.181   22903.181
  Sample-size adjusted Bayesian (BIC)        22868.242   22868.242

Root Mean Square Error of Approximation:

  RMSEA                                          0.097       0.081
  90 Percent Confidence Interval          0.084  0.110       0.070  0.092
  P-value RMSEA <= 0.05                          0.000       0.000

  Robust RMSEA                                               0.095
  90 Percent Confidence Interval                             0.080  0.110

Standardized Root Mean Square Residual:

  SRMR                                           0.088       0.088

Parameter Estimates:

  Information                                 Observed
  Standard Errors                   Robust.huber.white

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  SitP =~                                                               
    Sit2      (LP)    1.005    0.035   28.455    0.000    1.005    0.698
    Sit4      (LP)    1.005    0.035   28.455    0.000    1.005    0.698
    Sit6      (LP)    1.005    0.035   28.455    0.000    1.005    0.698
  SitN =~                                                               
    Sit1r     (LN)    1.222    0.032   38.250    0.000    1.222    0.724
    Sit3r     (LN)    1.222    0.032   38.250    0.000    1.222    0.724
    Sit5r     (LN)    1.222    0.032   38.250    0.000    1.222    0.724

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  SitP ~~                                                               
    SitN              0.596    0.041   14.422    0.000    0.596    0.596

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit2              5.289    0.042  127.346    0.000    5.289    3.676
   .Sit4              5.359    0.042  126.896    0.000    5.359    3.724
   .Sit6              5.321    0.046  115.492    0.000    5.321    3.698
   .Sit1r             4.547    0.053   86.474    0.000    4.547    2.695
   .Sit3r             4.896    0.048  101.959    0.000    4.896    2.902
   .Sit5r             4.860    0.052   94.060    0.000    4.860    2.881
    SitP              0.000                               0.000    0.000
    SitN              0.000                               0.000    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .Sit1r     (EN)    1.353    0.060   22.722    0.000    1.353    0.475
   .Sit2      (EP)    1.060    0.061   17.452    0.000    1.060    0.512
   .Sit3r     (EN)    1.353    0.060   22.722    0.000    1.353    0.475
   .Sit4      (EP)    1.060    0.061   17.452    0.000    1.060    0.512
   .Sit5r     (EN)    1.353    0.060   22.722    0.000    1.353    0.475
   .Sit6      (EP)    1.060    0.061   17.452    0.000    1.060    0.512
    SitP              1.000                               1.000    1.000
    SitN              1.000                               1.000    1.000

R-Square:
                   Estimate
    Sit1r             0.525
    Sit2              0.488
    Sit3r             0.525
    Sit4              0.488
    Sit5r             0.525
    Sit6              0.488

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    OmegaP            0.741    0.020   36.909    0.000    0.741    0.741
    OmegaN            0.768    0.015   52.547    0.000    0.768    0.768

The values resulting from Omega are the Spearman-Brown reliablity estimates. Note that reliablity for parallel items < reliability for tau equivalent items < reliability for CFA, but that the estimates are very close. That is common when a model fits the data.

Examining Factor Score Distributions

The positive factor scores have an estimated mean of 0 with a variance of 0.78 instead of 1.00 (due to the effect of the prior distribution – this is called “shrinkage”). The SE for each person’s factor score is .472. Treating factor scores as observed variables is like saying SE = 0.

95% confidence interval for positive factor score = Score ± 2*.472 = Score ± .944.

# calculate factor scores using function above:
fscores = factorScores(lavObject = model04Estimates)
#show variance of factor score:
var(fscores$scores$SitP)
[1] 0.7778887
# Histogram overlaid with kernel density curve
ggplot(fscores$scores, aes(x=SitP)) + 
    geom_histogram(aes(y=..density..),      # Histogram with density instead of count on y-axis
                   binwidth=.5,
                   colour="black", fill="white") + xlim(c(-4,4)) + labs(title = "Positive Situation Forgiveness Factor Score") +
    geom_density(alpha=.2, fill="#FF6666")    # Overlay with transparent density plot