## Reviewing Main Effects in General Linear Models (as estimated using least squares with the lm function in base R)

The models for this example come from Hoffman (2015) chapter 2. We will be examining the extent to which cognition (as measured by an information test outcome) can be predicted from age (centered at 85 years) grip strength (centered at 9 pounds), sex (with men as the reference group) and dementia status (none = 1, future = 2, and current = 3) in a sample of 550 older adults.

if (!require(psych)){
install.packages("psych")
}
library(psych)

### Data Import and Variable Centering/Creation

# Import data into R from CSV file.
ImportedData = read.csv(file = "mv18epsy905_lecture02.csv", stringsAsFactors = FALSE)
# Copy initial data to a new data frame for additional formatting:
Data01 = ImportedData
# Center age at 85 years old
Data01$age85 = Data01$age - 85
# Center grip strength at 9
Data01$grip9 = Data01$grip - 9
# Create dummy coded variables for dementia group (3 categories needs 2 variables):
# 1. Start by creating variables full of zeroes:
Data01$demNF = 0 # for no dementia in the future Data01$demNC = 0 # for no dementia currently
# 2. Change zeroes to ones for people in each respective group
#   Here, the term in the bracket provides a logical variable where if true, a 1 gets put into the demNF variable
Data01$demNF[Data01$demgroup == 2] = 1
Data01$demNC[Data01$demgroup == 3] = 1

### Descriptive Statistics for Data

# Selecting only three quantitative variables (the c() in the brackets consists of the variables to use)
describe(Data01[c("age", "grip", "cognition")])
          vars   n  mean    sd median trimmed   mad   min   max range  skew kurtosis   se
age          1 550 84.93  3.43  84.33   84.49  2.88 80.02 96.97 16.95  1.10     0.72 0.15
grip         2 550  9.11  2.98   9.00    9.11  2.97  0.00 19.00 19.00 -0.01    -0.17 0.13
cognition    3 550 24.82 10.99  25.00   25.18 11.86  0.00 44.00 44.00 -0.26    -0.62 0.47
# Creating a frequency table for categorical variables:
table(Data01[c("sexMW")])

0   1
227 323 
table(Data01[c("demgroup")])

1   2   3
399 109  42 

### Syntax and Output for Empty Model in Equation 2.3

Model: $$\text{Cognition}_i = \beta_0 + e_i$$,

where $$e_i \sim N(0, \sigma^2_e)$$

# Here the formula has the name of the dependent variable in the Data01 data frame
# ~ is the formula term for =
# 1 indicates we only want the intercept estimated
Model01 = lm(formula = cognition ~ 1, data = Data01)
summary(Model01)

Call:
lm(formula = cognition ~ 1, data = Data01)

Residuals:
Min       1Q   Median       3Q      Max
-24.8218  -7.5718   0.1782   8.1782  19.1782

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  24.8218     0.4686   52.97   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 10.99 on 549 degrees of freedom

In the output:

• Call: provides the arugments given to the lm() function
• Residuals: is often disregarded but provides quantiles of the distribution of $$e_i$$ (residuals)
• Coefficients: provides the estimates of the lm() coefficients, their standard errors, a Wald-ish test and a (possibly conditional) p-value
• Residual standard error: gives an estimate of $$\sigma_e = \sqrt{\sigma^2_e}$$, the residual standard error/deviation

Additionally, we can display an ANOVA table using the summary() function:

anova(Model01)
Analysis of Variance Table

Response: cognition
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 549  66297  120.76               

Here the ANOVA table only has residuals because the only parameter estimated is the intercept. Also, note that the Mean Square Residual of 120.76 provides an estimate of $$\sigma_{e}^2$$ (and $$\sqrt{120.76} = 10.99 = \sigma_{e}$$ from above).

### R and Output for Age, Grip, and Sex (0=M, 1=W) Model in Equation 2.7

Model: $$\text{Cognition}_i = \beta_0 + \beta_1\left(\text{Age}_i - 85\right) + \beta_2 \left(\text{Grip}_i - 9 \right) + \beta_3\left(\text{SexMW}_i \right) e_i$$,

where $$e_i \sim N(0, \sigma^2_e)$$

Model02 = lm(formula = cognition ~ age85 + grip9 + sexMW, data = Data01)
summary(Model02)

Call:
lm(formula = cognition ~ age85 + grip9 + sexMW, data = Data01)

Residuals:
Min      1Q  Median      3Q     Max
-27.758  -7.093   0.102   8.219  22.742

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  26.9594     0.7389  36.487  < 2e-16 ***
age85        -0.4338     0.1325  -3.275  0.00112 **
grip9         0.5460     0.1663   3.284  0.00109 **
sexMW        -3.7988     0.9904  -3.836  0.00014 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 10.46 on 546 degrees of freedom
Multiple R-squared:  0.09917,   Adjusted R-squared:  0.09422
F-statistic: 20.04 on 3 and 546 DF,  p-value: 2.476e-12

### Model Comparison

We can use the anova() function to compare Model02 to Model01. Later in the class, this won’t be an option, but it works for now.

anova(Model01, Model02)
Analysis of Variance Table

Model 1: cognition ~ 1
Model 2: cognition ~ age85 + grip9 + sexMW
Res.Df   RSS Df Sum of Sq      F    Pr(>F)
1    549 66297
2    546 59722  3    6574.7 20.036 2.476e-12 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

### R Syntax and Output for Dementia Group Model in Equation 2.8

Model: $$\text{Cognition}_i = \beta_0 + \beta_1\left(\text{Age}_i - 85\right) + \beta_2 \left(\text{Grip}_i - 9 \right) + \beta_3\left(\text{SexMW}_i \right) + \beta_4 \text{DemNF}_i + \beta_5 \text{DemNC}_i + e_i$$,

where $$e_i \sim N(0, \sigma^2_e)$$

Model03 = lm(formula = cognition ~ age85 + grip9 + sexMW + demNF + demNC, data = Data01)
summary(Model03)

Call:
lm(formula = cognition ~ age85 + grip9 + sexMW + demNF + demNC,
data = Data01)

Residuals:
Min       1Q   Median       3Q      Max
-28.4870  -5.9566   0.1129   6.8300  20.5467

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  29.2643     0.6985  41.895  < 2e-16 ***
age85        -0.4057     0.1189  -3.412 0.000692 ***
grip9         0.6042     0.1498   4.034 6.26e-05 ***
sexMW        -3.6574     0.8914  -4.103 4.71e-05 ***
demNF        -5.7220     1.0191  -5.615 3.14e-08 ***
demNC       -16.4798     1.5228 -10.822  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.385 on 544 degrees of freedom
Multiple R-squared:  0.2773,    Adjusted R-squared:  0.2707
F-statistic: 41.75 on 5 and 544 DF,  p-value: < 2.2e-16

#### Main effect of DemNC $$\beta_5$$ =

We can use the model equation to calculate the dementia group means for predicted cognition. What model parameters would give the mean for the following groups?

#### Cognition for Current =

We can determine the differences between the dementia group means as follows:

#### Future vs. Current = Current - Future =

Here, we can use the parameters to get estimated differences, but we need more information for forming the standard errors. We will learn how to do this next week.

### Alternative Model03: Let R Code Group with factor()

We can use the factor() function to tell R we have a grouping variable, which will automatically code it for us:

Model04 = lm(formula = cognition ~ age85 + grip9 + sexMW + factor(demgroup), data = Data01)
summary(Model04)

Call:
lm(formula = cognition ~ age85 + grip9 + sexMW + factor(demgroup),
data = Data01)

Residuals:
Min       1Q   Median       3Q      Max
-28.4870  -5.9566   0.1129   6.8300  20.5467

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)        29.2643     0.6985  41.895  < 2e-16 ***
age85              -0.4057     0.1189  -3.412 0.000692 ***
grip9               0.6042     0.1498   4.034 6.26e-05 ***
sexMW              -3.6574     0.8914  -4.103 4.71e-05 ***
factor(demgroup)2  -5.7220     1.0191  -5.615 3.14e-08 ***
factor(demgroup)3 -16.4798     1.5228 -10.822  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.385 on 544 degrees of freedom
Multiple R-squared:  0.2773,    Adjusted R-squared:  0.2707
F-statistic: 41.75 on 5 and 544 DF,  p-value: < 2.2e-16

Here, the parameters are identical to what we saw in Model03. But compare the ANOVA tables:

#### Model03 ANOVA Table:

anova(Model03)
Analysis of Variance Table

Response: cognition
Df Sum Sq Mean Sq F value    Pr(>F)
age85       1   1926  1926.2  21.871 3.683e-06 ***
grip9       1   3039  3039.2  34.508 7.398e-09 ***
sexMW       1   1609  1609.3  18.273 2.260e-05 ***
demNF       1   1496  1496.1  16.988 4.350e-05 ***
demNC       1  10315 10315.2 117.124 < 2.2e-16 ***
Residuals 544  47911    88.1
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

#### Model04 ANOVA Table:

anova(Model04)
Analysis of Variance Table

Response: cognition
Df Sum Sq Mean Sq F value    Pr(>F)
age85              1   1926  1926.2  21.871 3.683e-06 ***
grip9              1   3039  3039.2  34.508 7.398e-09 ***
sexMW              1   1609  1609.3  18.273 2.260e-05 ***
factor(demgroup)   2  11811  5905.7  67.056 < 2.2e-16 ***
Residuals        544  47911    88.1
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

What is the difference?