library(dplyr)
library(ggplot2)
library(lme4)
library(nlme)
Here we present the implementation of mixed modelling with heterogenous variance components. Normally, we assume that the variance is equally distributed across stimuli, participants, timepoints… Sometimes, variance may systematically vary between conditions, and it might be useful to model this explicitly. One example might be measures taken in time after a certain event (eg, taking a drug) across a wide span: observations closer in time to the event might be more similar between each other, with variance dropping while the drug takes its effect. However, with time, the effect of the drug tend to disappear and variance may increase again. So heterogeneous variance is like an hypothesis and needs to be verified with model selection.
data <- sleepstudy
head(data)
## Reaction Days Subject lunch
## 1 249.5600 0 308 0
## 2 258.7047 1 308 0
## 3 250.8006 2 308 1
## 4 321.4398 3 308 0
## 5 356.8519 4 308 0
## 6 414.6901 5 308 1
set.seed(88)
sg <- data.frame()
sb <- unique(data$Subject)
for (i in 1:length(sb)) {
g = sample(LETTERS[c(6,13)], size = 1)
sg = rbind(sg, cbind(Subject=paste(sb[i]), G=g))
}
head(sg)
## Subject G
## 1 308 F
## 2 309 F
## 3 310 M
## 4 330 F
## 5 331 F
## 6 332 M
data_2 <- inner_join(data, sg, by="Subject")
head(data_2)
## Reaction Days Subject lunch G
## 1 249.5600 0 308 0 F
## 2 258.7047 1 308 0 F
## 3 250.8006 2 308 1 F
## 4 321.4398 3 308 0 F
## 5 356.8519 4 308 0 F
## 6 414.6901 5 308 1 F
Is variance different in the two groups?
data_2 %>%
ggplot(aes(y=Reaction, x=Days, group=Subject)) +
geom_path() +
labs(caption="No evidence of heterogeneous variance")
data_2 %>%
ggplot(aes(y=Reaction, x=Days, group=Subject, color=G)) +
geom_path() +
labs(caption="Or maybe yes?")
it looks like group F have more outliers but more consistent variation, while group M have a slightly bigger variance (more spread out).
This model assumes common variance between and within person, irrespective of the variable G. This is the default model fit by lme with the standard specification of the ranom effect with 1 grouping variable:
# Common models for between and within person variance
model.01 = lme(fixed = Reaction ~ 1,
random = ~ 1 | Subject,
data = data_2,
method = 'REML')
summary(model.01)
## Linear mixed-effects model fit by REML
## Data: data_2
## AIC BIC logLik
## 1910.327 1919.889 -952.1633
##
## Random effects:
## Formula: ~1 | Subject
## (Intercept) Residual
## StdDev: 35.75385 44.25907
##
## Fixed effects: Reaction ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 298.5079 9.049936 162 32.98453 0
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.4983313 -0.5501348 -0.1475698 0.5122894 3.3445880
##
## Number of Observations: 180
## Number of Groups: 18
VarCorr(model.01)[1]
## [1] "1278.338"
1278.338 is the common variance across the 2 groups attributable to between-persons differences. This, summed to the residual variance 1958.865, gives the total variance.
By extracting the Variance-Covariance Matrix, we may visualise the total variance on the diagonal, and the common covariance off-diagonal.
getVarCov(model.01,type="marginal")
## Subject 308
## Marginal variance covariance matrix
## 1 2 3 4 5 6 7 8 9
## 1 3237.2 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3
## 2 1278.3 3237.2 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3
## 3 1278.3 1278.3 3237.2 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3
## 4 1278.3 1278.3 1278.3 3237.2 1278.3 1278.3 1278.3 1278.3 1278.3
## 5 1278.3 1278.3 1278.3 1278.3 3237.2 1278.3 1278.3 1278.3 1278.3
## 6 1278.3 1278.3 1278.3 1278.3 1278.3 3237.2 1278.3 1278.3 1278.3
## 7 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 3237.2 1278.3 1278.3
## 8 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 3237.2 1278.3
## 9 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 3237.2
## 10 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3
## 10
## 1 1278.3
## 2 1278.3
## 3 1278.3
## 4 1278.3
## 5 1278.3
## 6 1278.3
## 7 1278.3
## 8 1278.3
## 9 1278.3
## 10 3237.2
## Standard Deviations: 56.896 56.896 56.896 56.896 56.896 56.896 56.896 56.896 56.896 56.896
The Variance-Covariance structure is the same for all subjects:
getVarCov(model.01,type="marginal",individuals = 15)
## Subject 369
## Marginal variance covariance matrix
## 1 2 3 4 5 6 7 8 9
## 1 3237.2 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3
## 2 1278.3 3237.2 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3
## 3 1278.3 1278.3 3237.2 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3
## 4 1278.3 1278.3 1278.3 3237.2 1278.3 1278.3 1278.3 1278.3 1278.3
## 5 1278.3 1278.3 1278.3 1278.3 3237.2 1278.3 1278.3 1278.3 1278.3
## 6 1278.3 1278.3 1278.3 1278.3 1278.3 3237.2 1278.3 1278.3 1278.3
## 7 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 3237.2 1278.3 1278.3
## 8 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 3237.2 1278.3
## 9 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 3237.2
## 10 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3
## 10
## 1 1278.3
## 2 1278.3
## 3 1278.3
## 4 1278.3
## 5 1278.3
## 6 1278.3
## 7 1278.3
## 8 1278.3
## 9 1278.3
## 10 3237.2
## Standard Deviations: 56.896 56.896 56.896 56.896 56.896 56.896 56.896 56.896 56.896 56.896
The variance attributed to subject is a measure of the intra-class correlation and can be converted to a correlation coefficient when divided by the total variance (in this case, ICC = 1959.865)
This kind of variance structure is also termed “compound symmetry”. The model assigns this structure by default, however, this can be explicitly specified in the random effect:
model.02 = lme(fixed = Reaction ~ 1,
random = list(Subject = pdSymm(form = ~ 1)),
data = data_2,
method = 'REML')
This gives exactly the same result with between-subject Variance = rVarCorr(model.01)[1]
. pdSymm creates a positive-definite Symmetric matrix:
pdSymm(diag(rep(x = 1, times = 3)), nam = c("A","B","C"))
## Positive definite matrix structure of class pdSymm representing
## A B C
## A 1 0 0
## B 0 1 0
## C 0 0 1
We specify a positive-definite diagonal matrix, meaning that variance changes with group between subjects, but it is common within subject (which makes sense since these people are either males or females but not both…)
model.01b = lme(fixed = Reaction ~ 1,
random = list(Subject = pdDiag(form = ~ factor(G))),
data = data_2,
method = 'REML')
summary(model.01b)
## Linear mixed-effects model fit by REML
## Data: data_2
## AIC BIC logLik
## 1912.203 1924.953 -952.1016
##
## Random effects:
## Formula: ~factor(G) | Subject
## Structure: Diagonal
## (Intercept) factor(G)M Residual
## StdDev: 33.69998 19.09395 44.25907
##
## Fixed effects: Reaction ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 297.9522 8.984658 162 33.16233 0
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.4839689 -0.5531604 -0.1335073 0.5225600 3.3425834
##
## Number of Observations: 180
## Number of Groups: 18
Here we have a differential term of Variance for G = M (the other level is taken as threshold).
VarCorr(model.01b)
## Subject = pdDiag(factor(G))
## Variance StdDev
## (Intercept) 1135.6884 33.69998
## factor(G)M 364.5791 19.09395
## Residual 1958.8652 44.25907
Let’s check the Variance-Covariance Matrix of 2 subjects pertaining to different groups; here are subject 308 (F), and 310 (M):
getVarCov(model.01b, type="marginal", individuals = c(1,3))
## Subject 308
## Marginal variance covariance matrix
## 1 2 3 4 5 6 7 8 9
## 1 3094.6 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7
## 2 1135.7 3094.6 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7
## 3 1135.7 1135.7 3094.6 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7
## 4 1135.7 1135.7 1135.7 3094.6 1135.7 1135.7 1135.7 1135.7 1135.7
## 5 1135.7 1135.7 1135.7 1135.7 3094.6 1135.7 1135.7 1135.7 1135.7
## 6 1135.7 1135.7 1135.7 1135.7 1135.7 3094.6 1135.7 1135.7 1135.7
## 7 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 3094.6 1135.7 1135.7
## 8 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 3094.6 1135.7
## 9 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 3094.6
## 10 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7
## 10
## 1 1135.7
## 2 1135.7
## 3 1135.7
## 4 1135.7
## 5 1135.7
## 6 1135.7
## 7 1135.7
## 8 1135.7
## 9 1135.7
## 10 3094.6
## Standard Deviations: 55.629 55.629 55.629 55.629 55.629 55.629 55.629 55.629 55.629 55.629
## Subject 310
## Marginal variance covariance matrix
## 1 2 3 4 5 6 7 8 9
## 1 3459.1 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3
## 2 1500.3 3459.1 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3
## 3 1500.3 1500.3 3459.1 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3
## 4 1500.3 1500.3 1500.3 3459.1 1500.3 1500.3 1500.3 1500.3 1500.3
## 5 1500.3 1500.3 1500.3 1500.3 3459.1 1500.3 1500.3 1500.3 1500.3
## 6 1500.3 1500.3 1500.3 1500.3 1500.3 3459.1 1500.3 1500.3 1500.3
## 7 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 3459.1 1500.3 1500.3
## 8 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 3459.1 1500.3
## 9 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 3459.1
## 10 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3
## 10
## 1 1500.3
## 2 1500.3
## 3 1500.3
## 4 1500.3
## 5 1500.3
## 6 1500.3
## 7 1500.3
## 8 1500.3
## 9 1500.3
## 10 3459.1
## Standard Deviations: 58.814 58.814 58.814 58.814 58.814 58.814 58.814 58.814 58.814 58.814
The diagonal total variance and off-diagonal covariance change with G by a factor of 364.5791.
Since this is a longitudinal design, variances may also change within subject. This may happen, for example, when observations that are closer in time are more similar to each other than observations occurring at a later time point. In this case, we capitalise on the between and within subject variability across multiple time points, and we assign a weight depending on the time of observation:
model.01c = lme(fixed = Reaction ~ 1,
random = list(Subject = pdSymm(form = ~ 1)),
weights = varIdent(form = ~ 1 | factor(Days)),
data = data_2,
method = 'REML')
summary(model.01c)
## Linear mixed-effects model fit by REML
## Data: data_2
## AIC BIC logLik
## 1872.486 1910.735 -924.243
##
## Random effects:
## Formula: ~1 | Subject
## (Intercept) Residual
## StdDev: 36.34266 44.26675
##
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | factor(Days)
## Parameter estimates:
## 0 1 2 3 4 5
## 1.0000000 0.7430690 0.6639201 0.2702922 0.3127369 0.8492301
## 6 7 8 9
## 1.1462055 1.0878625 1.4884096 1.8360810
## Fixed effects: Reaction ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 285.7469 8.758025 162 32.62687 0
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.6435227 -0.4410395 0.1204819 0.7318349 2.9048085
##
## Number of Observations: 180
## Number of Groups: 18
Between-subject variances:
VarCorr(model.01c)
## Subject = pdSymm(1)
## Variance StdDev
## (Intercept) 1320.789 36.34266
## Residual 1959.546 44.26675
1320.789 is the between-subject variation. However, there is also a variation within the same subject now, on the diagonal:
getVarCov(model.01c, type="marginal", individuals = c(1,3))
## Subject 308
## Marginal variance covariance matrix
## 1 2 3 4 5 6 7 8 9
## 1 3280.3 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8
## 2 1320.8 2402.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8
## 3 1320.8 1320.8 2184.5 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8
## 4 1320.8 1320.8 1320.8 1463.9 1320.8 1320.8 1320.8 1320.8 1320.8
## 5 1320.8 1320.8 1320.8 1320.8 1512.4 1320.8 1320.8 1320.8 1320.8
## 6 1320.8 1320.8 1320.8 1320.8 1320.8 2734.0 1320.8 1320.8 1320.8
## 7 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 3895.2 1320.8 1320.8
## 8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 3639.8 1320.8
## 9 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 5661.9
## 10 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8
## 10
## 1 1320.8
## 2 1320.8
## 3 1320.8
## 4 1320.8
## 5 1320.8
## 6 1320.8
## 7 1320.8
## 8 1320.8
## 9 1320.8
## 10 7926.8
## Standard Deviations: 57.274 49.018 46.739 38.262 38.89 52.288 62.412 60.331 75.246 89.033
## Subject 310
## Marginal variance covariance matrix
## 1 2 3 4 5 6 7 8 9
## 1 3280.3 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8
## 2 1320.8 2402.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8
## 3 1320.8 1320.8 2184.5 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8
## 4 1320.8 1320.8 1320.8 1463.9 1320.8 1320.8 1320.8 1320.8 1320.8
## 5 1320.8 1320.8 1320.8 1320.8 1512.4 1320.8 1320.8 1320.8 1320.8
## 6 1320.8 1320.8 1320.8 1320.8 1320.8 2734.0 1320.8 1320.8 1320.8
## 7 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 3895.2 1320.8 1320.8
## 8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 3639.8 1320.8
## 9 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 5661.9
## 10 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8
## 10
## 1 1320.8
## 2 1320.8
## 3 1320.8
## 4 1320.8
## 5 1320.8
## 6 1320.8
## 7 1320.8
## 8 1320.8
## 9 1320.8
## 10 7926.8
## Standard Deviations: 57.274 49.018 46.739 38.262 38.89 52.288 62.412 60.331 75.246 89.033
And within-subject variances by day (a bit difficult to extract…):
wss.v <- function(model) {
res.std = summary(model)$sigma #to re-scale the values
wts = coef(model$modelStruct$varStruct, unconstrained=FALSE)
res.v.g = (res.std*wts)^2
res.v = data.frame(res.v.g)
colnames(res.v) = "wss"
return(res.v)
}
wss.v(model.01c)
## wss
## 1 1081.9662
## 2 863.7479
## 3 143.1602
## 4 191.6521
## 5 1413.2083
## 6 2574.4255
## 7 2319.0142
## 8 4341.1048
## 9 6606.0071
Variances decrease until the 4th day, then increases again (probably related to correlation between observations close in time).