library(dplyr)
library(ggplot2)
library(lme4)
library(nlme)

Here we present the implementation of mixed modelling with heterogenous variance components. Normally, we assume that the variance is equally distributed across stimuli, participants, timepoints… Sometimes, variance may systematically vary between conditions, and it might be useful to model this explicitly. One example might be measures taken in time after a certain event (eg, taking a drug) across a wide span: observations closer in time to the event might be more similar between each other, with variance dropping while the drug takes its effect. However, with time, the effect of the drug tend to disappear and variance may increase again. So heterogeneous variance is like an hypothesis and needs to be verified with model selection.

Example data

data <- sleepstudy
head(data)
##   Reaction Days Subject lunch
## 1 249.5600    0     308     0
## 2 258.7047    1     308     0
## 3 250.8006    2     308     1
## 4 321.4398    3     308     0
## 5 356.8519    4     308     0
## 6 414.6901    5     308     1

Add a categorical variable

set.seed(88)

sg <- data.frame()
sb <- unique(data$Subject)

for (i in 1:length(sb)) {
  g = sample(LETTERS[c(6,13)], size = 1)
  sg = rbind(sg, cbind(Subject=paste(sb[i]), G=g))
}

head(sg)
##   Subject G
## 1     308 F
## 2     309 F
## 3     310 M
## 4     330 F
## 5     331 F
## 6     332 M
data_2 <- inner_join(data, sg, by="Subject")
head(data_2)
##   Reaction Days Subject lunch G
## 1 249.5600    0     308     0 F
## 2 258.7047    1     308     0 F
## 3 250.8006    2     308     1 F
## 4 321.4398    3     308     0 F
## 5 356.8519    4     308     0 F
## 6 414.6901    5     308     1 F

Is variance different in the two groups?

data_2 %>%
  ggplot(aes(y=Reaction, x=Days, group=Subject)) +
  geom_path() +
  labs(caption="No evidence of heterogeneous variance")

data_2 %>%
  ggplot(aes(y=Reaction, x=Days, group=Subject, color=G)) +
  geom_path() +
  labs(caption="Or maybe yes?")

it looks like group F have more outliers but more consistent variation, while group M have a slightly bigger variance (more spread out).

Variance homogeneity model

This model assumes common variance between and within person, irrespective of the variable G. This is the default model fit by lme with the standard specification of the ranom effect with 1 grouping variable:

# Common models for between and within person variance
model.01 = lme(fixed = Reaction ~ 1,  
               random = ~ 1 | Subject,
                       data = data_2,
                   method = 'REML')
summary(model.01)
## Linear mixed-effects model fit by REML
##   Data: data_2 
##        AIC      BIC    logLik
##   1910.327 1919.889 -952.1633
## 
## Random effects:
##  Formula: ~1 | Subject
##         (Intercept) Residual
## StdDev:    35.75385 44.25907
## 
## Fixed effects:  Reaction ~ 1 
##                Value Std.Error  DF  t-value p-value
## (Intercept) 298.5079  9.049936 162 32.98453       0
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.4983313 -0.5501348 -0.1475698  0.5122894  3.3445880 
## 
## Number of Observations: 180
## Number of Groups: 18
VarCorr(model.01)[1]
## [1] "1278.338"

1278.338 is the common variance across the 2 groups attributable to between-persons differences. This, summed to the residual variance 1958.865, gives the total variance.

By extracting the Variance-Covariance Matrix, we may visualise the total variance on the diagonal, and the common covariance off-diagonal.

getVarCov(model.01,type="marginal") 
## Subject 308 
## Marginal variance covariance matrix
##         1      2      3      4      5      6      7      8      9
## 1  3237.2 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3
## 2  1278.3 3237.2 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3
## 3  1278.3 1278.3 3237.2 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3
## 4  1278.3 1278.3 1278.3 3237.2 1278.3 1278.3 1278.3 1278.3 1278.3
## 5  1278.3 1278.3 1278.3 1278.3 3237.2 1278.3 1278.3 1278.3 1278.3
## 6  1278.3 1278.3 1278.3 1278.3 1278.3 3237.2 1278.3 1278.3 1278.3
## 7  1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 3237.2 1278.3 1278.3
## 8  1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 3237.2 1278.3
## 9  1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 3237.2
## 10 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3
##        10
## 1  1278.3
## 2  1278.3
## 3  1278.3
## 4  1278.3
## 5  1278.3
## 6  1278.3
## 7  1278.3
## 8  1278.3
## 9  1278.3
## 10 3237.2
##   Standard Deviations: 56.896 56.896 56.896 56.896 56.896 56.896 56.896 56.896 56.896 56.896

The Variance-Covariance structure is the same for all subjects:

getVarCov(model.01,type="marginal",individuals = 15) 
## Subject 369 
## Marginal variance covariance matrix
##         1      2      3      4      5      6      7      8      9
## 1  3237.2 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3
## 2  1278.3 3237.2 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3
## 3  1278.3 1278.3 3237.2 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3
## 4  1278.3 1278.3 1278.3 3237.2 1278.3 1278.3 1278.3 1278.3 1278.3
## 5  1278.3 1278.3 1278.3 1278.3 3237.2 1278.3 1278.3 1278.3 1278.3
## 6  1278.3 1278.3 1278.3 1278.3 1278.3 3237.2 1278.3 1278.3 1278.3
## 7  1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 3237.2 1278.3 1278.3
## 8  1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 3237.2 1278.3
## 9  1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 3237.2
## 10 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3 1278.3
##        10
## 1  1278.3
## 2  1278.3
## 3  1278.3
## 4  1278.3
## 5  1278.3
## 6  1278.3
## 7  1278.3
## 8  1278.3
## 9  1278.3
## 10 3237.2
##   Standard Deviations: 56.896 56.896 56.896 56.896 56.896 56.896 56.896 56.896 56.896 56.896

The variance attributed to subject is a measure of the intra-class correlation and can be converted to a correlation coefficient when divided by the total variance (in this case, ICC = 1959.865)

This kind of variance structure is also termed “compound symmetry”. The model assigns this structure by default, however, this can be explicitly specified in the random effect:

model.02 = lme(fixed = Reaction ~ 1,  
               random = list(Subject = pdSymm(form = ~ 1)),
                       data = data_2,
                   method = 'REML')

This gives exactly the same result with between-subject Variance = rVarCorr(model.01)[1]. pdSymm creates a positive-definite Symmetric matrix:

pdSymm(diag(rep(x = 1, times = 3)), nam = c("A","B","C"))
## Positive definite matrix structure of class pdSymm representing
##   A B C
## A 1 0 0
## B 0 1 0
## C 0 0 1

Variance heterogeneity model in between-subject residuals

We specify a positive-definite diagonal matrix, meaning that variance changes with group between subjects, but it is common within subject (which makes sense since these people are either males or females but not both…)

model.01b = lme(fixed = Reaction ~ 1,  
                random = list(Subject = pdDiag(form = ~ factor(G))),
                        data = data_2,
                method = 'REML')
summary(model.01b)
## Linear mixed-effects model fit by REML
##   Data: data_2 
##        AIC      BIC    logLik
##   1912.203 1924.953 -952.1016
## 
## Random effects:
##  Formula: ~factor(G) | Subject
##  Structure: Diagonal
##         (Intercept) factor(G)M Residual
## StdDev:    33.69998   19.09395 44.25907
## 
## Fixed effects:  Reaction ~ 1 
##                Value Std.Error  DF  t-value p-value
## (Intercept) 297.9522  8.984658 162 33.16233       0
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.4839689 -0.5531604 -0.1335073  0.5225600  3.3425834 
## 
## Number of Observations: 180
## Number of Groups: 18

Here we have a differential term of Variance for G = M (the other level is taken as threshold).

VarCorr(model.01b)
## Subject = pdDiag(factor(G)) 
##             Variance  StdDev  
## (Intercept) 1135.6884 33.69998
## factor(G)M   364.5791 19.09395
## Residual    1958.8652 44.25907

Let’s check the Variance-Covariance Matrix of 2 subjects pertaining to different groups; here are subject 308 (F), and 310 (M):

getVarCov(model.01b, type="marginal", individuals = c(1,3)) 
## Subject 308 
## Marginal variance covariance matrix
##         1      2      3      4      5      6      7      8      9
## 1  3094.6 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7
## 2  1135.7 3094.6 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7
## 3  1135.7 1135.7 3094.6 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7
## 4  1135.7 1135.7 1135.7 3094.6 1135.7 1135.7 1135.7 1135.7 1135.7
## 5  1135.7 1135.7 1135.7 1135.7 3094.6 1135.7 1135.7 1135.7 1135.7
## 6  1135.7 1135.7 1135.7 1135.7 1135.7 3094.6 1135.7 1135.7 1135.7
## 7  1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 3094.6 1135.7 1135.7
## 8  1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 3094.6 1135.7
## 9  1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 3094.6
## 10 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7 1135.7
##        10
## 1  1135.7
## 2  1135.7
## 3  1135.7
## 4  1135.7
## 5  1135.7
## 6  1135.7
## 7  1135.7
## 8  1135.7
## 9  1135.7
## 10 3094.6
##   Standard Deviations: 55.629 55.629 55.629 55.629 55.629 55.629 55.629 55.629 55.629 55.629 
## Subject 310 
## Marginal variance covariance matrix
##         1      2      3      4      5      6      7      8      9
## 1  3459.1 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3
## 2  1500.3 3459.1 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3
## 3  1500.3 1500.3 3459.1 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3
## 4  1500.3 1500.3 1500.3 3459.1 1500.3 1500.3 1500.3 1500.3 1500.3
## 5  1500.3 1500.3 1500.3 1500.3 3459.1 1500.3 1500.3 1500.3 1500.3
## 6  1500.3 1500.3 1500.3 1500.3 1500.3 3459.1 1500.3 1500.3 1500.3
## 7  1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 3459.1 1500.3 1500.3
## 8  1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 3459.1 1500.3
## 9  1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 3459.1
## 10 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3 1500.3
##        10
## 1  1500.3
## 2  1500.3
## 3  1500.3
## 4  1500.3
## 5  1500.3
## 6  1500.3
## 7  1500.3
## 8  1500.3
## 9  1500.3
## 10 3459.1
##   Standard Deviations: 58.814 58.814 58.814 58.814 58.814 58.814 58.814 58.814 58.814 58.814

The diagonal total variance and off-diagonal covariance change with G by a factor of 364.5791.

Variance heterogeneity model in between- and within-subject residuals

Since this is a longitudinal design, variances may also change within subject. This may happen, for example, when observations that are closer in time are more similar to each other than observations occurring at a later time point. In this case, we capitalise on the between and within subject variability across multiple time points, and we assign a weight depending on the time of observation:

model.01c = lme(fixed = Reaction ~ 1,  
                random = list(Subject = pdSymm(form = ~ 1)),
                weights = varIdent(form = ~ 1 | factor(Days)),
                        data = data_2,
                    method = 'REML')
summary(model.01c)
## Linear mixed-effects model fit by REML
##   Data: data_2 
##        AIC      BIC   logLik
##   1872.486 1910.735 -924.243
## 
## Random effects:
##  Formula: ~1 | Subject
##         (Intercept) Residual
## StdDev:    36.34266 44.26675
## 
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | factor(Days) 
##  Parameter estimates:
##         0         1         2         3         4         5 
## 1.0000000 0.7430690 0.6639201 0.2702922 0.3127369 0.8492301 
##         6         7         8         9 
## 1.1462055 1.0878625 1.4884096 1.8360810 
## Fixed effects:  Reaction ~ 1 
##                Value Std.Error  DF  t-value p-value
## (Intercept) 285.7469  8.758025 162 32.62687       0
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.6435227 -0.4410395  0.1204819  0.7318349  2.9048085 
## 
## Number of Observations: 180
## Number of Groups: 18

Between-subject variances:

VarCorr(model.01c)
## Subject = pdSymm(1) 
##             Variance StdDev  
## (Intercept) 1320.789 36.34266
## Residual    1959.546 44.26675

1320.789 is the between-subject variation. However, there is also a variation within the same subject now, on the diagonal:

getVarCov(model.01c, type="marginal", individuals = c(1,3)) 
## Subject 308 
## Marginal variance covariance matrix
##         1      2      3      4      5      6      7      8      9
## 1  3280.3 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8
## 2  1320.8 2402.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8
## 3  1320.8 1320.8 2184.5 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8
## 4  1320.8 1320.8 1320.8 1463.9 1320.8 1320.8 1320.8 1320.8 1320.8
## 5  1320.8 1320.8 1320.8 1320.8 1512.4 1320.8 1320.8 1320.8 1320.8
## 6  1320.8 1320.8 1320.8 1320.8 1320.8 2734.0 1320.8 1320.8 1320.8
## 7  1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 3895.2 1320.8 1320.8
## 8  1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 3639.8 1320.8
## 9  1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 5661.9
## 10 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8
##        10
## 1  1320.8
## 2  1320.8
## 3  1320.8
## 4  1320.8
## 5  1320.8
## 6  1320.8
## 7  1320.8
## 8  1320.8
## 9  1320.8
## 10 7926.8
##   Standard Deviations: 57.274 49.018 46.739 38.262 38.89 52.288 62.412 60.331 75.246 89.033 
## Subject 310 
## Marginal variance covariance matrix
##         1      2      3      4      5      6      7      8      9
## 1  3280.3 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8
## 2  1320.8 2402.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8
## 3  1320.8 1320.8 2184.5 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8
## 4  1320.8 1320.8 1320.8 1463.9 1320.8 1320.8 1320.8 1320.8 1320.8
## 5  1320.8 1320.8 1320.8 1320.8 1512.4 1320.8 1320.8 1320.8 1320.8
## 6  1320.8 1320.8 1320.8 1320.8 1320.8 2734.0 1320.8 1320.8 1320.8
## 7  1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 3895.2 1320.8 1320.8
## 8  1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 3639.8 1320.8
## 9  1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 5661.9
## 10 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8 1320.8
##        10
## 1  1320.8
## 2  1320.8
## 3  1320.8
## 4  1320.8
## 5  1320.8
## 6  1320.8
## 7  1320.8
## 8  1320.8
## 9  1320.8
## 10 7926.8
##   Standard Deviations: 57.274 49.018 46.739 38.262 38.89 52.288 62.412 60.331 75.246 89.033

And within-subject variances by day (a bit difficult to extract…):

wss.v <- function(model) {
  res.std = summary(model)$sigma #to re-scale the values
  wts = coef(model$modelStruct$varStruct, unconstrained=FALSE)
  res.v.g = (res.std*wts)^2
  res.v = data.frame(res.v.g)
  colnames(res.v) = "wss"
  return(res.v)
}

wss.v(model.01c)
##         wss
## 1 1081.9662
## 2  863.7479
## 3  143.1602
## 4  191.6521
## 5 1413.2083
## 6 2574.4255
## 7 2319.0142
## 8 4341.1048
## 9 6606.0071

Variances decrease until the 4th day, then increases again (probably related to correlation between observations close in time).