Chapter 5: Treating time more flexibly • alda

library(alda)
library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
library(ggplot2)
library(lme4)
library(broom.mixed)
library(modelbased)
library(modelsummary)
library(gt)

5.1 Variably spaced measurement occasions

In Section 5.1 Singer and Willett (2003) demonstrate how you can fit the multilevel model for change for data with variably spaced measurement occasions using a subset of data from the Children of the National Longitudinal Study of Youth (US Bureau of Labor and Statistics), which measured changes in the reading subtest of the Peabody Individual Achievement Test (PIAT) in a sample of 89 African-American children across three waves around the ages of 6, 8, and 10.

For this example we use the reading_scores data set, a person-period data frame with 267 rows and 5 columns:

id: Child ID.
wave: Wave of measurement.
age_group: Expected age on each measurement occasion.
age: Age in years at time of measurement.
reading_score: Reading score on the reading subtest of the Peabody Individual Achievement Test (PIAT).

# Table 5.1, page 141:
reading_scores
#> # A tibble: 267 × 5
#>    id     wave age_group   age reading_score
#>    <fct> <dbl>     <dbl> <dbl>         <dbl>
#>  1 1         1       6.5  6               18
#>  2 1         2       8.5  8.33            35
#>  3 1         3      10.5 10.3             59
#>  4 2         1       6.5  6               18
#>  5 2         2       8.5  8.5             25
#>  6 2         3      10.5 10.6             28
#>  7 3         1       6.5  6.08            18
#>  8 3         2       8.5  8.42            23
#>  9 3         3      10.5 10.4             32
#> 10 4         1       6.5  6               18
#> # ℹ 257 more rows

Note that the structure of the reading_scores data is identical to the person-period data sets shown in previous chapters, except that it has three time-indicator variables:

The values of wave reflect the study’s design; they are time-structured across children, but have little substantive meaning.
The values of age_group reflect the child’s expected age on each measurement occasion; they are time-structured across children and have substantive meaning.
The values of age reflect the child’s actual age on each measurement occasion; they are variably spaced across children and have substantive meaning.

This demonstrates a distinctive feature of time-unstructured data sets—the possibility to have multiple representations of time. Thus, from the perspective of the age_group variable, the reading_scores data appears to be time-structured:

select(reading_scores, id, age_group, reading_score)
#> # A tibble: 267 × 3
#>    id    age_group reading_score
#>    <fct>     <dbl>         <dbl>
#>  1 1           6.5            18
#>  2 1           8.5            35
#>  3 1          10.5            59
#>  4 2           6.5            18
#>  5 2           8.5            25
#>  6 2          10.5            28
#>  7 3           6.5            18
#>  8 3           8.5            23
#>  9 3          10.5            32
#> 10 4           6.5            18
#> # ℹ 257 more rows

Whereas from the perspective of the age variable, the reading_scores data appears to be variably spaced:

select(reading_scores, id, age, reading_score)
#> # A tibble: 267 × 3
#>    id      age reading_score
#>    <fct> <dbl>         <dbl>
#>  1 1      6               18
#>  2 1      8.33            35
#>  3 1     10.3             59
#>  4 2      6               18
#>  5 2      8.5             25
#>  6 2     10.6             28
#>  7 3      6.08            18
#>  8 3      8.42            23
#>  9 3     10.4             32
#> 10 4      6               18
#> # ℹ 257 more rows

However, as Singer and Willett (2003) discuss, the specification, estimation, and interpretation of the multilevel model for change proceeds in the exact same way regardless of which temporal representation we use; thus, it is generally preferable to use the most accurate unstructured temporal representation rather forcing the data into a time-structured design.

Here we will fit an unconditional growth model using both the structured and unstructured temporal representations to demonstrate why the latter is generally preferable. As usual, we begin by inspecting empirical growth plots to help select the functional form for the level-1 submodel.

# Figure 5.1, page 143:
reading_scores |>
  filter(id %in% c(4, 27, 31, 33, 41, 49, 69, 77, 87)) |>
  pivot_longer(
    starts_with("age"), names_to = "time_indicator", values_to = "age"
  ) |>
  ggplot(aes(x = age, y = reading_score, colour = time_indicator)) +
    geom_point() +
    stat_smooth(method = "lm", se = FALSE, linewidth = .5) +
    scale_x_continuous(breaks = 5:12) +
    scale_color_brewer(palette = "Dark2") +
    coord_cartesian(xlim = c(5, 12), ylim = c(0, 80)) +
    facet_wrap(vars(id), labeller = label_both)

A linear change individual growth model seems most parsimonious for both temporal representations. Following Singer and Willett (2003), we will centre both age_group and age on age 6.5 (the average child’s age at wave 1) so that the parameters of both models have identical interpretations, and label time in each model with a generic time variable.

reading_scores_fits <- map(
  list(age_group = "age_group", age = "age"),
  \(.time) {
    lmer(
      reading_score ~ I(time - 6.5) + (1 + I(time - 6.5) | id),
      data = mutate(reading_scores, time = .data[[.time]]),
      REML = FALSE
    )
  }
)

options(modelsummary_get = "all")

# Table 5.2, page 145:
reading_scores_fits |>
  modelsummary(
    shape = term + effect + statistic ~ model,
    scales = c("vcov", NA),
    coef_map = c(
      "(Intercept)",
      "I(time - 6.5)",
      "var__Observation",
      "var__(Intercept)",
      "var__I(time - 6.5)"
    ),
    gof_map = tibble(
      raw = c("deviance", "AIC", "BIC"),
      clean = c("Deviance", "AIC", "BIC"),
      fmt = 1
    ),
    output = "gt"
  ) |>
  tab_row_group(label = "Goodness-of-Fit", rows = 8:10) |>
  tab_row_group(label = "Variance Components", rows = 5:7) |>
  tab_row_group(label = "Fixed Effects", rows = 1:4) |>
  cols_hide(effect)

	age_group	age
Fixed Effects
(Intercept)	21.163	21.061
	(0.614)	(0.559)
I(time - 6.5)	5.031	4.540
	(0.296)	(0.261)
Variance Components
var__Observation	27.043	27.447
var__(Intercept)	11.046	5.107
var__I(time - 6.5)	4.397	3.301
Goodness-of-Fit
Deviance	1819.9	1803.9
AIC	1831.9	1815.9
BIC	1853.5	1837.4

Comparing these models, we see that the age model fits the data better than the age_group model—with less unexplained variation in initial status and rates of change, and smaller AIC and BIC statistics.

5.2 Varying numbers of measurement occasions

In Section 5.2 Singer and Willett (2003) demonstrate how you can fit the multilevel model for change for data with varying numbers of measurement occasions (i.e., unbalanced data) using a subset of data from the National Longitudinal Study of Youth tracking the labour market experiences of male high school dropouts (Murnane, Boudett, & Willett, 1999).

For this example we use the dropout_wages data set, a person-period data frame with 6402 rows and 9 columns:

id: Participant ID.
log_wages: Natural logarithm of wages.
experience: Labour force experience in years, tracked from dropouts’ first day of work.
ged: Binary indicator for whether the dropout obtained a GED.
postsecondary_education: Binary indicator for whether the dropout obtained post-secondary education.
black: Binary indicator for whether the dropout is black.
hispanic: Binary indicator for whether the dropout is hispanic.
highest_grade: Highest grade completed.
unemployment_rate: Unemployment rate in the local geographic area.

dropout_wages
#> # A tibble: 6,402 × 9
#>    id    log_wages experience   ged postsecondary_education black hispanic
#>    <fct>     <dbl>      <dbl> <dbl>                   <dbl> <dbl>    <dbl>
#>  1 31         1.49      0.015     1                   0.015     0        1
#>  2 31         1.43      0.715     1                   0.715     0        1
#>  3 31         1.47      1.73      1                   1.73      0        1
#>  4 31         1.75      2.77      1                   2.77      0        1
#>  5 31         1.93      3.93      1                   3.93      0        1
#>  6 31         1.71      4.95      1                   4.95      0        1
#>  7 31         2.09      5.96      1                   5.96      0        1
#>  8 31         2.13      6.98      1                   6.98      0        1
#>  9 36         1.98      0.315     1                   0.315     0        0
#> 10 36         1.80      0.983     1                   0.983     0        0
#> # ℹ 6,392 more rows
#> # ℹ 2 more variables: highest_grade <dbl>, unemployment_rate <dbl>

In the dropout_wages data, the number of measurement occasions varies widely across individuals, from 1 to 13 waves.

dropout_wages |>
  group_by(id) |>
  summarise(waves = n()) |>
  count(waves, name = "count")
#> # A tibble: 13 × 2
#>    waves count
#>    <int> <int>
#>  1     1    38
#>  2     2    39
#>  3     3    47
#>  4     4    35
#>  5     5    74
#>  6     6    92
#>  7     7   103
#>  8     8   123
#>  9     9   127
#> 10    10   113
#> 11    11    65
#> 12    12    26
#> 13    13     6

Indeed, examining the data from a subset of individuals, we can see that the dropout_wages data varies in both the number and spacing of measurement occasions.

# Table 5.3, page 147:
dropout_wages |>
  filter(id %in% c(206, 332, 1028)) |>
  select(id, experience, log_wages, black, highest_grade, unemployment_rate)
#> # A tibble: 20 × 6
#>    id    experience log_wages black highest_grade unemployment_rate
#>    <fct>      <dbl>     <dbl> <dbl>         <dbl>             <dbl>
#>  1 206        1.87      2.03      0            10              9.2 
#>  2 206        2.81      2.30      0            10             11   
#>  3 206        4.31      2.48      0            10              6.30
#>  4 332        0.125     1.63      0             8              7.1 
#>  5 332        1.62      1.48      0             8              9.6 
#>  6 332        2.41      1.80      0             8              7.2 
#>  7 332        3.39      1.44      0             8              6.20
#>  8 332        4.47      1.75      0             8              5.60
#>  9 332        5.18      1.53      0             8              4.60
#> 10 332        6.08      2.04      0             8              4.30
#> 11 332        7.04      2.18      0             8              3.40
#> 12 332        8.20      2.19      0             8              4.39
#> 13 332        9.09      4.04      0             8              6.70
#> 14 1028       0.004     0.872     1             8              9.3 
#> 15 1028       0.035     0.903     1             8              7.4 
#> 16 1028       0.515     1.39      1             8              7.3 
#> 17 1028       1.48      2.32      1             8              7.4 
#> 18 1028       2.14      1.48      1             8              6.30
#> 19 1028       3.16      1.70      1             8              5.90
#> 20 1028       4.10      2.34      1             8              6.9

Yet, as Singer and Willett (2003) discuss, a major advantage of the multilevel model for change is that it can easily fit to unbalanced data like this—as long as the person-period data set includes enough people with enough waves of data for the model converge, the analyses can proceed as usual.

Here we will fit three models to the dropout_wages data: an unconditional growth model (Model A), and two models that include predictors for race and the highest grade completed (Models B and C).

# Fit models ------------------------------------------------------------------
dropout_wages_fit_A <- lmer(
  log_wages ~ experience + (1 + experience | id),
  data = dropout_wages,
  REML = FALSE
)

dropout_wages_fit_B <- update(
  dropout_wages_fit_A,
  . ~ . + experience * I(highest_grade - 9) + experience * black
)

# The model fails to converge with the default optimizer (although the estimates
# are fine). Changing the optimizer achieves convergence.
dropout_wages_fit_C <- update(
  dropout_wages_fit_B,
  . ~ . - experience:I(highest_grade - 9) - black,
  control = lmerControl(optimizer = "bobyqa")
)

dropout_wages_fits <- list(
  `Model A` = dropout_wages_fit_A,
  `Model B` = dropout_wages_fit_B,
  `Model C` = dropout_wages_fit_C
)

# Make table ------------------------------------------------------------------

# Table 5.4, page 149:
dropout_wages_fits |>
  modelsummary(
    shape = term + effect + statistic ~ model,
    scales = c("vcov", NA),
    coef_map = c(
      "(Intercept)",
      "I(highest_grade - 9)",
      "black",
      "experience",
      "experience:I(highest_grade - 9)",
      "experience:black",
      "var__Observation",
      "var__(Intercept)",
      "var__experience"
    ),
    gof_map = tibble(
      raw = c("deviance", "AIC", "BIC"),
      clean = c("Deviance", "AIC", "BIC"),
      fmt = 1
    ),
    output = "gt"
  ) |>
  tab_row_group(label = "Goodness-of-Fit", rows = 15:18) |>
  tab_row_group(label = "Variance Components", rows = 13:15) |>
  tab_row_group(label = "Fixed Effects", rows = 1:12) |>
  cols_hide(effect)

	Model A	Model B	Model C
Fixed Effects
(Intercept)	1.716	1.717	1.721
	(0.011)	(0.013)	(0.011)
I(highest_grade - 9)		0.035	0.038
		(0.008)	(0.006)
black		0.015
		(0.024)
experience	0.046	0.049	0.049
	(0.002)	(0.003)	(0.003)
experience:I(highest_grade - 9)		0.001
		(0.002)
experience:black		-0.018	-0.016
		(0.005)	(0.005)
Variance Components
var__Observation	0.095	0.095	0.095
var__(Intercept)	0.054	0.052	0.052
var__experience	0.002	0.002	0.002
Goodness-of-Fit
Deviance	4921.4	4873.8	4874.7
AIC	4933.4	4893.8	4890.7
BIC	4974.0	4961.4	4944.8

Likewise, even for data with varying numbers of measurement occasions, prototypical change trajectories can be derived from the model as usual.

prototypical_dropout_wages <- dropout_wages_fit_C |>
  estimate_prediction(
    data = crossing(
      experience = c(0, 12),
      highest_grade = c(0, 3) + 9,
      black = c(FALSE, TRUE)
    )
  ) |>
  rename(log_wages = Predicted) |>
  mutate(highest_grade = factor(highest_grade)) |>
  as_tibble()

# Figure 5.2, page 150:
ggplot(prototypical_dropout_wages, aes(x = experience, y = log_wages)) +
  geom_line(aes(colour = highest_grade, linetype =  black)) +
  scale_x_continuous(breaks = seq(0, 12, by = 2)) +
  scale_color_brewer(palette = "Dark2") +
  scale_linetype_manual(values = c(2, 1)) +
  coord_cartesian(ylim = c(1.6, 2.4))

5.2.2 Practical problems that may arise when analyzing unbalanced data sets

The multilevel model may fail to converge or be unable to estimate one or more variance components for data sets that are severely unbalanced, or if too few people have enough waves of data. In Section 5.2.2 Singer and Willett (2003) discuss two strategies for addressing these problems:

Removing boundary constraints, where the software is permitted to obtain negative variance components.
Fixing rates of change, where the model is simplified by removing the varying slope change.

For this example we use a subset of the dropout_wages data purposefully constructed to be severely unbalanced.

dropout_wages_subset
#> # A tibble: 257 × 5
#>    id    log_wages experience black highest_grade
#>    <fct>     <dbl>      <dbl> <dbl>         <dbl>
#>  1 206        2.03      1.87      0            10
#>  2 206        2.30      2.81      0            10
#>  3 206        2.48      4.31      0            10
#>  4 266        1.81      0.322     0             9
#>  5 304        1.84      0.58      0             8
#>  6 329        1.42      0.016     0             8
#>  7 329        1.31      0.716     0             8
#>  8 329        1.88      1.76      0             8
#>  9 336        1.89      1.91      1             8
#> 10 336        1.28      2.51      1             8
#> # ℹ 247 more rows

First we will refit Model C to the dropout_wages_subset data. Note that the estimated variance component for experience is practically zero and the model summary has the following message at the bottom: “boundary (singular) fit: see help(‘isSingular’)”.

dropout_wages_fit_A_subset <- update(
  dropout_wages_fit_C,
  data = dropout_wages_subset
)

summary(dropout_wages_fit_A_subset)
#> Linear mixed model fit by maximum likelihood  ['lmerMod']
#> Formula: log_wages ~ experience + (1 + experience | id) + I(highest_grade -  
#>     9) + experience:black
#>    Data: dropout_wages_subset
#> Control: lmerControl(optimizer = "bobyqa")
#> 
#>      AIC      BIC   logLik deviance df.resid 
#>    299.9    328.3   -141.9    283.9      249 
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -2.4109 -0.4754 -0.0290  0.4243  4.2842 
#> 
#> Random effects:
#>  Groups   Name        Variance  Std.Dev. Corr
#>  id       (Intercept) 8.215e-02 0.286615     
#>           experience  3.526e-06 0.001878 1.00
#>  Residual             1.150e-01 0.339068     
#> Number of obs: 257, groups:  id, 124
#> 
#> Fixed effects:
#>                      Estimate Std. Error t value
#> (Intercept)           1.73734    0.04760  36.499
#> experience            0.05161    0.02108   2.449
#> I(highest_grade - 9)  0.04610    0.02447   1.884
#> experience:black     -0.05968    0.03477  -1.716
#> 
#> Correlation of Fixed Effects:
#>             (Intr) exprnc I(_-9)
#> experience  -0.612              
#> I(hghst_-9)  0.051 -0.133       
#> exprnc:blck -0.129 -0.297  0.023
#> optimizer (bobyqa) convergence code: 0 (OK)
#> boundary (singular) fit: see help('isSingular')

The first strategy Singer and Willett (2003) suggest is to remove the boundary constraints of the software, however, the lme4 package does not support the removal of boundary constraints to allow for negative variance components, so this strategy cannot be replicated (Model B).

The second strategy is to simplify the model by fixing rates of change, which we do by removing the varying slope for experience. Here the model fits without issue.

dropout_wages_fit_C_subset <- update(
  dropout_wages_fit_A_subset,
  . ~ . - (1 + experience | id) + (1 | id)
)

summary(dropout_wages_fit_C_subset)
#> Linear mixed model fit by maximum likelihood  ['lmerMod']
#> Formula: 
#> log_wages ~ experience + I(highest_grade - 9) + (1 | id) + experience:black
#>    Data: dropout_wages_subset
#> Control: lmerControl(optimizer = "bobyqa")
#> 
#>      AIC      BIC   logLik deviance df.resid 
#>    295.9    317.2   -141.9    283.9      251 
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -2.4202 -0.4722 -0.0290  0.4197  4.2439 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  id       (Intercept) 0.08425  0.2903  
#>  Residual             0.11480  0.3388  
#> Number of obs: 257, groups:  id, 124
#> 
#> Fixed effects:
#>                      Estimate Std. Error t value
#> (Intercept)           1.73734    0.04775  36.383
#> experience            0.05178    0.02093   2.474
#> I(highest_grade - 9)  0.04576    0.02450   1.868
#> experience:black     -0.06007    0.03458  -1.737
#> 
#> Correlation of Fixed Effects:
#>             (Intr) exprnc I(_-9)
#> experience  -0.614              
#> I(hghst_-9)  0.051 -0.135       
#> exprnc:blck -0.130 -0.294  0.024

Comparing Models A and C, note that their deviance statistics are identical, and that the AIC and BIC statistics are smaller in Model C, suggesting that: (1) Model C is an improvement over Model A; and (2) we cannot effectively model systematic interindividual differences in rates of change with this data set.

dropout_wages_fits_subset <- list(
  `Model A` = dropout_wages_fit_A_subset,
  `Model C` = dropout_wages_fit_C_subset
)

# Table 5.5, page 154:
dropout_wages_fits_subset |>
  modelsummary(
    shape = term + effect + statistic ~ model,
    scales = c("vcov", NA),
    coef_map = c(
      "(Intercept)",
      "I(highest_grade - 9)",
      "black",
      "experience",
      "experience:I(highest_grade - 9)",
      "experience:black",
      "var__Observation",
      "var__(Intercept)",
      "var__experience"
    ),
    gof_map = tibble(
      raw = c("deviance", "AIC", "BIC"),
      clean = c("Deviance", "AIC", "BIC"),
      fmt = 1
    ),
    output = "gt"
  ) |>
  tab_row_group(label = "Goodness-of-Fit", rows = 12:14) |>
  tab_row_group(label = "Variance Components", rows = 9:11) |>
  tab_row_group(label = "Fixed Effects", rows = 1:8) |>
  cols_hide(effect)

	Model A	Model C
Fixed Effects
(Intercept)	1.737	1.737
	(0.048)	(0.048)
I(highest_grade - 9)	0.046	0.046
	(0.024)	(0.024)
experience	0.052	0.052
	(0.021)	(0.021)
experience:black	-0.060	-0.060
	(0.035)	(0.035)
Variance Components
var__Observation	0.115	0.115
var__(Intercept)	0.082	0.084
var__experience	0.000
Goodness-of-Fit
Deviance	283.9	283.9
AIC	299.9	295.9
BIC	328.3	317.2

5.3 Time-varying predictors

In Section 5.3 Singer and Willett (2003) demonstrate how to fit the multilevel model for change for data with time-varying predictors using a subset of data from Ginexi, Howe, and Caplan (2000), who measured changes in depressive symptoms after job loss in a sample of 254 recently unemployed men and women. Interviews were conducted in three waves at around 1, 5, and 12 months after job loss.

For this example we use the depression_unemployment data set, a person-period data frame with 674 rows and 5 columns:

id: Participant ID.
interview: Time of interview. months: Months since job loss.
depression: Total score on the Center for Epidemiologic Studies’ Depression (CES-D) scale (Radloff, 1977).
unemployed: Binary indicator for whether the participant was unemployed at time of interview. Note that all participants were unemployed at the first interview, and changes in unemployment status were gathered during the second and third interviews.

depression_unemployment
#> # A tibble: 674 × 5
#>    id    interview months depression unemployed
#>    <fct>     <dbl>  <dbl>      <dbl>      <dbl>
#>  1 103           1  1.15          25          1
#>  2 103           2  5.95          16          1
#>  3 103           3 12.9           33          1
#>  4 641           1  0.789         27          1
#>  5 641           2  4.86           7          0
#>  6 641           3 11.8           25          0
#>  7 741           1  1.05          40          1
#>  8 846           1  0.624          2          1
#>  9 846           2  4.93          22          1
#> 10 846           3 11.8            0          0
#> # ℹ 664 more rows

In the depression_unemployment data, both the number and spacing of measurement occasions varies across individuals.

A total of 193 participants (76%) had three interviews, 34 participants (13.4%) had two interviews, and 27 participants (10.6%) had only one interview.

depression_unemployment |>
  group_by(id) |>
  summarise(waves = n()) |>
  count(waves, name = "count") |>
  mutate(proportion = count / sum(count))
#> # A tibble: 3 × 3
#>   waves count proportion
#>   <int> <int>      <dbl>
#> 1     1    27      0.106
#> 2     2    34      0.134
#> 3     3   193      0.760

The average time between job loss and the first interview was 27.6 days (SD = 10.7; range = 2-61), 151 days for the second interview (SD = 18.3; range = 111-220), and 359 days for the third interview (SD = 19.1; range = 319-458).

depression_unemployment |>
  group_by(interview) |>
  mutate(days = months * 30.4167) |>
  summarise(
    mean = mean(days),
    sd = sd(days),
    min = min(days),
    max = max(days)
  )
#> # A tibble: 3 × 5
#>   interview  mean    sd    min   max
#>       <dbl> <dbl> <dbl>  <dbl> <dbl>
#> 1         1  27.6  10.7   2.00  61.0
#> 2         2 151.   18.4 111.   220. 
#> 3         3 359.   19.1 319.   458.

Additionally, examining the data for a subset of individuals, we can see that the unemployed variable is a time-varying predictor with several unique patterns of change across participants.

# Table 5.6, page 161:
filter(depression_unemployment, id %in% c(7589, 55697, 67641, 65441, 53782))
#> # A tibble: 14 × 5
#>    id    interview months depression unemployed
#>    <fct>     <dbl>  <dbl>      <dbl>      <dbl>
#>  1 7589          1  1.31          36          1
#>  2 7589          2  5.09          40          1
#>  3 7589          3 11.8           39          1
#>  4 53782         1  0.427         22          1
#>  5 53782         2  4.24          15          0
#>  6 53782         3 11.1           21          1
#>  7 55697         1  1.35           7          1
#>  8 55697         2  5.78           4          1
#>  9 65441         1  1.08          27          1
#> 10 65441         2  4.70          15          1
#> 11 65441         3 11.3            7          0
#> 12 67641         1  0.329         32          1
#> 13 67641         2  4.11           9          0
#> 14 67641         3 10.9           10          0

Considering only participants with complete data, 78 were unemployed at every interview (pattern 1-1-1), 55 were always employed after the first interview (pattern 1-0-0), 41 were still unemployed at the second interview but employed at the third (pattern 1-1-0), and 19 were employed at the second interview but unemployed at the third (pattern 1-0-1).

unemployed_patterns <- depression_unemployment |>
  group_by(id) |>
  filter(n() == 3) |>
  summarise(unemployed_pattern = paste(unemployed, collapse = "-")) |>
  count(unemployed_pattern, name = "count")

unemployed_patterns
#> # A tibble: 4 × 2
#>   unemployed_pattern count
#>   <chr>              <int>
#> 1 1-0-0                 55
#> 2 1-0-1                 19
#> 3 1-1-0                 41
#> 4 1-1-1                 78

As with the previous examples, no special strategies are needed to fit the multilevel model for change with time-varying predictors. However, as Singer and Willett (2003) discuss, the inclusion of time-varying predictors in a model implies the existence of multiple continuous and discontinuous change trajectories—one for each possible pattern of the time-varying predictors.

Here we will fit four models to the depression_unemployment data: an unconditional growth model (Model A), a model that includes the main effect of a time-varying predictor (Model B), a model that includes an interaction effect with a time-varying predictor (Model C), and a model that allows a time-varying predictor to have both fixed and random effects (Model D).

Note that for Model D, Singer and Willett (2003) fit this model using SAS, which does not report any issues with the model given the data; however, other programs (R, MPlus, SPSS, STATA) all have convergence/singularity problems and it is not possible to get results that match the textbook. Each of these programs react differently to this situation, but it is reasonable to conclude the problem is not with the software, but with this model being too complex, given the data.

# Fit models ------------------------------------------------------------------
depression_unemployment_fit_A <- lmer(
  depression ~ months + (1 + months | id),
  data = depression_unemployment,
  REML = FALSE
)

# The model fails to converge with the default optimizer (although the
# estimates are fine). Changing the optimizer achieves convergence.
depression_unemployment_fit_B <- update(
  depression_unemployment_fit_A,
  . ~ . + unemployed,
  control = lmerControl(optimizer = "bobyqa")
)

depression_unemployment_fit_C <- update(
  depression_unemployment_fit_B,
  . ~ . + months:unemployed
)

# The number of observations is less than the number of random effects levels
# for each term, which makes the random effects variances (probably)
# unidentifiable in this model and throws an error. In order to fit the model
# we need to ignore this check.
depression_unemployment_fit_D <- lmer(
  depression ~ 
    unemployed + unemployed:months + (1 + unemployed + months:unemployed | id),
  data = depression_unemployment,
  REML = FALSE,
  control = lmerControl(check.nobs.vs.nRE = "ignore")
)

depression_unemployment_fits <- list(
  `Model A` = depression_unemployment_fit_A,
  `Model B` = depression_unemployment_fit_B,
  `Model C` = depression_unemployment_fit_C,
  `Model D` = depression_unemployment_fit_D
)

# Make table ------------------------------------------------------------------

# Table 5.7, page 163:
depression_unemployment_fits |>
  modelsummary(
    shape = term + effect + statistic ~ model,
    scales = c("vcov", NA),
    coef_map = c(
      "(Intercept)" = "(Intercept)",
      "months" = "months",
      "black" = "black",
      "unemployed" = "unemployed",
      "months:unemployed" = "months:unemployed",
      "unemployed:months" = "months:unemployed",
      "var__Observation" = "var__Observation",
      "var__(Intercept)" = "var__(Intercept)",
      "var__months" = "var__months",
      "var__unemployed" = "var__unemployed",
      "var__unemployed:months" = "var__unemployed:months"
    ),
    gof_map = tibble(
      raw = c("deviance", "AIC", "BIC"),
      clean = c("Deviance", "AIC", "BIC"),
      fmt = 1
    ),
    output = "gt"
  ) |>
  tab_row_group(label = "Goodness-of-Fit", rows = 14:16) |>
  tab_row_group(label = "Variance Components", rows = 9:13) |>
  tab_row_group(label = "Fixed Effects", rows = 1:8) |>
  cols_hide(effect)

	Model A	Model B	Model C	Model D
Fixed Effects
(Intercept)	17.669	12.666	9.617	11.195
	(0.776)	(1.242)	(1.889)	(0.790)
months	-0.422	-0.202	0.162
	(0.083)	(0.093)	(0.194)
unemployed		5.111	8.529	6.927
		(0.989)	(1.878)	(0.930)
months:unemployed			-0.465	-0.303
			(0.217)	(0.112)
Variance Components
var__Observation	68.848	62.388	62.031	59.019
var__(Intercept)	86.852	93.519	93.713	45.257
var__months	0.355	0.465	0.451
var__unemployed				44.969
var__unemployed:months				0.753
Goodness-of-Fit
Deviance	5133.1	5107.6	5103.0	5095.3
AIC	5145.1	5121.6	5119.0	5115.3
BIC	5172.2	5153.2	5155.2	5160.4

Plotting discontinuous change trajectories

Unlike previous examples, the addition of a time-varying predictor in the model implies that a given change trajectory may be composed of either one continuous segment or multiple discontinuous segments. Because of this, new strategies are required to construct a data set for prototypical individuals and plot their fitted change trajectories, where each segment has its own start and end times and predictions. This data set can be in either a wide or long format, however: For wide formats, each segment must be plotted using the geom_segment() function from the ggplot2 package; whereas for long formats, each segment must have its own grouping ID, but can otherwise be plotted using the geom_line() function as usual. We demonstrate both formats here by constructing prototypical change trajectories for Model B.

A convenient way to construct a data set for prototypical individuals in wide format is with the reframe() function from the dplyr package, which works similarly to the summarise() function, but can return an arbitrary number of rows per group. Here we use it to (1) expand each unemployed_pattern string into a numeric vector using the str_extract_all() function from the stringr package; and (2) add start and stop times for each segment. After this prediction proceeds as usual, except that we use dplyr’s across() function to avoid writing the same predict() code.

prototypical_depression_B <- unemployed_patterns |>
  select(-count) |>
  group_by(unemployed_pattern) |>
  reframe(
    unemployed = str_extract_all(unemployed_pattern, "[:digit:]", simplify = TRUE),
    unemployed = as.numeric(unemployed),
    months_start = c(0, 5, 10),
    months_end = c(5, 10, 15),
  ) |>
  mutate(
    across(
      starts_with("months"),
      \(.time) {
        predict(
          depression_unemployment_fit_B,
          tibble(unemployed, months = .time),
          re.form = NA
        )
      },
      .names = "depression_{.col}"
    ),
    unemployed_pattern = factor(
      unemployed_pattern, levels = c("1-1-1", "1-0-0", "1-1-0", "1-0-1")
    )
  ) |>
  rename_with(
    \(.x) str_remove(.x, "months_"), .cols = starts_with("depression")
  )

prototypical_depression_B
#> # A tibble: 12 × 6
#>    unemployed_pattern unemployed months_start months_end depression_start
#>    <fct>                   <dbl>        <dbl>      <dbl>            <dbl>
#>  1 1-0-0                       1            0          5             17.8
#>  2 1-0-0                       0            5         10             11.7
#>  3 1-0-0                       0           10         15             10.6
#>  4 1-0-1                       1            0          5             17.8
#>  5 1-0-1                       0            5         10             11.7
#>  6 1-0-1                       1           10         15             15.8
#>  7 1-1-0                       1            0          5             17.8
#>  8 1-1-0                       1            5         10             16.8
#>  9 1-1-0                       0           10         15             10.6
#> 10 1-1-1                       1            0          5             17.8
#> 11 1-1-1                       1            5         10             16.8
#> 12 1-1-1                       1           10         15             15.8
#> # ℹ 1 more variable: depression_end <dbl>

Although we will plot the prototypical trajectories using the wide format data, note here that a convenient way to create a grouping ID for long format data is with the consecutive_id() function from the dplyr package, which generates a unique identifier that increments every time a variable changes. The resulting variable can then be passed to ggplot2’s group aesthetic to ensure the correct cases are connected together.

prototypical_depression_B |>
  pivot_longer(
    cols = c(starts_with("months"), starts_with("depression")),
    names_to = c(".value"),
    names_pattern = "(^.*(?=_))"
  ) |>
  group_by(unemployed_pattern) |>
  mutate(cid = consecutive_id(unemployed), .after = unemployed_pattern)
#> # A tibble: 24 × 5
#> # Groups:   unemployed_pattern [4]
#>    unemployed_pattern   cid unemployed months depression
#>    <fct>              <int>      <dbl>  <dbl>      <dbl>
#>  1 1-0-0                  1          1      0      17.8 
#>  2 1-0-0                  1          1      5      16.8 
#>  3 1-0-0                  2          0      5      11.7 
#>  4 1-0-0                  2          0     10      10.6 
#>  5 1-0-0                  2          0     10      10.6 
#>  6 1-0-0                  2          0     15       9.64
#>  7 1-0-1                  1          1      0      17.8 
#>  8 1-0-1                  1          1      5      16.8 
#>  9 1-0-1                  2          0      5      11.7 
#> 10 1-0-1                  2          0     10      10.6 
#> # ℹ 14 more rows

Now we can plot the four trajectories.

# Figure 5.3:
ggplot(prototypical_depression_B, aes(x = months_start, y = depression_start)) +
  geom_segment(aes(xend = months_end, yend = depression_end)) +
  coord_cartesian(ylim = c(5, 20)) +
  facet_wrap(vars(unemployed_pattern), labeller = label_both) +
  labs(x = "months", y = "depression")

An alternative strategy for plotting discontinuous change trajectories suggested by Singer and Willett (2003) is to represent the wide variety of transition times using just two continuous trajectories that encompass the most extreme contrasts possible: Here, someone who is consistently unemployed, and someone who is consistently employed. With this approach, prototypical change trajectories can be predicted and plotted using the same strategies we have used for models with time-invariant predictors, while conveying the same (or more) information as the set of discontinuous trajectories above.

We demonstrate this alternative strategy for Models B, C, and D. Because of the depression_unemployment study’s design, we start the fitted trajectory for a consistently employed individual at 3.5 months—the earliest time a participant could have their second interview.

prototypical_depression <- depression_unemployment_fits[-1] |>
  map(
    \(.fit) {
      .fit |>
        estimate_prediction(
          data = tibble(months = c(0, 14, 3.5, 14), unemployed = c(1, 1, 0, 0))
        ) |>
        rename(depression = Predicted) |>
        mutate(unemployed = as.logical(unemployed)) |>
        as_tibble()
    }
  ) |>
  list_rbind(names_to = "model")

# Figure 5.4, page 167:
ggplot(prototypical_depression, aes(x = months, y = depression)) +
  geom_line(aes(colour = unemployed)) +
  scale_x_continuous(breaks = seq(0, 14, by = 2)) +
  scale_color_brewer(palette = "Dark2") +
  coord_cartesian(xlim = c(0, 14), ylim = c(5, 20)) +
  facet_wrap(vars(model))

When examining plots like these, Singer and Willett (2003) suggest thinking of the two extreme trajectories as an envelope representing the complete set of prototypical individuals implied by each model:

Because all participants were unemployed at the first interview (by design), each individual starts on the unemployed trajectory.
For the second interview—regardless of the transition time—those who become employed move to the employed trajectory, and those who don’t stay on the unemployed trajectory.
For the third interview—regardless of the transition time—those who become unemployed again move back to the unemployed trajectory, and those who don’t stay on the employed trajectory.

5.3.3 Recentring time-varying predictors

In Section 5.3.3 Singer and Willett (2003) return to the dropout_wages data to discuss three strategies for centring time-varying predictors:

Constant centring: Centre around a single substantively meaningful constant for all observations.
Within-person centring: Decompose the time-varying predictor into two constituent predictors where, for each individual, the first predictor is their within-person mean; and the second predictor is each measurement occasion’s deviation from their within-person mean.
Time-one centring: Decompose the time-varying predictor into two constituent predictors where, for each individual, the first predictor is the value of their first measurement occasion; and the second predictor is each measurement occasion’s deviation from the first measurement occasion.

We demonstrate each of these strategies by updating Model C, dropout_wages_fit_C, to include the main effect of the time-varying predictor unemployment_rate, fitting a model that uses constant centring (Model A2), within-person centring (Model B2), and time-one centring (Model C2).

# Fit models ------------------------------------------------------------------
dropout_wages_fit_A2 <- update(
  dropout_wages_fit_C,
  . ~ . + I(unemployment_rate - 7)
)

dropout_wages_fit_B2 <- update(
  dropout_wages_fit_C,
  . ~ . + unemployment_rate_mean + unemployment_rate_dev,
  data = mutate(
    dropout_wages,
    unemployment_rate_mean = mean(unemployment_rate),
    unemployment_rate_dev = unemployment_rate - unemployment_rate_mean,
    .by = id
  )
)

dropout_wages_fit_C2 <- update(
  dropout_wages_fit_C,
  . ~ . + unemployment_rate_first + unemployment_rate_dev,
  data = mutate(
    dropout_wages,
    unemployment_rate_first = first(unemployment_rate),
    unemployment_rate_dev = unemployment_rate - unemployment_rate_first,
    .by = id
  )
)

dropout_wages_fits_2 <- list(
  `Model A2` = dropout_wages_fit_A2,
  `Model B2` = dropout_wages_fit_B2,
  `Model C2` = dropout_wages_fit_C2
)

# Make table ------------------------------------------------------------------

# Table 5.8:
dropout_wages_fits_2 |>
  modelsummary(
    shape = term + effect + statistic ~ model,
    scales = c("vcov", NA),
    coef_map = c(
      "(Intercept)" = "(Intercept)",
      "I(highest_grade - 9)" = "I(highest_grade - 9)",
      "I(unemployment_rate - 7)" = "unemployment_rate",
      "unemployment_rate_mean" = "unemployment_rate",
      "unemployment_rate_first" = "unemployment_rate",
      "unemployment_rate_dev" = "unemployment_rate_dev",
      "black" = "black",
      "experience" = "experience",
      "experience:I(highest_grade - 9)" = "experience:I(highest_grade - 9)",
      "experience:black" = "experience:black",
      "var__Observation" = "var__Observation",
      "var__(Intercept)" = "var__(Intercept)",
      "var__experience" = "var__experience"
    ),
    gof_map = tibble(
      raw = c("deviance", "AIC", "BIC"),
      clean = c("Deviance", "AIC", "BIC"),
      fmt = 1
    ),
    fmt = 4,
    output = "gt"
  ) |>
  tab_row_group(label = "Goodness-of-Fit", rows = 16:18) |>
  tab_row_group(label = "Variance Components", rows = 13:15) |>
  tab_row_group(label = "Fixed Effects", rows = 1:12) |>
  cols_hide(effect)

	Model A2	Model B2	Model C2
Fixed Effects
(Intercept)	1.7490	1.8743	1.8693
	(0.0114)	(0.0295)	(0.0260)
I(highest_grade - 9)	0.0400	0.0402	0.0399
	(0.0064)	(0.0064)	(0.0063)
unemployment_rate	-0.0120	-0.0177	-0.0162
	(0.0018)	(0.0035)	(0.0026)
unemployment_rate_dev		-0.0099	-0.0103
		(0.0021)	(0.0019)
experience	0.0441	0.0451	0.0448
	(0.0026)	(0.0026)	(0.0026)
experience:black	-0.0182	-0.0189	-0.0183
	(0.0045)	(0.0045)	(0.0045)
Variance Components
var__Observation	0.0948	0.0948	0.0948
var__(Intercept)	0.0506	0.0510	0.0503
var__experience	0.0016	0.0016	0.0016
Goodness-of-Fit
Deviance	4830.5	4827.0	4825.8
AIC	4848.5	4847.0	4845.8
BIC	4909.4	4914.6	4913.5

5.4 Recentring the effect of time

In Section 5.4 Singer and Willett (2003) discuss strategies for centring time-indicator variables using a subset of data from Tomarken, Shelton, Elkins, and Anderson (1997), who measured the relation between changes in positive mood and supplemental antidepressant medication over the course of a week in a sample of 73 men and women already receiving nonpharmacological therapy for depression.

For this example we use the antidepressants data set, a person-period data frame with 1242 rows and 8 columns:

id: Participant ID.
wave: Wave of measurement.
day: Day of measurement.
reading: Time of day a reading was taken.
positive_mood: Positive mood score.
treatment: Treatment condition (placebo pills = 0, antidepressant pills = 1).

antidepressants
#> # A tibble: 1,242 × 6
#>    id     wave   day reading positive_mood treatment
#>    <fct> <dbl> <dbl> <chr>           <dbl>     <dbl>
#>  1 1         1     0 8 AM             107.         1
#>  2 1         2     0 3 PM             100          1
#>  3 1         3     0 10 PM            100          1
#>  4 1         4     1 8 AM             100          1
#>  5 1         5     1 3 PM             100          1
#>  6 1         6     1 10 PM            100          1
#>  7 1         7     2 8 AM             100          1
#>  8 1         8     2 3 PM             100          1
#>  9 1         9     2 10 PM            100          1
#> 10 1        10     3 8 AM             107.         1
#> # ℹ 1,232 more rows

Note that the antidepressants data has three time-indicator variables, each providing a different representation of time:

The values of wave reflect the study’s design, but have little substantive meaning due to the conceptual difficulty of dividing one week into 21 components.
The values of day reflect the study’s design in a meaningful way, but fail to distinguish between morning, afternoon, and evening readings.
The values of reading also reflect the study’s design in a meaningful way—capturing the time of day each reading was taken—but fail to distinguish between days, and are difficult to analyze due to being a character vector.

To facilitate model fitting, we can create new time-indicator variables that are more meaningful and easier to analyze. Here we create two new time-indicator variables:

time_of_day: Time of day a reading was taken, expressed numerically (0 for morning readings; 0.33 for afternoon readings; 0.67 for evening readings).
time: Time of measurement expressed as a combination of day and time_of_day.

antidepressants <- antidepressants |>
  mutate(
    time_of_day = case_when(
      reading == "8 AM" ~ 0,
      reading == "3 PM" ~ 1/3,
      reading == "10 PM" ~ 2/3
    ),
    time = day + time_of_day,
    .after = reading
  )

antidepressants
#> # A tibble: 1,242 × 8
#>    id     wave   day reading time_of_day  time positive_mood treatment
#>    <fct> <dbl> <dbl> <chr>         <dbl> <dbl>         <dbl>     <dbl>
#>  1 1         1     0 8 AM          0     0              107.         1
#>  2 1         2     0 3 PM          0.333 0.333          100          1
#>  3 1         3     0 10 PM         0.667 0.667          100          1
#>  4 1         4     1 8 AM          0     1              100          1
#>  5 1         5     1 3 PM          0.333 1.33           100          1
#>  6 1         6     1 10 PM         0.667 1.67           100          1
#>  7 1         7     2 8 AM          0     2              100          1
#>  8 1         8     2 3 PM          0.333 2.33           100          1
#>  9 1         9     2 10 PM         0.667 2.67           100          1
#> 10 1        10     3 8 AM          0     3              107.         1
#> # ℹ 1,232 more rows

The advantage of the time variable is that it captures both aspects of time in the antidepressants data in a single variable, making it is easy to centre on different time points in the study. Following Singer and Willett (2003), here we centre time on three different points in the study:

time: centred on initial status.
time_3.33: centred on the study’s midpoint.
time_6.67: centred on the study’s final wave.

# Table 5.9, page 182:
antidepressants |>
  select(-c(id, positive_mood, treatment)) |>
  mutate(time_3.33 = time - 3.33, time_6.67 = time - 6.67)
#> # A tibble: 1,242 × 7
#>     wave   day reading time_of_day  time time_3.33 time_6.67
#>    <dbl> <dbl> <chr>         <dbl> <dbl>     <dbl>     <dbl>
#>  1     1     0 8 AM          0     0        -3.33      -6.67
#>  2     2     0 3 PM          0.333 0.333    -3.00      -6.34
#>  3     3     0 10 PM         0.667 0.667    -2.66      -6.00
#>  4     4     1 8 AM          0     1        -2.33      -5.67
#>  5     5     1 3 PM          0.333 1.33     -2.00      -5.34
#>  6     6     1 10 PM         0.667 1.67     -1.66      -5.00
#>  7     7     2 8 AM          0     2        -1.33      -4.67
#>  8     8     2 3 PM          0.333 2.33     -0.997     -4.34
#>  9     9     2 10 PM         0.667 2.67     -0.663     -4.00
#> 10    10     3 8 AM          0     3        -0.33      -3.67
#> # ℹ 1,232 more rows

Here we will fit three models to the antidepressants data to demonstrate how centring time affects parameter estimates and interpretation: a model with time centred on initial status (Model A), a model with time centred on the study’s midpoint (Model B), a model with time centred on the study’s final wave (Model C).

# Fit models ------------------------------------------------------------------
antidepressants_fit_A <- lmer(
  positive_mood ~ treatment * time + (1 + time | id),
  data = antidepressants,
  REML = FALSE
)

antidepressants_fit_B <- update(
  antidepressants_fit_A,
  data = mutate(antidepressants, time = time - 3.33),
  control = lmerControl(optimizer = "bobyqa")
)

antidepressants_fit_C <- update(
  antidepressants_fit_A,
  data = mutate(antidepressants, time = time - 6.67)
)

antidepressants_fits <- list(
  `Model A` = antidepressants_fit_A,
  `Model B` = antidepressants_fit_B,
  `Model C` = antidepressants_fit_C
)

# Make table ------------------------------------------------------------------

# Table 5.10, page 184:
antidepressants_fits |>
  modelsummary(
    shape = term + effect + statistic ~ model,
    scales = c("vcov", NA),
    coef_map = c(
      "(Intercept)",
      "treatment",
      "time",
      "treatment:time",
      "var__Observation",
      "var__(Intercept)",
      "var__time",
      "cov__(Intercept).time"
    ),
    gof_map = tibble(
      raw = c("deviance", "AIC", "BIC"),
      clean = c("Deviance", "AIC", "BIC"),
      fmt = 1
    ),
    output = "gt"
  ) |>
  tab_row_group(label = "Goodness-of-Fit", rows = 13:15) |>
  tab_row_group(label = "Variance Components", rows = 9:12) |>
  tab_row_group(label = "Fixed Effects", rows = 1:8) |>
  cols_hide(effect)

	Model A	Model B	Model C
Fixed Effects
(Intercept)	167.463	159.411	151.335
	(9.327)	(8.763)	(11.546)
treatment	-3.109	15.328	33.821
	(12.333)	(11.543)	(15.163)
time	-2.418	-2.418	-2.418
	(1.731)	(1.731)	(1.731)
treatment:time	5.537	5.537	5.537
	(2.278)	(2.278)	(2.278)
Variance Components
var__Observation	1229.926	1229.930	1229.931
var__(Intercept)	2111.536	2008.159	3324.498
var__time	63.736	63.736	63.735
cov__(Intercept).time	-121.633	90.619	303.494
Goodness-of-Fit
Deviance	12680.5	12680.5	12680.5
AIC	12696.5	12696.5	12696.5
BIC	12737.4	12737.4	12737.4

Notice that the parameters related to the slope are identical between Models A, B, and C, but the those related to the intercept are different. As Singer and Willett (2003) explain, this is because centring a time-indicator variable changes the location of the fitted trajectory’s anchors around a given point in time.

We can visualize this anchoring effect by plotting prototypical change trajectories for the models fit to the antidepressants data: As the dashed vertical lines highlight, centring a time-indicator variable changes the location of the focal comparison between the control and treatment groups in the model, causing the resultant estimates to describe the trajectories behaviours at that specific point in time.

Note that because Models A, B, and C are structurally identical, it does not matter which model is used here to make predictions—they all have the same prototypical change trajectories.

protoypical_mood <- antidepressants_fit_A |>
  estimate_prediction(
    data = tibble(
      treatment = c(0, 0, 0, 1, 1, 1),
      time = c(0, 3.33, 6.67, 0, 3.33, 6.67)
    )
  ) |>
  rename(positive_mood = Predicted) |>
  mutate(treatment = as.logical(treatment))
  
# Figure 5.5, page 185:
ggplot(protoypical_mood, aes(x = time, y = positive_mood)) +
    geom_line(aes(colour = treatment)) +
    geom_line(aes(group = time), linetype = 2) +
    scale_x_continuous(breaks = seq(0, 7, by = 1)) +
    scale_color_brewer(palette = "Dark2") +
    coord_cartesian(ylim = c(140, 190))