Model overview

This document exists in the repository at https://github.com/codatmo/Liverpool as index.Rmd and rendered in html as index.html. The releases can be downloaded at https://github.com/codatmo/Liverpool/releases which includes this file.

There is a https://codatmo.github.io/Liverpool/model_reproduction_checklist.html which applies standard model validation to the model.

This model is part of the CoDatMo (Co)vid (Dat)a (Mo)deling site (https://codatmo.github.io/) which is intended to replicate and make available important COVID models written in Bayesian modeling languages like Stan or PyMC3.

Problem description

These models attempt to model the relationship between the numbers of phone calls to NHS 111 and/or NHS 111 online assessments reporting COVID-19 symptoms, and weekly recorded numbers of deaths from COVID-19.

Model description

The model is separated into two parts: the underlying deterministic transmission model governed by a system of ordinary differential equations (ODEs) for the spread of the disease through the population, and a stochastic observation model which links the states of the transmission model with the observed surveillance data.

Transmission model

The transmission model assumes a single geographical region with a large population of identical (for instance, in terms of age and sex) individuals who come into contact with one another uniformly at random, but do not come into contact with individuals from other areas. It also assumes a closed population, that is no individuals migrate into or out of the population, for example as a result of births, non-COVID related deaths, or changes in permanent residence. The model divides the population into six disease states: (S) susceptible, (E) exposed, (I) infectious, (R) recovered, (T) terminally ill, and (D) dead. The exposed, infectious, and terminally ill disease states are each further partitioned into two substates, making the expected times spent by people in these states follow Erlang distributions. Below is a graphical illustration of the transmission model.

Transmission Model

Transmission Model

Through the random mixing of the population, infectious individuals come into contact with and are allowed to transmit the virus to susceptible individuals. A susceptible individual who has become exposed through one of these contacts is not initially infectious. A period of time elapses, while the virus replicates in their body before they become infectious and can transmit the virus onto members of the remaining susceptible population. After being infectious for some time, the individual may recover and become indefinitely immune to reinfection. Should the individual fail to recover, however, they will become terminally ill for a while before, unfortunately, dying of the disease.

The number of individuals in each disease state varies with time according to the following system of ordinary differential equations:

\[\begin{align} \frac{dS}{dt} &= -\beta \frac{I_1 + I_2}{N} S \\ \frac{dE_1}{dt} &= \beta \frac{I_1 + I_2}{N} S - \frac{2}{d_L} E_1 \\ \frac{dE_2}{dt} &= \frac{2}{d_L} (E_1 - E_2) \\ \frac{dI_1}{dt} &= \frac{2}{d_L} E_2 - \frac{2}{d_I} I_1 \\ \frac{dI_2}{dt} &= \frac{2}{d_I} (I_1 - I_2) \\ \frac{dR}{dt} &= \frac{2}{d_I} I_2 \left(1 - \omega\right) \\ \frac{dT_1}{dt} &= \frac{2}{d_I} I_2 \omega - \frac{2}{d_T} T_1 \\ \frac{dT_2}{dt} &= \frac{2}{d_T} (T_1 - T_2) \\ \frac{dD}{dt} &= \frac{2}{d_T} T_2 \end{align}\]

where

  • \(S(t)\) is the number of susceptible individuals who have not yet been infected and are at risk of infection,
  • \(E_1(t) + E_2(t)\) is the number of exposed individuals who have been infected but are not yet infectious,
  • \(I_1(t) + I_2(t)\) is the number of infectious individuals,
  • \(T_1(t) + T_2(t)\) is the number of terminally ill individuals who have been infected and will die,
  • \(R(t)\) is the number of recovered individuals,
  • \(D(t)\) is the number of dead individuals,
  • \(N = S(t) + E_1(t) + E_2(t) + I_1(t) + I_2(t) + R(t) + T_1(t) + T_2(t) + D(t)\) is the constant total number of individuals in the population,
  • \(\beta(t)\) is the mean rate of contacts between individuals per unit time that are sufficient to lead to transmission if one of the individuals is infectious and the other is susceptible,
  • \(d_L\) is the mean time between infection and onset of infectiousness,
  • \(d_I\) is the mean time for which individuals are infectious,
  • \(d_T\) is the mean time for which individuals are terminally ill,
  • \(\omega\), the infection fatality ratio (IFR), is the proportion of infected individuals who will die.

The mean rate of effective contacts, \(\beta(t)\), is a piecewise linear function of time:

\[\begin{equation} \beta(t) = \sum_{i=1}^{N} \beta_i(t) \chi_{[t_{i-1}, t_i)}(t), \end{equation}\]

where the mean rate of effective contacts during the \(i\)th time interval, \(\beta_i(t)\), is given by

\[\begin{equation} \beta_i(t) = \frac{\beta_{r,i} - \beta_{l,i}}{t_i - t_{i-1}}(t - t_{i-1}) + \beta_{l, i} \end{equation}\]

and

\[\begin{equation} \chi_{[t_{i-1}, t_i)}(t) = \begin{cases} 1 & \mathrm{if} \, t \in [t_{i-1}, t_i) \\ 0 & \mathrm{if} \, t \notin [t_{i-1}, t_i) \end{cases} \end{equation}\]

Observation Model

The observation model captures the assumed stochastic process that generates the surveillance data from the the states of the transmission model. The weekly deaths and daily 111 call counts and online assessment counts are taken to have two completely different connections to the transmission model.

Weekly deaths

The transmission model provides a sequence \(D(1), D(2), ..., D(t), ...\) in which the \(t\)th element is the total number individuals in the population who have died of COVID-19 by day \(t\). The observation model differences the elements of this sequence and then aggregates the differences to get the number of individuals who have died of COVID-19 in each epidemiological week. The observed number of weekly deaths in week \(w\), \(\Delta D_{obs}(w)\), are assumed to have a negative binomial distribution parameterised by \(\Delta D(w)\) and a parameter that controls overdispersion relative to the square of the mean \(\phi_{deaths}\):

\[\begin{equation} \Delta D_{obs}(w) \sim \mathrm{NegBin}\left(\Delta D(w), \phi_{deaths}\right) \end{equation}\]

Daily 111 calls

The observation model supposes an unknown latency between individuals being exposed to the virus and calling NHS 111. This unknown latency is embodied by allowing the mean number of daily 111 calls at time \(t\), \(\mu_{\mathrm{calls}}(t)\), to depend not only on the number of new daily infection at time \(t\), \(\lambda(t)\), but also lagged values of the number of new daily infections. The observation model also assumes that the tendency for individuals to call NHS 111 if they have been infected varies with time. Putting both assumptions together leads to the expression for the mean number of daily 111 calls:

\[\begin{equation} \mu_{\mathrm{calls}}\left(t\right) = \rho_{\mathrm{calls}}\left(t\right) \sum_{l=0}^L w_l \lambda\left(t-l\right), \end{equation}\]

where

\[\begin{equation} \rho_{\mathrm{calls}}(t) = \sum_{i=1}^{N} \rho_{{\mathrm{calls}},i}(t) \chi_{[t_{i-1}, t_i)}(t), \end{equation}\]

where \(\rho_{\mathrm{calls},i}\left(t\right)\) is the proportion of infected individuals being converted into 111 call counts during the \(i\)th time interval, and the indicator for the \(i\)th time interval is

\[\begin{equation} \chi_{[t_{i-1}, t_i)}(t) = \begin{cases} 1 & \mathrm{if} \, t \in [t_{i-1}, t_i) \\ 0 & \mathrm{if} \, t \notin [t_{i-1}, t_i) \end{cases} \end{equation}\]

The observed number of daily 111 calls at time \(t\), \(C\left(t\right)\), is assumed to have a negative binomial distribution with mean \(\mu_{\mathrm{calls}}\left(t\right)\) and overdispersion relative to the square of the mean \(\phi_{\mathrm{calls}}\):

\[\begin{equation} C\left(t\right) \sim \mathrm{NegBin}\left(\mu_{\mathrm{calls}}\left(t\right), \phi_{\mathrm{calls}}\right) \end{equation}\]

Online assessments

The observation model for the observed NHS 111 online assessments is similar to the observation model for NHS 111 phone calls, but with separately estimated \(\rho_{\mathrm{online}}\), \(\mu_{{\mathrm{online}}}\), \(\phi_{\mathrm{online}}\), \(w_{{\mathrm{online}},l}\), etc.

Data

An R script for generating some simulated data (based on the priors in the model) is in src/scripts/generateSimulatedData.R which can be used as follows:

source("src/scripts/generateSimulatedData.R")
## Loading required package: DirichletReg
## Loading required package: Formula
generatedData <- generateSimulatedData(
  seed = 0,
  maxTime = 100,
  population = 1000000,
  n_beta_pieces = 20,
  n_rho_calls_111_pieces = 10,
  calls_111_start = 30
)

stan_data <- generatedData$stan_data
ground_truth <- generatedData$ground_truth

This data can then be fit to the deaths and calls model:

library(rstan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

initf <- function() {
  n_beta_pieces <- stan_data$n_beta_pieces
  
  initial_state_raw = c(runif(1, min = 0.99999, max = 1.0), runif(1, min = 0.0, max = 1.0))
  beta_left = exp(runif(n_beta_pieces, min = -2, max = 0.5))
  beta_right = exp(runif(n_beta_pieces, min = -2, max = 0.5))
  dL = runif(1, min = 3.5, max = 4.0)
  dI = runif(1, min = 2.2, max = 2.6)
  dT = runif(1, min = 11.0, max = 13.0)
  omega = 1/(1 + exp(-runif(1, min = -5, max = -3)))
  
  list(
    initial_state_raw = initial_state_raw,
    beta_left = beta_left,
    beta_right = beta_right,
    dL = dL,
    dI = dI,
    dT = dT,
    omega = omega
  )
}


# generate a list of lists to specify initial values
liverpoolModel <- stan_model("src/models/deaths_and_111_calls.stan")
fit <- sampling(liverpoolModel, data = stan_data, init = initf)

Data generating process

Data munging and examination

Stan program

The Stan models are located at Liverpool/src/models:

functions {
  real[] seeiittd(real time,
                  real[] state,
                  real[] params,
                  real[] real_data,
                  int[] integer_data) {

    // Unpack integer data values
    int T = integer_data[1];
    int n_beta_pieces = integer_data[2];
    int n_disease_states = integer_data[3];

    // Unpack real data values
    real beta_left_t[n_beta_pieces] = real_data[1:n_beta_pieces];
    real beta_right_t[n_beta_pieces] = real_data[n_beta_pieces+1:2*n_beta_pieces];
    real population = real_data[2*n_beta_pieces+1];

    // Unpack parameter values
    real beta_left[n_beta_pieces] = params[1:n_beta_pieces];
    real grad_beta[n_beta_pieces] = params[n_beta_pieces+1:2*n_beta_pieces];
    real nu = params[2*n_beta_pieces+1];
    real gamma = params[2*n_beta_pieces+2];
    real kappa = params[2*n_beta_pieces+3];
    real omega = params[2*n_beta_pieces+4];

    // Unpack state
    real S = state[1];
    real E1 = state[2];
    real E2 = state[3];
    real I1 = state[4];
    real I2 = state[5];
    real T1 = state[6];
    real T2 = state[7];
    real D = state[8];

    real infection_rate;
    real nuE1 = nu * E1;
    real nuE2 = nu * E2;
    real gammaI1 = gamma * I1;
    real gammaI2 = gamma * I2;
    real kappaT1 = kappa * T1;
    real kappaT2 = kappa * T2;

    real dS_dt;
    real dE1_dt;
    real dE2_dt;
    real dI1_dt;
    real dI2_dt;
    real dT1_dt;
    real dT2_dt;
    real dD_dt;
    
    for (i in 1:n_beta_pieces) {
      if(time >= beta_left_t[i] && time < beta_right_t[i]) {
        real beta = grad_beta[i] * (time - beta_left_t[i]) + beta_left[i];
        infection_rate = beta * (I1 + I2) * S / population;
      }
    }

    dS_dt = -infection_rate;
    dE1_dt = infection_rate - nuE1;
    dE2_dt = nuE1 - nuE2;
    dI1_dt = nuE2 - gammaI1;
    dI2_dt = gammaI1 - gammaI2;
    dT1_dt = gammaI2 * omega - kappaT1;
    dT2_dt = kappaT1 - kappaT2;
    dD_dt = kappaT2;

    return {dS_dt, dE1_dt, dE2_dt, dI1_dt, dI2_dt, dT1_dt, dT2_dt, dD_dt};
  }

  real[ , ] integrate_ode_explicit_trapezoidal(real[] initial_state, real initial_time, real[] times, real[] params, real[] real_data, int[] integer_data) {
    real h;
    vector[size(initial_state)] dstate_dt_initial_time;
    vector[size(initial_state)] dstate_dt_tidx;
    vector[size(initial_state)] k;
    real state_estimate[size(times),size(initial_state)];

    h = times[1] - initial_time;
    dstate_dt_initial_time = to_vector(seeiittd(initial_time, initial_state, params, real_data, integer_data));
    k = h*dstate_dt_initial_time;
    state_estimate[1,] = to_array_1d(to_vector(initial_state) + h*(dstate_dt_initial_time + to_vector(seeiittd(times[1], to_array_1d(to_vector(initial_state)+k), params, real_data, integer_data)))/2);

    for (tidx in 1:size(times)-1) {
      h = (times[tidx+1] - times[tidx]);
      dstate_dt_tidx = to_vector(seeiittd(times[tidx], state_estimate[tidx], params, real_data, integer_data));
      k = h*dstate_dt_tidx;
      state_estimate[tidx+1,] = to_array_1d(to_vector(state_estimate[tidx,]) + h*(dstate_dt_tidx + to_vector(seeiittd(times[tidx+1], to_array_1d(to_vector(state_estimate[tidx,])+k), params, real_data, integer_data)))/2);
    }

    return state_estimate;
  }
}
data {
  real initial_time;
  int<lower=1> n_beta_pieces;
  real<lower=0> beta_left_t[n_beta_pieces];
  real<lower=0> beta_right_t[n_beta_pieces];
  int<lower=1> n_rho_calls_111_pieces;
  int<lower=0> rho_calls_111_left_t[n_rho_calls_111_pieces];
  int<lower=0> rho_calls_111_right_t[n_rho_calls_111_pieces];
  int<lower=1> T;
  real times[T];
  int<lower=1> n_disease_states;
  real<lower=0> population;
  int<lower=1> deaths_length;
  int<lower=1> deaths_starts[deaths_length];
  int<lower=1> deaths_stops[deaths_length];
  int<lower=0> deaths[deaths_length];
  int<lower=1> calls_111_length;
  int<lower=1> calls_111_start;
  int<lower=0> calls_111[calls_111_length];
  int real_data_length;
  real real_data[real_data_length];
  int integer_data_length;
  int integer_data[integer_data_length];
  int<lower=0, upper=1> compute_likelihood;
}
transformed data {
  real mu_dL = 4.00;
  real sigma_dL = 0.20;
  real mu_dI = 3.06;
  real sigma_dI = 0.21;
  real mu_dT = 16.00;
  real sigma_dT = 0.71;
  int max_lag = 13;
}
parameters {
  real<lower=0, upper=1> initial_state_raw[2];
  real<lower=0> beta_left[n_beta_pieces];
  real<lower=0> beta_right[n_beta_pieces];
  real<lower=0> dL;
  real<lower=0> dI;
  real<lower=0> dT;
  real<lower=0, upper=1> omega;
  real<lower=0> reciprocal_phi_deaths;
  real<lower=0> reciprocal_phi_calls_111;
  real<lower=0> rho_calls_111[n_rho_calls_111_pieces];
  simplex[max_lag+1] lag_weights_calls_111;
}
transformed parameters {
  real initial_state[n_disease_states];
  real grad_beta[n_beta_pieces];
  real nu;
  real gamma;
  real kappa;
  real phi_deaths;
  real phi_calls_111;
  real state_estimate[T,n_disease_states];
  vector[T+1] S;
  vector[T+1] E1;
  vector[T+1] E2;
  vector[T+1] I1;
  vector[T+1] I2;
  vector[T+1] T1;
  vector[T+1] T2;
  vector[T+1] D;
  vector[T] daily_infections;
  vector[T] daily_deaths;
  vector[T] calls_111_lagged_daily_infections;
  vector[T] daily_calls_111;

  initial_state[1] = (population-5.0)*initial_state_raw[1] + 1.0;
  initial_state[2] = (population-5.0)*(1.0-initial_state_raw[1])*initial_state_raw[2]/2.0 + 1.0;
  initial_state[3] = (population-5.0)*(1.0-initial_state_raw[1])*initial_state_raw[2]/2.0 + 1.0;
  initial_state[4] = (population-5.0)*(1.0-initial_state_raw[1])*(1.0-initial_state_raw[2])/2.0 + 1.0;
  initial_state[5] = (population-5.0)*(1.0-initial_state_raw[1])*(1.0-initial_state_raw[2])/2.0 + 1.0;
  initial_state[6] = 0.0;
  initial_state[7] = 0.0;
  initial_state[8] = 0.0;
  grad_beta = to_array_1d((to_vector(beta_right) - to_vector(beta_left))./(to_vector(beta_right_t) - 
              to_vector(beta_left_t)));
  nu = 2.0/dL;
  gamma = 2.0/dI;
  kappa = 2.0/dT;
  phi_deaths = 1.0 / reciprocal_phi_deaths;
  phi_calls_111 = 1.0 / reciprocal_phi_calls_111;

  {
    real params[2*n_beta_pieces+4];
    params[1:n_beta_pieces] = beta_left;
    params[n_beta_pieces+1:2*n_beta_pieces] = grad_beta;
    params[2*n_beta_pieces+1] = nu;
    params[2*n_beta_pieces+2] = gamma;
    params[2*n_beta_pieces+3] = kappa;
    params[2*n_beta_pieces+4] = omega;

    state_estimate = integrate_ode_explicit_trapezoidal(initial_state, initial_time, times, params, real_data, integer_data);
  }

  S = append_row(initial_state[1], to_vector(state_estimate[, 1]));
  E1 = append_row(initial_state[2], to_vector(state_estimate[, 2]));
  E2 = append_row(initial_state[3], to_vector(state_estimate[, 3]));
  I1 = append_row(initial_state[4], to_vector(state_estimate[, 4]));
  I2 = append_row(initial_state[5], to_vector(state_estimate[, 5]));
  T1 = append_row(initial_state[6], to_vector(state_estimate[, 6]));
  T2 = append_row(initial_state[7], to_vector(state_estimate[, 7]));
  D = append_row(initial_state[8], to_vector(state_estimate[, 8]));

  daily_infections = S[:T] - S[2:] + machine_precision();
  daily_deaths = D[2:] - D[:T];

  calls_111_lagged_daily_infections = lag_weights_calls_111[1]*daily_infections;

  for (i in 1:max_lag) {
    calls_111_lagged_daily_infections += lag_weights_calls_111[i+1]*
                                          append_row(rep_vector(0.0, i), daily_infections[:T-i]);
  }

  daily_calls_111 = rep_vector(0.0, T);

  for (i in 1:n_rho_calls_111_pieces) {
    daily_calls_111[rho_calls_111_left_t[i]:rho_calls_111_right_t[i]-1] = 
    calls_111_lagged_daily_infections[rho_calls_111_left_t[i]:rho_calls_111_right_t[i]-1] * 
    rho_calls_111[i];
  }
}

model {
  initial_state_raw[1] ~ beta(5.0, 0.5);
  initial_state_raw[2] ~ beta(1.1, 1.1);
  beta_left ~ normal(0, 0.5);
  beta_right ~ normal(0, 0.5);
  dL ~ normal(mu_dL, sigma_dL);
  dI ~ normal(mu_dI, sigma_dI);
  dT ~ normal(mu_dT, sigma_dT);
  omega ~ beta(100, 9803);
  reciprocal_phi_deaths ~ exponential(5);
  reciprocal_phi_calls_111 ~ exponential(5);
  rho_calls_111 ~ normal(0, 0.5);
  lag_weights_calls_111 ~ dirichlet(rep_vector(0.1, max_lag+1));

  if (compute_likelihood == 1) {
    for (i in 1:deaths_length) {
      target += neg_binomial_2_lpmf(deaths[i] | 
                sum(daily_deaths[deaths_starts[i]:deaths_stops[i]]), phi_deaths);
    }

    target += neg_binomial_2_lpmf(calls_111 | 
            daily_calls_111[calls_111_start:(calls_111_start-1)+calls_111_length], phi_calls_111);
  }
}
generated quantities {
  vector[T-1] growth_rate = (log(daily_infections[2:]) - log(daily_infections[:T-1]))*100;

  int pred_deaths[deaths_length];
  int pred_calls_111[calls_111_length];

  for (i in 1:deaths_length) {
    pred_deaths[i] = neg_binomial_2_rng(sum(daily_deaths[deaths_starts[i]:deaths_stops[i]]), 
                     phi_deaths);
  }

  for (i in 1:calls_111_length) {
    pred_calls_111[i] = neg_binomial_2_rng(daily_calls_111[calls_111_start - 1 + i], 
                        phi_calls_111);
  }
}

Running model

Model validation

Run posterior diagnostics

Prior predictive check

Plotting simulations from priors

Parameter recovery with simulated data

Posterior predictive check

Cross validation

SBC validation