1 Infectious Disease Modelling with Epidemix

1.1 Introduction to models with Epidemix

The Epidemix Shiny App is a R-based online app. It provides an easy interface to run simple epidemiological models without the need for installing or coding in R. Before starting the practical we will have a brief overview.

The interface is divided in two main parts, the inputs panel on the left side and the outputs on the right. The outputs panel shows the model results, either graphically or listed in a downloadable table.

The first decision you need to make is whether to use a generic model or a disease-specific one. At the moment, the only disease-specific models are for Covid-19, Avian flu and African Swine Fever.

drawing


The first drop-down Select model type allows you to choose the type of model you want to work with. As you can see in the figure below, there are few model classifications: Deterministic vs. Stochastic and Compartmental (COMP) vs. Individual Based Models (IBM). The third criterion with which these models are described regards the host population structure: homogeneous (all hosts can be in contact equally with each other), heterogeneous (hosts are divided in two groups), metapopulation (hosts are divided in several sub-populations), network (hosts are in contact with a set pre-defined neighbours) and spatial (contacts between hosts depend on their position in space).

drawing


In Select infection states to consider you can choose the epidemiological structure of your model, for example including an Exposed (infected but uninfectious) stage or specify if infectious individuals are asymptomatic, symptomatic or both (in this practical we won’t use the asymptomatic). In the plot output the infectious are indicated as Infectious-symptomatic, the exposed as Infected(latent), and the recovered as Recovered-immune.

drawing

For any panel, you can click the blue “i” to get a full explanation of the features. See the example for Select infection states to consider.

drawing

Individuals are referred to as ‘units’. By selecting Some units recover in the How many units recover from infection? option you can allow infected hosts not to recover and to be Removed instead. For example, this could be the result of disease-induced death or, in the case of non-human animal disease, culling (a new parameter will appear in the Define infection and transmission features panel).

By selecting No in the Does a unit which recovered from infection remain immune until removed? option you can allow immunity to wane, therefore recovered hosts can return susceptible (a new parameter will appear in the Define infection and transmission features panel).


In Define host population features you can choose the population size (unfortunately limited to 1000 host individuals) and if it’s a closed or open population (that is, whether hosts move in or out of the population), and in the latter case, the duration for which individuals stay in the population.

Note: times are specified in days, but this does not prevent you to consider them months or years, (just remember to write the parameters accordingly!).

drawing


The panel Define infection and transmission features allows you to change the parameters, such as the number of infected individuals at the start of the simulated epidemic, the Daily number of effective contacts per unit (corresponding to the transmission rate, \(\beta\)), and the infectious period length (corresponding to the inverse of the recovery rate, 1/\(\gamma\)). Depending upon the setting you choose in the previous panels, other parameters might need to be set. As an example, if you choose a network model you will have to specify the average number of contacts (\(\nu\)) and the probability of transmission given a contact (\(\rho\)). Note that: \(\beta\) = \(\nu\) x \(\rho\).

drawing


The last two panels are Choose control strategy and Set simulation parameters. The first allows you to choose a control (e.g. vaccination) but we will not use it in this practical. In the latter you can decide how long you want to run your model for and, if you are using a stochastic model, how many independent simulations you want to run.

drawing


1.2 Cholera and outbreak dynamics

You have been asked to provide some mathematical modelling predictions to help with a public health response to a potential cholera outbreak in a region that currently has no access to a vaccine supply. Cholera is a water-borne infectious disease caused by the bacterial pathogen Vibrio cholerae, and it is spread mostly by unsafe water and unsafe food that has been contaminated with human feces containing the bacteria. The WHO estimated between 1.3 and 4.0 million of cholera cases per year, and between 21,000 and 143,000 deaths.

The probably cases are centred around a village with a population size N of 1000. According to the public health authorities cholera has not previously been reported, and the population is thought to be fully susceptible to infection.

Lewnard and collegues studied the outbreak in Haiti (2010-2011) and their estimate for the infection \(R_0\) was 1.8 in the Artibonite area, where it is believed the outbreak started.


1.2.1 Do you think it is possible for cholera to invade and cause an outbreak? Explain your reasoning.

It is possible because \(R_0\) is > 1 and the population is fully susceptible to infection meaning that \(R_0\) is a good metric to assess whether an outbreak can occur.

1.2.2 Assuming that the latency period of the disease is negligible, and that recovered individuals acquire life-lasting immunity, which compartmental model would you choose? Why?

  • SIR
  • SEIR
  • SIRS

Because the latency period is negligible, there is no need for an Exposed compartment, and because of the life-lasting immunity, recovered individuals don’t go back to susceptible, therefore a simple SIR is the answer.


Understanding when an outbreak “turns over” (i.e. when the peak of the epidemic occurs in the absence of any intervention or behaviour change) is very important from an epidemic management point of view. For a simple model the turn-over point can be calculated as a function of \(R_0\).

1.2.3 Write down a mathematical expression that defines the fraction of susceptible individuals at the outbreak peak. Why does the number of new infections decline after the peak? Provide reasoning in terms of number of susceptible individuals.

1/\(R_0\)

Based on the above information from previous work, use Epidemix to run an epidemiological model representing the potential outbreak of cholera in your study village of N population size for 180 days (change the simulation length in the Set simulation parameters section), starting from the introduction of 1 infectious individual.

The interface does not allow you to directly specify \(R_0\) (= 1.8), but given that the infectious period length (1/\(\gamma\)) is 5 days, we can calculate \(\beta = R_0\gamma\), such that \(\beta = 1.8/5\)) [Note: what are the units of \(\beta\)?]. (Reminder: in Epidemix \(\beta\) is called Daily number of effective contacts per unit)

\(\beta\) is expressed in contacts/person/day

You should obtain a plot like this, in the right panel:

drawing

Determining the maximum number of simultaneously infected individuals is important for epidemic preparedness because it allows to calibrate the control efforts. For example, knowing the maximum number of severe cases at any one time allows you to prepare an adequate number of beds in the hospital, or to buy an adequate number of vaccine doses (you will see more about epidemic control in Week 4).

1.2.4 Verify graphically the previous question. How long does it take for the outbreak to reach the peak and turn-over (in days)? How many susceptible and infectious individuals there are at the peak of the outbreak? Is this consistent with your answer to the previous question?

(Hint: if you are unsure about the exact values appearing using the cursor on the plot, check the values in the results table. Please also note that the values shown by the cursor on the plot are an approximation.)

The outbreak peak happens after 40 days, there are approximately 555 susceptible and 118 infectious individuals. We would expect the fraction of susceptibles, s, to be 1/R0 at the peak of the outbreak. For cholera, this would therefore mean that s = 1/1.8 = 0.5555. Therefore our plot is consistent with what we were expecting from our theoretical result, as 555/1000 is approximately 0.55.

1.2.5 How long does the outbreak last?

(Note: we consider the outbreak over when there are less than 1 infectious individuals left in the population)

The outbreaks end after 103 days.


Deterministic SIR models predict that an outbreak in a closed population is eventually going to end. By doing some maths, starting from the SIR Ordinary Differential Equations (ODEs), you can show that: \[ s(t) = s(0)e^{-r(t)R_0} \] where s(t) is the fraction of susceptible individuals at time t (\(S(t)/N\)), s(0) is the initial fraction of susceptible (\(S(0)/N\)), and r(t) is the fraction of recovered at time t (\(R(t)/N\)). If you assume that at the beginning of the outbreak everyone in the population is susceptible (i.e s(0) = ~1), and that at the end of the outbreak, individuals can either be susceptible or recovered, you can get this formula for t \(\rightarrow \infty\) (i.e. in the ‘long run’) :

\[ s(\infty) = 1 - r(\infty) = e^{-r(\infty)R_0} \] According to this equation, the fraction of susceptibles at the end of the outbreak tends to 0 (and the fraction of recovered tends to 1) for increasing values of \(R_0\).

1.2.6 What’s the value of \(R_0\) above which 99% of the population is infected? (i.e. r(\(\infty\)) = 0.99). How about when 90% of the population is infected?

(Hint: you can either solve this mathematically or by increasing \(\beta\) in the Epidemix model until you reach 990 and 900 recovered individuals by the end of the outbreak)

The value of \(R_0\) which would lead to 99% of infected individuals is 4.65, while an \(R_0\) of ~2.56 would lead to 90%.

You will see that the cholera \(R_0\) we are considering (1.8) is much lower than the answer to the previous question.

1.2.7 What does it mean in terms of when the outbreak will end? Are there going to be any susceptible individuals in the study village after the cholera outbreak? How many? (Note: solve this using Epidemix)

It means that according to our SIR model results there is going to be a large fraction of susceptibles at the end of the outbreak, around a quarter (267).


Now that you have a working model, you want to make it more realistic. After some literature research, you find out that cholera latency period (i.e. from infection to infectiousness) is considered to be around 2 days.

1.2.8 Which model would you choose now? Why?

  • SIR
  • SEIR
  • SIRS

Since there is a latency period, a compartment E to represent the exposed/latent infected individual is necessary.

1.2.9 How do the outbreak dynamics change, in terms of: timing of the turn-over, outbreak duration, number of infected at the outbreak peak? How do these values compare to the SIR model? IsI this what you would expect? It may be helpful to think about the natural history of cholera, as well as its \(R_0\).

(Note: the total number of infected is the sum of the exposed and the infectious)

The outbreak turn-over happens after 63 days (compared to 40 days without the latent period) andthe outbreak ends after 123 days. Therefore, the predicted dynamics are slower than in the SIR model case. However, the maximum number of infected individuals is the same (about 118) with the turn over happening when the fraction of susceptible individuals is around 0.55.

In reality we know that cholera is a waterborne transmitted disease, in most cases people get infected by consuming contaminated water. Some mathematical models incorporate this complexity with a compartment modelling the bacterial load in the water supply.

Moreover, by doing some research on the field you find out that, in the study village, the population is divided into a few communities spread out in a wide area, which rarely interacts with each other.

1.2.10 Given the above characteristics of the pathogen transmission and of the study population, which SIR assumptions would be violated? Why?

  • Direct transmission
  • Homogeneous mixing
  • Both

The disease is not transmitted directly, because we need an intermediate medium which is water. The fact that the population is divided in groups does not conform with the homogenoeusly mixing assumption.


1.3 Influenza modelling

You have been asked to help with a public health response to this year’s flu season. This season it has been estimated that flu’s reproductive number is 1.5.

The control of airborne transmitted infectious diseases is particularly challenging in schools, where students come in close contact with each other for an extended amount of time. You have been given the task to study a flu outbreak in a school on an island in the Hebrides. The island community is very small, and the school only comprises 60 students.

A small population size can challenge the assumptions of deterministic models, because random probabilistic events have a bigger impact on outbreak dynamics than in bigger populations.

1.3.1 In general, during which outbreak phase are the dynamics is highly dependent on stochastic events? Why?

  • Disease invasion phase (i.e. when there are few infectious individuals)
  • Outbreak peak (i.e. when there are the most infectious individuals)
  • Both

Because when there are few infectious individuals the individual variability can lead to early disease extinction, while when there are more infectious individuals the dynamics approximate the average dynamics.

1.3.2 Which is the probability of a natural die-off of a flu outbreak with a single infectious introduction?

The probability is 1/\(R_0\), therefore 67%.

You want to verify the above question with Epidemix. To do that you need to switch to a Stochastic Homogeneous COMP model and set the parameters accordingly (\(R_0\) is 1.5, and the infectious period \(1/\gamma\) is 3 days, calculate the transmission rate \(\beta\) (like you did in Part 2). Run 30 simulations (you can edit Set simulation parameters).

In the outputs panel you should see something like this:

drawing

1.3.3 Why do you think the median number of Recovered-immune at the end of the outbreak is so low? If you click on the panel Explore individual simulations you can select and visualise the dynamics of each simulated outbreak (see figure below).

Because the majority of simulated outbreaks, 67%, should die off naturally, therefore an outbreak should happen in only 33% of the simulations.

If you click on the panel Explore individual simulations you can select and visualise the dynamics of each simulated outbreak (see figure below).

drawing

1.3.4 In how many simulations there are more than 10 infected individuals? Based on the two previous questions, how many did you expect?

(Note: the stochastic model can produce slightly different results.)

We would expect there to be 1- 1/1.5 = 33% chance that each introduction leads to an outbreak, therefore we would expect there to be 10 * 0.33 = 3-4 simulations that have outbreaks. There were more than 10 infected individuals in 9 simulations, I expect it to be 10 (30 * 33%).

In the next practical you will see how a different number of outbreak seeds (i.e. the number of initial infectious individuals) can lead to very different results.


For the purpose of your task, you want to estimate the average dynamics (sometimes called, ‘mean-field approximation’) of a potential flu outbreak on the entire island population, once enough people have seeded an outbreak that an epidemic is inevitable. Therefore, we take a step back and use a deterministic model.

Other than the school students we discussed before, the total island population consists of 850 people in total, that can be divided into 2 main groups: young and children up to 18 years of age (250) and adults (600).

1.3.5 Similar to what we did in the previous practical, use Epidemix to verify when the outbreak would turn-over (in the absence of any intervention or behaviour change), how many susceptible individuals and infectious individuals (i.e. named Infectious-symptomatic) there are at the turn-over, and how long would it take for the outbreak in the island to end.

The outbreak turn-over would happen after 33 days, when susceptible individuals are approximately 567 (850 * 1/1.5) and at the peak there are 54 infectious individuals. The outbreak is over by day 81.


Because of the different contact patterns characterizing schools, considering the entire population as homogeneously mixed would be a quite misleading assumption. A potential solution would be to divide the population in two groups, young & children (group 1) and adults (group 2). Epidemix allows you to do so by switching to a Deterministic heterogeneous model.

You can then set the total population size and the number of individuals in group 1 (the children & young group in our case study):

drawing

and the transmission rate within- and between-groups:

drawing

The potential infectious contacts between adults are more scarce than in the children population, so let’s assume a \(\beta_2\) of 0.3 (transmission rate within group 2, adults). For the purpose of the exercise set the between-group \(\beta\)s to 0 (\(\beta_{1-2}\) and \(\beta_{2-1}\)), while the \(\beta_1\), within children, and the infectious period length stay the same as before.

1.3.6 Comment the outbreak dynamics in the two groups. Why do you think this happens?

While the outbreak in the children and young group is similar as before, the one in the adults does not happen. This is because the \(R_0\) in the adults is < 1, 0.3 * 3 = 0.9.

This assumption is not very credible for various reasons: the contacts between children and their families, the adults working in the school, etc. To correct for this set the \(\beta_{1-2}\) and \(\beta_{2-1}\) to 0.1.

1.3.7 Comment on how the dynamic of the flu outbreak changes in the two groups (for students compare it with the answer to question 2.2.6). Why do you think this is important?

Now the flu spreads also in the adult group, causing a major outbreak where almost half of the adult group (280 over 600) gets infected by the end of the outbreak, on day 71. The peak happens on day 28, when there are 34 infectious and 441 susceptible individuals. In the children & young group, the outbreak peak happens 2 days earlier, on day 26, when there are 146 susceptible and 23 infectious individuals, and the outbreak is over on day 60.

It is important because it shows how a population that cannot maintain an infectious disease outbreak can be invaded even with limited contacts with another population.