In this paper, we build a mechanistic system to understand the relation between a reduction in human mobility and Covid19 spread dynamics within New York City. To this end, we propose a multivariate compartmental system that jointly models smartphone mobility data and case counts during the first 90 days of the epidemic. Parameter calibration is achieved through the formulation of a general statisticalmechanistic Bayesian hierarchical model. The opensource probabilistic programming language Stan is used for the requisite computation. Through sensitivity analysis and outofsample forecasting, we find our simple and interpretable model provides quantifiable evidence for how reductions in human mobility altered early case dynamics in New York City.
The global Covid19 pandemic has underscored the importance of mathematical and statistical models in understanding disease dynamics, assessing policy efficacy, and examining counterfactual scenarios to formulate thorough costbenefit analyses. Lockdown measures can have drastic impact on individual wellbeing as well as society and the economy at large [
Raw Daily Case Counts and Mobility Time Series.
Mathematical modeling in epidemiology has a long history, famously dating back to the eighteenth century with the work of Bernoulli [
In modeling populationlevel data with a compartmental system, the population is typically subdivided into separate homogeneous groups. Here we focus on reviewing the simple SIR model developed in 1927 by A. G. McKendrick and W. O. Kermack to model a plague outbreak in Bombay [
The progression of the disease throughout the population depends upon the contact rate between susceptible and infectious individuals, the probability of transmission upon contact, and the prevalence of disease. To mathematically capture these factors and express the rate of new infections, let λ be defined as a per capita contact rate among individuals. In this way,
The remaining parameter γ is interpreted by considering that
The system of differential equations in (
A steady state or equilibrium of a dynamical system is a point
Through elementary matrix operations,
For general compartment models that extend the simplistic SIR framework, computing the basic reproductive number can be difficult. [
Let
Next,
In this section, we detail the construction of our compartmental model designed to formulate an understanding of how reduction in human mobility might have altered early infection dynamics within New York City. The model is parsimonious, in that it consists of only four compartments and shares many wellestablished mathematical properties of the SIR model discussed in the background section. The dynamical system proposed comprises a closed population divided into susceptible, lockdown
Proposed SLIR compartmental model.
The system of differential equations describing the changing system are
We present our methodology in a general hierarchical framework to facilitate Bayesian inference of compartmental system parameters in equation (
We first establish notation to represent the mechanistic system of differential equations in the middle of the hierarchy. Let the equations in (
The top of the hierarchy is described by formulating a measurement process for the two outcome variables. Let the first outcome of interest be labeled
In this section, we present two simulation studies and conclude with the New York City analysis. First, the proposed SLIR compartmental model is used to simulate data from two lockdown scenarios that affect human mobility differently. We then fit our Bayesian model to assess whether the true parameter values are adequately recovered. After, we analyze the realworld mobility and case count data that initially inspired the model formulation.
To illustrate the nonlinear dynamics of which our model can capture, the first simulation reflects the idealized scenario of strict adherence to lockdown and mitigation measures, when population movement is quickly reduced in the early stages of the outbreak and remains reduced for the next 90 days. In this case, individuals move from the
In the second scenario, we consider weak adherence to mitigation measures and illustrate the substantial change in dynamics by only altering the speed of flow back into the susceptible population from the
Model Fit to Simulated Data.
We fit our model to both scenarios using Stan’s NUTS algorithm with 4 chains and 5,000 iterations each, the first half of which are discarded as warmup. The convergence of the parameter chains are judged by inspecting the trace plots along with the GelmanRubin
Simulation Study.
Data  Parameter  Truth  Median  GR 

5  4.93  4.720–5.130  1.00  
Simulation 1  γ  0.1  0.090  0.086–0.150  1.00 
0.05  0.045  0.040–0.051  1.00  
0.1  0.095  0.040–0.051  1.00  
5  4.79  4.590–5.010  1.00  
Simulation 2  γ  0.1  0.099  0.096–0.104  1.00 
0.05  0.045  0.040–0.051  1.00  
0.1  0.095  0.086–0.106  1.00 
In both cases, the Bayesian hierarchical model is able to infer the structural parameters of the SLIR model. Notice the change in case counts of both scenarios resulting from different susceptible population sizes.
The entire leadup thus far was requisite background material for compartmental model inference and application to realworld data. As mentioned in the introduction, our main motivation for this article was to understand how a reduction in mobility affected early Covid19 dynamics specifically within NYC. For convenience, we restate the data sources. Case counts are reported by the official website of the City of New York, available at
To achieve parameter calibration, we fit in Stan our full hierarchical model using four chains run for 10,000 iterations each. We discard the first 5,000 as warmup. Sufficient posterior exploration is assessed by examining parameter chain plots below and assessing
New York City Analysis.
Data  Parameter  Median  GR 

5.130  4.841–5.448  1.00  
New York City  γ  0.212  0.191–0.234  1.00 
0.115  0.106–0.124  1.00  
0.022  0.019–0.024  1.00 
The fitted time series are presented below on the right, along with the raw data used to train the model. On the left, prior predictive distributions are included to illustrate the degree in which Bayesian learning occurs after observing the data. We also include the fit of an SIR model to illustrate its inability to capture the dynamics.
To assess the structural fit of our hypothesized SLIR mechanism, we next interpret parameter values to ensure they are logical and consistent with outside literature. The
The posterior of γ has a median value of 0.21 and a 95% credible interval of (0.191, 0.234). From this, we arrive at an estimate of approximately 5 days for the average infectious removal time. This estimate could be reasonable assuming asymptomatic or presymptomatic transmission is possible and that individuals isolate at the onset of symptoms; see [
We conclude the New York City analysis by assessing the sensitivity of the model to changing mobility levels. Additionally, we assess outofsample predictive capacity of the SLIR model. To perform the sensitivity analysis, we first generate a range of hypothetical mobility scenarios, from full mobility reduction through extremely stringent lockdown measures to a more mild decline. This is displayed in Figure
To conclude this section, we analyse the outofsample forecasting ability of the SLIR model when trained on only a subset of the ninety day period. We consider for illustration a two, three, and four week training intervals. These are indicated by vertical dashed lines in Figure
SLIR Prior and Posterior Predictive Checks.
SLIR Outofsample Forecasts.
The predictive median is illustrated with a solid line. With only two weeks observed, the predictive median is reasonably representative of the future trajectory but with very large uncertainty. The upper 95% predictive curve for the two week window reaches roughly 400,000 infections, but is clear that the predictive interval quickly contracts as more data is observed (cf. Figure
In this work, we have formulated a basic extension of the classical SIR model to jointly fit cell phone transit mobility data and case counts. By jointly modeling two outcome variables, we establish a mechanistic correspondence between reduced mobility and infection dynamics. The applied analysis and findings, however, are limited to NYC during the first 90 days. The disease progression throughout the region was wellapproximated by deterministic dynamics and modeling with a simple compartment system. In other geographic regions, the dynamics might not be as suitably wellbehaved or understood. It is difficult to capture the myriad of factors contributing to disease spread throughout a population, and often a stochastic model may be more appropriate. Additionally, the time horizon considered in our applied analysis is relatively short. As a disease progresses throughout the population and becomes more widespread, the strategy of informing the susceptible population numbers through cell phone mobility data becomes more difficult. It is also worth consideration about whether in general smartphone mobility data can be considered a representative sample of individuals’ adherence to mitigation protocols. It is therefore difficult to build a quantitative understanding of the dynamics between lockdown measures and disease progression over long periods of time. A direction for future research is to establish modeling strategies for such scenarios. This will introduce additional challenges of accounting for population immunity, vaccinations, and natural seasonality effects.
We briefly discuss for both discrete and continuous time how the deterministic SLIR model can be relaxed through a statespace (or Hidden Markov Model) framework, where stochasticity is introduced into the underlying driving epidemic process. We first discuss a discretetime extension by modifying equation (
An alternative approach is to incorporate continuoustime stochasticity directly by expressing
The hierarchical model of the previous section crucially depends upon the numerical solution to a coupled set of differential equations. We briefly review the popular numerical integration schemes and connect them with our hierarchical model of the compartmental system in (
Consider
Euler’s approximation is easy to execute and based upon a first order Taylor expansion. It is also the least accurate. The Trapezoidal and Modified Euler approximations in (
Higher order Taylor expansions produce other iterative schemes. Thus, second order methods emerge from
The RungeKutta methods are among the most conspicuous of numerical methods for solving systems of ordinary differential equations. The underlying idea is to achieve the same accuracy as Taylor series updates without requiring higher order derivatives of
More generally, the explicit RK methods of order
The appropriate step size
Our choice of implementation in Stan, as opposed to more traditional BUGS or JAGS [
Hamiltonian Monte Carlo (HMC) is more equipped to sample from complex posterior distributions with high autocorrelations than standard Metropolis schemes. The most popular presentation of Hamiltonian Monte Carlo is by way of analogy with statistical mechanics. Let 𝜃 be an arbitrary
We begin by recalling the more conspicuous Metropolis random walk and the concept of detailed balance. Let
The underlying idea behind HMC is that instead of generating the proposed value from a random distribution, we use a deterministic
The auxiliary variable,
In order to sample from (
We provide some further intuition on the timereversibility of the simple HMC algorithm. The key to this result is that the leapfrog iteration in (
This procedure, while maintaining the stationary distribution through timereversibility, introduces a host of complexities. Perhaps most importantly, tuning the many parameters needed in this process is inherently difficult. This motivated the development of an automatic procedure known as the NoUTurnSampler (NUTS) [