Periodic Sampling (COVID-19 Data)

This repository explores periodic trends in reported case and death data for multiple diseases. This work supports the paper 'Identification and Attribution of Weekly Periodic Biases in Global Epidemiological Time Series Data', currently available as a preprint on medRxiv.

We also provide a package for comprehensive Bayesian Inference and Gibbs Sampling methods to explore this periodic data trends in real or synthetic Covid-19 case data.

Periodic Data Trends

Covid-19 Data

We import Covid-19 case and death data from the John Hopkins Database. This data uploaded into separate .csv files on a daily basis, and so routines in the analysis module are provided to generate location-specific files over the history of the pandemic.

For example:

from analysis import generate_location_df

input_dir = "COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/"
location_key = "England, United Kingdom"

country_df = generate_location_df(input_dir, location_key)
country_df.to_csv("data/England_data.csv")

More detailed examples (along with cleaning procedures for the data) are given in data_trends.ipynb. Currently these procedures are not packaged into a separate method, but this may be updated in the future.

Further information about this data (such as collection methods) can be found in a dedicated README. Pre-generated example data files are also available.

Other Diseases

We also provide daily case data from the 1918 Spanish Flu and 2022 Haitian Cholera epidemics, in other_diseases.

Periodic Reporting Trends

In this data we typically observe a strong oscilatory trend, as depicted in both the cases and death data from England, UK. The raw daily data is given in grey, with a 7-day moving average (typically used in most publications) superimposed in colour.

There are consistent over/under reporting trends on particular weekdays across the duration of the pandemic. These may be quantified through a reporting factor, given by the ratio of observed cases on a given day to the 7-day average about that day. The distribution of reporting factor for each dataset is given below:

A global analysis of these trends is further provided in global_pca.ipynb.

Origin of Bias

We further use a dataset from PHE that distinguishes between the true date of death, and the date the death has been attributed to on online reporting systems. From analysis in periodicity_analysis.ipynb, we identify a weekly oscillation in the death data grouped by reporting date that is not present in the true event date, suggesting that this weekly trend is fully attributable to biases in the reporting process.

$R_{t}$ Inference

Synthetic Data

To benchmark inference approaches with a known ground truth, we generate synthetic pandemic data using a renewal model framework. Alongside this are provided various reporter functions, which can return/save this data in .csv format, as well as applying various reporting biases to replicate the trends described above.

An example of this process is given below:

from synthetic_data import RenewalModel, Reporter

model = RenewalModel(R0=0.99)
model.simulate(T=200, N_0=500)

rep = Reporter(model.case_data)
truth_df = rep.unbiased_report()
bias_df = rep.fixed_bias_report(bias = [0.5, 1.4, 1.2, 1.1, 1.1, 1.1, 0.6],
                                multinomial_dist=True)

This would generate the following data:

All functions have complete docstrings to record their functionality and expected arguments. Further detail is also given in the README for the periodic_sampling module.

Inference Methods

Both Metropolis-Hastings and Gibbs sampling methods are implemented for use in Bayesian inference. These have separate parameter and sampling classes, but a combined ('mixed') sampling method is also implemented to allow inference on multiple parameters of different types. We also utilise independent sampling for the discrete case values in inference of the ground truth time series.

This flexible implementation is applicable to a wide range of problems, with some examples from Ben Lambert's "A Student's Guide to Bayesian Statistics" given in exampler.ipynb. These methods are then applied to the inference of the true time series from the biased time series, under various assumptions described in a separate README.

We also introduce a number of methods in Stan using a No U-Turn Sampler, to handle larger populations without the computational limits we have imposed on our mixed sampler through the use of independent sampling on the time series. An example of predictions for the timeseries and reproduction number profile (based on the posterior mean) is given below:

Name		Name	Last commit message	Last commit date
Latest commit History 226 Commits
COVID-19 @ f9e0b3b		COVID-19 @ f9e0b3b
UK_raw_data		UK_raw_data
data		data
images		images
other_diseases		other_diseases
periodic_sampling		periodic_sampling
stan_inference		stan_inference
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
benchmark_analysis.ipynb		benchmark_analysis.ipynb
kernprof.py		kernprof.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Periodic Sampling (COVID-19 Data)

Periodic Data Trends

Covid-19 Data

Other Diseases

Periodic Reporting Trends

Origin of Bias

$R_{t}$ Inference

Synthetic Data

Inference Methods

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

KCGallagher/periodic-sampling

Folders and files

Latest commit

History

Repository files navigation

Periodic Sampling (COVID-19 Data)

Periodic Data Trends

Covid-19 Data

Other Diseases

Periodic Reporting Trends

Origin of Bias

$R_{t}$ Inference

Synthetic Data

Inference Methods

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages