Skip to content

hodcroftlab/poliovirus_recombination

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Enterovirus C: Phylogenetic & Recombination Analysis

STILL UNDER CONSTRUCTION

This repository provides the code for a phylogenetic and recombination analysis of the Enterovirus C species, presented in my (unpublished) MSc thesis.

Whole genome (>6000 bp) enterovirus C sequences (taxid:138950) were downloaded from NCBI Virus in May 2024; a total of 1940 sequences EV-C sequences were retained for the analysis.

Snakemake was used as a workflow management system; the Snakefile provided in this repository contains the computational pipeline for both the phylogenetic and recombination analysis. The scripts directory contains individual Python and R scripts that are called by the Snakefile.

Phylogenetic Analysis using Nextstrain

Phylogenetic analysis was performed based on the whole genome alignment and alignments of all individual genes as well as the 5' untranslated region using the Nextstrain phylogenetics pipeline. The interactive trees generated through this code can be interactively explored on the Nextstrain website.

To install the Nextstrain environment, follow these instructions. Once the Nextstrain environment has been set up and activated, phylogenetic analysis can be performed by executing snakemake --cores 1 export_all. Alternatively, the rules specified in the Snakefile can be executed individually in a step-wise manner. The generated trees (in JSON format) can be visualized using the auspice view command (not included in the Snakefile; use auspice view -h for help).

Refer to the Nextstrain publication and Nextstrain documentation for more information on the project.

Recombination Analysis

Recombination analysis was performed using a custom similarity plotting approach (inspired by SimPlot and SimPlot++) and the recombination detection method VirusRecom. The code to generate the similarity plots is provided in the scripts/custom_simplots.py and scripts/custom_simplots_extended.py scripts.

The similarity plots and VirusRecom results for all 1940 sequences can be downloaded here for exploration.

... more to come soon!

About

Phylogenetics and recombination analysis of Enterovirus C with a focus on vaccine-derived polioviruses

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published