Skip to content

EVclades is an Enterovirus-adapted version of Art Poon’s fluclades pipeline for automated clade assignment.

Notifications You must be signed in to change notification settings

hodcroftlab/EVclades

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Genetic Taxonomy of Enterovirus D68 Clades

EVclades is an Enterovirus-adapted version of Art Poon’s fluclades pipeline for automated clade assignment. This version has been customized for Enterovirus D68 (EV-D68), using protein-level phylogenies and metadata to define robust, reproducible clades based on tree structure and sequence divergence.

Data

Sequence and metadata files (sequences.fasta, metadata.tsv, reference.gbk, etc.) were loaded using the Nextclade D68 ingest pipeline.

Scripts

  • Snakefile – Manages the workflow, calling all relevant scripts and tools.

    Integrates Nextstrain commands including augur index, augur filter, nextclade3 run (for alignment), augur tree, and augur refine, alongside the custom Python and R scripts below.

  • relabel-fasta.py – Replaces FASTA headers using a CSV generated by the filtering step and RIVM subgenotype annotations.

  • compress-seqs.py – Removes exact duplicate sequences from FASTA input, retaining the first occurrence and writing duplicates to a CSV for traceability.

  • subtyping.py – Implements nodewise clustering by calculating divergence and patristic distances at internal nodes to assign sequences to clades.

  • chainsaw.py – Python script for edgewise clustering based on internal branch lengths. Requires Biopython.

    • Run with no arguments to print a histogram of branch lengths.

    • Use -cutoff to define a threshold for subtree cutting.

    • Use f to select output format: summary (default) or labels (CSV listing tip-to-subtree assignments).

      Also computes normalized mutual information between subtree assignments and known subtype labels.

  • auto-chainsaw.py – Automates chainsaw.py runs across a range of cutoffs to explore clustering behavior.

    Used to generate data for Figures 2A and 3A. Input trees are reconstructed with FastTree2; outputs are written to stdout in CSV format.

  • plot-trees.R – Uses ggfree to visualize full EV-D68 phylogenies, with branch coloring based on clade/subtype assignments.

  • chainsaw-plot.R – Plots the number of subtrees produced by chainsaw.py as a function of the internal branch length cutoff.

    Helps visualize parameter sensitivity for EV-D68 protein phylogenies.

  • coldates.R – Generates a barplot of EV-D68 sequence deposition by year. Originally created for supplementary material.

  • subtree-grid.R – Produces grid-based summary figures to visually compare subtree clustering results across different parameters.

About

EVclades is an Enterovirus-adapted version of Art Poon’s fluclades pipeline for automated clade assignment.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published