EVclades is an Enterovirus-adapted version of Art Poon’s fluclades pipeline for automated clade assignment. This version has been customized for Enterovirus D68 (EV-D68), using protein-level phylogenies and metadata to define robust, reproducible clades based on tree structure and sequence divergence.
Sequence and metadata files (sequences.fasta
, metadata.tsv
, reference.gbk
, etc.) were loaded using the Nextclade D68 ingest pipeline.
-
Snakefile
– Manages the workflow, calling all relevant scripts and tools.Integrates Nextstrain commands including
augur index
,augur filter
,nextclade3 run
(for alignment),augur tree
, andaugur refine
, alongside the custom Python and R scripts below. -
relabel-fasta.py
– Replaces FASTA headers using a CSV generated by the filtering step and RIVM subgenotype annotations. -
compress-seqs.py
– Removes exact duplicate sequences from FASTA input, retaining the first occurrence and writing duplicates to a CSV for traceability. -
subtyping.py
– Implements nodewise clustering by calculating divergence and patristic distances at internal nodes to assign sequences to clades. -
chainsaw.py
– Python script for edgewise clustering based on internal branch lengths. Requires Biopython.-
Run with no arguments to print a histogram of branch lengths.
-
Use
-cutoff
to define a threshold for subtree cutting. -
Use
f
to select output format:summary
(default) orlabels
(CSV listing tip-to-subtree assignments).Also computes normalized mutual information between subtree assignments and known subtype labels.
-
-
auto-chainsaw.py
– Automateschainsaw.py
runs across a range of cutoffs to explore clustering behavior.Used to generate data for Figures 2A and 3A. Input trees are reconstructed with FastTree2; outputs are written to stdout in CSV format.
-
plot-trees.R
– Usesggfree
to visualize full EV-D68 phylogenies, with branch coloring based on clade/subtype assignments. -
chainsaw-plot.R
– Plots the number of subtrees produced bychainsaw.py
as a function of the internal branch length cutoff.Helps visualize parameter sensitivity for EV-D68 protein phylogenies.
-
coldates.R
– Generates a barplot of EV-D68 sequence deposition by year. Originally created for supplementary material. -
subtree-grid.R
– Produces grid-based summary figures to visually compare subtree clustering results across different parameters.