Nextstrain Template

This repository provides a comprehensive Nextstrain analysis of "your virus". You can choose to perform either a shorter run with specific proteins or a full genome run.

For those unfamiliar with Nextstrain or needing installation guidance, please refer to the Nextstrain documentation.

Prerequisites

Ensure you have the following installed:

Python=3.8 or higher
Micromamba or Conda
Snakemake=7
Nextstrain CLI

Nextstrain Environment

Install the Nextstrain environment by following these instructions.

Installation

Clone the repository:

git clone [email protected]:hodcroftlab/template_nextstrain.git
cd template_nextstrain

Install the Nextstrain environment:

micromamba create -n nextstrain \
  --override-channels --strict-channel-priority \
  -c conda-forge -c bioconda --yes \
  augur auspice nextclade \
  snakemake=7 git ncbi-datasets-cli

micromamba activate nextstrain

Update/install additional dependencies:

sudo apt-get update
sudo apt-get install -y unzip

micromamba install -c conda-forge -c bioconda csvtk seqkit tsv-utils ipdb entrez-direct
micromamba install -c conda-forge fuzzywuzzy python-dotenv ipykernel

Enhancing the Analysis

The data for this analysis is available from NCBI Virus. Instructions for downloading sequences are provided under Sequences.

Repository Organization

This repository includes the following directories and files:

scripts: Custom Python scripts called by the snakefile.
snakefile: The entire computational pipeline, managed using Snakemake. Snakemake documentation can be found here.
ingest: Contains Python scripts and the snakefile for automatic downloading of <your_virus> sequences and metadata.
<protein_xy>: Sequences and configuration files for the specific protein_xy run.
whole_genome: Sequences and configuration files for the whole genome run.

Configuration Files

The config, protein_xy/config, and whole_genome/config directories contain necessary configuration files:

config.yaml: Configuration file for setting parameters and options for the analysis
colors.tsv: Color scheme
geo_regions.tsv: Geographical locations
lat_longs.tsv: Latitude data
dropped_strains.txt: It will exclude these accessions during augur filter
clades_genome.tsv: Manually Labeling Clades on a Nextstrain Tree (see documentation here)
reference_sequence.gb: Reference sequence (add manually)
auspice_config.json: Auspice configuration file - has to be in all data folders!

The reference sequence used is XYZ, accession number, sampled in 19XX.

Usage Examples

Running a Build

Activate the Nextstrain environment:

micromamba activate nextstrain

To perform a build, run:

snakemake --cores 9 all

For specific builds:

protein_xy build:

snakemake auspice/<your_virus>_protein_xy.json --cores 9

Whole genome build:

snakemake auspice/<your_virus>_whole-genome.json --cores 9

Visualizing the Build

To visualize the build, use Auspice:

auspice view --datasetDir auspice

To run two visualizations simultaneously, you may need to set the port:

export PORT=4001

Ingest

For more information on how to run the ingest, please refer to the README in the ingest folder.

Sequences

Sequences can be downloaded manually or automatically.

Manual Download: Visit NCBI Virus, search for <your_virus> or Taxid XXXXXX, and download the sequences.
Automated Download: The ingest functionality, included in the main snakefile, handles automatic downloading.

The ingest pipeline is based on the Nextstrain RSV ingest workflow. Running the ingest pipeline produces data/metadata.tsv and data/sequences.fasta.

Acknowledgments

Contact

For questions or support, please contact [[email protected]].

Feel free to adjust the content according to your project's specifics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nextstrain Template

Table of Contents

Prerequisites

Nextstrain Environment

Installation

Enhancing the Analysis

Repository Organization

Configuration Files

Usage Examples

Running a Build

Visualizing the Build

Ingest

Sequences

Acknowledgments

Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
config		config
data		data
genome/config		genome/config
ingest		ingest
protein_xy/config		protein_xy/config
scripts		scripts
.env		.env
.gitignore		.gitignore
README.md		README.md
snakefile		snakefile

hodcroftlab/template_nextstrain

Folders and files

Latest commit

History

Repository files navigation

Nextstrain Template

Table of Contents

Prerequisites

Nextstrain Environment

Installation

Enhancing the Analysis

Repository Organization

Configuration Files

Usage Examples

Running a Build

Visualizing the Build

Ingest

Sequences

Acknowledgments

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages