Snakefile to call Mitochondria Short Variant Discovery. Converting the Terra WDL scripts into sankefiles. Follows logic described on the GATK website. Input fasta files and blacklist files for ChrM were downloaded from here
GetContamination rule needs the haplocheckCLI.jar file from the below repo.
git clone https://github.com/leklab/haplocheckCLI.git
Snakemake uses the following format to run snakefiles on a high performance cluster.
/path/to/snakemake -s /path/to/snakefile --cluster-config /path/to/cluster_qsub.yaml --cluster "qsub #variables inside the cluster_qsub.yaml file to add to qsub commands" --jobs #of_jobs_to_submit_at_a_time
The below is how to run this snakefile (MitochondriaPipeline.snakefile) from inside my personal directory.
/mnt/storage/apps/anaconda3/bin/snakemake -s /home/mi724/Tools/MitochondriaPipeline/MitochondriaPipeline.snakefile --cluster-config /home/mi724/Tools/MitochondriaPipeline/config/cluster_qsub.yaml --cluster "qsub -l h_vmem={cluster.h_vmem},h_rt={cluster.h_rt} -pe {cluster.pe} -binding {cluster.binding}" --jobs 30 --rerun-incomplete
The config file contains paths to scripts, tools and input files, aw well as parameters.
This file contains the path to the input bam files from TCGA. Follow the format provided in the file when adding more files.
This file contains parameters for the qsub submissions. Default applies to all rules, but if a specific rule name is specified in the file, it will take on those parameters.
This file contains the R commands that were used in the WDL script for the CollectWgsMetrics task.
This file contains the R commands that were used in the WDL script for the CoverageAtEveryBase task.
This file contains the copying commands from google storage to argos for the ChrMT input files.
Snakefile to run the analysis.
- SubsetBamToChrM
- RevertSam
- AlignAndMarkDuplicates - On Not Shifted MT Reference Genome
- AlignAndMarkDuplicates - On Shifted MT Reference Genome
- CollectWgsMetrics
- M2 as CallMt
- M2 as CallShiftedMt
- LiftoverAndCombineVcfs
- MergeStats
- Filter as InitialFilter
- SplitMultiAllelicsAndRemoveNonPassSites
- GetContamination
- Filter as FilterContamination
- CoverageAtEveryBase
- SplitMultiAllelicSites
- SubsetBamToChrM
- RevertSam
- AlignAndCall.AlignAndCall as AlignAndCall
- CoverageAtEveryBase
- SplitMultiAllelicSites
- AlignAndMarkDuplicates.AlignmentPipeline as AlignToMt
- AlignAndMarkDuplicates.AlignmentPipeline as AlignToShiftedMt
- CollectWgsMetrics
- M2 as CallMt
- M2 as CallShiftedMt
- LiftoverAndCombineVcfs
- MergeStats
- Filter as InitialFilter
- SplitMultiAllelicsAndRemoveNonPassSites
- GetContamination
- Filter as FilterContamination
- GetBwaVersion
- AlignAndMarkDuplicates