Open
Description

My question
I split into 12 chromosomes and parallel multiple jobs at a time. As the picture says, the process occupy just very low(1%) CPU usage per job, and so it ran very slow in my server.
My environment:
Slurm HPC server, shared file system might cause this?
version: gatk-package-4.6.1.0-local.jar
My command:
sbatch_script="sbatch_${group_name}.sh"
cat << EOL > $sbatch_script
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=$threads
#SBATCH --mem=$MEM
#SBATCH --time=0
#SBATCH --partition=CPU
#SBATCH --output=logs/$P.${group_name}.log
#SBATCH --error=logs/$P.${group_name}.log
threads=$threads
P=$P
group_file=$group_file
echo "Running parallel tasks for group: \$group_file"
parallel --tmpdir $gulei/TMP -j $N_tasks --halt 2 --delay 1 '
chr=\$(echo {1} | awk -F"Chr" "{print \\\$2}" | awk -F":" "{print \\\$1}"); \
gatk --java-options "-Xmx${JAVA_MEM}g -Djava.io.tmpdir=./tmp" \
GenotypeGVCFs -R $ref -V gendb://genomeDB_$P/Chr\$chr \
--max-genotype-count 2048 --genomicsdb-shared-posixfs-optimizations \
-new-qual \
-O $P/$P.Chr\$chr.raw.vcf.gz 1>$gatk_logs/$P.Chr\$chr.GenotypeGVCFs.log 2>&1
' ::: \$(cat \$group_file)
EOL
sbatch $sbatch_script
done
Metadata
Metadata
Assignees
Labels
No labels