-
Notifications
You must be signed in to change notification settings - Fork 6
Chromosome sorting in header generates problems when using tryvamos combineVCF #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi Hugo, Thanks for using vamos! As more long-read data are coming out, we recently updated the combineVCF function to save memories on large sample sizes. But this requires sorted input vcfs and a specified chromosome orders of all TR entries. TR entries should have been sorted in vcfs if you run with vamos > 2.1. The chromosome order though, is determined from the vcf headers for now. This issue is caused by inconsistent chromosome orders in the vcf headers. Since the vcf headers are determined by the bam headers. So, probably some of the input bam has chrM comes before chrY in the header. At this moment, I have fixed the chromosome orders the same as our most updated catalog v2.1 (https://zenodo.org/records/13263615). You do not need to change the combineVCF command. Please try the latest push and let me know if there is anymore problem. Also, please make sure that all samples are analyzed using the same version of motif catalog (e.g., v2.1). Best, |
Hi Bida, I see. I will double check the BAM files later to confirm that, but it makes sense that they were the cause. But I guess something like throwing an error once this happens to avoid it blowing up is a good strategy, and it's sort of what I was looking for. I ended up just manually editing the VCF header as a quick fix, since it was just one file, and then ran everything without a problem. But I'll have to rerun this subcommand more times, so I'll test this new push and I think it will work 👍 Thanks for the reply! |
Hi Hugo, Yes. The problem is a bug due to inconsistency between the vcf entry and the header (sicne the header is taken from the bam while vcf entries are basically constrained by the motif catalog). The new push now omits the old logic. It now allows customized specification of chromosome orders (defaults to those in our latest catalog) and checks header entries. It will give an error message if no header contains a specified chromosome. Best, |
Hi,
I was trying to combine multiple VCF files generated by
vamos
, usingtryvamos.py combineVCF
. I was doing this for multiple sets of VCFs, and it was successful for all of them but one, which kept crashing after running out of memory, even when given 400 times more memory than the other working runs.After some debugging, I found the problem. So, the process would get stuck inside a while cycle in combineVCF.getHeader line 71. The reason why this happened was that the
ordering
dictionary looked like this:So, chrY would "point" to chrM and chrM to chrY, causing this infinite loop. I traced back the reason why this was happening and found that in one of my VCFs in this set, chrM was showing up before chrY in the VCF header, whereas in all the others it was the opposite. As said above, all the VCF files I'm using were generated by vamos, some of them by v2.1.3 and others by v2.1.5, all of them given sorted BAM files as inputs. There's only one case where chrM shows up before chrY, so I'm not sure it was my mistake when generating this specific VCF, although I generated it the same way I generated all the others, or if there's some bug that can lead to this behavior.
Best regards,
Hugo
The text was updated successfully, but these errors were encountered: