Skip to content

Sequence db size != result db size #996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jpecar opened this issue May 19, 2025 · 5 comments
Open

Sequence db size != result db size #996

jpecar opened this issue May 19, 2025 · 5 comments

Comments

@jpecar
Copy link

jpecar commented May 19, 2025

Hi,

My users reported hitting this error on MMseqs2-17 on our HPC beegfs parallel storage. I worked with them to isolate a small and quick way to reproduce it consistently.

We run mmseqs easy-linclust input.faa ccl_clus tmp --min-seq-id 1.0 -c 1.0 --cov-mode 0 --threads 1 -v 3 --remove-tmp-files 1 and this works as expected. As soon as we increase --threads to 2 we hit db size inconsistency.

I did some digging in the code and suspect that issue is timing related and comes from OpenMP scheduling in Util::ompCountLines. I see similar case was already worked around in issue #210. Can you comment on this and suggest a workaround or patch?

@milot-mirdita
Copy link
Member

That is a very surprising error. Could you upload the input set so I can try to reproduce this locally?

@milot-mirdita
Copy link
Member

Does this only happen with tmp on beegfs?

@jpecar
Copy link
Author

jpecar commented May 19, 2025

input.faa.gz

So far yes, we only hit this on beegfs.

We also have another scenario hitting this error from foldseek but I don't know what version of MMseqs2 is embedded into that binary.

@milot-mirdita
Copy link
Member

Could you please upload the full terminal output of one of the runs that crashes with this issue?

@jpecar
Copy link
Author

jpecar commented May 19, 2025

output_fail.txt

This is stdout only, but stderr only has the title error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants