Question about using local database #6

yamn29302 · 2025-06-05T07:04:27Z

Hello again!

I have some question about how to work with the -lbdb option.
I suppose this option takes a protein database as input, so I download the desire genomic protein sequences from ensembl. (I'm working on plant species, also wondering if this source is proper for this package?)

The sequences are concatenated into a single fasta using cat, and piped into makeblastdb. Everything works fine.

And then, I call uorf4u, but the following message pops up.

So here are my issues.

Shoulde the blast database made up with protein sequences?
Can the blast database be made up with source other than genbank, like ensembl?
The error pops up when "retrieve upstream sequences", so I wonder how does the software do this in local mode? Since the local database is a protein database, which means it doesn't contain upstream information.

BTW. I think it will be nice to add more about this option in the "Example-driven guide" section in the documentation.

Thanks

Y.Z. Zhou

art-egorov · 2025-06-05T12:39:18Z

Hello!

Yes, I believe the issue is that upstream sequences can only be retrieved for proteins whose IDs are found in the NCBI database. This is because the process relies on identifying assemblies from the identical protein database before retrieving the upstream sequences. Yep, this should be stated more clearly in the documentation, I will update.

Best

yamn29302 · 2025-06-05T15:54:34Z

Hello!

Thanks for the explaination.

I wonder will this package be updated with a "full local" mode? In case sometimes we may want to process on unpublished sequences or some genomes don't have annotation on NCBI. Here is a possible way of doing it. Sorry I'm more a biologist than a programmer, can't help much on coding.

Use a different databse instead of blast.
I've used a package called cblaster. It searchs the presence of mutiple target genes and define it as a gene cluster, which also works with "above gene" tasks. It uses a different way to store genomic data, likely to be related to SQL and diamond. The input of the makedb command are gbk/gtff files, which should contain the information of NTs, AAs and the position information. I guess this way can handel both the mORF amino acid sequences and the uORF upstream sequences properly.
https://cblaster.readthedocs.io/en/latest/guide/makedb_module.html
After searching through database, retrieve N bases at the upstream of each hit. (N can be specified by user with a default value).
Perform the uorf4u algorithm, and return the result.

Best

Y.Z. Zhou

art-egorov · 2025-06-09T07:57:33Z

Hello again!
Sorry for a delayed reply.
You can also use uorf4u if you don't expect to find your proteins in ncbi. In that case you can provide set of upstream sequences and they will be used to find if there are conserved uORFs.
Below screen from documentation page (https://gca-vh-lab.github.io/uorf4u/ExampleDrivenGuide/cmd_guide/)

Best

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about using local database #6

Question about using local database #6

yamn29302 commented Jun 5, 2025

art-egorov commented Jun 5, 2025 •

edited

Loading

Uh oh!

yamn29302 commented Jun 5, 2025

Uh oh!

art-egorov commented Jun 9, 2025

Uh oh!

Question about using local database #6

Question about using local database #6

Comments

yamn29302 commented Jun 5, 2025

art-egorov commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yamn29302 commented Jun 5, 2025

Uh oh!

art-egorov commented Jun 9, 2025

Uh oh!

art-egorov commented Jun 5, 2025 •

edited

Loading