-
Notifications
You must be signed in to change notification settings - Fork 45
fix: Add support for Drosophila melanogaster #142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes update two scripts by refining how attributes and parsed values are handled. In the R script, the attribute Changes
Sequence Diagram(s)sequenceDiagram
participant RScript as get-transcript-info.R
participant AttrFunc as use_if_available
participant Data as available_attributes
RScript->>AttrFunc: Check for "ensembl_transcript_id_version"
alt Attribute Exists
AttrFunc-->>RScript: Return attribute value
else Attribute Missing
AttrFunc-->>RScript: Return NULL/skip
end
RScript->>RScript: Rename column with any_of in other_annotations
sequenceDiagram
participant Func as extract_study_items
participant Input as raw_input_string
participant Split as parts_list
Func->>Input: Receive input string
Func->>Split: Split string by delimiter
Note over Func: Use parts[-2] for gene, parts[-1] for value
Func->>Func: Process and assign gene data
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🔇 Additional comments (3)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these generalizations. Good catches, and sorry for the inconvenience.
🤖 I have created a release *beep* *boop* --- ## [2.9.0](v2.8.4...v2.9.0) (2025-03-11) ### Features * Make PCA plots interactive and switch to HTML format ([#137](#137)) ([d0e6e78](d0e6e78)) ### Bug Fixes * add support for drosophila melanogaster and other non-standard species ([#142](#142)) ([dedfdf7](dedfdf7)) * Improve error handling for missing or empty column values in dataframe with duplicate definitions ([#138](#138)) ([7f3d590](7f3d590)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
I noticed two errors that occured while working with RNA-Seq data from Drosophila melanogaster and tried to fix them:
1. Gene names for the fruit fly can sometimes have a colon in them, when belonging to a gene family (e.g. His4:CG33869 and His4:CG31611). This leads to an error when selecting the correct name and value in postprocess_go_enrichment.py as they are also separated by a colon. The separation then results in three parts (His4, CG33869, 0.73842) instead of two parts.
2. For Drosophila melanogaster the "ensembl_transcript_id_version" attribute from the Ensembl mart does not exist. This attribute is part of the 3-prime attributes which (I assume?) are not needed for standard RNA-Seq analysis.
Summary by CodeRabbit