-
Notifications
You must be signed in to change notification settings - Fork 64
[C4GT Community]: Support full GEO based downloads #229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi Saket Sir, I came from C4GT and would like to contribute to this feature request for pysradb. I'm interested in extending the GEO support to download and parse GEO Matrix files, and I'm ready to work on adding CLI flags, handling file quirks, and implementing conversion to a clean .tsv format. Please assign this issue to me. |
hi @saketkc please assign me this issue, I can work on that as i have very good skills in python and machine learning |
Hi @saketkc, I’d love to contribute to this feature under C4GT. I have experience with Python, bioinformatics file parsing, and CLI tooling. I can implement GEO matrix file download, parsing to |
Hi Saket, I’ve thoroughly reviewed the GitHub repository and examined the codebase related to this project. I believe that adding support for GEO Matrix files, as outlined in the feature request, would be a valuable and impactful enhancement. I'm confident in my ability to contribute to this feature and would love the opportunity to work on it. I’ve also drafted an initial approach for the implementation and would be happy to discuss it further. Could you please assign this issue to me? Best regards, |
Hi @saketkc, I'm from C4GT and would like to contribute to the GEO Matrix file support feature in pysradb. I'm ready to handle CLI flags, manage file quirks, and implement conversion to a clean .tsv format. Kindly assign this issue to me. |
Hello @saketkc, I’ve successfully implemented the functionality to identify and download GEO Matrix files based on GEO accession numbers (e.g., GSE10072) and convert them into -Dynamic GEO Matrix File Downloading: Constructs URLs and downloads
I have thoroughly tested the script using the GEO accession I’m now ready to submit my contribution and would appreciate any feedback or suggestions. Let me know if further refinements are required. **Python Script of my project: ** def download_geo_matrix(accession):
def parse_geo_matrix_to_tsv(input_file, output_file):
def main():
if name == "main": Here’s what has been accomplished:
Testing
|
Hello @saketkc sir, I'd love to take up this issue as part of my contribution to the C4GT Community. Here's how I plan to approach it: 🔧 Implementation Plan:
🧪 Test Plan:
Would love your input on this approach! Let me know if there are any constraints or preferences I should keep in mind. Thanks! |
Hi @saketkc, I'm excited to express my interest in contributing to the pysradb project, particularly in extending its capabilities to support GEO Matrix file downloads and parsing. I’m particularly enthusiastic about the opportunity to simplify access to processed expression data for GEO users — transforming matrix files into clean, analysis-ready TSV formats. I’m confident that my skills align well with the goals of this enhancement, from developing efficient parsers to implementing intuitive CLI extensions, and ensuring robust documentation and testing. I'm fully committed to delivering a seamless experience that will help researchers spend less time on data wrangling and more time on scientific discovery. I'm looking forward to learning from the team and contributing meaningfully to pysradb's growth! Best regards, |
Hi @saket Choudhary, I’d like to work on extending pysradb to support GEO Matrix file downloads and parsing. Here’s how I’ll approach it: GEO Matrix Download
TSV Conversion
CLI & Documentation
Testing
Can you pls assign this task to me? I will keep the implementation light and well-documented. |
Description
Currently,
pysradb
primarily focuses on fetching metadata and data from SRA. However, many GEO datasets are linked to SRA, and users often require access to GEO-specific files, especially GEO Matrix files, which contain processed expression data.This feature request aims to extend
pysradb
to:.txt
or.gz
format)..tsv
format, making it easier for users to load and analyze them.Why this is useful:
Many GEO users want quick access to processed expression data. Integrating GEO matrix file support would make
pysradb
a one-stop tool for both raw and processed data, improving its utility for transcriptomics, genomics, and bioinformatics users.Additional Context:
!
) followed by a data table.Goals
Goals
Add functionality to identify and download GEO Matrix files given a GEO accession (e.g., GSEXXXXXX).
Implement a parser that reads the downloaded matrix file and outputs it as a
.tsv
.Handle common GEO Matrix file quirks (such as metadata headers or comments starting with
!
).Update the documentation with examples.
Provide CLI flags/subcommands, e.g.,
Bonus (Optional):
.gz
) and uncompressed formats.Expected Outcome
The final module will allow a user to download the full GEO record with the matrix file parsed as a dataframe.
Acceptance Criteria
No response
Implementation Details
There is already some GEO support available, you will extend this class to add support for GEO based downloads
Mockups/Wireframes
No response
Product Name
pysradb
Organisation Name
C4GT
Domain
No response
Tech Skills Needed
Python
Organizational Mentor
Saket Choudhary
Angel Mentor
No response
Complexity
Medium
Category
Research
The text was updated successfully, but these errors were encountered: