Skip to content

Migrate from MongoClient to AsyncMongoClient #1145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

eecavanna
Copy link
Collaborator

🚧 THIS BRANCH IS UNDER CONSTRUCTION 🚧

On this branch, I migrated the Runtime from using MongoClient to use AsyncMongoClient. That involved updating the pymongo version and doing the things listed in the official migration guide.

Details

The summary is written in the past tense, but it isn't true yet. Everything here is in progress.

Related issue(s)

Fixes #787

Related subsystem(s)

  • Runtime API (except the Minter)
  • Minter
  • Dagster
  • Project documentation (in the docs directory)
  • Translators (metadata ingest pipelines)
  • MongoDB migrations
  • Other

Testing

  • I tested these changes (explain below)
  • I did not test these changes

I tested these changes by...

Documentation

  • I have not checked for relevant documentation yet (e.g. in the docs directory)
  • I have updated all relevant documentation so it will remain accurate
  • Other (explain below)

Maintainability

  • Every Python function I defined includes a docstring (test functions are exempt from this)
  • Every Python function parameter I introduced includes a type hint (e.g. study_id: str)
  • All "to do" or "fix me" Python comments I added begin with either # TODO or # FIXME
  • I used black to format all the Python files I created/modified
  • The PR title is in the imperative mood (e.g. "Do X") and not the declarative mood (e.g. "Does X" or "Did X")

@eecavanna eecavanna linked an issue Aug 14, 2025 that may be closed by this pull request
@eecavanna eecavanna self-assigned this Aug 14, 2025
@eecavanna
Copy link
Collaborator Author

I did some performance profiling (of individual requests) using pyinstrument and found that the /nmdcschema/{collection_name} endpoint is spending most of its time, by far, generating the pagination token.

image

As of now, on this branch, that part of the endpoint still uses the synchronous Mongo client (indeed, it's the only part that uses it). I will focus on migrating that to use the asynchronous one (I did it last night, but at the expense of breaking Dagster in my local environment—I'll do it differently today).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explore migrating from pymongo MongoClient to AsyncMongoClient
1 participant