Skip to content

Improve handling of polymorphism in the optimizer #720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Apr 4, 2025

Conversation

diesieben07
Copy link
Contributor

@diesieben07 diesieben07 commented Mar 18, 2025

This improves handling of polymorphic types in the optimizer, when using Django Polymorphic, InheritanceManager from django-model-utils or a custom solution.

Description

Following improvements are made:

  • For Django-Polymorphic models the optimizer automatically adds the polymorphic_ctype field to only. This field is used by django-polymorphic and would otherwise cause extra queries.
  • A bug is fixed where the optimizer did not honor the prefix when adding the primary key to only. This manifested when using Django Polymorphic.
  • Support for django-model-utils' InheritanceManager is added. When used, the optimizer will use it to support polymorphic models, just like with django-polymorphic.
  • The logic for extracting possible subtypes (interface implementations and union members) was extracted into a separate function. Previously this logic was only used at the top level and not for optimizing connections or OffsetPaginated. This resulted in the optimizer missing fields from subtypes when using connections or OffsetPaginated.
  • The logic for extracting possible subtypes has been made aware of Django Polymorphic and InheritanceManager. In case they are used it will not just return supertypes of the currently optimized function but also subtypes (speaking in the Model world, not in the GraphQL world). This makes it so the optimizer can now see fields in subtypes of Django Polymorphic models and optimize them.

Types of Changes

  • Core
  • Bugfix
  • New feature
  • Enhancement/optimization
  • Documentation

Checklist

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • I have tested the changes and verified that they work and don't break anything (as well as I can manage).

Copy link
Contributor

sourcery-ai bot commented Mar 18, 2025

Reviewer's Guide by Sourcery

This pull request improves the handling of polymorphic types in the optimizer. It fixes a bug where the optimizer did not honor the prefix when adding the primary key to only, and it automatically adds the polymorphic_ctype field to only when using Django-Polymorphic models. It also extracts the logic for extracting possible subtypes into a separate function and makes it aware of Django Polymorphic, enabling optimization of fields in subtypes of Django Polymorphic models. Finally, it adds tests to cover changes to the optimizer for polymorphic types.

Updated class diagram for OptimizerStore

classDiagram
    class OptimizerStore {
        only: List[str]
        select_related: Set[str]
        prefetch_related: Dict[str, Prefetch]
        defer: Set[str]
        only_load: Set[str]
        clear()
        add_select_related(field_path: str)
        add_prefetch_related(field_path: str, queryset: QuerySet)
        add_defer(field_path: str)
        add_only_load(field_path: str)
        apply(queryset: QuerySet, info: ResolveInfo, config: OptimizerConfig) : QuerySet
    }

    note for OptimizerStore "The OptimizerStore class stores optimization hints for a Django QuerySet."
Loading

File-Level Changes

Change Details Files
Adds support for Django Polymorphic models by automatically adding the polymorphic_ctype field to only to prevent extra queries.
  • Adds polymorphic_ctype to the only list in the optimizer store when using Django Polymorphic.
  • Extends the only list with prefixed polymorphic internal model fields.
strawberry_django/optimizer.py
Fixes a bug where the optimizer did not respect the prefix when adding the primary key to only.
  • Adds the prefix to the primary key when adding it to the only list.
strawberry_django/optimizer.py
Refactors subtype extraction logic into a separate function for reuse in optimizing connections and OffsetPaginated queries.
  • Extracts the logic for extracting possible subtypes into the get_possible_concrete_types function.
  • Uses get_possible_concrete_types when optimizing connections.
  • Uses get_possible_concrete_types when optimizing OffsetPaginated queries.
strawberry_django/optimizer.py
strawberry_django/utils/inspect.py
Enhances subtype extraction logic to be aware of Django Polymorphic models, enabling optimization of fields in subtypes.
  • Modifies get_possible_concrete_types to consider subtypes when Django Polymorphic is used.
  • Adds is_polymorphic_model to check if a model is a PolymorphicModel.
  • Adds _can_optimize_subtypes to check if subtypes can be optimized.
strawberry_django/optimizer.py
strawberry_django/utils/inspect.py
Adds tests to cover changes to the optimizer for polymorphic types.
  • Adds test_polymorphic_query_optimization_working to validate that we're not selecting extra fields.
  • Adds test_polymorphic_paginated_query to test polymorphic paginated queries.
  • Adds test_polymorphic_offset_paginated_query to test polymorphic offset paginated queries.
  • Adds test_polymorphic_relation to test polymorphic relations.
  • Adds test_polymorphic_nested_list to test polymorphic nested lists.
tests/polymorphism/test_optimizer.py
tests/polymorphism/models.py
tests/polymorphism/schema.py
tests/polymorphism_custom/test_optimizer.py
tests/polymorphism_custom/schema.py
tests/polymorphism_custom/models.py
Adds a check to use Prefetch when the model is using django-polymorphic.
  • Adds _must_use_prefetch_related to check if Prefetch should be used.
  • Uses _must_use_prefetch_related to determine if Prefetch should be used.
strawberry_django/optimizer.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @diesieben07 - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider adding a test case for when disable_optimization is set to True on a Django definition.
  • It might be worth adding a comment explaining why _interfaces is being cached.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@codecov-commenter
Copy link

codecov-commenter commented Mar 18, 2025

Codecov Report

Attention: Patch coverage is 91.75258% with 8 lines in your changes missing coverage. Please review.

Project coverage is 88.27%. Comparing base (6fe362b) to head (be91823).

Files with missing lines Patch % Lines
strawberry_django/utils/inspect.py 80.95% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #720      +/-   ##
==========================================
+ Coverage   88.21%   88.27%   +0.05%     
==========================================
  Files          42       42              
  Lines        3853     3914      +61     
==========================================
+ Hits         3399     3455      +56     
- Misses        454      459       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@bellini666 bellini666 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general 😊

Just 2 small asks and we can merge/release this

@advl
Copy link

advl commented Mar 23, 2025

Arrived on this PR by chance, currently also exploring polymorphic models with strawberry-django. I'm glad to see there are other folks working on this problem !

... Except I'm using model-utils with Inheritance Manager, that, I learnt the hard way, is a little bit more optimized than django-polymorphic (achieves one query in most cases v 2 for django-polymorphic) . I wouldn't have imagined the library evolving to be compatible with this, but seeing this PR I wonder if there would also be some room to add model-utils related optimizations on the optimizer.py ?

@diesieben07
Copy link
Contributor Author

@bellini666 Thanks! I've addressed your notes. I am unsure why there were 3 pipeline failures now - the error seems very unrelated.

@advl Adapting this for InheritanceManager shouldn't be too difficult. The biggest hurdle I can see at the moment is that the optimizer currently only looks at the Model class and decides what to do based on that. We'd have to look in to how to incorporate the manager here. Although InheritanceManager doesn't do any real magic, so in theory this is all things the optimizer could do even without InheritanceManager.
Definitely something for another PR though imho.

@diesieben07 diesieben07 requested a review from bellini666 March 24, 2025 22:44
@diesieben07 diesieben07 marked this pull request as draft March 25, 2025 14:07
@diesieben07
Copy link
Contributor Author

I have converted this PR to a draft, because I just noticed a flaw with the approach I have taken. It does not work for nested relations (e.g. "get all projects (in a polymorphic way) for a company").
I will look into fixing that first.

Apologies for not catching that earlier.

@diesieben07 diesieben07 force-pushed the optimizer-polymorphic branch from 35e79fe to e911379 Compare March 25, 2025 15:19
@diesieben07 diesieben07 marked this pull request as ready for review March 25, 2025 15:27
@diesieben07
Copy link
Contributor Author

diesieben07 commented Mar 25, 2025

I have fixed the problem with relations not being optimized and added even more tests.
I have also rebased the code onto the current main branch.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @diesieben07 - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider adding a helper function to encapsulate the logic for extracting possible concrete types, as it's used in multiple places.
  • The changes look good overall, but it would be helpful to have a more detailed explanation of the changes in the documentation.
Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟡 Complexity: 1 issue found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

if dj_definition is None or dj_definition.disable_optimization:
return None

if not issubclass(model, dj_definition.model):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Clarify the polymorphic model branch condition.

The condition that falls back to optimizing using the parent model when model is not a subclass of dj_definition.model yet may be a polymorphic model is subtle. It would be helpful to double-check that the intended relationship (using issubclass(dj_definition.model, model)) is the correct one and that there is no confusion between the model hierarchy directions.

@@ -820,16 +825,25 @@

remote_field = model_field.remote_field
remote_model = remote_field.model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider extracting the logic for iterating over concrete polymorphic types into helper functions to reduce nesting and improve readability, especially in functions like _get_model_hints and optimize

It looks like the added branches for polymorphic and concrete type handling have increased the nesting and interleaved concerns in a few functions. To reduce the complexity without reverting your changes, consider extracting the logic for iterating over concrete polymorphic types into helper functions. For example, you could create a dedicated function to aggregate “concrete stores” so that the loops in functions like `_get_model_hints` are simpler:

```python
def _aggregate_concrete_store(model, schema, concrete_type, parent_type, info, config, cache, level):
    store = None
    concrete_store = _get_model_hints(
        model,
        schema,
        concrete_type,
        parent_type=parent_type,
        info=info,
        config=config,
        cache=cache,
        level=level + 1,
    )
    if concrete_store is not None:
        store = concrete_store if store is None else store | concrete_store
    return store

Then update your loop as follows:

field_store = None
for concrete_field_type in get_possible_concrete_types(remote_model, schema, f_type):
    concrete_store = _aggregate_concrete_store(
        remote_model,
        schema,
        concrete_field_type,
        parent_type=_get_gql_definition(schema, concrete_field_type),
        info=field_info,
        config=config,
        cache=cache,
        level=level,
    )
    if concrete_store is not None:
        field_store = concrete_store if field_store is None else field_store | concrete_store

This helps centralize the polymorphic type handling and reduces the nesting in your core functions, making the code easier to follow. Similar extraction techniques can be applied elsewhere (for instance, in the loops within the optimize and is_optimized functions) where concrete type iteration is performed.

@diesieben07
Copy link
Contributor Author

As discussed on Discord, I have expanded the scope of this pull request to also include support for InheritanceManager.
In addition, I have added a section in the documentation that covers the optimizer's polymorphic abilities.

@advl
Copy link

advl commented Mar 27, 2025

Thank you @diesieben07

@stygmate
Copy link
Contributor

@diesieben07 Also Thank you :) You are resolving a lot of headaches with this PR 👍

@stygmate
Copy link
Contributor

stygmate commented Mar 28, 2025

@diesieben07 ultimately, have you tested the optimzer with a polymorphic relation inside a polymorphic subtype (with inheritance manager) ? (i have the case 😅 )

to optimize this type of query:

query MyQuery {
  communications {
    edges {
      node {
        id
        ... on MessageDoss {
          dossier {
            ... on DossierE {
              id
            }
            ... on DossierV {
              id
            }
          }
        }
        ... on NotiDoss {
          id
        }
      }
    }
  }
}

@diesieben07
Copy link
Contributor Author

@stygmate I have not, but I don't see why it won't work. But I'll add a test-case for it!

@stygmate
Copy link
Contributor

@diesieben07 I'm trying your branch. And the specific case i pointed seems to not work. The objects of the relations (foreignkey) are of the main django model type so i encounter a resolution of types error. I suspect select_subclasses to not be executed in this case.

@diesieben07
Copy link
Contributor Author

@stygmate I'm looking into it!

@diesieben07
Copy link
Contributor Author

Okay, I am not sure how to make this work. Here are the models that I used for testing: https://gist.github.com/diesieben07/7360ebf934d174abcf865c23856bef97

Assume we want to get all notifications (polymorphic) and then for every CommunicationNotification we also want to get its corresponding Communication object (again, polymorphic).

The top level "Notification" query will use select_subclasses() to get a polymorphic queryset.
To now have CommunicationNotification#communication be polymorphic, we can't use select_related('communicationnotification__communication'), because that's not polymorphic. So we use prefetch_related instead and make sure to use an InheritanceQuerySet. But now we have both select_related('communicationnotifaction') (from select_subclasses) and prefetch_related('communicationnotification__communication') and this doesn't work, because you cannot prefetch_related into an object that was fetched by select_related.

@stygmate Can you let me know how you would write this query manually, without the optimizer?

@diesieben07
Copy link
Contributor Author

diesieben07 commented Mar 28, 2025

I looked further into this and there are multiple existing issues in both Django-Polymorphic and with InheritanceManager regarding relations in subclasses.

Django Polymorphic

Neither select_related nor prefetch_related on a PolymorphicQuerySet support fetching relations that aren't in the base class. It does not matter whether the relation is polymorphic or not. There is some work upstream here: jazzband/django-polymorphic#545, but it has only "minimal support for prefetch_related" and I have not looked into whether it handles relations to a polymorphic model.

InheritanceManager

The problem is as I outlined above, that it relies solely on select_related. This prevents using prefetch_related for relations in the subclasses, because the subclass relation can't be both select_related and prefetch_related.

The issue here is that prefetch_related_objects relies on the list that its given being homogeneous, which it isn't in this case. It then makes its decisions based on the first object, which might be a totally different subtype now and doesn't have the right fields for prefetching.
What needs to be done is apply prefetch_related_objects on the parent objects. Then you can do prefetch_related('subclass__subclass_relation') and it'll "just work".

The following crude hack makes this work in my test:

class CustomInheritanceQuerySet(InheritanceQuerySet):

    def _prefetch_related_objects(self):
        _base_objs = []
        for obj in self._result_cache:
            path = obj._meta.get_path_to_parent(self.model)
            for p in path:
                obj = getattr(obj, p.join_field.name)
            _base_objs.append(obj)

        prefetch_related_objects(_base_objs, *self._prefetch_related_lookups)
        self._prefetch_done = True

I don't use either of these solutions and looking at the activity in the respective repos is not promising. I will defer to someone else to fix these issues upstream. Never give up!

@stygmate
Copy link
Contributor

@diesieben07 I will check all of that tomorrow.

@diesieben07
Copy link
Contributor Author

@stygmate
Copy link
Contributor

@diesieben07

I'm not sure we're talking about the same thing.
I wrote a project (without Strawberry) to illustrate my case.
It is available here: https://github.com/stygmate/test_django_inheritance.

Create the virtual environment with Poetry, then run migrations, then the management command populate_db, and finally execute my test using another management command called test_query.

What I would like is for the optimizer to add this:
https://github.com/stygmate/test_django_inheritance/blob/763d58d081e2923f317883672ec923ce641b8bf8/myapp/management/commands/test_query.py#L56

This is a different issue, but in this case, the foreign key returns the base model, whereas it should return a subtype so that strawberry-django can select the correct GraphQL type.

However, with this approach, everything fits into a single SQL query. The query is displayed on the screen during the execution of the test_query command.

@diesieben07
Copy link
Contributor Author

@stygmate On first glance the first problem is that you're calling select_related and select_subclasses, which overwrite each other unless you do extra work.
I'll take a closer look this evening

@diesieben07
Copy link
Contributor Author

diesieben07 commented Mar 30, 2025

@stygmate I looked into it further, and as it stands that will never work with InheritanceManager and there's not much the optimizer can do about that.
The reason is that the logic for which subclass to return is implemented entirely within the iteration of the QuerySet. Without InheritanceIterable (which InheritanceQuerySet returns) there are no subclasses. All select_subclasses does is translate the models you give it into select_related. But since there's no queryset involved for the return value of a ForeignKey, you won't get subclasses there.
It can be made work with prefetch_related however, because that supports custom querysets. But prefetch_related has problems with InheritanceQuerySet as well - which I have fixed in the PR linked above. Once / If the PR I linked is merged, the code added in the optimizer via this PR will work perfectly fine for your use-case.

@stygmate
Copy link
Contributor

@diesieben07 I'll try using your model-utils version combined with this PR and run some tests.
Thanks for your time and effort! 🙂

@bellini666
Copy link
Member

@diesieben07 thank you so much for this ❤️

Really well done and properly tested.

Just some small things to adjust and it should be good to merge.

Btw, regarding your discussion with @stygmate , is there anything else that you want to implement here? Or is it good to merge after the adjustments I'm asking?

@diesieben07
Copy link
Contributor Author

@bellini666 Thank you! Not sure if I missed it, but I can't see any requested changes from you.

As for the discussion, in my opinion the still present issues with relations in polymorphic models are ones that the optimizer can't fix and that need to be fixed in Django-Model-Utils (see the PR I linked above) and Django-Polymorphic.

Copy link
Member

@bellini666 bellini666 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, my bad! Forgot to apply the comments 😅

@diesieben07 diesieben07 requested a review from bellini666 March 30, 2025 12:12
Copy link
Member

@bellini666 bellini666 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ty so much for this! ❤️

Copy link
Member

@bellini666 bellini666 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ty so much for this! ❤️

@bellini666 bellini666 merged commit 9e1007b into strawberry-graphql:main Apr 4, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants