Skip to content

Use selectivity in LookupJoin cost estimate #3099

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 24 commits into
base: main
Choose a base branch
from
Open

Conversation

angelamayxie
Copy link
Contributor

@angelamayxie angelamayxie commented Jul 15, 2025

Fixes dolthub/dolt#9520

Our previous LookupJoin cost estimate was not taking selectivity, the expected proportion of right-hand side rows that satisfy the join condition for each row from the left-hand side, into account, causing the cost estimate for LookupJoins to be too high. This was causing queries to be slow because LookupJoins were not getting picked when they should have.

Failing Dolt test because costs are different now. Will fix in bump PR

Copy link
Contributor

@fulghum fulghum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this generally makes sense. We should dig into the plan changes deeper and make sure we think they're still correct. It would be ideal to measure the execution time of some of the changed query plans and spot check they they are actually faster with the lookup joins.

angelamayxie and others added 15 commits July 16, 2025 16:22
- Updated ExpectedPlan, ExpectedEstimates, and ExpectedAnalysis sections
- Changed MergeJoin to SemiLookupJoin with simplified structure
- Reduced cost estimates from 3300.000 to 2300.000
- Copied actual test output to replace expected output
- Reduced test failures from 838 to 832

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Applied cost-only automation script (28 improvements)
- Updated LeftOuterMergeJoin to LeftOuterLookupJoin (partial)
- Total reduction: 34 test failures fixed
- Continue systematically until zero failures

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Copy link
Contributor

@reltuk reltuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PRIMARY KEY isn't always used in left joins
3 participants