-
Notifications
You must be signed in to change notification settings - Fork 25.3k
ESQL - KNN functions with non-pushed down filters #131708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ESQL - KNN functions with non-pushed down filters #131708
Conversation
…e that will be detected later.
assert block instanceof DocBlock : "LuceneQueryExpressionEvaluator expects DocBlock as input"; | ||
DocVector docs = (DocVector) block.asVector(); | ||
// Search for DocVector block | ||
Block docBlock = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made LuceneQueryEvaluator
more robust
assert page.getBlockCount() >= 2 : "Expected at least 2 blocks, got " + page.getBlockCount(); | ||
assert page.getBlock(0).asVector() instanceof DocVector : "Expected a DocVector, got " + page.getBlock(0).asVector(); | ||
assert page.getBlock(1).asVector() instanceof DoubleVector : "Expected a DoubleVector, got " + page.getBlock(1).asVector(); | ||
assert page.getBlockCount() > scoreBlockPosition : "Expected to get a score block in position " + scoreBlockPosition; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an uncovered bug
|
||
from colors metadata _score | ||
| eval composed_name = locate(color, " ") > 0 | ||
| where knn(rgb_vector, [128,128,0], 140) and composed_name == false | ||
| where knn(rgb_vector, [128,128,0], 10) and composed_name == false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can see the change in action - we no longer need to use a large number for k to maintain semantics, nor to use limit at the end.
@@ -38,10 +39,14 @@ public Failures verify(LogicalPlan plan, boolean skipRemoteEnrichVerification) { | |||
if (failures.hasFailures() == false) { | |||
if (p instanceof PostOptimizationVerificationAware pova) { | |||
pova.postOptimizationVerification(failures); | |||
} else if (p instanceof PostOptimizationPlanVerificationAware popva) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added PostOptimizationPlanVerificationAware
to have the plan available for validations. This is similar to PostAnalysisPlanVerificationAware
@bpintea , WDYT?
…er-non-pushed-down # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/EsqlFunctionRegistry.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
…n-pushed-down' into non-issue/knn-prefilter-non-pushed-down
KNN functions include other filter conjunctions as pre-filters as per #131004.
However, when a KNN function has non-pushable expressions as prefilters, it cannot be pushed down to Lucene. KNN can be executed in the compute engine, but the prefilters will be executed as postfilters. This means than less than the top k results can be retrieved from the KNN function.
In order to retrieve the true top k results and maintain knn semantics, the following transformation will be made:
A query like:
WHERE knn(field1, [..], 10) AND non-pushable-filter
Will be replaced with:
This way, knn becomes an exact search issued after the non-pushable filters.
The score for knn is then calculated to obtain the top K, and scores of zero are removed (could happen if minimum similarity is used).
This aims to maintain knn semantics and avoid filters to act as post-filters.