Skip to content

ESQL - KNN functions with non-pushed down filters #131708

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

carlosdelest
Copy link
Member

@carlosdelest carlosdelest commented Jul 22, 2025

KNN functions include other filter conjunctions as pre-filters as per #131004.

However, when a KNN function has non-pushable expressions as prefilters, it cannot be pushed down to Lucene. KNN can be executed in the compute engine, but the prefilters will be executed as postfilters. This means than less than the top k results can be retrieved from the KNN function.

In order to retrieve the true top k results and maintain knn semantics, the following transformation will be made:

A query like:

WHERE knn(field1, [..], 10) AND non-pushable-filter

Will be replaced with:

 | WHERE non-pushable-filter
 | EVAL knn_score = SCORE(exact_nn(field1, [..]))
 | TOPN 10 knn_score DESC
 | WHERE knn_score > 0
 | DROP knn_score

This way, knn becomes an exact search issued after the non-pushable filters.

The score for knn is then calculated to obtain the top K, and scores of zero are removed (could happen if minimum similarity is used).

This aims to maintain knn semantics and avoid filters to act as post-filters.

@carlosdelest carlosdelest added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Search Relevance/Vectors Vector search :Analytics/ES|QL AKA ESQL Team:Search - Relevance The Search organization Search Relevance team >non-issue v9.2.0 and removed v9.2.0 labels Jul 22, 2025
assert block instanceof DocBlock : "LuceneQueryExpressionEvaluator expects DocBlock as input";
DocVector docs = (DocVector) block.asVector();
// Search for DocVector block
Block docBlock = null;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made LuceneQueryEvaluator more robust

assert page.getBlockCount() >= 2 : "Expected at least 2 blocks, got " + page.getBlockCount();
assert page.getBlock(0).asVector() instanceof DocVector : "Expected a DocVector, got " + page.getBlock(0).asVector();
assert page.getBlock(1).asVector() instanceof DoubleVector : "Expected a DoubleVector, got " + page.getBlock(1).asVector();
assert page.getBlockCount() > scoreBlockPosition : "Expected to get a score block in position " + scoreBlockPosition;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an uncovered bug


from colors metadata _score
| eval composed_name = locate(color, " ") > 0
| where knn(rgb_vector, [128,128,0], 140) and composed_name == false
| where knn(rgb_vector, [128,128,0], 10) and composed_name == false
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can see the change in action - we no longer need to use a large number for k to maintain semantics, nor to use limit at the end.

@@ -38,10 +39,14 @@ public Failures verify(LogicalPlan plan, boolean skipRemoteEnrichVerification) {
if (failures.hasFailures() == false) {
if (p instanceof PostOptimizationVerificationAware pova) {
pova.postOptimizationVerification(failures);
} else if (p instanceof PostOptimizationPlanVerificationAware popva) {
Copy link
Member Author

@carlosdelest carlosdelest Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added PostOptimizationPlanVerificationAware to have the plan available for validations. This is similar to PostAnalysisPlanVerificationAware

@bpintea , WDYT?

…er-non-pushed-down

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/EsqlFunctionRegistry.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
@carlosdelest carlosdelest marked this pull request as ready for review July 23, 2025 08:31
@elasticsearchmachine elasticsearchmachine added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search - Relevance The Search organization Search Relevance team labels Jul 23, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

…n-pushed-down' into non-issue/knn-prefilter-non-pushed-down
@carlosdelest carlosdelest marked this pull request as draft July 24, 2025 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >non-issue :Search Relevance/Vectors Vector search Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants