-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Improve Expanding Lookup Join performance by pushing a filter to the right side of the lookup join #132889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
); | ||
builder.append( | ||
new FieldAttribute(Source.EMPTY, "Positions", new EsField("Positions", DataType.INTEGER, Collections.emptyMap(), false)) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nik9000
FilterOperator.FilterOperatorFactory needs an ExpressionEvaluator.Factory which needs a Layout.
How do I build a Layout here?
I attached 2 columns from the EnrichQuerySourceOperator and then whatever else we have in request.extractFields. It seems to work, because we don't refer to the first 2 columns.
EnrichQuerySourceOperator says there are 2 columns so I added them, but not sure what I have is correct. In particular the Docs column should be a vector, right? But there is no vector datatype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These come from LocalExecutionPlanner
usually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I suppose you could copy stuff from there. Or use it somehow.
} | ||
|
||
var evaluatorFactory = EvalMapper.toEvaluator( | ||
FoldContext.small()/*is this correct*/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nik9000 Should I use FoldContext.small() here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This'll be used when folding arguments - though maybe we'd never need any memory because we've already folded to literals. Imagine:
HASH(v, "md5") == "whatever"
The "md5"
gets folded.
1b92979
to
9603a7c
Compare
Hi @julian-elastic, I've created a changelog YAML for you. |
a413a84
to
e73996f
Compare
Hi @julian-elastic, I've created a changelog YAML for you. |
65189bb
to
018b40d
Compare
8759578
to
855746a
Compare
} else { | ||
inputOperator = queryOperator; | ||
} | ||
Operator postJoinFilter = filterExecOperator(filterExec, inputOperator, shardContext.context, driverContext, builder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nik9000 Is this how I need to build the filter? It does not seem to be working, I am debugging but can use some help if you spot any issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems sane.
855746a
to
2ee02a1
Compare
To be merged with #133166. |
Improve Expanding Lookup Join performance by pushing a filter to the right side of the lookup join.
As this is a performance optimization, we don't want to break the behavior for old nodes for CSS. The filter that we push down is optional and it is always reapplied after the lookup join. As a result if all nodes involved are new we will get performance benefits. Otherwise there might be partial or no performance benefits, but we will still execute the query successfully and get correct results if the query worked before this optimization.
Preliminary results indicate around 90x improvement with the optimization for Lucene pushable filters on a test case that is specifically designed to demonstrate the benefits of this optimization. Customers are likely to see more limited benefits. The test case is an expanding lookup join of 100,000 rows table with 10,000 lookup table with filter of selectivity 0.1% (keeps 10 out of 10,000 rows of the lookup table). In the non-optimized version the filter is not pushed to the right, and we can get an explosion of records. We have 100,000 x10,000 = 1,000,000,000 rows after the join without the optimization. Then we filter then out to only 1,000,000 rows. With the optimization we apply the filter early so after the expanding join we only have 1,000,000 rows. This reduced max number of rows used by a factor of 1,000 and made the query 90 times faster.
Right Pushable filters with optimization
Running filtered join query...
test pushable join with filter on keyword: {"took":125,"documents_found":100000,"values":[[1000000]]}
test pushable join with filter on keyword: {"took":124,"documents_found":100000,"values":[[1000000]]}
test pushable join with filter on keyword: {"took":124,"documents_found":100000,"values":[[1000000]]}
test pushable join with filter on keyword: {"took":121,"documents_found":100000,"values":[[1000000]]}
test pushable join with filter on keyword: {"took":134,"documents_found":100000,"values":[[1000000]]}
Right Pushable filters without optimization
Running filtered join query...
test pushable join with filter on keyword: {"took":11315,"documents_found":100000,"values":[[1000000]]}
test pushable join with filter on keyword: {"took":11348,"documents_found":100000,"values":[[1000000]]}
test pushable join with filter on keyword: {"took":11330,"documents_found":100000,"values":[[1000000]]}
test pushable join with filter on keyword: {"took":11271,"documents_found":100000,"values":[[1000000]]}
test pushable join with filter on keyword: {"took":11258,"documents_found":100000,"values":[[1000000]]}
Script