-
Notifications
You must be signed in to change notification settings - Fork 25.3k
ESQL: Allow remote enrich after LOOKUP JOIN #131286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ESQL: Allow remote enrich after LOOKUP JOIN #131286
Conversation
Hi @alex-spies, I've created a changelog YAML for you. |
TODO:
|
if (f instanceof BinaryExec be) { | ||
// Remove any LOOKUP JOIN or inline Join from INLINE STATS do avoid double execution | ||
return be.left(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main fix: don't duplicate the join node after a remote enrich.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think there's a saner approach. Any LOOKUP JOIN before a remote ENRICH is implicitly remote, too. We should treat them as such, otherwise we will have to redo a lot of validation and look into edge cases that are already solved for remote LOOKUP JOIN.
This approach is implemented in 0f65dd5.
Hi @alex-spies, I've updated the changelog YAML for you. |
Alternative approach: rather than reinventing the wheel, let's just mark any LOOKUP JOINs upstream from a remote ENRICH as remote, too, so the validation automatically kicks in.
* Fix the planning of {@code | ENRICH _remote:policy} when there's a preceding {@code | LOOKUP JOIN}, | ||
* see <a href="https://github.com/elastic/elasticsearch/issues/129372">java.lang.ClassCastException when combining LOOKUP JOIN and remote ENRICH</a> | ||
*/ | ||
REMOTE_ENRICH_AFTER_LOOKUP_JOIN, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally it shouldn't be working if ENABLE_LOOKUP_JOIN_ON_REMOTE
is off. Not sure if it's important but maybe we should make it the same?
@@ -117,7 +117,7 @@ public void testEnrichAfterStop() throws Exception { | |||
SimplePauseFieldPlugin.allowEmitting.countDown(); | |||
|
|||
try (EsqlQueryResponse resp = stopAction.actionGet(30, TimeUnit.SECONDS)) { | |||
// Compare this to CrossClustersEnrichIT.testEnrichTwiceThenAggs - the results from c2 will be absent | |||
// Compare this to CrossClusterEnrichIT.testEnrichTwiceThenAggs - the results from c2 will be absent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. There's one more in CrossClusterEnrichUnavailableClustersIT btw if we already fixing it :)
@@ -482,9 +482,15 @@ public PlanFactory visitEnrichCommand(EsqlBaseParser.EnrichCommandContext ctx) { | |||
} | |||
|
|||
List<NamedExpression> keepClauses = visitList(this, ctx.enrichWithClause(), NamedExpression.class); | |||
|
|||
// If this is a remote-only ENRICH, any upstream LOOKUP JOINs need to be treated as remote-only, too. | |||
LogicalPlan updatedChild = (mode == Mode.REMOTE) == false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Maybe we should flip the condition to avoid (==) == false
?
Fix #129372 Due to how remote ENRICH is [planned](https://github.com/elastic/elasticsearch/blob/32e50d0d94e27ee559d24bf9d5463ba6e64d1788/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java#L93), it interacts in special ways with pipeline breakers, in particular LIMIT and TopN; when these are encountered upstream from a remote ENRICH, these nodes are copied and executed a second time after the remote ENRICH. We'd like to allow remote ENRICH after LOOKUP JOIN, but that forces the lookup to be remote as well; this has its own interactions with pipeline breakers: in particular, LIMITs and TopNs cannot just be duplicated after LOOKUP JOIN, as LOOKUP JOIN may add new rows. For now, let's just forbid any usage of remote ENRICH after LOOKUP JOINs; remote ENRICH is mostly relevant for CCS, and LOOKUP JOIN doesn't support that in 9.1/8.19, anyway. There is separate work that enables remote LOOKUP JOINs on remote clusters and adds the correct validations; we can later build support for remote ENRICH + LOOKUP JOIN on top of that. (C.f. my comment [here](#129372 (comment)) and my draft #131286 for enabling this.)
Fix elastic#129372 Due to how remote ENRICH is [planned](https://github.com/elastic/elasticsearch/blob/32e50d0d94e27ee559d24bf9d5463ba6e64d1788/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java#L93), it interacts in special ways with pipeline breakers, in particular LIMIT and TopN; when these are encountered upstream from a remote ENRICH, these nodes are copied and executed a second time after the remote ENRICH. We'd like to allow remote ENRICH after LOOKUP JOIN, but that forces the lookup to be remote as well; this has its own interactions with pipeline breakers: in particular, LIMITs and TopNs cannot just be duplicated after LOOKUP JOIN, as LOOKUP JOIN may add new rows. For now, let's just forbid any usage of remote ENRICH after LOOKUP JOINs; remote ENRICH is mostly relevant for CCS, and LOOKUP JOIN doesn't support that in 9.1/8.19, anyway. There is separate work that enables remote LOOKUP JOINs on remote clusters and adds the correct validations; we can later build support for remote ENRICH + LOOKUP JOIN on top of that. (C.f. my comment [here](elastic#129372 (comment)) and my draft elastic#131286 for enabling this.) (cherry picked from commit 06e39c0) # Conflicts: # x-pack/plugin/esql/qa/server/multi-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
Fix elastic#129372 Due to how remote ENRICH is [planned](https://github.com/elastic/elasticsearch/blob/32e50d0d94e27ee559d24bf9d5463ba6e64d1788/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java#L93), it interacts in special ways with pipeline breakers, in particular LIMIT and TopN; when these are encountered upstream from a remote ENRICH, these nodes are copied and executed a second time after the remote ENRICH. We'd like to allow remote ENRICH after LOOKUP JOIN, but that forces the lookup to be remote as well; this has its own interactions with pipeline breakers: in particular, LIMITs and TopNs cannot just be duplicated after LOOKUP JOIN, as LOOKUP JOIN may add new rows. For now, let's just forbid any usage of remote ENRICH after LOOKUP JOINs; remote ENRICH is mostly relevant for CCS, and LOOKUP JOIN doesn't support that in 9.1/8.19, anyway. There is separate work that enables remote LOOKUP JOINs on remote clusters and adds the correct validations; we can later build support for remote ENRICH + LOOKUP JOIN on top of that. (C.f. my comment [here](elastic#129372 (comment)) and my draft elastic#131286 for enabling this.) (cherry picked from commit 06e39c0) # Conflicts: # x-pack/plugin/esql/qa/server/multi-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTestUtils.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
Fix elastic#129372 Due to how remote ENRICH is [planned](https://github.com/elastic/elasticsearch/blob/32e50d0d94e27ee559d24bf9d5463ba6e64d1788/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java#L93), it interacts in special ways with pipeline breakers, in particular LIMIT and TopN; when these are encountered upstream from a remote ENRICH, these nodes are copied and executed a second time after the remote ENRICH. We'd like to allow remote ENRICH after LOOKUP JOIN, but that forces the lookup to be remote as well; this has its own interactions with pipeline breakers: in particular, LIMITs and TopNs cannot just be duplicated after LOOKUP JOIN, as LOOKUP JOIN may add new rows. For now, let's just forbid any usage of remote ENRICH after LOOKUP JOINs; remote ENRICH is mostly relevant for CCS, and LOOKUP JOIN doesn't support that in 9.1/8.19, anyway. There is separate work that enables remote LOOKUP JOINs on remote clusters and adds the correct validations; we can later build support for remote ENRICH + LOOKUP JOIN on top of that. (C.f. my comment [here](elastic#129372 (comment)) and my draft elastic#131286 for enabling this.) (cherry picked from commit 06e39c0) # Conflicts: # x-pack/plugin/esql/qa/server/multi-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
Fix elastic#129372 Due to how remote ENRICH is [planned](https://github.com/elastic/elasticsearch/blob/32e50d0d94e27ee559d24bf9d5463ba6e64d1788/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java#L93), it interacts in special ways with pipeline breakers, in particular LIMIT and TopN; when these are encountered upstream from a remote ENRICH, these nodes are copied and executed a second time after the remote ENRICH. We'd like to allow remote ENRICH after LOOKUP JOIN, but that forces the lookup to be remote as well; this has its own interactions with pipeline breakers: in particular, LIMITs and TopNs cannot just be duplicated after LOOKUP JOIN, as LOOKUP JOIN may add new rows. For now, let's just forbid any usage of remote ENRICH after LOOKUP JOINs; remote ENRICH is mostly relevant for CCS, and LOOKUP JOIN doesn't support that in 9.1/8.19, anyway. There is separate work that enables remote LOOKUP JOINs on remote clusters and adds the correct validations; we can later build support for remote ENRICH + LOOKUP JOIN on top of that. (C.f. my comment [here](elastic#129372 (comment)) and my draft elastic#131286 for enabling this.) (cherry picked from commit 06e39c0) # Conflicts: # x-pack/plugin/esql/qa/server/multi-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java # x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTestUtils.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
* ESQL: Disallow remote enrich after lu join (#131426) Fix #129372 Due to how remote ENRICH is [planned](https://github.com/elastic/elasticsearch/blob/32e50d0d94e27ee559d24bf9d5463ba6e64d1788/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java#L93), it interacts in special ways with pipeline breakers, in particular LIMIT and TopN; when these are encountered upstream from a remote ENRICH, these nodes are copied and executed a second time after the remote ENRICH. We'd like to allow remote ENRICH after LOOKUP JOIN, but that forces the lookup to be remote as well; this has its own interactions with pipeline breakers: in particular, LIMITs and TopNs cannot just be duplicated after LOOKUP JOIN, as LOOKUP JOIN may add new rows. For now, let's just forbid any usage of remote ENRICH after LOOKUP JOINs; remote ENRICH is mostly relevant for CCS, and LOOKUP JOIN doesn't support that in 9.1/8.19, anyway. There is separate work that enables remote LOOKUP JOINs on remote clusters and adds the correct validations; we can later build support for remote ENRICH + LOOKUP JOIN on top of that. (C.f. my comment [here](#129372 (comment)) and my draft #131286 for enabling this.) (cherry picked from commit 06e39c0) # Conflicts: # x-pack/plugin/esql/qa/server/multi-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTestUtils.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <[email protected]>
Fix #129372 Due to how remote ENRICH is [planned](https://github.com/elastic/elasticsearch/blob/32e50d0d94e27ee559d24bf9d5463ba6e64d1788/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java#L93), it interacts in special ways with pipeline breakers, in particular LIMIT and TopN; when these are encountered upstream from a remote ENRICH, these nodes are copied and executed a second time after the remote ENRICH. We'd like to allow remote ENRICH after LOOKUP JOIN, but that forces the lookup to be remote as well; this has its own interactions with pipeline breakers: in particular, LIMITs and TopNs cannot just be duplicated after LOOKUP JOIN, as LOOKUP JOIN may add new rows. For now, let's just forbid any usage of remote ENRICH after LOOKUP JOINs; remote ENRICH is mostly relevant for CCS, and LOOKUP JOIN doesn't support that in 9.1/8.19, anyway. There is separate work that enables remote LOOKUP JOINs on remote clusters and adds the correct validations; we can later build support for remote ENRICH + LOOKUP JOIN on top of that. (C.f. my comment [here](#129372 (comment)) and my draft #131286 for enabling this.) (cherry picked from commit 06e39c0) # Conflicts: # x-pack/plugin/esql/qa/server/multi-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
Fix #129372 Due to how remote ENRICH is [planned](https://github.com/elastic/elasticsearch/blob/32e50d0d94e27ee559d24bf9d5463ba6e64d1788/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java#L93), it interacts in special ways with pipeline breakers, in particular LIMIT and TopN; when these are encountered upstream from a remote ENRICH, these nodes are copied and executed a second time after the remote ENRICH. We'd like to allow remote ENRICH after LOOKUP JOIN, but that forces the lookup to be remote as well; this has its own interactions with pipeline breakers: in particular, LIMITs and TopNs cannot just be duplicated after LOOKUP JOIN, as LOOKUP JOIN may add new rows. For now, let's just forbid any usage of remote ENRICH after LOOKUP JOINs; remote ENRICH is mostly relevant for CCS, and LOOKUP JOIN doesn't support that in 9.1/8.19, anyway. There is separate work that enables remote LOOKUP JOINs on remote clusters and adds the correct validations; we can later build support for remote ENRICH + LOOKUP JOIN on top of that. (C.f. my comment [here](#129372 (comment)) and my draft #131286 for enabling this.) (cherry picked from commit 06e39c0) # Conflicts: # x-pack/plugin/esql/qa/server/multi-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
* ESQL: Disallow remote enrich after lu join (#131426) Fix #129372 Due to how remote ENRICH is [planned](https://github.com/elastic/elasticsearch/blob/32e50d0d94e27ee559d24bf9d5463ba6e64d1788/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java#L93), it interacts in special ways with pipeline breakers, in particular LIMIT and TopN; when these are encountered upstream from a remote ENRICH, these nodes are copied and executed a second time after the remote ENRICH. We'd like to allow remote ENRICH after LOOKUP JOIN, but that forces the lookup to be remote as well; this has its own interactions with pipeline breakers: in particular, LIMITs and TopNs cannot just be duplicated after LOOKUP JOIN, as LOOKUP JOIN may add new rows. For now, let's just forbid any usage of remote ENRICH after LOOKUP JOINs; remote ENRICH is mostly relevant for CCS, and LOOKUP JOIN doesn't support that in 9.1/8.19, anyway. There is separate work that enables remote LOOKUP JOINs on remote clusters and adds the correct validations; we can later build support for remote ENRICH + LOOKUP JOIN on top of that. (C.f. my comment [here](#129372 (comment)) and my draft #131286 for enabling this.) (cherry picked from commit 06e39c0) # Conflicts: # x-pack/plugin/esql/qa/server/multi-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java # x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTestUtils.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java * [CI] Auto commit changes from spotless * Add test_lookup index to default test analyzer * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <[email protected]>
Relates #129372