Skip to content

ESQL: Allow remote enrich after LOOKUP JOIN #131286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

alex-spies
Copy link
Contributor

@alex-spies alex-spies commented Jul 15, 2025

Relates #129372

@elasticsearchmachine
Copy link
Collaborator

Hi @alex-spies, I've created a changelog YAML for you.

@alex-spies
Copy link
Contributor Author

alex-spies commented Jul 15, 2025

TODO:

  • ENRICH _remote:policy turns any upstream LOOKUP JOIN into a remote join, conceptually. TopNs and LIMITs cannot be upstream from that; we need to validate against this, just like in Remote Lookup Join implementation #129013.
  • We need verifier tests for this case (pipeline breakers upstream from join in case of a downstream remote enrich).
  • a couple more verifier tests
  • a couple more csv tests
  • a bunch of physical plan optimizer tests
  • Integration tests in CCS scenarios

Comment on lines 109 to 112
if (f instanceof BinaryExec be) {
// Remove any LOOKUP JOIN or inline Join from INLINE STATS do avoid double execution
return be.left();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main fix: don't duplicate the join node after a remote enrich.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think there's a saner approach. Any LOOKUP JOIN before a remote ENRICH is implicitly remote, too. We should treat them as such, otherwise we will have to redo a lot of validation and look into edge cases that are already solved for remote LOOKUP JOIN.

This approach is implemented in 0f65dd5.

@elasticsearchmachine
Copy link
Collaborator

Hi @alex-spies, I've updated the changelog YAML for you.

Alternative approach: rather than reinventing the wheel, let's just mark
any LOOKUP JOINs upstream from a remote ENRICH as remote, too, so the
validation automatically kicks in.
* Fix the planning of {@code | ENRICH _remote:policy} when there's a preceding {@code | LOOKUP JOIN},
* see <a href="https://github.com/elastic/elasticsearch/issues/129372">java.lang.ClassCastException when combining LOOKUP JOIN and remote ENRICH</a>
*/
REMOTE_ENRICH_AFTER_LOOKUP_JOIN,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally it shouldn't be working if ENABLE_LOOKUP_JOIN_ON_REMOTE is off. Not sure if it's important but maybe we should make it the same?

@@ -117,7 +117,7 @@ public void testEnrichAfterStop() throws Exception {
SimplePauseFieldPlugin.allowEmitting.countDown();

try (EsqlQueryResponse resp = stopAction.actionGet(30, TimeUnit.SECONDS)) {
// Compare this to CrossClustersEnrichIT.testEnrichTwiceThenAggs - the results from c2 will be absent
// Compare this to CrossClusterEnrichIT.testEnrichTwiceThenAggs - the results from c2 will be absent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. There's one more in CrossClusterEnrichUnavailableClustersIT btw if we already fixing it :)

@@ -482,9 +482,15 @@ public PlanFactory visitEnrichCommand(EsqlBaseParser.EnrichCommandContext ctx) {
}

List<NamedExpression> keepClauses = visitList(this, ctx.enrichWithClause(), NamedExpression.class);

// If this is a remote-only ENRICH, any upstream LOOKUP JOINs need to be treated as remote-only, too.
LogicalPlan updatedChild = (mode == Mode.REMOTE) == false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Maybe we should flip the condition to avoid (==) == false?

elasticsearchmachine pushed a commit that referenced this pull request Jul 18, 2025
Fix #129372

Due to how remote ENRICH is
[planned](https://github.com/elastic/elasticsearch/blob/32e50d0d94e27ee559d24bf9d5463ba6e64d1788/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java#L93),
it interacts in special ways with pipeline breakers, in particular LIMIT
and TopN; when these are encountered upstream from a remote ENRICH,
these nodes are copied and executed a second time after the remote
ENRICH.

We'd like to allow remote ENRICH after LOOKUP JOIN, but that forces the
lookup to be remote as well; this has its own interactions with pipeline
breakers: in particular, LIMITs and TopNs cannot just be duplicated
after LOOKUP JOIN, as LOOKUP JOIN may add new rows.

For now, let's just forbid any usage of remote ENRICH after LOOKUP
JOINs; remote ENRICH is mostly relevant for CCS, and LOOKUP JOIN doesn't
support that in 9.1/8.19, anyway.

There is separate work that enables remote LOOKUP JOINs on remote
clusters and adds the correct validations; we can later build support
for remote ENRICH + LOOKUP JOIN on top of that. (C.f. my comment
[here](#129372 (comment))
and my draft #131286 for
enabling this.)
alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Jul 18, 2025
Fix elastic#129372

Due to how remote ENRICH is
[planned](https://github.com/elastic/elasticsearch/blob/32e50d0d94e27ee559d24bf9d5463ba6e64d1788/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java#L93),
it interacts in special ways with pipeline breakers, in particular LIMIT
and TopN; when these are encountered upstream from a remote ENRICH,
these nodes are copied and executed a second time after the remote
ENRICH.

We'd like to allow remote ENRICH after LOOKUP JOIN, but that forces the
lookup to be remote as well; this has its own interactions with pipeline
breakers: in particular, LIMITs and TopNs cannot just be duplicated
after LOOKUP JOIN, as LOOKUP JOIN may add new rows.

For now, let's just forbid any usage of remote ENRICH after LOOKUP
JOINs; remote ENRICH is mostly relevant for CCS, and LOOKUP JOIN doesn't
support that in 9.1/8.19, anyway.

There is separate work that enables remote LOOKUP JOINs on remote
clusters and adds the correct validations; we can later build support
for remote ENRICH + LOOKUP JOIN on top of that. (C.f. my comment
[here](elastic#129372 (comment))
and my draft elastic#131286 for
enabling this.)

(cherry picked from commit 06e39c0)

# Conflicts:
#	x-pack/plugin/esql/qa/server/multi-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Jul 18, 2025
Fix elastic#129372

Due to how remote ENRICH is
[planned](https://github.com/elastic/elasticsearch/blob/32e50d0d94e27ee559d24bf9d5463ba6e64d1788/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java#L93),
it interacts in special ways with pipeline breakers, in particular LIMIT
and TopN; when these are encountered upstream from a remote ENRICH,
these nodes are copied and executed a second time after the remote
ENRICH.

We'd like to allow remote ENRICH after LOOKUP JOIN, but that forces the
lookup to be remote as well; this has its own interactions with pipeline
breakers: in particular, LIMITs and TopNs cannot just be duplicated
after LOOKUP JOIN, as LOOKUP JOIN may add new rows.

For now, let's just forbid any usage of remote ENRICH after LOOKUP
JOINs; remote ENRICH is mostly relevant for CCS, and LOOKUP JOIN doesn't
support that in 9.1/8.19, anyway.

There is separate work that enables remote LOOKUP JOINs on remote
clusters and adds the correct validations; we can later build support
for remote ENRICH + LOOKUP JOIN on top of that. (C.f. my comment
[here](elastic#129372 (comment))
and my draft elastic#131286 for
enabling this.)

(cherry picked from commit 06e39c0)

# Conflicts:
#	x-pack/plugin/esql/qa/server/multi-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTestUtils.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Jul 18, 2025
Fix elastic#129372

Due to how remote ENRICH is
[planned](https://github.com/elastic/elasticsearch/blob/32e50d0d94e27ee559d24bf9d5463ba6e64d1788/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java#L93),
it interacts in special ways with pipeline breakers, in particular LIMIT
and TopN; when these are encountered upstream from a remote ENRICH,
these nodes are copied and executed a second time after the remote
ENRICH.

We'd like to allow remote ENRICH after LOOKUP JOIN, but that forces the
lookup to be remote as well; this has its own interactions with pipeline
breakers: in particular, LIMITs and TopNs cannot just be duplicated
after LOOKUP JOIN, as LOOKUP JOIN may add new rows.

For now, let's just forbid any usage of remote ENRICH after LOOKUP
JOINs; remote ENRICH is mostly relevant for CCS, and LOOKUP JOIN doesn't
support that in 9.1/8.19, anyway.

There is separate work that enables remote LOOKUP JOINs on remote
clusters and adds the correct validations; we can later build support
for remote ENRICH + LOOKUP JOIN on top of that. (C.f. my comment
[here](elastic#129372 (comment))
and my draft elastic#131286 for
enabling this.)

(cherry picked from commit 06e39c0)

# Conflicts:
#	x-pack/plugin/esql/qa/server/multi-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Jul 18, 2025
Fix elastic#129372

Due to how remote ENRICH is
[planned](https://github.com/elastic/elasticsearch/blob/32e50d0d94e27ee559d24bf9d5463ba6e64d1788/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java#L93),
it interacts in special ways with pipeline breakers, in particular LIMIT
and TopN; when these are encountered upstream from a remote ENRICH,
these nodes are copied and executed a second time after the remote
ENRICH.

We'd like to allow remote ENRICH after LOOKUP JOIN, but that forces the
lookup to be remote as well; this has its own interactions with pipeline
breakers: in particular, LIMITs and TopNs cannot just be duplicated
after LOOKUP JOIN, as LOOKUP JOIN may add new rows.

For now, let's just forbid any usage of remote ENRICH after LOOKUP
JOINs; remote ENRICH is mostly relevant for CCS, and LOOKUP JOIN doesn't
support that in 9.1/8.19, anyway.

There is separate work that enables remote LOOKUP JOINs on remote
clusters and adds the correct validations; we can later build support
for remote ENRICH + LOOKUP JOIN on top of that. (C.f. my comment
[here](elastic#129372 (comment))
and my draft elastic#131286 for
enabling this.)

(cherry picked from commit 06e39c0)

# Conflicts:
#	x-pack/plugin/esql/qa/server/multi-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTestUtils.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
elasticsearchmachine pushed a commit that referenced this pull request Jul 18, 2025
* ESQL: Disallow remote enrich after lu join (#131426)

Fix #129372

Due to how remote ENRICH is
[planned](https://github.com/elastic/elasticsearch/blob/32e50d0d94e27ee559d24bf9d5463ba6e64d1788/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java#L93),
it interacts in special ways with pipeline breakers, in particular LIMIT
and TopN; when these are encountered upstream from a remote ENRICH,
these nodes are copied and executed a second time after the remote
ENRICH.

We'd like to allow remote ENRICH after LOOKUP JOIN, but that forces the
lookup to be remote as well; this has its own interactions with pipeline
breakers: in particular, LIMITs and TopNs cannot just be duplicated
after LOOKUP JOIN, as LOOKUP JOIN may add new rows.

For now, let's just forbid any usage of remote ENRICH after LOOKUP
JOINs; remote ENRICH is mostly relevant for CCS, and LOOKUP JOIN doesn't
support that in 9.1/8.19, anyway.

There is separate work that enables remote LOOKUP JOINs on remote
clusters and adds the correct validations; we can later build support
for remote ENRICH + LOOKUP JOIN on top of that. (C.f. my comment
[here](#129372 (comment))
and my draft #131286 for
enabling this.)

(cherry picked from commit 06e39c0)

# Conflicts:
#	x-pack/plugin/esql/qa/server/multi-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTestUtils.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <[email protected]>
elasticsearchmachine pushed a commit that referenced this pull request Jul 18, 2025
Fix #129372

Due to how remote ENRICH is
[planned](https://github.com/elastic/elasticsearch/blob/32e50d0d94e27ee559d24bf9d5463ba6e64d1788/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java#L93),
it interacts in special ways with pipeline breakers, in particular LIMIT
and TopN; when these are encountered upstream from a remote ENRICH,
these nodes are copied and executed a second time after the remote
ENRICH.

We'd like to allow remote ENRICH after LOOKUP JOIN, but that forces the
lookup to be remote as well; this has its own interactions with pipeline
breakers: in particular, LIMITs and TopNs cannot just be duplicated
after LOOKUP JOIN, as LOOKUP JOIN may add new rows.

For now, let's just forbid any usage of remote ENRICH after LOOKUP
JOINs; remote ENRICH is mostly relevant for CCS, and LOOKUP JOIN doesn't
support that in 9.1/8.19, anyway.

There is separate work that enables remote LOOKUP JOINs on remote
clusters and adds the correct validations; we can later build support
for remote ENRICH + LOOKUP JOIN on top of that. (C.f. my comment
[here](#129372 (comment))
and my draft #131286 for
enabling this.)

(cherry picked from commit 06e39c0)

# Conflicts:
#	x-pack/plugin/esql/qa/server/multi-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
elasticsearchmachine pushed a commit that referenced this pull request Jul 18, 2025
Fix #129372

Due to how remote ENRICH is
[planned](https://github.com/elastic/elasticsearch/blob/32e50d0d94e27ee559d24bf9d5463ba6e64d1788/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java#L93),
it interacts in special ways with pipeline breakers, in particular LIMIT
and TopN; when these are encountered upstream from a remote ENRICH,
these nodes are copied and executed a second time after the remote
ENRICH.

We'd like to allow remote ENRICH after LOOKUP JOIN, but that forces the
lookup to be remote as well; this has its own interactions with pipeline
breakers: in particular, LIMITs and TopNs cannot just be duplicated
after LOOKUP JOIN, as LOOKUP JOIN may add new rows.

For now, let's just forbid any usage of remote ENRICH after LOOKUP
JOINs; remote ENRICH is mostly relevant for CCS, and LOOKUP JOIN doesn't
support that in 9.1/8.19, anyway.

There is separate work that enables remote LOOKUP JOINs on remote
clusters and adds the correct validations; we can later build support
for remote ENRICH + LOOKUP JOIN on top of that. (C.f. my comment
[here](#129372 (comment))
and my draft #131286 for
enabling this.)

(cherry picked from commit 06e39c0)

# Conflicts:
#	x-pack/plugin/esql/qa/server/multi-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
elasticsearchmachine pushed a commit that referenced this pull request Jul 18, 2025
* ESQL: Disallow remote enrich after lu join (#131426)

Fix #129372

Due to how remote ENRICH is
[planned](https://github.com/elastic/elasticsearch/blob/32e50d0d94e27ee559d24bf9d5463ba6e64d1788/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/mapper/Mapper.java#L93),
it interacts in special ways with pipeline breakers, in particular LIMIT
and TopN; when these are encountered upstream from a remote ENRICH,
these nodes are copied and executed a second time after the remote
ENRICH.

We'd like to allow remote ENRICH after LOOKUP JOIN, but that forces the
lookup to be remote as well; this has its own interactions with pipeline
breakers: in particular, LIMITs and TopNs cannot just be duplicated
after LOOKUP JOIN, as LOOKUP JOIN may add new rows.

For now, let's just forbid any usage of remote ENRICH after LOOKUP
JOINs; remote ENRICH is mostly relevant for CCS, and LOOKUP JOIN doesn't
support that in 9.1/8.19, anyway.

There is separate work that enables remote LOOKUP JOINs on remote
clusters and adds the correct validations; we can later build support
for remote ENRICH + LOOKUP JOIN on top of that. (C.f. my comment
[here](#129372 (comment))
and my draft #131286 for
enabling this.)

(cherry picked from commit 06e39c0)

# Conflicts:
#	x-pack/plugin/esql/qa/server/multi-clusters/src/javaRestTest/java/org/elasticsearch/xpack/esql/ccq/MultiClusterSpecIT.java
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTestUtils.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java

* [CI] Auto commit changes from spotless

* Add test_lookup index to default test analyzer

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants