Skip to content

Add support for Lookup Join on Multiple Fields #131559

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 45 commits into from
Aug 13, 2025
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
b1ee80e
Lookup Join on Multiple Columns POC WIP
julian-elastic Jul 18, 2025
2c90817
Update docs/changelog/131559.yaml
julian-elastic Jul 18, 2025
4db37fd
Looking join on multiple fields WIP
julian-elastic Jul 21, 2025
ef894f3
Merge branch 'main' into lookupJoin
julian-elastic Jul 21, 2025
e367e8c
Fix more UTs
julian-elastic Jul 21, 2025
744479f
Bugfixes
julian-elastic Jul 21, 2025
c17c993
Bugfixes
julian-elastic Jul 21, 2025
d9462cc
Fix serialization error
julian-elastic Jul 21, 2025
a17fa88
Fix UT error
julian-elastic Jul 22, 2025
3fd48e4
Add more test datasets
julian-elastic Jul 22, 2025
64a07c7
Add UTs for join on 2,3,4 columns
julian-elastic Jul 22, 2025
ea171b1
Merge branch 'main' into lookupJoin
julian-elastic Jul 22, 2025
bae7007
Add handling for remote not supporting LOOKUP JOIN on multiple fields
julian-elastic Jul 23, 2025
6bd2937
Merge branch 'main' into lookupJoin
julian-elastic Jul 23, 2025
71adaa8
Change documentation
julian-elastic Jul 23, 2025
43aa7e1
Fix docs
julian-elastic Jul 24, 2025
7e0d8d7
Add more UTs
julian-elastic Jul 25, 2025
b6c615c
Merge branch 'main' into lookupJoin
julian-elastic Jul 25, 2025
5d9f68f
Address code review feedback
julian-elastic Jul 29, 2025
fc1c63b
Merge branch 'main' into lookupJoin
julian-elastic Jul 29, 2025
c179cea
Add Generative tests for Lookup Join On Multiple Columns
julian-elastic Jul 29, 2025
be2ce94
Merge branch 'main' into lookupJoin
julian-elastic Jul 29, 2025
59c16d9
Remove debugging code
julian-elastic Jul 29, 2025
dd52c02
Address a rare issue in Generative tests
julian-elastic Jul 30, 2025
8b2594b
Address docs issues
julian-elastic Jul 30, 2025
606c099
Merge branch 'main' into lookupJoin
julian-elastic Jul 30, 2025
e585342
Mode docs changes
julian-elastic Jul 30, 2025
72c3ad7
Merge branch 'main' into lookupJoin
julian-elastic Jul 30, 2025
f742160
Address code review feedback
julian-elastic Jul 30, 2025
ed6946b
Enhance LookupFromIndexIT
julian-elastic Jul 30, 2025
806933f
Fix failing UT
julian-elastic Jul 31, 2025
62956af
Merge branch 'main' into lookupJoin
julian-elastic Jul 31, 2025
326cb82
Address more code review comments
julian-elastic Jul 31, 2025
1dbe524
Address more code review comments, part 2
julian-elastic Jul 31, 2025
00b41ed
MatchConfig refactoring and add serialization test
julian-elastic Jul 31, 2025
a036aa6
bugfix
julian-elastic Jul 31, 2025
28d0c7c
Merge branch 'main' into lookupJoin
julian-elastic Jul 31, 2025
ce73957
Add HeapAttackIT cases with join on multiple fields
julian-elastic Aug 1, 2025
cfdb440
Merge branch 'main' into lookupJoin
julian-elastic Aug 1, 2025
96a6891
bugfix
julian-elastic Aug 1, 2025
d90799d
Update docs/changelog/131559.yaml
julian-elastic Aug 12, 2025
ec7f10a
Merge branch 'main' into lookupJoin
julian-elastic Aug 12, 2025
1cf34bc
Address code review comments
julian-elastic Aug 12, 2025
6acb8ef
Merge branch 'main' into lookupJoin
julian-elastic Aug 13, 2025
1102c1b
fix issue with docs
julian-elastic Aug 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/131559.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to declare this change as notable. I'm not sure what the bar is for that @leemthompo ?

Copy link
Contributor

@leemthompo leemthompo Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you and @tylerperk agree sounds like this passes the notable bar 😄 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leemthompo How do I declare this change as notable? Can you point me to an example? Or you add it to some other release notes list after it is merged?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alex-spies by notable you mean adding the release highlight label I guess?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leemthompo I saw that we flag important release notes as notable, like here. Is this normally added via the release highlight label?

Both seem to make sense, so I'll go and mark this as release highlight and see what the bot does :)

Copy link
Contributor

@leemthompo leemthompo Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flag important release notes as notable

TIL 😄

Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 131559
summary: Add support for LOOKUP JOIN on multiple fields
area: ES|QL
type: enhancement
issues: [ ]
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,22 @@ FROM <source_index>
| LOOKUP JOIN <lookup_index> ON <field_name>
```

```esql
FROM <source_index>
| LOOKUP JOIN <lookup_index> ON <field_name1>, <field_name2>, <field_name3>
```

**Parameters**

`<lookup_index>`
: The name of the lookup index. This must be a specific index name - wildcards, aliases, and remote cluster references are not supported. Indices used for lookups must be configured with the [`lookup` index mode](/reference/elasticsearch/index-settings/index-modules.md#index-mode-setting).

`<field_name>`
: The field to join on. This field must exist in both your current query results and in the lookup index. If the field contains multi-valued entries, those entries will not match anything (the added fields will contain `null` for those rows).
`<field_name>` or `<field_name1>, <field_name2>, <field_name3>`
: The field(s) to join on. Can be either:
* A single field name
* A comma-separated list of field names {applies_to}`stack: ga 9.2`
: These fields must exist in both your current query results and in the lookup index. If the fields contains multi-valued entries, those entries will not match anything (the added fields will contain `null` for those rows).


**Description**

Expand All @@ -32,7 +41,7 @@ results table by finding documents in a lookup index that share the same
join field value as your result rows.

For each row in your results table that matches a document in the lookup
index based on the join field, all fields from the matching document are
index based on the join fields, all fields from the matching document are
added as new columns to that row.

If multiple documents in the lookup index match a single row in your
Expand Down
13 changes: 8 additions & 5 deletions docs/reference/query-languages/esql/esql-lookup-join.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,14 @@ For example, you can use `LOOKUP JOIN` to:
The `LOOKUP JOIN` command adds fields from the lookup index as new columns to your results table based on matching values in the join field.

The command requires two parameters:
- The name of the lookup index (which must have the `lookup` [`index.mode setting`](/reference/elasticsearch/index-settings/index-modules.md#index-mode-setting))
- The name of the field to join on

* The name of the lookup index (which must have the `lookup` [`index.mode setting`](/reference/elasticsearch/index-settings/index-modules.md#index-mode-setting))
* The field(s) to join on. Can be either:
* A single field name
* A comma-separated list of field names {applies_to}`stack: ga 9.2`

```esql
LOOKUP JOIN <lookup_index> ON <field_name>
LOOKUP JOIN <lookup_index> ON <field_name> # Join on a single field
LOOKUP JOIN <lookup_index> ON <field_name1>, <field_name2>, <field_name3> # Join on multiple fields
```

:::{image} ../images/esql-lookup-join.png
Expand Down Expand Up @@ -200,7 +203,7 @@ The following are the current limitations with `LOOKUP JOIN`:
* Indices in [`lookup` mode](/reference/elasticsearch/index-settings/index-modules.md#index-mode-setting) are always single-sharded.
* Cross cluster search is unsupported initially. Both source and lookup indices must be local.
* Currently, only matching on equality is supported.
* `LOOKUP JOIN` can only use a single match field and a single index. Wildcards are not supported.
* In Stack versions `9.0-9.1`,`LOOKUP JOIN` can only use a single match field and a single index. Wildcards are not supported.
* Aliases, datemath, and datastreams are supported, as long as the index pattern matches a single concrete index {applies_to}`stack: ga 9.1.0`.
* The name of the match field in `LOOKUP JOIN lu_idx ON match_field` must match an existing field in the query. This may require `RENAME`s or `EVAL`s to achieve.
* The query will circuit break if there are too many matching documents in the lookup index, or if the documents are too large. More precisely, `LOOKUP JOIN` works in batches of, normally, about 10,000 rows; a large amount of heap space is needed if the matching documents from the lookup index for a batch are multiple megabytes or larger. This is roughly the same as for `ENRICH`.
Original file line number Diff line number Diff line change
Expand Up @@ -355,6 +355,7 @@ static TransportVersion def(int id) {
public static final TransportVersion PIPELINE_TRACKING_INFO = def(9_131_0_00);
public static final TransportVersion COMPONENT_TEMPLATE_TRACKING_INFO = def(9_132_0_00);
public static final TransportVersion TO_CHILD_BLOCK_JOIN_QUERY = def(9_133_0_00);
public static final TransportVersion ESQL_LOOKUP_JOIN_ON_MANY_FIELDS = def(9_134_0_00);

/*
* STOP! READ THIS FIRST! No, really,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
*/
public final class EnrichQuerySourceOperator extends SourceOperator {
private final BlockFactory blockFactory;
private final QueryList queryList;
private final LookupEnrichQueryGenerator queryList;
private int queryPosition = -1;
private final ShardContext shardContext;
private final IndexReader indexReader;
Expand All @@ -51,7 +51,7 @@ public final class EnrichQuerySourceOperator extends SourceOperator {
public EnrichQuerySourceOperator(
BlockFactory blockFactory,
int maxPageSize,
QueryList queryList,
LookupEnrichQueryGenerator queryList,
ShardContext shardContext,
Warnings warnings
) {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

package org.elasticsearch.compute.operator.lookup;

import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.Query;

import java.util.List;

/**
* A {@link LookupEnrichQueryGenerator} that combines multiple {@link QueryList}s into a single query.
* Each query in the resulting query will be a conjunction of all queries from the input lists at the same position.
* In the future we can extend this to support more complex expressions, such as disjunctions or negations.
*/
public class ExpressionQueryList implements LookupEnrichQueryGenerator {
private final List<QueryList> queryLists;

public ExpressionQueryList(List<QueryList> queryLists) {
if (queryLists.size() < 2) {
throw new IllegalArgumentException("ExpressionQueryList must have at least two QueryLists");
}
this.queryLists = queryLists;
}

@Override
public Query getQuery(int position) {
BooleanQuery.Builder builder = new BooleanQuery.Builder();
for (QueryList queryList : queryLists) {
Query q = queryList.getQuery(position);
if (q == null) {
// if any of the matchFields are null, it means there is no match for this position
// A AND NULL is always NULL, so we can skip this position
return null;
}
builder.add(q, BooleanClause.Occur.FILTER);
}
return builder.build();
}

@Override
public int getPositionCount() {
int positionCount = queryLists.get(0).getPositionCount();
for (QueryList queryList : queryLists) {
if (queryList.getPositionCount() != positionCount) {
throw new IllegalArgumentException(
"All QueryLists must have the same position count, expected: "
+ positionCount
+ ", but got: "
+ queryList.getPositionCount()
);
}
}
return positionCount;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

package org.elasticsearch.compute.operator.lookup;

import org.apache.lucene.search.Query;
import org.elasticsearch.core.Nullable;

/**
* An interface to generates queries for the lookup and enrich operators.
* This interface is used to retrieve queries based on a position index.
*/
public interface LookupEnrichQueryGenerator {

/**
* Returns the query at the given position.
*/
@Nullable
Query getQuery(int position);

/**
* Returns the number of queries in this generator
*/
int getPositionCount();

}
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@
/**
* Generates a list of Lucene queries based on the input block.
*/
public abstract class QueryList {
public abstract class QueryList implements LookupEnrichQueryGenerator {
protected final SearchExecutionContext searchExecutionContext;
protected final AliasFilter aliasFilter;
protected final MappedFieldType field;
Expand All @@ -74,7 +74,8 @@ protected QueryList(
/**
* Returns the number of positions in this query list
*/
int getPositionCount() {
@Override
public int getPositionCount() {
return block.getPositionCount();
}

Expand All @@ -87,7 +88,8 @@ int getPositionCount() {
*/
public abstract QueryList onlySingleValues(Warnings warnings, String multiValueWarningMessage);

final Query getQuery(int position) {
@Override
public final Query getQuery(int position) {
final int valueCount = block.getValueCount(position);
if (onlySingleValueParams != null && valueCount != 1) {
if (valueCount > 1) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,8 @@ public MultiClusterSpecIT(
"SortEvalBeforeLookup",
"SortBeforeAndAfterMultipleJoinAndMvExpand",
"LookupJoinAfterTopNAndRemoteEnrich",
"LookupJoinOnTwoFieldsAfterTop",
"LookupJoinOnTwoFieldsMultipleTimes",
// Lookup join after LIMIT is not supported in CCS yet
"LookupJoinAfterLimitAndRemoteEnrich"
);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -215,13 +215,24 @@ private List<String> availableIndices() throws IOException {
.toList();
}

public record LookupIdx(String idxName, String key, String keyType) {}
public record LookupIdxColumn(String name, String type) {}

public record LookupIdx(String idxName, List<LookupIdxColumn> keys) {}

private List<LookupIdx> lookupIndices() {
List<LookupIdx> result = new ArrayList<>();
// we don't have key info from the dataset loader, let's hardcode it for now
result.add(new LookupIdx("languages_lookup", "language_code", "integer"));
result.add(new LookupIdx("message_types_lookup", "message", "keyword"));
result.add(new LookupIdx("languages_lookup", List.of(new LookupIdxColumn("language_code", "integer"))));
result.add(new LookupIdx("message_types_lookup", List.of(new LookupIdxColumn("message", "keyword"))));
List<LookupIdxColumn> multiColumnJoinableLookupKeys = List.of(
new LookupIdxColumn("id_int", "integer"),
new LookupIdxColumn("name_str", "keyword"),
new LookupIdxColumn("is_active_bool", "boolean"),
new LookupIdxColumn("ip_addr", "ip"),
new LookupIdxColumn("other1", "keyword"),
new LookupIdxColumn("other2", "integer")
);
result.add(new LookupIdx("multi_column_joinable_lookup", multiColumnJoinableLookupKeys));
return result;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,15 @@
import org.elasticsearch.xpack.esql.qa.rest.generative.GenerativeRestTest;
import org.elasticsearch.xpack.esql.qa.rest.generative.command.CommandGenerator;

import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;

import static org.elasticsearch.test.ESTestCase.randomFrom;
import static org.elasticsearch.test.ESTestCase.randomInt;
import static org.elasticsearch.test.ESTestCase.randomSubsetOf;

public class LookupJoinGenerator implements CommandGenerator {

Expand All @@ -29,15 +34,47 @@ public CommandDescription generate(
) {
GenerativeRestTest.LookupIdx lookupIdx = randomFrom(schema.lookupIndices());
String lookupIdxName = lookupIdx.idxName();
String idxKey = lookupIdx.key();
String keyType = lookupIdx.keyType();
int joinColumnsCount = randomInt(lookupIdx.keys().size() - 1) + 1; // at least one column must be used for the join
List<GenerativeRestTest.LookupIdxColumn> joinColumns = randomSubsetOf(joinColumnsCount, lookupIdx.keys());
List<String> keyNames = new ArrayList<>();
List<String> joinOn = new ArrayList<>();
Set<String> usedColumns = new HashSet<>();
for (GenerativeRestTest.LookupIdxColumn joinColumn : joinColumns) {
String idxKey = joinColumn.name();
String keyType = joinColumn.type();

var candidateKeys = previousOutput.stream().filter(x -> x.type().equals(keyType)).toList();
if (candidateKeys.isEmpty()) {
var candidateKeys = previousOutput.stream().filter(x -> x.type().equals(keyType)).toList();
if (candidateKeys.isEmpty()) {
continue; // no candidate keys of the right type, skip this column
}
EsqlQueryGenerator.Column key = randomFrom(candidateKeys);
if (usedColumns.contains(key.name()) || usedColumns.contains(idxKey)) {
continue; // already used this column from the lookup index, or will discard the main index column by RENAME'ing below, skip
} else {
usedColumns.add(key.name());
usedColumns.add(idxKey);
}
keyNames.add(key.name());
joinOn.add(idxKey);
}
if (keyNames.isEmpty()) {
return EMPTY_DESCRIPTION;
}
EsqlQueryGenerator.Column key = randomFrom(candidateKeys);
String cmdString = "| rename " + key.name() + " as " + idxKey + " | lookup join " + lookupIdxName + " on " + idxKey;
StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < keyNames.size(); i++) {
stringBuilder.append("| rename ");
stringBuilder.append(keyNames.get(i));
stringBuilder.append(" as ");
stringBuilder.append(joinOn.get(i));
}
stringBuilder.append(" | lookup join ").append(lookupIdxName).append(" on ");
for (int i = 0; i < keyNames.size(); i++) {
stringBuilder.append(joinOn.get(i));
if (i < keyNames.size() - 1) {
stringBuilder.append(", ");
}
}
String cmdString = stringBuilder.toString();
return new CommandDescription(LOOKUP_JOIN, this, cmdString, Map.of());
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,16 @@ public class CsvTestsDataLoader {
private static final TestDataset HOSTS = new TestDataset("hosts");
private static final TestDataset APPS = new TestDataset("apps");
private static final TestDataset APPS_SHORT = APPS.withIndex("apps_short").withTypeMapping(Map.of("id", "short"));
private static final TestDataset MULTI_COLUMN_JOINABLE = new TestDataset(
"multi_column_joinable",
"mapping-multi_column_joinable.json",
"multi_column_joinable.csv"
);
private static final TestDataset MULTI_COLUMN_JOINABLE_LOOKUP = new TestDataset(
"multi_column_joinable_lookup",
"mapping-multi_column_joinable_lookup.json",
"multi_column_joinable_lookup.csv"
).withSetting("lookup-settings.json");
private static final TestDataset LANGUAGES = new TestDataset("languages");
private static final TestDataset LANGUAGES_LOOKUP = LANGUAGES.withIndex("languages_lookup").withSetting("lookup-settings.json");
private static final TestDataset LANGUAGES_LOOKUP_NON_UNIQUE_KEY = LANGUAGES_LOOKUP.withIndex("languages_lookup_non_unique_key")
Expand Down Expand Up @@ -219,7 +229,9 @@ public class CsvTestsDataLoader {
Map.entry(LOGS.indexName, LOGS),
Map.entry(MV_TEXT.indexName, MV_TEXT),
Map.entry(DENSE_VECTOR.indexName, DENSE_VECTOR),
Map.entry(COLORS.indexName, COLORS)
Map.entry(COLORS.indexName, COLORS),
Map.entry(MULTI_COLUMN_JOINABLE.indexName, MULTI_COLUMN_JOINABLE),
Map.entry(MULTI_COLUMN_JOINABLE_LOOKUP.indexName, MULTI_COLUMN_JOINABLE_LOOKUP)
);

private static final EnrichConfig LANGUAGES_ENRICH = new EnrichConfig("languages_policy", "enrich-policy-languages.json");
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
id_int,name_str,is_active_bool,ip_addr,extra1,extra2
1,Alice,true,192.168.1.1,foo,100
2,Bob,false,192.168.1.2,bar,200
3,Charlie,true,192.168.1.3,baz,300
4,David,false,192.168.1.4,qux,400
5,Eve,true,192.168.1.5,quux,500
6,,true,192.168.1.6,corge,600
7,Grace,false,,grault,700
8,Hank,true,192.168.1.8,garply,800
9,Ivy,false,192.168.1.9,waldo,900
10,John,true,192.168.1.10,fred,1000
,Kate,false,192.168.1.11,plugh,1100
[12],Liam,true,192.168.1.12,xyzzy,1200
13,Mia,false,192.168.1.13,thud,1300
[14],Nina,true,192.168.1.14,foo2,1400
15,Oscar,false,192.168.1.15,bar2,1500
[17,18],Olivia,true,192.168.1.17,xyz,17000
[1,19,21],Sophia,true,192.168.1.21,zyx,21000
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
id_int,name_str,is_active_bool,ip_addr,other1,other2
1,Alice,true,192.168.1.1,alpha,1000
1,Alice,true,192.168.1.2,beta,2000
2,Bob,false,192.168.1.3,gamma,3000
3,Charlie,true,192.168.1.3,delta,4000
3,Charlie,false,192.168.1.3,epsilon,5000
4,David,false,192.168.1.4,zeta,6000
5,Eve,true,192.168.1.5,eta,7000
5,Eve,true,192.168.1.5,theta,8000
6,,true,192.168.1.6,iota,9000
7,Grace,false,,kappa,10000
8,Hank,true,192.168.1.8,lambda,11000
,Kate,false,192.168.1.11,mu,12000
12,Liam,true,192.168.1.12,nu,13000
13,Mia,false,192.168.1.13,xi,14000
[14],Nina,true,192.168.1.14,omicron,15000
16,Paul,true,192.168.1.16,pi,16000
[17,18],Olivia,true,192.168.1.17,rho,17000
[1,19,20],Sophia,true,192.168.1.21,sigma,21000
Loading
Loading