-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Simplified Linear and RRF Retrievers Docs #130559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Mikep86
merged 11 commits into
elastic:main
from
Mikep86:simplified-linear-and-rrf-retrievers-docs
Jul 8, 2025
Merged
Changes from 9 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
da1e049
Added documentation about the multi-field query format
Mikep86 10873a3
Added field boosting anchor
Mikep86 5c70949
Added links to examples
Mikep86 a88ab17
Add applies to tags
Mikep86 3608990
Remove "exclusively"
Mikep86 c982b24
Normalizer documentation adjustments
Mikep86 35248db
Add better score breakdown example
Mikep86 79149a0
Change field boosting section name
Mikep86 d4de19d
Add anchor for wildcard field patterns section
Mikep86 8d9db25
Change wording
Mikep86 c467c1f
Temporarily comment out broken links
Mikep86 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -258,13 +258,55 @@ | |
|
||
#### Parameters [linear-retriever-parameters] | ||
|
||
::::{note} | ||
Either `query` or `retrievers` must be specified. | ||
Combining `query` and `retrievers` is not supported. | ||
:::: | ||
|
||
`query` {applies_to}`stack: ga 9.1` | ||
: (Optional, String) | ||
|
||
The query to use when using the [multi-field query format](#multi-field-query-format). | ||
|
||
`fields` {applies_to}`stack: ga 9.1` | ||
: (Optional, array of strings) | ||
|
||
The fields to query when using the [multi-field query format](#multi-field-query-format). | ||
Fields can include boost values using the `^` notation (e.g., `"field^2"`). | ||
If not specified, uses the index's default fields from the `index.query.default_field` index setting, which is `*` by default. | ||
|
||
`normalizer` {applies_to}`stack: ga 9.1` | ||
: (Optional, String) | ||
|
||
The normalizer to use when using the [multi-field query format](#multi-field-query-format). | ||
See [normalizers](#linear-retriever-normalizers) for supported values. | ||
Required when `query` is specified. | ||
|
||
::::{warning} | ||
Avoid using `none` as that will disable normalization and may bias the result set towards lexical matches. | ||
See [field grouping](#multi-field-field-grouping) for more information. | ||
:::: | ||
|
||
`retrievers` | ||
: (Required, array of objects) | ||
: (Optional, array of objects) | ||
|
||
A list of the sub-retrievers' configuration, that we will take into account and whose result sets we will merge through a weighted sum. | ||
Each configuration can have a different weight and normalization depending on the specified retriever. | ||
|
||
A list of the sub-retrievers' configuration, that we will take into account and whose result sets we will merge through a weighted sum. Each configuration can have a different weight and normalization depending on the specified retriever. | ||
`rank_window_size` | ||
: (Optional, integer) | ||
|
||
This value determines the size of the individual result sets per query. A higher value will improve result relevance at the cost of performance. | ||
The final ranked result set is pruned down to the search request’s [size](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search#search-size-param). | ||
`rank_window_size` must be greater than or equal to `size` and greater than or equal to `1`. | ||
Defaults to 10. | ||
|
||
`filter` | ||
: (Optional, [query object or list of query objects](/reference/query-languages/querydsl.md)) | ||
|
||
Applies the specified [boolean query filter](/reference/query-languages/query-dsl/query-dsl-bool-query.md) to all of the specified sub-retrievers, according to each retriever’s specifications. | ||
|
||
Each entry specifies the following parameters: | ||
Each entry in the `retrievers` array specifies the following parameters: | ||
|
||
`retriever` | ||
: (Required, a `retriever` object) | ||
|
@@ -279,64 +321,74 @@ | |
`normalizer` | ||
: (Optional, String) | ||
|
||
- Specifies how we will normalize the retriever’s scores, before applying the specified `weight`. Available values are: `minmax`, `l2_norm`, and `none`. Defaults to `none`. | ||
Specifies how the retriever’s score will be normalized before applying the specified `weight`. | ||
See [normalizers](#linear-retriever-normalizers) for supported values. | ||
Defaults to `none`. | ||
|
||
* `none` | ||
* `minmax` : A `MinMaxScoreNormalizer` that normalizes scores based on the following formula | ||
See also [this hybrid search example](docs-content://solutions/search/retrievers-examples.md#retrievers-examples-linear-retriever) using a linear retriever on how to independently configure and apply normalizers to retrievers. | ||
|
||
``` | ||
score = (score - min) / (max - min) | ||
``` | ||
#### Normalizers [linear-retriever-normalizers] | ||
|
||
* `l2_norm` : An `L2ScoreNormalizer` that normalizes scores using the L2 norm of the score values. | ||
The `linear` retriever supports the following normalizers: | ||
|
||
See also [this hybrid search example](docs-content://solutions/search/retrievers-examples.md#retrievers-examples-linear-retriever) using a linear retriever on how to independently configure and apply normalizers to retrievers. | ||
* `none`: No normalization | ||
* `minmax`: Normalizes scores based on the following formula: | ||
|
||
`rank_window_size` | ||
: (Optional, integer) | ||
``` | ||
score = (score - min) / (max - min) | ||
``` | ||
* `l2_norm`: Normalizes scores using the L2 norm of the score values | ||
|
||
This value determines the size of the individual result sets per query. A higher value will improve result relevance at the cost of performance. The final ranked result set is pruned down to the search request’s [size](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search#search-size-param). `rank_window_size` must be greater than or equal to `size` and greater than or equal to `1`. Defaults to the `size` parameter. | ||
|
||
## RRF Retriever [rrf-retriever] | ||
|
||
`filter` | ||
: (Optional, [query object or list of query objects](/reference/query-languages/querydsl.md)) | ||
An [RRF](/reference/elasticsearch/rest-apis/reciprocal-rank-fusion.md) retriever returns top documents based on the RRF formula, equally weighting two or more child retrievers. | ||
Reciprocal rank fusion (RRF) is a method for combining multiple result sets with different relevance indicators into a single result set. | ||
|
||
Applies the specified [boolean query filter](/reference/query-languages/query-dsl/query-dsl-bool-query.md) to all of the specified sub-retrievers, according to each retriever’s specifications. | ||
|
||
#### Parameters [rrf-retriever-parameters] | ||
|
||
::::{note} | ||
Either `query` or `retrievers` must be specified. | ||
Combining `query` and `retrievers` is not supported. | ||
:::: | ||
|
||
## RRF Retriever [rrf-retriever] | ||
`query` {applies_to}`stack: ga 9.1` | ||
: (Optional, String) | ||
|
||
An [RRF](/reference/elasticsearch/rest-apis/reciprocal-rank-fusion.md) retriever returns top documents based on the RRF formula, equally weighting two or more child retrievers. Reciprocal rank fusion (RRF) is a method for combining multiple result sets with different relevance indicators into a single result set. | ||
The query to use when using the [multi-field query format](#multi-field-query-format). | ||
|
||
`fields` {applies_to}`stack: ga 9.1` | ||
: (Optional, array of strings) | ||
|
||
#### Parameters [rrf-retriever-parameters] | ||
The fields to query when using the [multi-field query format](#multi-field-query-format). | ||
If not specified, uses the index's default fields from the `index.query.default_field` index setting, which is `*` by default. | ||
|
||
`retrievers` | ||
: (Required, array of retriever objects) | ||
|
||
A list of child retrievers to specify which sets of returned top documents will have the RRF formula applied to them. Each child retriever carries an equal weight as part of the RRF formula. Two or more child retrievers are required. | ||
: (Optional, array of retriever objects) | ||
|
||
A list of child retrievers to specify which sets of returned top documents will have the RRF formula applied to them. | ||
Each child retriever carries an equal weight as part of the RRF formula. Two or more child retrievers are required. | ||
|
||
`rank_constant` | ||
: (Optional, integer) | ||
|
||
This value determines how much influence documents in individual result sets per query have over the final ranked result set. A higher value indicates that lower ranked documents have more influence. This value must be greater than or equal to `1`. Defaults to `60`. | ||
|
||
|
||
`rank_window_size` | ||
: (Optional, integer) | ||
|
||
This value determines the size of the individual result sets per query. A higher value will improve result relevance at the cost of performance. The final ranked result set is pruned down to the search request’s [size](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search#search-size-param). `rank_window_size` must be greater than or equal to `size` and greater than or equal to `1`. Defaults to the `size` parameter. | ||
|
||
This value determines the size of the individual result sets per query. | ||
A higher value will improve result relevance at the cost of performance. | ||
The final ranked result set is pruned down to the search request’s [size](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search#search-size-param). | ||
`rank_window_size` must be greater than or equal to `size` and greater than or equal to `1`. | ||
Defaults to 10. | ||
kderusso marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
`filter` | ||
: (Optional, [query object or list of query objects](/reference/query-languages/querydsl.md)) | ||
|
||
Applies the specified [boolean query filter](/reference/query-languages/query-dsl/query-dsl-bool-query.md) to all of the specified sub-retrievers, according to each retriever’s specifications. | ||
|
||
|
||
|
||
### Example: Hybrid search [rrf-retriever-example-hybrid] | ||
|
||
A simple hybrid search example (lexical search + dense vector search) combining a `standard` retriever with a `knn` retriever using RRF: | ||
|
@@ -976,6 +1028,181 @@ | |
} | ||
``` | ||
|
||
## Multi-field query format [multi-field-query-format] | ||
```yaml {applies_to} | ||
stack: ga 9.1 | ||
``` | ||
|
||
The `linear` and `rrf` retrievers support a multi-field query format that provides a simplified way to define searches across multiple fields without explicitly specifying inner retrievers. | ||
This format automatically generates appropriate inner retrievers based on the field types and query parameters. | ||
This is a great way to search an index, knowing little to nothing about its schema, while also handling normalization across lexical and semantic matches. | ||
|
||
### Field grouping [multi-field-field-grouping] | ||
|
||
The multi-field query format groups queried fields into two categories: | ||
|
||
- **Lexical fields**: fields that support term queries, such as `keyword` and `text` fields. | ||
- **Semantic fields**: [`semantic_text` fields](/reference/elasticsearch/mapping-reference/semantic-text.md). | ||
|
||
Each field group is queried separately and the scores/ranks are normalized such that each contributes 50% to the final score/rank. | ||
This is done to balance the importance of lexical and semantic fields. | ||
Mikep86 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Most indices contain more lexical than semantic fields, and without this grouping the results would often bias towards lexical field matches. | ||
|
||
::::{warning} | ||
In the `linear` retriever, this grouping relies on using a normalizer other than `none` (i.e., `minmax` or `l2_norm`). | ||
If you use the `none` normalizer, the scores across field groups will not be normalized and the results may be biased towards lexical field matches. | ||
:::: | ||
|
||
### Linear retriever field boosting [multi-field-field-boosting] | ||
|
||
When using the `linear` retriever, fields can be boosted using the `^` notation: | ||
|
||
```console | ||
GET books/_search | ||
{ | ||
"retriever": { | ||
"linear": { | ||
"query": "elasticsearch", | ||
"fields": [ | ||
"title^3", <1> | ||
"description^2", <2> | ||
"title_semantic", <3> | ||
"description_semantic^2" | ||
], | ||
"normalizer": "minmax" | ||
} | ||
} | ||
} | ||
``` | ||
|
||
1. 3x weight | ||
2. 2x weight | ||
3. 1x weight (default) | ||
|
||
Due to how the [field group scores](#multi-field-field-grouping) are normalized, per-field boosts have no effect on the range of the final score. | ||
Instead, they affect the importance of the field's score within its group. | ||
|
||
For example, if the schema looks like: | ||
|
||
```console | ||
PUT /books | ||
{ | ||
"mappings": { | ||
"properties": { | ||
"title": { | ||
"type": "text", | ||
"copy_to": "title_semantic" | ||
}, | ||
"description": { | ||
"type": "text", | ||
"copy_to": "description_semantic" | ||
}, | ||
"title_semantic": { | ||
"type": "semantic_text" | ||
}, | ||
"description_semantic": { | ||
"type": "semantic_text" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
And we run this query: | ||
|
||
```console | ||
GET books/_search | ||
{ | ||
"retriever": { | ||
"linear": { | ||
"query": "elasticsearch", | ||
"fields": [ | ||
"title", | ||
"description", | ||
"title_semantic", | ||
"description_semantic" | ||
], | ||
"normalizer": "minmax" | ||
} | ||
} | ||
} | ||
``` | ||
|
||
The score breakdown would be: | ||
|
||
* Lexical fields (50% of score): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice! |
||
* `title`: 50% of lexical fields group score, 25% of final score | ||
* `description`: 50% of lexical fields group score, 25% of final score | ||
* Semantic fields (50% of score): | ||
* `title_semantic`: 50% of semantic fields group score, 25% of final score | ||
* `description_semantic`: 50% of semantic fields group score, 25% of final score | ||
|
||
If we apply per-field boosts like so: | ||
|
||
```console | ||
GET books/_search | ||
{ | ||
"retriever": { | ||
"linear": { | ||
"query": "elasticsearch", | ||
"fields": [ | ||
"title^3", | ||
"description^2", | ||
"title_semantic", | ||
"description_semantic^2" | ||
], | ||
"normalizer": "minmax" | ||
} | ||
} | ||
} | ||
``` | ||
|
||
The score breakdown would change to: | ||
|
||
* Lexical fields (50% of score): | ||
* `title`: 60% of lexical fields group score, 30% of final score | ||
* `description`: 40% of lexical fields group score, 20% of final score | ||
* Semantic fields (50% of score): | ||
* `title_semantic`: 33% of semantic fields group score, 16.5% of final score | ||
* `description_semantic`: 66% of semantic fields group score, 33% of final score | ||
|
||
### Wildcard field patterns [multi-field-wildcard-field-patterns] | ||
|
||
Field names support the `*` wildcard character to match multiple fields: | ||
|
||
```console | ||
GET books/_search | ||
{ | ||
"retriever": { | ||
"rrf": { | ||
"query": "machine learning", | ||
"fields": [ | ||
"title*", <1> | ||
"*_text" <2> | ||
] | ||
} | ||
} | ||
} | ||
``` | ||
|
||
1. Match fields that start with `title` | ||
2. Match fields that end with `_text` | ||
|
||
Note, however, that wildcard field patterns will only resolve to fields that either: | ||
|
||
- Support term queries, such as `keyword` and `text` fields | ||
- Are `semantic_text` fields | ||
|
||
### Limitations | ||
|
||
- **Single index**: Multi-field queries currently work with single index searches only | ||
- **CCS (Cross Cluster Search)**: Multi-field queries do not support remote cluster searches | ||
|
||
### Examples | ||
|
||
- [RRF with the multi-field query format](docs-content://solutions/search/retrievers-examples.md#retrievers-examples-rrf-multi-field-query-format) | ||
Mikep86 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- [Linear retriever with the multi-field query format](docs-content://solutions/search/retrievers-examples.md#retrievers-examples-linear-multi-field-query-format) | ||
Mikep86 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Common usage guidelines [retriever-common-parameters] | ||
|
||
|
||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this warning go under the
Normalizers
section instead of params? Or be duplicated?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nevermind, I see you have it down under the field boosting section, I think 3x might be overkill 😉 But we could consider moving it. I'll leave that up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good where it is. Also, I'm reticent to include the qualifier of "when performing hybrid search" because:
We could add general advice about when normalization is required (outside of the multi-field query format), but this linear retriever example already does that (and we link to it on this page), so it seems repetitive.