Skip to content

Implement support for weighted rrf #130658

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

mridula-s109
Copy link
Contributor

@mridula-s109 mridula-s109 commented Jul 4, 2025

Implement support for weighted RRF

Summary

This PR implements support for weighted RRF (Reciprocal Rank Fusion) retrievers, allowing users to specify custom weights for each sub-retriever within an RRF retriever configuration. This addresses a common customer request to customize the influence of different retrievers in the RRF scoring process.

Core Implementation

  • Enhanced RRFRetrieverBuilder: Extended to support both weighted and non-weighted retriever configurations
  • New RRFRetrieverComponent: Added weight validation and handling for individual retrievers
  • Backward Compatibility: Maintains support for existing RRF configurations without weights (default weight = 1.0)

Key Features

1. Weighted Retriever Support

Users can now specify weights for individual retrievers:

{
  "retriever": {
    "rrf": {
      "retrievers": [
        {
          "retriever": {
            "standard": { 
              "query": { "match": { "description": "pizza" } } 
            }
          },
          "weight": 0.7
        },
        {
          "retriever": {
            "knn": { 
              "field": "vector", 
              "query_vector": [1,2,3], 
              "k": 10 
            }
          },
          "weight": 0.3
        }
      ]
    }
  }
}

@mridula-s109 mridula-s109 marked this pull request as draft July 4, 2025 23:39
@Mikep86 Mikep86 self-requested a review July 15, 2025 13:32
@mridula-s109 mridula-s109 force-pushed the SEARCH-1026-implement-support-for-weighted-rrf branch from f1eede5 to 0640099 Compare July 17, 2025 17:05
Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice progress, this is looking better. The hybrid object parsing is definitely a beast.

Comment on lines +60 to +68
static final ConstructingObjectParser<RRFRetrieverComponent, RetrieverParserContext> PARSER = new ConstructingObjectParser<>(
"rrf_component",
false,
(args, context) -> {
RetrieverBuilder retrieverBuilder = (RetrieverBuilder) args[0];
Float weight = (Float) args[1];
return new RRFRetrieverComponent(retrieverBuilder, weight);
}
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we actually use this?

Copy link
Contributor Author

@mridula-s109 mridula-s109 Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used in the static method:
public static RRFRetrieverComponent fromXContent(XContentParser parser, RetrieverParserContext context) which typically calls:
return PARSER.apply(parser, context);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that's where it's normally used, but where is it actually used in this implementation? RRFRetrieverComponen#fromXContent, in its current form, doesn't use it.

Comment on lines +96 to +139
if (RETRIEVER_FIELD.match(firstFieldName, parser.getDeprecationHandler())
|| WEIGHT_FIELD.match(firstFieldName, parser.getDeprecationHandler())) {
// This is a structured component - parse manually
RetrieverBuilder retriever = null;
Float weight = null;

do {
String fieldName = parser.currentName();
if (RETRIEVER_FIELD.match(fieldName, parser.getDeprecationHandler())) {
if (retriever != null) {
throw new ParsingException(parser.getTokenLocation(), "only one retriever can be specified");
}
parser.nextToken();
parser.nextToken();
String retrieverType = parser.currentName();
retriever = parser.namedObject(RetrieverBuilder.class, retrieverType, context);
context.trackRetrieverUsage(retriever.getName());
parser.nextToken();
} else if (WEIGHT_FIELD.match(fieldName, parser.getDeprecationHandler())) {
if (weight != null) {
throw new ParsingException(parser.getTokenLocation(), "[weight] field can only be specified once");
}
parser.nextToken();
weight = parser.floatValue();
} else {
if (retriever != null) {
throw new ParsingException(parser.getTokenLocation(), "only one retriever can be specified");
}
throw new ParsingException(
parser.getTokenLocation(),
"unknown field [{}], expected [{}] or [{}]",
fieldName,
RETRIEVER_FIELD.getPreferredName(),
WEIGHT_FIELD.getPreferredName()
);
}
} while (parser.nextToken() == XContentParser.Token.FIELD_NAME);

if (retriever == null) {
throw new ParsingException(parser.getTokenLocation(), "retriever component must contain a retriever");
}

return new RRFRetrieverComponent(retriever, weight);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is complex, but is there an opportunity to use a ConstructingObjectParser here once we know this is a structured component? @ioanatia WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am bit skeptical on that change, lets see what @ioanatia thinks as well.

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of cleanup left, but this is coming along 👍

retrievers.add(retriever.retrieverSource());
}
float[] weights = new float[retrievers.size()];
Arrays.fill(weights, 1.0f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to hard-code a weight of 1.0 here. The weight validator ensures that every WeightedRetrieverSource's weight is 1.0, so we can use that to populate the weights array.

Comment on lines -253 to +301
).stream().map(RetrieverSource::from).toList();
);

if (fieldsInnerRetrievers.isEmpty() == false) {
// TODO: This is a incomplete solution as it does not address other incomplete copy issues
// (such as dropping the retriever name and min score)
rewritten = new RRFRetrieverBuilder(fieldsInnerRetrievers, rankWindowSize, rankConstant);
int size = fieldsInnerRetrievers.size();
List<RetrieverSource> sources = new ArrayList<>(size);
float[] weights = new float[size];
Arrays.fill(weights, RRFRetrieverComponent.DEFAULT_WEIGHT);
for (int i = 0; i < size; i++) {
sources.add(RetrieverSource.from(fieldsInnerRetrievers.get(i)));
weights[i] = RRFRetrieverComponent.DEFAULT_WEIGHT;
}
rewritten = new RRFRetrieverBuilder(sources, null, null, rankWindowSize, rankConstant, weights);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a functional difference, but we could clean this up a bit and remove some duplicated logic:

  • Keep .stream().map(RetrieverSource::from).toList() to build a list of RetrieverSources. We need that anyways.
  • Use createDefaultWeights to create the weights array.

Comment on lines +60 to +68
static final ConstructingObjectParser<RRFRetrieverComponent, RetrieverParserContext> PARSER = new ConstructingObjectParser<>(
"rrf_component",
false,
(args, context) -> {
RetrieverBuilder retrieverBuilder = (RetrieverBuilder) args[0];
Float weight = (Float) args[1];
return new RRFRetrieverComponent(retrieverBuilder, weight);
}
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that's where it's normally used, but where is it actually used in this implementation? RRFRetrieverComponen#fromXContent, in its current form, doesn't use it.

Comment on lines +65 to +67
for (int i = 0; i < innerRetrievers.size(); i++) {
weights[i] = randomFloat();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we could populate the weights array in the while (retrieverCount > 0) loop

Comment on lines +228 to +254
public void testRRFRetrieverParsingWithDefaultWeights() throws IOException {
String restContent = """
{
"retriever": {
"rrf": {
"retrievers": [
{
"test": {
"value": "first"
}
},
{
"test": {
"value": "second"
}
}
],
"rank_window_size": 100,
"rank_constant": 10,
"min_score": 20.0,
"_name": "foo_rrf"
}
}
}
""";
checkRRFRetrieverParsing(restContent);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this test functionally different than testRRFRetrieverParsing?

}
""";

expectParsingException(negativeWeightContent, "weight] must be non-negative");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Missing a [ here

Comment on lines +101 to +106
try (XContentParser parser = createParser(JsonXContent.jsonXContent, legacyJson)) {
SearchSourceBuilder ssb = new SearchSourceBuilder().parseXContent(parser, true, nf -> true);
assertThat(ssb.retriever(), instanceOf(RRFRetrieverBuilder.class));
RRFRetrieverBuilder rrf = (RRFRetrieverBuilder) ssb.retriever();
assertArrayEquals(new float[] { 1.0f, 1.0f }, rrf.weights(), 0.001f);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we could factor this duplicated code out. It doesn't necessarily need to be refactored into a named method, we could make a local BiConsumer (or CheckedBiConsumer) that takes the JSON string and the expected weight array.

- match: { hits.hits.0._id: "1" }

---
"Weighted RRF retriever defaults to weight 1":
Copy link
Contributor

@Mikep86 Mikep86 Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very big nit here, but this test method is misleading. It doesn't actually check that that the default weight is one (testRRFRetrieverParsingSyntax does that). It only checks that weight is optional. Maybe change the test name to be more accurate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test where we show that we can use weight to boost a document in the result set? In other words, show that by changing weight we can change the result order in an expected way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :SearchOrg/Relevance Label for the Search (solution/org) Relevance team Team:Search - Relevance The Search organization Search Relevance team Team:SearchOrg Meta label for the Search Org (Enterprise Search) v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants