Support for Re-Ranking Queries using Late Interaction Model Multi-Vectors. #14729

vigyasharma · 2025-05-29T00:12:33Z

Late Interaction models, like ColBERT and ColPali, capture rich semantic interaction between documents and queries, and have been shown to outperform single-vector (no-interaction) models on search relevance. These models operate by using multi-vector representations for query (and document) embeddings.

One challenge with including late interaction models in search, has been working with multi-vectors at scale. This change provides an efficient workaround, by adding support to rerank results of a query using late interaction multi-vectors.

Typical envisioned use-case is to do the full corpus search using ANN search on single-valued vectors, followed by a second pass that reranks results using late-interaction multi-vector scores. This PR creates:

A LateInteractionField that stores multi-vectors in BinaryDocValues
A DoubleValuesSource to scores query and document multi-vectors.
A FunctionScore query that wraps a provided query and reranks its result with late-interaction model scores.

Note: This first approach does not add additional metadata to FieldInfo. As a result, we are unable to ensure consistency in shape for multi-vector indexed in the same field across documents.

github-actions · 2025-05-29T00:13:28Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

vigyasharma · 2025-05-29T00:16:13Z

This change builds on the work shared here by @jimczi, thanks Jim!

benwtrent · 2025-05-29T12:44:54Z

I like this idea and its a good starting point for allowing late-interaction brute-force ranking (e.g reranking).

romseygeek · 2025-05-30T15:57:33Z

Typical envisioned use-case is to do the full corpus search using ANN search on single-valued vectors, followed by a second pass that reranks results using late-interaction multi-vector scores.

This sounds like it would fit nicely with the Rescorer infrastructure? Looking more closely, it seems that Rescorers don't currently support adjusting scores via a DoubleValuesSource, but this shouldn't be too tricky to add (and indeed, the existing Rescorer implementations could quite easily be reworked to use DoubleValuesSource everywhere which would cut down on a lot of code duplication).

vigyasharma · 2025-06-01T06:47:10Z

This sounds like it would fit nicely with the Rescorer infrastructure?

I wasn't aware of Query Rescorers, thanks @romseygeek for pointing me in that direction. At the outset, it seems like we could create a DoubleValuesSourceRescorer extends Rescorer – that rescores based on the DoubleValuesSource instead of a second query.

However, this is exactly what the FunctionScoreQuery does, and the benefits of one over the other are not immediately obvious to me. I seeFunctionScoreQuery being a Query, exposes a SegmentCacheable weight, which could be useful? Though I believe we only cache hits (disi) and not scores, so maybe it doesn't matter for this use case?

Do you see any benefits to using Rescorer over FunctionScoreQuery ?

romseygeek · 2025-06-01T15:18:28Z

The advantage of a Rescorer is that is is explicitly only run over the hits in a TopDocs instance, whereas FunctionScoreQuery will run over the entire docid space if you let it. So it's a natural fit for a late-interaction search process - run your first query over the whole document set to get a preliminary top-k, and then pass the resulting TopDocs to your rescorer.

vigyasharma · 2025-06-12T20:41:34Z

The advantage of a Rescorer is that is is explicitly only run over the hits in a TopDocs instance, whereas FunctionScoreQuery will run over the entire docid space if you let it.

Makes sense. I was thinking of only the knn vector queries that return top-N matches only, but reranking can apply more generally to any query.

I've raised #14776 to add support for DoubleValuesSource based rescorers. Once that is merged, I'll modify this PR to use DoubleValuesSourceRescorer instead of a FunctionScoreQuery.

benwtrent · 2025-06-13T11:27:57Z

@vigyasharma I really like how this is evolving and how we are unifying these vector scoring providers. Great stuff!

mingshl · 2025-06-26T04:56:33Z

This is an exciting feature!! I think it's great idea to create a LateInteractionField that can store multi-vector values in the documents, but I have a question in search, when I try to look at the Function score query, it seems that for search request, it first runs a knn query, then we fetch the top N documents for maxSim reranker.

In this case, we need to use a single vector for knn query, and then use multi-vectors for reranking.

If we want to use late interaction model for search, after we get the multi-vectors from the model and try to construct the function score query,

first, we need a way to pool the multi-vector into single vector, and put into knn query
second, we would put the multi-vector into the lateInteractionFloatRerankQuery

I can see this way, we can save a lot of computing for too many MaxSim calculation, when the array of vector is big.

I am thinking if that worths a new query type that can handle both pooling and rerank, basically the above two steps together.

lucene/lucene/queries/src/test/org/apache/lucene/queries/function/TestFunctionScoreQuery.java

Lines 447 to 452 in 8dc05f4

    
               Arrays.stream(knnHits.scoreDocs).map(k -> k.doc).collect(Collectors.toSet()); 
        
           FunctionScoreQuery lateIQuery = 
        
               FunctionScoreQuery.lateInteractionFloatRerankQuery( 
        
                   knnQuery, LATE_I_FIELD, lateIQueryVector, vectorSimilarityFunction); 
        
           TopDocs lateIHits = s.search(lateIQuery, 10); 
        
           StoredFields storedFields = reader.storedFields();

dungba88

I like this implementation! It's much simpler and decomposable than #13525.

dungba88 · 2025-06-26T04:29:22Z

lucene/core/src/java/org/apache/lucene/search/LateInteractionFloatValuesSource.java

+  }
+
+  /** Defines the function to compute similarity score between query and document multi-vectors */
+  public enum ScoreFunction {


I think this follows the VectorSimilarityFunction convention to define function as enum. But have we thought about allowing a @FunctionalInternal as parameter to allow custom scoring without having to modify Lucene core? It's not a strong opinion, as I guess people can still define their own DoubleValueSource (but maybe will have to duplicate this code).

dungba88 · 2025-06-26T11:34:12Z

lucene/core/src/java/org/apache/lucene/search/LateInteractionFloatValuesSource.java

+            if (q.length != d.length) {
+              throw new IllegalArgumentException(
+                  "Provided multi-vectors are incompatible. "
+                      + "Their composing token vectors should have the same dimension, got "


minor: maybe we can say got query dimension = ..., document dimension = ...

lucene/core/src/java/org/apache/lucene/search/LateInteractionFloatValuesSource.java

vigyasharma · 2025-06-26T18:35:54Z

@mingshl you're right, this only adds support to rerank the results of an initial query using late interaction multi-vectors. We would typically need to first run a knn vector query on some indexed vector field, then rerank results using the late interaction field.

...

I am thinking if that worths a new query type that can handle both pooling and rerank, basically the above two steps together.

This is an interesting idea. We would also have to index the pooled vector values. So essentially have two fields in Lucene, a KnnFloatVectorField that indexes the pooled single-vector value into an hnsw graph for searching, and another LateInteractionField that stores the un-pooled multi-vectors for reranking.

It seems useful to provide convenience wrappers such that users only provide the late-interaction multi-vectors, and they are internally pooled, indexed, stored and queried through the two fields. I'm not sure if this should be in Lucene, or in higher level layers like OpenSearch/Elasticsearch/Solr. We should pick this up in a spin-off issue.

mingshl · 2025-06-26T18:58:45Z

@mingshl you're right, this only adds support to rerank the results of an initial query using late interaction multi-vectors. We would typically need to first run a knn vector query on some indexed vector field, then rerank results using the late interaction field.

I am thinking if that worths a new query type that can handle both pooling and rerank, basically the above two steps together.

This is an interesting idea. We would also have to index the pooled vector values. So essentially have two fields in Lucene, a KnnFloatVectorField that indexes the pooled single-vector value into an hnsw graph for searching, and another LateInteractionField that stores the un-pooled multi-vectors for reranking.

It seems useful to provide convenience wrappers such that users only provide the late-interaction multi-vectors, and they are internally pooled, indexed, stored and queried through the two fields. I'm not sure if this should be in Lucene, or in higher level layers like OpenSearch/Elasticsearch/Solr. We should pick this up in a spin-off issue.

Agree that the process of the pooled, indexed, stored and queried job should be the work of higher level layer. A new query type and an ingest processor would help this case. Let's move the discussion to separate RFC opensearch-project/OpenSearch#18091

vigyasharma added 12 commits May 28, 2025 16:30

initial late interaction field

84cc999

impl for LateI values source, query and field

b235bbb

tests for late interaction field

0f0e544

LateI values source test and tidy

2d5525a

remove some null tests

12dfb10

test sumMaxSim score fn

bb79d09

late I query test

22d96cf

missing docstring

015116e

rename files

70c4812

docstring fix

a132aec

add lateI value source docString

f7dbefd

tidy

7e0f13a

github-project-automation bot added this to OpenSearch Lucene & Core Performance Tracking May 29, 2025

github-project-automation bot moved this to Open in OpenSearch Lucene & Core Performance Tracking May 29, 2025

github-actions bot added module:core/search module:queries labels May 29, 2025

changes entry

8dc05f4

github-actions bot added this to the 10.3.0 milestone May 29, 2025

vigyasharma mentioned this pull request Jun 12, 2025

Add a rescorer that uses DoubleValuesSource values to re-score first pass hits #14776

Merged

vigyasharma mentioned this pull request Jun 25, 2025

[FEATURE][RFC] Introduce MultiVector Field Type For Late-interaction Score opensearch-project/k-NN#2706

Open

dungba88 reviewed Jun 26, 2025

View reviewed changes

mingshl mentioned this pull request Jun 26, 2025

[Feature Request] Support for multi-stage retriever and re-ranker in OpenSearch to use late interaction embedding models like ColBert, ColPali etc. opensearch-project/OpenSearch#18091

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for Re-Ranking Queries using Late Interaction Model Multi-Vectors. #14729

Support for Re-Ranking Queries using Late Interaction Model Multi-Vectors. #14729

Uh oh!

vigyasharma commented May 29, 2025

Uh oh!

github-actions bot commented May 29, 2025

Uh oh!

vigyasharma commented May 29, 2025

Uh oh!

benwtrent commented May 29, 2025

Uh oh!

romseygeek commented May 30, 2025

Uh oh!

vigyasharma commented Jun 1, 2025

Uh oh!

romseygeek commented Jun 1, 2025

Uh oh!

vigyasharma commented Jun 12, 2025

Uh oh!

benwtrent commented Jun 13, 2025

Uh oh!

mingshl commented Jun 26, 2025 •

edited

Loading

Uh oh!

dungba88 left a comment

Uh oh!

dungba88 Jun 26, 2025

Uh oh!

dungba88 Jun 26, 2025

Uh oh!

Uh oh!

vigyasharma commented Jun 26, 2025

Uh oh!

mingshl commented Jun 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Support for Re-Ranking Queries using Late Interaction Model Multi-Vectors. #14729

Are you sure you want to change the base?

Support for Re-Ranking Queries using Late Interaction Model Multi-Vectors. #14729

Uh oh!

Conversation

vigyasharma commented May 29, 2025

Uh oh!

github-actions bot commented May 29, 2025

Uh oh!

vigyasharma commented May 29, 2025

Uh oh!

benwtrent commented May 29, 2025

Uh oh!

romseygeek commented May 30, 2025

Uh oh!

vigyasharma commented Jun 1, 2025

Uh oh!

romseygeek commented Jun 1, 2025

Uh oh!

vigyasharma commented Jun 12, 2025

Uh oh!

benwtrent commented Jun 13, 2025

Uh oh!

mingshl commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dungba88 left a comment

Choose a reason for hiding this comment

Uh oh!

dungba88 Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

dungba88 Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vigyasharma commented Jun 26, 2025

Uh oh!

mingshl commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mingshl commented Jun 26, 2025 •

edited

Loading

mingshl commented Jun 26, 2025 •

edited

Loading