Skip to content

Support for Re-Ranking Queries using Late Interaction Model Multi-Vectors. #14729

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

vigyasharma
Copy link
Contributor

Late Interaction models, like ColBERT and ColPali, capture rich semantic interaction between documents and queries, and have been shown to outperform single-vector (no-interaction) models on search relevance. These models operate by using multi-vector representations for query (and document) embeddings.

One challenge with including late interaction models in search, has been working with multi-vectors at scale. This change provides an efficient workaround, by adding support to rerank results of a query using late interaction multi-vectors.

Typical envisioned use-case is to do the full corpus search using ANN search on single-valued vectors, followed by a second pass that reranks results using late-interaction multi-vector scores. This PR creates:

  1. A LateInteractionField that stores multi-vectors in BinaryDocValues
  2. A DoubleValuesSource to scores query and document multi-vectors.
  3. A FunctionScore query that wraps a provided query and reranks its result with late-interaction model scores.

Note: This first approach does not add additional metadata to FieldInfo. As a result, we are unable to ensure consistency in shape for multi-vector indexed in the same field across documents.

Copy link

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

@github-actions github-actions bot added this to the 10.3.0 milestone May 29, 2025
@vigyasharma
Copy link
Contributor Author

This change builds on the work shared here by @jimczi, thanks Jim!

@benwtrent
Copy link
Member

I like this idea and its a good starting point for allowing late-interaction brute-force ranking (e.g reranking).

@romseygeek
Copy link
Contributor

Typical envisioned use-case is to do the full corpus search using ANN search on single-valued vectors, followed by a second pass that reranks results using late-interaction multi-vector scores.

This sounds like it would fit nicely with the Rescorer infrastructure? Looking more closely, it seems that Rescorers don't currently support adjusting scores via a DoubleValuesSource, but this shouldn't be too tricky to add (and indeed, the existing Rescorer implementations could quite easily be reworked to use DoubleValuesSource everywhere which would cut down on a lot of code duplication).

@vigyasharma
Copy link
Contributor Author

This sounds like it would fit nicely with the Rescorer infrastructure?

I wasn't aware of Query Rescorers, thanks @romseygeek for pointing me in that direction. At the outset, it seems like we could create a DoubleValuesSourceRescorer extends Rescorer – that rescores based on the DoubleValuesSource instead of a second query.

However, this is exactly what the FunctionScoreQuery does, and the benefits of one over the other are not immediately obvious to me. I seeFunctionScoreQuery being a Query, exposes a SegmentCacheable weight, which could be useful? Though I believe we only cache hits (disi) and not scores, so maybe it doesn't matter for this use case?

Do you see any benefits to using Rescorer over FunctionScoreQuery ?

@romseygeek
Copy link
Contributor

The advantage of a Rescorer is that is is explicitly only run over the hits in a TopDocs instance, whereas FunctionScoreQuery will run over the entire docid space if you let it. So it's a natural fit for a late-interaction search process - run your first query over the whole document set to get a preliminary top-k, and then pass the resulting TopDocs to your rescorer.

@vigyasharma
Copy link
Contributor Author

The advantage of a Rescorer is that is is explicitly only run over the hits in a TopDocs instance, whereas FunctionScoreQuery will run over the entire docid space if you let it.

Makes sense. I was thinking of only the knn vector queries that return top-N matches only, but reranking can apply more generally to any query.

I've raised #14776 to add support for DoubleValuesSource based rescorers. Once that is merged, I'll modify this PR to use DoubleValuesSourceRescorer instead of a FunctionScoreQuery.

@benwtrent
Copy link
Member

@vigyasharma I really like how this is evolving and how we are unifying these vector scoring providers. Great stuff!

@mingshl
Copy link

mingshl commented Jun 26, 2025

This is an exciting feature!! I think it's great idea to create a LateInteractionField that can store multi-vector values in the documents, but I have a question in search, when I try to look at the Function score query, it seems that for search request, it first runs a knn query, then we fetch the top N documents for maxSim reranker.

In this case, we need to use a single vector for knn query, and then use multi-vectors for reranking.

If we want to use late interaction model for search, after we get the multi-vectors from the model and try to construct the function score query,

first, we need a way to pool the multi-vector into single vector, and put into knn query
second, we would put the multi-vector into the lateInteractionFloatRerankQuery

I can see this way, we can save a lot of computing for too many MaxSim calculation, when the array of vector is big.

I am thinking if that worths a new query type that can handle both pooling and rerank, basically the above two steps together.

Arrays.stream(knnHits.scoreDocs).map(k -> k.doc).collect(Collectors.toSet());
FunctionScoreQuery lateIQuery =
FunctionScoreQuery.lateInteractionFloatRerankQuery(
knnQuery, LATE_I_FIELD, lateIQueryVector, vectorSimilarityFunction);
TopDocs lateIHits = s.search(lateIQuery, 10);
StoredFields storedFields = reader.storedFields();

Copy link
Contributor

@dungba88 dungba88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this implementation! It's much simpler and decomposable than #13525.

}

/** Defines the function to compute similarity score between query and document multi-vectors */
public enum ScoreFunction {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this follows the VectorSimilarityFunction convention to define function as enum. But have we thought about allowing a @FunctionalInternal as parameter to allow custom scoring without having to modify Lucene core? It's not a strong opinion, as I guess people can still define their own DoubleValueSource (but maybe will have to duplicate this code).

if (q.length != d.length) {
throw new IllegalArgumentException(
"Provided multi-vectors are incompatible. "
+ "Their composing token vectors should have the same dimension, got "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: maybe we can say got query dimension = ..., document dimension = ...

@vigyasharma
Copy link
Contributor Author

@mingshl you're right, this only adds support to rerank the results of an initial query using late interaction multi-vectors. We would typically need to first run a knn vector query on some indexed vector field, then rerank results using the late interaction field.

...

I am thinking if that worths a new query type that can handle both pooling and rerank, basically the above two steps together.

This is an interesting idea. We would also have to index the pooled vector values. So essentially have two fields in Lucene, a KnnFloatVectorField that indexes the pooled single-vector value into an hnsw graph for searching, and another LateInteractionField that stores the un-pooled multi-vectors for reranking.

It seems useful to provide convenience wrappers such that users only provide the late-interaction multi-vectors, and they are internally pooled, indexed, stored and queried through the two fields. I'm not sure if this should be in Lucene, or in higher level layers like OpenSearch/Elasticsearch/Solr. We should pick this up in a spin-off issue.

@mingshl
Copy link

mingshl commented Jun 26, 2025

@mingshl you're right, this only adds support to rerank the results of an initial query using late interaction multi-vectors. We would typically need to first run a knn vector query on some indexed vector field, then rerank results using the late interaction field.

I am thinking if that worths a new query type that can handle both pooling and rerank, basically the above two steps together.

This is an interesting idea. We would also have to index the pooled vector values. So essentially have two fields in Lucene, a KnnFloatVectorField that indexes the pooled single-vector value into an hnsw graph for searching, and another LateInteractionField that stores the un-pooled multi-vectors for reranking.

It seems useful to provide convenience wrappers such that users only provide the late-interaction multi-vectors, and they are internally pooled, indexed, stored and queried through the two fields. I'm not sure if this should be in Lucene, or in higher level layers like OpenSearch/Elasticsearch/Solr. We should pick this up in a spin-off issue.

Agree that the process of the pooled, indexed, stored and queried job should be the work of higher level layer. A new query type and an ingest processor would help this case. Let's move the discussion to separate RFC opensearch-project/OpenSearch#18091

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants