-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Support for Re-Ranking Queries using Late Interaction Model Multi-Vectors. #14729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. |
I like this idea and its a good starting point for allowing late-interaction brute-force ranking (e.g reranking). |
This sounds like it would fit nicely with the |
I wasn't aware of Query Rescorers, thanks @romseygeek for pointing me in that direction. At the outset, it seems like we could create a However, this is exactly what the Do you see any benefits to using |
The advantage of a |
Makes sense. I was thinking of only the knn vector queries that return top-N matches only, but reranking can apply more generally to any query. I've raised #14776 to add support for |
@vigyasharma I really like how this is evolving and how we are unifying these vector scoring providers. Great stuff! |
This is an exciting feature!! I think it's great idea to create a In this case, we need to use a single vector for knn query, and then use multi-vectors for reranking. If we want to use late interaction model for search, after we get the multi-vectors from the model and try to construct the function score query, first, we need a way to pool the multi-vector into single vector, and put into knn query I can see this way, we can save a lot of computing for too many MaxSim calculation, when the array of vector is big. I am thinking if that worths a new query type that can handle both pooling and rerank, basically the above two steps together. lucene/lucene/queries/src/test/org/apache/lucene/queries/function/TestFunctionScoreQuery.java Lines 447 to 452 in 8dc05f4
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this implementation! It's much simpler and decomposable than #13525.
} | ||
|
||
/** Defines the function to compute similarity score between query and document multi-vectors */ | ||
public enum ScoreFunction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this follows the VectorSimilarityFunction convention to define function as enum. But have we thought about allowing a @FunctionalInternal
as parameter to allow custom scoring without having to modify Lucene core? It's not a strong opinion, as I guess people can still define their own DoubleValueSource
(but maybe will have to duplicate this code).
if (q.length != d.length) { | ||
throw new IllegalArgumentException( | ||
"Provided multi-vectors are incompatible. " | ||
+ "Their composing token vectors should have the same dimension, got " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: maybe we can say got query dimension = ..., document dimension = ...
lucene/core/src/java/org/apache/lucene/search/LateInteractionFloatValuesSource.java
Show resolved
Hide resolved
@mingshl you're right, this only adds support to rerank the results of an initial query using late interaction multi-vectors. We would typically need to first run a knn vector query on some indexed vector field, then rerank results using the late interaction field. ...
This is an interesting idea. We would also have to index the pooled vector values. So essentially have two fields in Lucene, a It seems useful to provide convenience wrappers such that users only provide the late-interaction multi-vectors, and they are internally pooled, indexed, stored and queried through the two fields. I'm not sure if this should be in Lucene, or in higher level layers like OpenSearch/Elasticsearch/Solr. We should pick this up in a spin-off issue. |
Agree that the process of the pooled, indexed, stored and queried job should be the work of higher level layer. A new query type and an ingest processor would help this case. Let's move the discussion to separate RFC opensearch-project/OpenSearch#18091 |
Late Interaction models, like ColBERT and ColPali, capture rich semantic interaction between documents and queries, and have been shown to outperform single-vector (no-interaction) models on search relevance. These models operate by using multi-vector representations for query (and document) embeddings.
One challenge with including late interaction models in search, has been working with multi-vectors at scale. This change provides an efficient workaround, by adding support to rerank results of a query using late interaction multi-vectors.
Typical envisioned use-case is to do the full corpus search using ANN search on single-valued vectors, followed by a second pass that reranks results using late-interaction multi-vector scores. This PR creates:
Note: This first approach does not add additional metadata to
FieldInfo
. As a result, we are unable to ensure consistency in shape for multi-vector indexed in the same field across documents.