-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Support semantic reranking using contextual snippets instead of entire field text #129369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💚 CLA has been signed |
cla/check |
...k/inference/rank/textsimilarity/TextSimilarityRerankingRankFeaturePhaseRankShardContext.java
Show resolved
Hide resolved
|
||
import static org.elasticsearch.search.rank.feature.RerankSnippetConfig.DEFAULT_NUM_SNIPPETS; | ||
|
||
public class TextSimilarityRerankingRankFeaturePhaseRankShardContext extends RerankingRankFeaturePhaseRankShardContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need the separation here between TextSimilarityRerankingRankFeaturePhaseRankShardContext
and RerankingRankFeaturePhaseRankShardContext
? I don't see RerankingRankFeaturePhaseRankShardContext
being used elsewhere so maybe we can just have TextSimilarityRerankingRankFeaturePhaseRankShardContext
and implement the full logic there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's a little extra, but it clearly conveys the fact that snippets only pertain to text similarity reranking and not other rerankers we may add in the future. If you feel strongly about it I will refactor back, but I don't think this is a bad thing per se.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just rename RerankingRankFeaturePhaseRankShardContext
into TextSimilarityRerankingRankFeaturePhaseRankShardContext
? I don't see what RerankingRankFeaturePhaseRankShardContext
provides as an abstract class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just rename RerankingRankFeaturePhaseRankShardContext into TextSimilarityRerankingRankFeaturePhaseRankShardContext? I don't see what RerankingRankFeaturePhaseRankShardContext provides as an abstract class.
@jimczi this class is referenced elsewhere including the random reranker. I personally think it's cleaner to have the separation, but if you feel strongly about it I will put snippet configuration in the parent class and remove the abstraction. I am commenting one more time instead of changing it, because I actually prefer this separation to putting things specific to text similarity that are used by the parent reranker. However, I won't die on that hill if you absolutely hate it.
server/src/main/java/org/elasticsearch/search/retriever/CompoundRetrieverBuilder.java
Outdated
Show resolved
Hide resolved
*/ | ||
public class SnippetRankInput implements Writeable { | ||
|
||
private final RerankSnippetConfig snippets; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the separation between SnippetRankInput
and RerankSnippetConfig
necessary? Why not injecting numSnippets
and snippetQueryBuilder
directly here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little convoluted because we don't allow size of snippets in the API, I'll see if I can clean it up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in d56726c
...in/java/org/elasticsearch/xpack/inference/rank/textsimilarity/TextSimilarityRankBuilder.java
Outdated
Show resolved
Hide resolved
private final SnippetRankInput snippetRankInput; | ||
|
||
// Rough approximation of token size vs. characters in highlight fragments. | ||
// TODO highlighter should be able to set fragment size by token not length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jimczi Here is the TODO I referenced in my earlier response 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the feature flag. I left some minor comments to address but the feature looks good to me.
|
||
import static org.elasticsearch.search.rank.feature.RerankSnippetConfig.DEFAULT_NUM_SNIPPETS; | ||
|
||
public class TextSimilarityRerankingRankFeaturePhaseRankShardContext extends RerankingRankFeaturePhaseRankShardContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just rename RerankingRankFeaturePhaseRankShardContext
into TextSimilarityRerankingRankFeaturePhaseRankShardContext
? I don't see what RerankingRankFeaturePhaseRankShardContext
provides as an abstract class.
...k/inference/rank/textsimilarity/TextSimilarityRerankingRankFeaturePhaseRankShardContext.java
Outdated
Show resolved
Hide resolved
...in/java/org/elasticsearch/xpack/inference/rank/textsimilarity/TextSimilarityRankBuilder.java
Outdated
Show resolved
Hide resolved
...in/java/org/elasticsearch/xpack/inference/rank/textsimilarity/TextSimilarityRankBuilder.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/rank/feature/RerankSnippetInput.java
Outdated
Show resolved
Hide resolved
…inference/rank/textsimilarity/TextSimilarityRankBuilder.java Co-authored-by: Jim Ferenczi <[email protected]>
…inference/rank/textsimilarity/TextSimilarityRerankingRankFeaturePhaseRankShardContext.java Co-authored-by: Jim Ferenczi <[email protected]>
Followup to the POC described in #128255
Adds the ability to rerank based on a smaller number of snippets.
Example, with a default of one snippet:
Example, specifying snippets:
Not specifying snippets will continue to send the entire field contents into the reranker model.