Skip to content

Add AnytimeRankingSearcher for SLA-Aware Early Termination with Bin-Based Score Boosting #14525

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 47 commits into
base: main
Choose a base branch
from

Conversation

atris
Copy link
Contributor

@atris atris commented Apr 18, 2025

Add AnytimeRankingSearcher for SLA-aware early termination with bin-based score boosting

This patch adds AnytimeRankingSearcher, a new low-latency search implementation that supports early termination under SLA constraints, combined with bin-aware score boosting.

Architecture

Index-time binning uses a configurable post-indexing pass to assign each document to one of bin.count bins. This pass is activated via field attributes (doBinning=true, bin.count=N, etc.) and is triggered after all standard postings are written. Binning uses a segment-local sparse similarity graph where each node is a document and edges represent cosine similarity between term frequency vectors.

The bin distribution is computed via recursive graph bisection. The graph is recursively split into halves using a seeded heuristic that assigns each document to the closer of two seed nodes based on edge weights. This ensures intra-bin similarity and minimizes cross-bin connectivity. A fixed number of bins is produced, and the assignment is saved to a .binmap file.

In approximate mode (graph.builder=approx), we avoid building explicit term vectors. Instead, token co-occurrence is tracked using per-term BitSets, and documents are grouped using lightweight overlap heuristics. This trades off precision for speed and scales better on large segments.

At search time, BinMapReader loads the bin assignments, and BinScoreReader makes them accessible to search collectors. BinBoostCalculator assigns a boost score to each bin based on estimated bin quality (e.g. average term frequency or rank share in a warmup run). This boost is applied additively during ranking, allowing the collector to prioritize high-quality bins earlier and exit faster under SLA pressure.

Binning Modes (Index Time)

This patch supports two modes of document binning during indexing:
• Absolute mode: computes exact bin assignments using full document similarity graphs.
• Approximate mode: enabled when document count exceeds a threshold; skips graph construction and uses faster heuristics to assign bins.

Bin assignment is handled by DocBinningGraphBuilder and switches to ApproximateDocGraphBuilder automatically when needed.

To enable binning, field attributes must be set:

fieldType.putAttribute("postingsFormat", "Lucene103");
fieldType.putAttribute("doBinning", "true");
fieldType.putAttribute("bin.count", "4"); // total number of bins
fieldType.putAttribute("bin.builder", "exact" | "approx" | "auto"); // binning strategy

Search-Time Integration

At search time, bin boosts are loaded using BinScoreReader. To enable anytime ranking:

AnytimeRankingSearcher searcher = new AnytimeRankingSearcher(reader, topK, slaMs, fieldName);
TopDocs results = searcher.search(query);

Internally:
• Bin scores are applied per segment at query time.
• The collector monitors elapsed time and stops scoring once SLA is exhausted.

Test Coverage

Includes a full test (TestAnytimeRankingSearchQuality) that:

• Indexes 10k docs with periodic relevant content
• Runs baseline and anytime search
• Computes NDCG, precision, recall
• Asserts average and max position delta across result sets
• Verifies minimal degradation under SLA constraints

Performance

• AnytimeRankingSearcher provides ~2–3x speedup at low SLA targets
• Recall, precision, and NDCG remain within 95%+ of baseline
• Position delta of relevant docs remains bounded

Notes

• Readers are wrapped using BinScoreUtil.wrap(reader) to enable bin-aware scoring
• Compound readers are tracked and closed explicitly
• BinFilter skipping is not implemented yet — will be added in a follow-up patch
• Fallback to approximate binning ensures indexing remains scalable for large segments

Benchmarks

Search Latency (Top-K Retrieval)

Relevance Score Comparison

@atris
Copy link
Contributor Author

atris commented Apr 20, 2025

@jpountz This PR is now ready for review. I will post luceneutil benchmarks tomorrow. Please let me know if anything else is needed from me.

@jpountz
Copy link
Contributor

jpountz commented Apr 22, 2025

Can you link the paper that you implemented? I'll need some time to digest these 5k lines. :)

@atris
Copy link
Contributor Author

atris commented Apr 23, 2025

@jpountz Thanks! Here is the paper: https://arxiv.org/abs/2104.08976

Note that the core inspiration of this PR's approach comes from the paper, but the implementation diverges in certain ways:

The paper talks about using bins mainly for hard cutoffs and filtering. The PR, instead, uses bins to compute adaptive score boosts, and wire that directly using the new Collector.

The PR also adds:
• index-time graph-based binning (exact + approximate). This adds minimal indexing latency but gives significant improvements in search time.
• bin-level boosting at segment level

So while the high-level idea overlaps, the implementation is more ambitious and also opens doors for implementations like bin skipping and multiple fields support and then use graph intersection or fusion to identify documents that are strongly connected across multiple semantic dimensions.

@jpountz
Copy link
Contributor

jpountz commented Apr 24, 2025

the implementation is more ambitious

I like ambition, but it also makes this change harder to review/integrate, especially with the high LOC count. I would suggest splitting this PR into multiple PRs, for instance:

  • First PR just works with indexes created with existing recursive graph bisection and uses basic heuristics to determine which ranges of doc IDs to score first (e.g. using impacts) to hopefully increase the top-k score quickly. No extra data stored in the index. All code under lucene/misc rather than core.
  • Another PR can introduce the SLA-based termination logic.
  • Another PR can introduce topical clustering mechanism of Kulkarni and Callan, that the paper suggests combining with recursive graph bisection.
  • Another PR can discuss augmenting index formats to enhance the range selection logic.

@atris
Copy link
Contributor Author

atris commented Apr 24, 2025

@jpountz thanks for looking!

Just for my understanding, the first PR should contain the index time binning logic that is currently in this PR, just with a simpler model of rank on bin boosting?

Copy link

github-actions bot commented May 9, 2025

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants