Speed up advancing within a sparse block in IndexedDISI. #14371

vsop-479 · 2025-03-19T12:18:27Z

Description

Similar to #13692.

gf2121 · 2025-03-20T10:08:48Z

Thanks @vsop-479 , have you been able to measure the performance of your patch?

I had similar idea recently. If you look at newest code in Lucene101PostingsReader, you may find we are using VectorMask to speed up this, that was what i had in mind - get a MemorySegment slice if it is not null, and play it with VectorMask.

lucene/lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java

Lines 781 to 787 in b756024

    
           for (; from + INT_SPECIES.length() < to; from += INT_SPECIES.length() + 1) { 
        
             if (buffer[from + INT_SPECIES.length()] >= target) { 
        
               IntVector vector = IntVector.fromArray(INT_SPECIES, buffer, from); 
        
               VectorMask<Integer> mask = vector.compare(VectorOperators.LT, target); 
        
               return from + mask.trueCount(); 
        
             } 
        
           }

vsop-479 · 2025-03-20T13:03:35Z

Thanks for your feedback @gf2121. This patch is still in process, and have not been measured.

If you look at newest code in Lucene101PostingsReader, you may find we are using VectorMask to speed up this

Thanks for reminding this, I noticed the vectorization approach when I find #13692 has been reverted. But I am not sure we can use vector for IndexedDISI.slice.

that was what i had in mind - get a MemorySegment slice if it is not null, and play it with VectorMask.

That would be nice, I just noticed ShortVector#fromMemorySegment.

vsop-479 · 2025-03-24T06:48:24Z

@gf2121
For what it's worth, I implemented this patch, and measured with luceneutil on wikimedium10m.

           TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
           HighTermDayOfYearSort      470.94     (11.4%)      441.83      (8.5%)   -6.2% ( -23% -   15%) 0.051
                 MedSloppyPhrase      183.03      (8.5%)      175.29      (6.1%)   -4.2% ( -17% -   11%) 0.070
                     LowSpanNear      411.55     (10.3%)      396.07      (9.5%)   -3.8% ( -21% -   17%) 0.230
           BrowseMonthSSDVFacets       27.44      (8.5%)       26.47      (5.4%)   -3.6% ( -16% -   11%) 0.113
                         MedTerm     1178.02      (9.8%)     1142.75      (6.9%)   -3.0% ( -17% -   15%) 0.261
                        HighTerm      929.38      (7.6%)      902.10      (7.9%)   -2.9% ( -17% -   13%) 0.232
                       MedPhrase      251.86      (7.8%)      245.03      (9.7%)   -2.7% ( -18% -   16%) 0.330
                    OrHighNotLow     1175.04      (9.4%)     1143.49     (11.8%)   -2.7% ( -21% -   20%) 0.425
                   OrNotHighHigh      877.79      (7.1%)      858.18      (7.3%)   -2.2% ( -15% -   13%) 0.326
                         LowTerm     1419.65      (9.5%)     1389.75     (10.4%)   -2.1% ( -20% -   19%) 0.504
             MedIntervalsOrdered       59.98      (8.2%)       58.76      (7.2%)   -2.0% ( -16% -   14%) 0.403
                      TermDTSort      454.05     (11.0%)      444.95     (12.6%)   -2.0% ( -23% -   24%) 0.593
                          Fuzzy1      130.17      (3.4%)      128.06      (4.1%)   -1.6% (  -8% -    6%) 0.172
                    OrNotHighMed     1033.67     (10.7%)     1017.41     (10.3%)   -1.6% ( -20% -   21%) 0.636
                     AndHighHigh      388.27      (8.1%)      382.25      (7.5%)   -1.5% ( -15% -   15%) 0.530
                HighSloppyPhrase      132.66      (5.2%)      130.62      (6.4%)   -1.5% ( -12% -   10%) 0.402
             LowIntervalsOrdered      637.45      (6.9%)      627.85      (6.9%)   -1.5% ( -14% -   13%) 0.488
                          IntNRQ      438.20     (11.4%)      431.78     (10.2%)   -1.5% ( -20% -   22%) 0.668
            HighTermTitleBDVSort      123.85      (8.9%)      122.20      (8.7%)   -1.3% ( -17% -   17%) 0.633
       BrowseDayOfYearSSDVFacets       26.84     (10.1%)       26.50      (9.3%)   -1.3% ( -18% -   20%) 0.683
                          Fuzzy2      102.89      (2.1%)      101.68      (3.6%)   -1.2% (  -6% -    4%) 0.204
                 LowSloppyPhrase      683.96     (10.5%)      676.73     (12.0%)   -1.1% ( -21% -   23%) 0.766
                         Respell      134.93      (2.2%)      133.51      (2.6%)   -1.0% (  -5% -    3%) 0.167
        AndHighHighDayTaxoFacets       67.12      (4.2%)       66.44      (4.6%)   -1.0% (  -9% -    8%) 0.471
                        Wildcard      379.66      (8.4%)      376.74      (9.8%)   -0.8% ( -17% -   19%) 0.790
                     MedSpanNear      227.24      (4.8%)      225.86      (5.6%)   -0.6% ( -10% -   10%) 0.715
               HighTermMonthSort     1811.33     (12.1%)     1803.90     (13.4%)   -0.4% ( -23% -   28%) 0.919
                      AndHighMed      820.66      (8.4%)      817.55      (9.8%)   -0.4% ( -17% -   19%) 0.895
                   OrHighNotHigh      757.11      (8.3%)      754.48      (7.8%)   -0.3% ( -15% -   17%) 0.892
                    OrNotHighLow     1757.90     (11.0%)     1754.29      (8.9%)   -0.2% ( -18% -   22%) 0.948
            MedTermDayTaxoFacets      148.12      (4.6%)      148.00      (5.1%)   -0.1% (  -9% -   10%) 0.956
                        PKLookup      293.33      (4.4%)      293.35      (2.9%)    0.0% (  -7% -    7%) 0.995
                         Prefix3      707.43     (14.9%)      708.70     (12.0%)    0.2% ( -23% -   31%) 0.967
                    HighSpanNear       75.55      (3.7%)       75.81      (4.5%)    0.3% (  -7% -    8%) 0.793
         AndHighMedDayTaxoFacets      217.31      (6.1%)      218.27      (5.5%)    0.4% ( -10% -   12%) 0.808
          OrHighMedDayTaxoFacets       47.73      (4.8%)       47.97      (3.9%)    0.5% (  -7% -    9%) 0.712
                           range     6721.40     (10.5%)     6784.00      (9.0%)    0.9% ( -16% -   22%) 0.763
               HighTermTitleSort      138.34      (5.1%)      139.89      (4.1%)    1.1% (  -7% -   10%) 0.443
                       OrHighMed      677.06     (14.7%)      687.35     (10.5%)    1.5% ( -20% -   31%) 0.707
            HighIntervalsOrdered      119.78     (10.0%)      121.82      (8.4%)    1.7% ( -15% -   22%) 0.562
                       OrHighLow     1099.81      (6.6%)     1118.62      (9.2%)    1.7% ( -13% -   18%) 0.499
                      HighPhrase       20.60      (4.2%)       20.98      (5.4%)    1.8% (  -7% -   11%) 0.232
                    OrHighNotMed      931.08      (6.7%)      950.36      (9.9%)    2.1% ( -13% -   19%) 0.438
     BrowseRandomLabelSSDVFacets       20.46      (4.4%)       20.95      (8.6%)    2.4% ( -10% -   16%) 0.270
                      OrHighHigh      263.33     (12.8%)      272.70     (13.9%)    3.6% ( -20% -   34%) 0.398
                      AndHighLow     2129.61     (16.0%)     2216.44     (11.3%)    4.1% ( -19% -   37%) 0.351
                       LowPhrase      218.81      (7.0%)      227.97     (11.5%)    4.2% ( -13% -   24%) 0.164
       BrowseDayOfYearTaxoFacets       35.03     (35.9%)       36.90     (34.8%)    5.4% ( -48% -  118%) 0.632
            BrowseDateTaxoFacets       34.70     (36.1%)       36.80     (35.2%)    6.0% ( -47% -  120%) 0.592
           BrowseMonthTaxoFacets       39.54     (33.0%)       42.11     (28.8%)    6.5% ( -41% -  101%) 0.508
            BrowseDateSSDVFacets        5.19     (20.1%)        5.59     (19.9%)    7.7% ( -26% -   59%) 0.222
     BrowseRandomLabelTaxoFacets       36.35     (51.7%)       39.68     (53.7%)    9.2% ( -63% -  236%) 0.583

vsop-479 · 2025-03-24T06:54:00Z

Maybe I should measure it with DVBench in luceneutil, or add a bench in jmh.

gf2121 · 2025-03-24T07:37:55Z

Thanks for running benchmark @vsop-479 !

Maybe I should measure it with DVBench in luceneutil, or add a bench in jmh.

Yes, you are right, a bench in jmh will be great. We have not had tasks measuring IndexedDISI in luceneutil so far.

vsop-479 · 2025-03-26T03:35:00Z

a bench in jmh will be great.

I measured it with AdvanceSparseDISIBenchmark:

Benchmark                                             Mode  Cnt    Score   Error   Units
AdvanceSparseDISIBenchmark.advance                   thrpt   15  669.502 ± 4.531  ops/ms
AdvanceSparseDISIBenchmark.advanceBinarySearch       thrpt   15  358.620 ± 1.102  ops/ms
AdvanceSparseDISIBenchmark.advanceExact              thrpt   15  752.444 ± 1.810  ops/ms
AdvanceSparseDISIBenchmark.advanceExactBinarySearch  thrpt   15  547.818 ± 2.278  ops/ms

Even I set target docs's inteval to 10, there is still a big performance degrade. Maybe I use too many disi.slice.seek in this binary search version.

you may find we are using VectorMask to speed up this, that was what i had in mind - get a MemorySegment slice if it is not null, and play it with VectorMask.

I will try VectorMask when I get a chance.

vsop-479 · 2025-03-28T09:36:33Z

@gf2121
I implemented VectorMask approach. There is still a slowdown. I think the reason is my laptop (Mac M2).

Benchmark                                  Mode  Cnt    Score    Error   Units
AdvanceSparseDISIBenchmark.advance        thrpt   15  654.472 ±  2.349  ops/ms
AdvanceSparseDISIBenchmark.advanceVector  thrpt   15  498.590 ± 66.751  ops/ms

vsop-479 · 2025-03-29T02:54:52Z

@gf2121
I also implemented advanceExact with vector, there is still a slowdown. I will try to measure it on other laptop (with more vector lanes).

Benchmark                                       Mode  Cnt    Score    Error   Units
AdvanceSparseDISIBenchmark.advanceExact        thrpt   15  727.403 ± 33.060  ops/ms
AdvanceSparseDISIBenchmark.advanceExactVector  thrpt   15  520.427 ±  0.868  ops/ms

vsop-479 · 2025-04-01T08:32:15Z

Adjust ENABLE_ADVANCE_WITHIN_BLOCK_VECTOR_OPTO to 16 (at least 16 lanes, such as: AVX, AVX-512).

vsop-479 · 2025-04-03T06:31:51Z

@gf2121 , I measured it on a linux server (uses preferredBitSize=512; FMA enabled), there is still a massive slowndown. I will dig more ...

Benchmark                                       Mode  Cnt    Score   Error   Units
AdvanceSparseDISIBenchmark.advance             thrpt   15  386.100 ± 0.162  ops/ms
AdvanceSparseDISIBenchmark.advanceVector       thrpt   15  162.697 ± 0.581  ops/ms
AdvanceSparseDISIBenchmark.advanceExact        thrpt   15  437.998 ± 0.644  ops/ms
AdvanceSparseDISIBenchmark.advanceExactVector  thrpt   15  271.823 ± 0.625  ops/ms

github-actions · 2025-04-18T00:24:47Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

vsop-479 added 2 commits March 18, 2025 10:05

Typo: Tableentry -> TableEntry

8398fa8

Temp.

2cbc694

github-project-automation bot added this to OpenSearch Lucene & Core Performance Tracking Mar 19, 2025

github-project-automation bot moved this to Open in OpenSearch Lucene & Core Performance Tracking Mar 19, 2025

vsop-479 marked this pull request as draft March 19, 2025 12:18

github-actions bot added the module:core/codecs label Mar 19, 2025

Temp.

d2783bc

vsop-479 added 9 commits March 21, 2025 09:41

Since we have read last doc, compare it first.

adcebaf

Compare remaining.

5e09d61

Tmp.

f2a1f57

Print disi.

f33f0b3

Add Short.toUnsignedInt.

e0f4d27

Remove System#out#println.

84e5615

Implement binary search for IndexedDISI$SPARSE#advanceWithinBlock.

0bafac0

Cast long.

a0772d5

Change BINARY_SEARCH_WINDOW_SIZE from static to static final.

44a4db9

vsop-479 marked this pull request as ready for review March 24, 2025 06:57

Add benchmark.

8b65647

github-actions bot added the module:core/search label Mar 26, 2025

vsop-479 added 5 commits March 28, 2025 17:01

Implemented advance sparse DISI with vector.

8f30a41

Revert to resolve conflict.

6a5f369

Merge branch 'main' into binary_search_IndexedDISI

c0a49e3

Implemented VectorUtil#advanceWithinBlock.

e27cbf0

Tidy.

dbb6b9e

vsop-479 added 2 commits March 28, 2025 20:14

Change advanceExactWithinBlockBinarySearch to method.

3899413

Implemented advanceExact with vector.

4587a47

vsop-479 added 2 commits April 1, 2025 14:12

Adjust ENABLE_ADVANCE_WITHIN_BLOCK_VECTOR_OPTO to 16.

5b7731f

Tidy.

d036da9

github-actions bot added the Stale label Apr 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up advancing within a sparse block in IndexedDISI. #14371

Speed up advancing within a sparse block in IndexedDISI. #14371

Uh oh!

vsop-479 commented Mar 19, 2025

Uh oh!

gf2121 commented Mar 20, 2025

Uh oh!

vsop-479 commented Mar 20, 2025

Uh oh!

vsop-479 commented Mar 24, 2025

Uh oh!

vsop-479 commented Mar 24, 2025

Uh oh!

gf2121 commented Mar 24, 2025

Uh oh!

vsop-479 commented Mar 26, 2025 •

edited

Loading

Uh oh!

vsop-479 commented Mar 28, 2025

Uh oh!

vsop-479 commented Mar 29, 2025

Uh oh!

vsop-479 commented Apr 1, 2025

Uh oh!

vsop-479 commented Apr 3, 2025

Uh oh!

github-actions bot commented Apr 18, 2025

Uh oh!

Uh oh!

Speed up advancing within a sparse block in IndexedDISI. #14371

Are you sure you want to change the base?

Speed up advancing within a sparse block in IndexedDISI. #14371

Uh oh!

Conversation

vsop-479 commented Mar 19, 2025

Description

Uh oh!

gf2121 commented Mar 20, 2025

Uh oh!

vsop-479 commented Mar 20, 2025

Uh oh!

vsop-479 commented Mar 24, 2025

Uh oh!

vsop-479 commented Mar 24, 2025

Uh oh!

gf2121 commented Mar 24, 2025

Uh oh!

vsop-479 commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vsop-479 commented Mar 28, 2025

Uh oh!

vsop-479 commented Mar 29, 2025

Uh oh!

vsop-479 commented Apr 1, 2025

Uh oh!

vsop-479 commented Apr 3, 2025

Uh oh!

github-actions bot commented Apr 18, 2025

Uh oh!

Uh oh!

vsop-479 commented Mar 26, 2025 •

edited

Loading