Skip to content

Speed up GroupingSelectors when using a descending sort on a high cardinality field #13664

@HoustonPutman

Description

@HoustonPutman

Description

I've ran a benchmark (using Solr admittedly, not Lucene), that compares the speed of various sorted queries. The fields mentioned in the benchmark are the fields that were sorted on.

Benchmark parameters:

  • 1M documents
    • High cardinality fields are unique values per document (an incrementing counter)
    • Low cardinality fields had 500 unique values
    • All fields were single-valued
  • The different fields were tested: LongPointField, StrField and TrieLongField (No longer in lucene)
    • All sort fields were tested using both DocValues and Uninversion.
  • Queries
    • 10 results were requested, for both grouped and non-grouped
    • Each field type, docValues/Uninversion combination were ran in 8 configurations doing a matrix of:
      • Grouping and Non-grouping
      • Sort asc and desc
      • High Cardinality values and Low Cardinality values
    • The grouping:
      • All queries were grouped by a the same string field (250,000 unique values, docValues enabled)
      • The sort field was used both for both the group sorting and the document sorting within the groups
Screenshot 2024-08-16 at 12 12 18 PM

Overall there are some interesting findings:

  • For un-grouped queries (sorting by a high cardinality field), ascending sorts were 5x faster than descending sorts. For grouped queries, they were 40x faster. That is ultimately what this issue is meant to address, so I highlighted it in the chart above.
    • Note: Low cardinality fields had no difference in speed
  • DocValues and Uninverted fields has similar sorting performance for grouped queries. (This is likely related to Sorting by DocValues while grouping is slower than old good FieldCache [LUCENE-9328] #10368)

I know that since this benchmark is using Solr, it's only so useful here. So I can utilize lucenebench to try to recreate this as well, if that would help.

I also have the flame graphs for these benchmarks, which aren't as useful as a screenshot but I will provide one anyways (LongPointField, docValues, highCardinality, grouped, sorted descending):
Screenshot 2024-08-16 at 12 05 02 PM

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions