Skip to content

refactor: Mutil Column Aggregate Function State Serialization Interface #18398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Jul 27, 2025

Conversation

forsaken628
Copy link
Collaborator

@forsaken628 forsaken628 commented Jul 21, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Serializing the Aggregate Function State into multiple columns instead of a single binary column helps to reduce the size of the serialization and reduce io.

break change: aggregate index must rebuild handly

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-refactor this PR changes the code base without new features or bugfix label Jul 21, 2025
@forsaken628 forsaken628 changed the title refactor: New Aggregate Function State Serialization Interface refactor: Mutil Column Aggregate Function State Serialization Interface Jul 23, 2025
@forsaken628 forsaken628 marked this pull request as ready for review July 23, 2025 12:42
@forsaken628 forsaken628 requested review from sundy-li and b41sh July 23, 2025 12:42
@sundy-li
Copy link
Member

Do you have any performance test results for the agg state serialized size and query latency?

@sundy-li sundy-li requested a review from zhang2014 July 24, 2025 03:43
@forsaken628 forsaken628 added the ci-cloud Build docker image for cloud test label Jul 24, 2025
@forsaken628
Copy link
Collaborator Author

Do you have any performance test results for the agg state serialized size and query latency?

Waiting to add. But optimization for each function is the next step, and the interface is refactored here first.

@zhang2014 zhang2014 added the ci-benchmark Benchmark: run all test label Jul 24, 2025
Copy link
Contributor

Docker Image for PR

  • tag: pr-18398-ccff018-1753333673

note: this image tag is only available for internal use.

@forsaken628 forsaken628 added the ci-benchmark Benchmark: run all test label Jul 25, 2025
Copy link
Contributor

Docker Image for PR

  • tag: pr-18398-00301bc-1753415560

note: this image tag is only available for internal use.

@forsaken628 forsaken628 force-pushed the aggr-ser branch 2 times, most recently from b3ea7a9 to 0078deb Compare July 25, 2025 11:56
@forsaken628 forsaken628 added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Jul 25, 2025
Copy link
Contributor

Docker Image for PR

  • tag: pr-18398-0467ff9-1753462537

note: this image tag is only available for internal use.

1 similar comment
@forsaken628 forsaken628 added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Jul 26, 2025
Copy link
Contributor

Docker Image for PR

  • tag: pr-18398-d586479-1753524149

note: this image tag is only available for internal use.

@forsaken628
Copy link
Collaborator Author

forsaken628 commented Jul 26, 2025

benchmark:

sql

explain analyze select sum(number+1) from numbers(100000000) group by number IGNORE_RESULT;

1.2.778
size: medium
duration: 2.2s

    └── AggregateFinal
        ├── output columns: [sum(number) (#1), count(number) (#2), numbers.number (#0)]
        ├── group by: [number]
        ├── aggregate functions: [sum(number), count()]
        ├── estimated rows: 100000000.00
        ├── cpu time: 32.391624724s
        ├── wait time: 3.605208609s
        ├── exchange bytes: 818.73 MiB
        ├── output rows: 95.7 million
        ├── output bytes: 2.15 GiB

pr
size: medium
duration: 1.8s

    └── AggregateFinal
        ├── output columns: [sum(number) (#1), count(number) (#2), numbers.number (#0)]
        ├── group by: [number]
        ├── aggregate functions: [sum(number), count()]
        ├── estimated rows: 100000000.00
        ├── cpu time: 26.94341643s
        ├── wait time: 3.161924207s
        ├── exchange bytes: 415.09 MiB
        ├── output rows: 89.46 million
        ├── output bytes: 2.01 GiB

@BohuTANG
Copy link
Member

👍
Let me benchmark this PR next

@BohuTANG
Copy link
Member

checksb test passed!

@BohuTANG
Copy link
Member

BohuTANG commented Jul 27, 2025

Databend Query Performance Comparison

Query: select sum(number+1),count(number) from numbers(10000000000) group by number IGNORE_RESULT;
Setup: 2 nodes (32C256G each), 10B numbers (74.51 GiB)

Results Summary

Metric This PR Main Difference
Execution Time 2m 50s 3m 3s 🟢 13s faster
Spilled Data 46.54 GiB 86.79 GiB 🟢 40.25 GiB less

Key Differences

Component This PR Main Difference
AggregateFinal - CPU Time 4470.8s 5206.9s 🟢 736.1s less
AggregateFinal - Wait Time 1245.5s 2082.9s 🟢 837.4s less
AggregateFinal - Exchange Bytes 20.09 GiB 37.38 GiB 🟢 17.29 GiB less
Spill Write Time 1729.7s 3105.9s 🟢 1376.2s less
Spill Read Time 1223.4s 1853.7s 🟢 630.3s less
Bytes Spilled (Write) 46.54 GiB 86.79 GiB 🟢 40.25 GiB less
Bytes Spilled (Read) 46.54 GiB 86.79 GiB 🟢 40.25 GiB less
EvalScalar - CPU Time 86.2s 77.7s 🔴 8.5s more

Execution Plans

This PR (2m 50s)

Exchange
├── output columns: [count(number) (#2), sum(number) + 1 * count(number) (#3)]
├── exchange type: Merge
└── EvalScalar
    ├── output columns: [count(number) (#2), sum(number) + 1 * count(number) (#3)]
    ├── expressions: [sum(number) (#1) + CAST(1 * count(number) (#2) AS UInt64 NULL)]
    ├── estimated rows: 10000000000.00
    ├── cpu time: 86.236814334s
    ├── output rows: 10 billion
    ├── output bytes: 150.18 GiB
    └── AggregateFinal
        ├── output columns: [sum(number) (#1), count(number) (#2), numbers.number (#0)]
        ├── group by: [number]
        ├── aggregate functions: [sum(number), count()]
        ├── estimated rows: 10000000000.00
        ├── cpu time: 4470.835086584s
        ├── wait time: 1245.548067478s
        ├── exchange bytes: 20.09 GiB
        ├── output rows: 10 billion
        ├── output bytes: 224.68 GiB
        ├── numbers remote spilled by write: 128
        ├── bytes remote spilled by write: 46.54 GiB
        ├── remote spilled time by write: 1729.697s
        ├── numbers remote spilled by read: 16384
        ├── bytes remote spilled by read: 46.54 GiB
        ├── remote spilled time by read: 1223.388s
        └── Exchange
            ├── output columns: [sum(number) (#1), count(number) (#2), numbers.number (#0)]
            ├── exchange type: Hash(0)
            └── AggregatePartial
                ├── group by: [number]
                ├── aggregate functions: [sum(number), count()]
                ├── estimated rows: 10000000000.00
                ├── cpu time: 1197.109558377s
                └── TableScan
                    ├── table: default.system.numbers
                    ├── output columns: [number (#0)]
                    ├── read rows: 10000000000
                    ├── read size: 74.51 GiB
                    ├── partitions total: 152588
                    ├── partitions scanned: 152588
                    ├── push downs: [filters: [], limit: NONE]
                    ├── estimated rows: 10000000000.00
                    ├── cpu time: 26.056168242s
                    ├── output rows: 10 billion
                    ├── output bytes: 74.51 GiB
                    └── bytes scanned: 74.51 GiB

Main (3m 3s)

Exchange
├── output columns: [count(number) (#2), sum(number) + 1 * count(number) (#3)]
├── exchange type: Merge
└── EvalScalar
    ├── output columns: [count(number) (#2), sum(number) + 1 * count(number) (#3)]
    ├── expressions: [sum(number) (#1) + CAST(1 * count(number) (#2) AS UInt64 NULL)]
    ├── estimated rows: 10000000000.00
    ├── cpu time: 77.694080059s
    ├── output rows: 10 billion
    ├── output bytes: 150.18 GiB
    └── AggregateFinal
        ├── output columns: [sum(number) (#1), count(number) (#2), numbers.number (#0)]
        ├── group by: [number]
        ├── aggregate functions: [sum(number), count()]
        ├── estimated rows: 10000000000.00
        ├── cpu time: 5206.939129139s
        ├── wait time: 2082.870426479s
        ├── exchange bytes: 37.38 GiB
        ├── output rows: 10 billion
        ├── output bytes: 224.68 GiB
        ├── numbers remote spilled by write: 128
        ├── bytes remote spilled by write: 86.79 GiB
        ├── remote spilled time by write: 3105.909s
        ├── numbers remote spilled by read: 16384
        ├── bytes remote spilled by read: 86.79 GiB
        ├── remote spilled time by read: 1853.687s
        └── Exchange
            ├── output columns: [sum(number) (#1), count(number) (#2), numbers.number (#0)]
            ├── exchange type: Hash(0)
            └── AggregatePartial
                ├── group by: [number]
                ├── aggregate functions: [sum(number), count()]
                ├── estimated rows: 10000000000.00
                ├── cpu time: 1172.800108227s
                └── TableScan
                    ├── table: default.system.numbers
                    ├── output columns: [number (#0)]
                    ├── read rows: 10000000000
                    ├── read size: 74.51 GiB
                    ├── partitions total: 152588
                    ├── partitions scanned: 152588
                    ├── push downs: [filters: [], limit: NONE]
                    ├── estimated rows: 10000000000.00
                    ├── cpu time: 25.544488872s
                    ├── output rows: 10 billion
                    ├── output bytes: 74.51 GiB
                    └── bytes scanned: 74.51 GiB

@forsaken628 forsaken628 merged commit cb63b57 into databendlabs:main Jul 27, 2025
86 checks passed
@forsaken628 forsaken628 deleted the aggr-ser branch July 27, 2025 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-benchmark Benchmark: run all test pr-refactor this PR changes the code base without new features or bugfix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants