-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Add implementation for exponential histogram merging and percentiles #131220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
7711093
to
a46914a
Compare
libs/exponential-histogram/README.md
Outdated
|
||
This offers significant benefits for distributions with fewer distinct values: | ||
If we have at least as many buckets as we have distinct values to store in the histogram, we can almost exactly represent this distribution. | ||
This can be achieved by simply maintaining the scale at the maximum supported value (so the buckets become the smallest). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two lines are tricky to grasp. Is there some external reference explaining the differences between dense and sparse storage, and how the number of buckets and scale are associated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there are any sources explaining this clearly. The DDSketch
and UDDSketch
papers use sparse representations in the papers, but practically all implementations are dense due to the O(1)
insertion time I assume.
I added a concrete example in 5e1ca08, hope this makes it clearer?
...ponential-histogram/src/main/java/org/elasticsearch/exponentialhistogram/DownscaleStats.java
Outdated
Show resolved
Hide resolved
...ponential-histogram/src/main/java/org/elasticsearch/exponentialhistogram/DownscaleStats.java
Outdated
Show resolved
Hide resolved
...ponential-histogram/src/main/java/org/elasticsearch/exponentialhistogram/DownscaleStats.java
Show resolved
Hide resolved
long negCount = getTotalCount(histo.negativeBuckets()); | ||
long posCount = getTotalCount(histo.positiveBuckets()); | ||
|
||
long totalCount = zeroCount + negCount + posCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's likely that we'll get many quantile calls for the same histogram, so we might as well store this number in the histogram.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this was part of the TODOs, but should be simple enough to do it now.
If we expect many calls for the same histogram, it could make sense to build an array representing the prefix-sum of the counts. This allows for percentile computation in O(log n) instead of O(n) at the cost of a little memory. But this is definitely something for later if we see the need for it.
...ogram/src/main/java/org/elasticsearch/exponentialhistogram/ExponentialHistogramQuantile.java
Outdated
Show resolved
Hide resolved
...ogram/src/main/java/org/elasticsearch/exponentialhistogram/ExponentialHistogramQuantile.java
Outdated
Show resolved
Hide resolved
...al-histogram/src/test/java/org/elasticsearch/exponentialhistogram/QuantileAccuracyTests.java
Outdated
Show resolved
Hide resolved
...al-histogram/src/test/java/org/elasticsearch/exponentialhistogram/QuantileAccuracyTests.java
Outdated
Show resolved
Hide resolved
...al-histogram/src/test/java/org/elasticsearch/exponentialhistogram/QuantileAccuracyTests.java
Outdated
Show resolved
Hide resolved
...al-histogram/src/test/java/org/elasticsearch/exponentialhistogram/QuantileAccuracyTests.java
Outdated
Show resolved
Hide resolved
...al-histogram/src/test/java/org/elasticsearch/exponentialhistogram/QuantileAccuracyTests.java
Outdated
Show resolved
Hide resolved
ba259bd
to
b308838
Compare
.../src/main/java/org/elasticsearch/exponentialhistogram/FixedCapacityExponentialHistogram.java
Show resolved
Hide resolved
Applied the changes of the first reviews and a bit of cleanup / refactoring. I also am hopeful that the tests do not introduce flakes, as I ran the randomized ones locally continously for quite a while. |
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
Still a WIP.
gradle check
?