Description
We should support OpenMetrics. I've started this work in a temporary branch, will start opening PRs.
But this seems to be the master list of things to do.
This is a breaking change that we should release as 3.0.
We should have previous minor versions with deprecation warnings about the things we'll be changing.
Todo
-
Basic Formatting hygiene
- Format negotiation with OpenMetrics Accept Header
- Add tests to the OpenMetrics formatter I wrote
- #EOF
- #UNIT including validation
- forcing
_total
on counters. This is a breaking change, released in 3 stages- Warn on counters that are not suffixed with
_total
- (next major version) Forcefully add
_total
to counters. Those that come with_total
suffix, remove it and warn, in preparation for next major. - (next major version) Error on counters suffixed
_total
- Warn on counters that are not suffixed with
-
Timestamps Output
- Merge PR Add :most_recent aggregation to DirectFileStore #172 which will help with outputting "updated_at" timestamps for time series
-
series_created
series
-
Exemplars
-
New Metric types
- StateSet
- Info
-
Document breaking changes (UPGRADING.md)
- If you called
Metric#values
orStores#all_values
, you'll get objects with timestamps instead of simply float - Naming convention things: You must pass the unit and not include it in the name. You must not add _total to your counters. We'll add both automatically
- If you called
Questions:
-
Is this the most up-to-date documentation we have?
-
The
_created
child series:
a. Is it only supported forcounter
,summary
andhistogram
?
b. Is this the time the metric got declared / added to the registry? Or is this the time the metric was first observed?
c. Do we report one_created
value per metric or per time series? In other words, if I have:http_server_requests_total{code="200"} 10.0 1590078339.214435 http_server_requests_total{code="400"} 1.0 1590078339.214435
Should I have:
http_server_requests_created{code="200"} 1590078336.554018 http_server_requests_created{code="400"} 1590078336.554018
Or just:
http_server_requests_created 1590078336.554018
And if the former, do we have a separate one per Histogram Bucket?Note: in the documentation, all examples happen to not have labels. Moreover, trying to figure this out by experimentation, I think there may be a problem with the Python parser. For Counters, it only accepts the second example (no labels for
_created
). For Histograms, it only accepts "all the labels minusle
". Both kind of make sense independently, but they seem like they should be mutually exclusive. -
Exemplars
a. Are they only available forcounter
andhistogram
?
b. We are reporting an exemplar, of the many the app could have reported. Does it have to be the latest? Can it be an "unstable" pick between the many? (the reason I ask is that if different scrapes are ok to report different exemplars, even if the metric hasn't changed, then we don't need to share those exemplars between processes, and we can just keep them in the Metric object itself, instead of persisting to the files)