Open
Description
At GitLab, we've been running prometheus-client-mmap to record and report Prometheus metrics. It's an early fork of this project and uses a C extension with mmap()
to optimize both the read and write paths.
We'd like to switch to DirectFileStore
, but we see a number of issues:
- File descriptor usage: I don't believe
DirectFileStore
recording one metric per file is going to work at scale. With thousands of metrics, that's a lot of file descriptors that may have to be opened to read and write metrics. Inprometheus-client-mmap
we use one file metric per metric type (counter, histogram, gauge). I see there was an attempt to use one file per process in One file per process instead of per metric/process for DirectFileStore [DOESNT WORK] #161. - Read performance (related to CPU usage growing over time with DirectFileStore #232,
DirectFileStore
creates too many files for long running applications #143, slow response on metrics http endpoint #194): Aggregating thousands of metrics in Ruby is pretty CPU and memory intensive. We found that this was even difficult to do efficiently with a Go exporter that reads theprometheus-client-mmap
metrics since garbage collection becomes an issue due to memory allocations. We have a prototype Rust port that handles reading of metrics that is faster than the C implementation. - Aggregating metrics in a separate process: The metrics stored by
prometheus-client-mmap
can be aggregated by a separate process since the metric type is in the filename. WithDirectFileStore
, I believe the types are registered in the registry, so there's no way an outside process can determine the metric type just by scanning the.bin
files.
I would like to propose a path forward:
- Switch
DirectFileStore
to use file per process, or one file per metric type. The latter is simpler from our experience. If we do the former, we'd probably want to encode the metric type in the file. - Add an optional Rust extension for reading metrics generated by
DirectFileStore
. We can easily adapt the work in our Rust port forDirectFileStore
.
A side point: I believe the metrics are always written in the native endian format. I propose that we enforce little endian (which is what x86 and ARM64 use) to avoid cross-platform confusion.
@dmagliola What do you think?
Metadata
Metadata
Assignees
Labels
No labels