Description
What would you like to be added?
Kepler (reboot) currently uses procfs to read process stats. The current implementation of procfs.AllProcs() relies on traditional filesystem operations to gather process information, which can be slow and resource-intensive, especially on systems with a large number of processes. This issue proposes replacing procfs.AllProcs() with a more efficient implementation using the io_uring library to leverage asynchronous I/O operations for improved performance.
Goals
Reduce the latency and CPU usage of process information retrieval.
Maintain compatibility with existing procfs.AllProcs() API signatures and functionality.
Ensure robustness and error handling comparable to the current implementation.
Proposed Solution
Integrate io_uring Library: Add a Go library that supports io_uring operations, such as github.com/axboe/liburing-go, https://github.com/pawelgaczynski/giouring?tab=readme-ov-file (or evaluate alternatives).
Refactor procfs.AllProcs(): Replace synchronous filesystem reads (e.g., /proc//*) with io_uring-based asynchronous I/O operations.
Optimize directory scanning and file reading using batched io_uring submissions.
Preserve API Contract:
Ensure the function returns the same data structure ([]Proc) and handles errors consistently.
Maintain support for all fields currently populated by procfs.AllProcs().
Add Benchmarks:
Implement performance benchmarks to compare the new io_uring-based implementation against the existing procfs.AllProcs().
Measure latency, CPU usage, and memory footprint under various system loads.
Handle Compatibility:
Add fallback to the original procfs.AllProcs() implementation for systems/kernels where io_uring is not supported (e.g., older Linux versions).
Include runtime checks for io_uring support.
Acceptance Criteria
- The new implementation is at least 20% faster than the current procfs.AllProcs() in benchmark tests on a system with 1000+ processes.
- No regressions in functionality (all existing fields in Proc struct are correctly populated).
- Error handling is robust, with clear error messages for io_uring-related failures.
- Fallback mechanism works correctly on systems without io_uring support.
- Code passes all existing unit tests and new tests for the io_uring implementation.
- Benchmarks are added to the repository to track performance.
Tasks
- Research and select an appropriate io_uring Go library.
- Prototype io_uring-based process scanning.
- Refactor procfs.AllProcs() to use io_uring.
- Write unit tests for the new implementation.
- Implement benchmarks comparing old and new implementations.
- Add fallback logic for non-io_uring systems.
- Update documentation for procfs.AllProcs() to note the new implementation.
Submit PR with changes for review.
Why is this needed?
Additional Context
Why io_uring?: io_uring provides a high-performance, asynchronous I/O interface that can significantly reduce system call overhead and improve scalability for filesystem operations.
Constraints: The solution must support Linux kernel versions 5.1 and above (minimum for io_uring support).
Related Issues: Link to any related performance issues or discussions (e.g., #123 if applicable).
Credits
Thanks to @cmcantalupo from (GeoPM) for the pointer to using io_uring