Skip to content

[spike] Benchmark replacing procfs.AllProcs() with io_uring-based Implementation #2186

Open
@sthaha

Description

@sthaha

What would you like to be added?

Kepler (reboot) currently uses procfs to read process stats. The current implementation of procfs.AllProcs() relies on traditional filesystem operations to gather process information, which can be slow and resource-intensive, especially on systems with a large number of processes. This issue proposes replacing procfs.AllProcs() with a more efficient implementation using the io_uring library to leverage asynchronous I/O operations for improved performance.

Goals

Reduce the latency and CPU usage of process information retrieval.
Maintain compatibility with existing procfs.AllProcs() API signatures and functionality.
Ensure robustness and error handling comparable to the current implementation.

Proposed Solution

Integrate io_uring Library: Add a Go library that supports io_uring operations, such as github.com/axboe/liburing-go, https://github.com/pawelgaczynski/giouring?tab=readme-ov-file (or evaluate alternatives).

Refactor procfs.AllProcs(): Replace synchronous filesystem reads (e.g., /proc//*) with io_uring-based asynchronous I/O operations.
Optimize directory scanning and file reading using batched io_uring submissions.

Preserve API Contract:

Ensure the function returns the same data structure ([]Proc) and handles errors consistently.
Maintain support for all fields currently populated by procfs.AllProcs().

Add Benchmarks:

Implement performance benchmarks to compare the new io_uring-based implementation against the existing procfs.AllProcs().
Measure latency, CPU usage, and memory footprint under various system loads.

Handle Compatibility:

Add fallback to the original procfs.AllProcs() implementation for systems/kernels where io_uring is not supported (e.g., older Linux versions).
Include runtime checks for io_uring support.

Acceptance Criteria

  • The new implementation is at least 20% faster than the current procfs.AllProcs() in benchmark tests on a system with 1000+ processes.
  • No regressions in functionality (all existing fields in Proc struct are correctly populated).
  • Error handling is robust, with clear error messages for io_uring-related failures.
  • Fallback mechanism works correctly on systems without io_uring support.
  • Code passes all existing unit tests and new tests for the io_uring implementation.
  • Benchmarks are added to the repository to track performance.

Tasks

  • Research and select an appropriate io_uring Go library.
  • Prototype io_uring-based process scanning.
  • Refactor procfs.AllProcs() to use io_uring.
  • Write unit tests for the new implementation.
  • Implement benchmarks comparing old and new implementations.
  • Add fallback logic for non-io_uring systems.
  • Update documentation for procfs.AllProcs() to note the new implementation.
    Submit PR with changes for review.

Why is this needed?

Additional Context
Why io_uring?: io_uring provides a high-performance, asynchronous I/O interface that can significantly reduce system call overhead and improve scalability for filesystem operations.
Constraints: The solution must support Linux kernel versions 5.1 and above (minimum for io_uring support).
Related Issues: Link to any related performance issues or discussions (e.g., #123 if applicable).

Credits

Thanks to @cmcantalupo from (GeoPM) for the pointer to using io_uring

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions