|
| 1 | +# Kepler Exporter Design |
| 2 | + |
| 3 | +This document provides a simple overview of the design of the Kepler |
| 4 | +Exporter, a tool for measuring and reporting the power utilization of a |
| 5 | +system and its running processes. |
| 6 | + |
| 7 | +> **_NOTE_**: This guide it not intended to replace the detailed Kepler |
| 8 | +[overview](https://sustainable-computing.io/usage/deep_dive/) but rather provide |
| 9 | +a starting point for a new contributor to this repo. |
| 10 | + |
| 11 | +Kepler monitors system processes by tracking task switches in the Kernel and |
| 12 | +logging stats. These stats are then used to estimate the power usage of the system |
| 13 | +and its associated processes. Kepler collects power data using: |
| 14 | + |
| 15 | +- EBPF/Hardware Counters |
| 16 | +- Real-time Component Power Meters (e.g., RAPL) |
| 17 | +- Platform Power Meters (ACPI/IPMI, etc.) |
| 18 | + |
| 19 | +Below is a high-level representation of the Kepler Exporter components: |
| 20 | + |
| 21 | + |
| 22 | + |
| 23 | +Metrics in Kepler can be broken down into 2 categories: |
| 24 | + |
| 25 | +1. Resource metrics. |
| 26 | +1. Energy metrics. |
| 27 | + |
| 28 | +## Exporter Introduction |
| 29 | + |
| 30 | +Package: `cmd/exporter` |
| 31 | + |
| 32 | +The `Exporter` is the main Kepler program, performing the following operations: |
| 33 | + |
| 34 | +- Starting various power collection implementations needed to collect metrics from |
| 35 | +the platform and its components (DRAM, uncore, core, package). |
| 36 | +- Creating a BPF exporter. |
| 37 | +- Creating a collector manager to collect and expose the collected metrics. |
| 38 | +- Creating an HTTP endpoint that exposes metrics to Prometheus. |
| 39 | + |
| 40 | +Below is the startup sequence for the Kepler Exporter: |
| 41 | + |
| 42 | + |
| 43 | + |
| 44 | +> **_NOTE_**: Depending on the environment that Kepler was deployed in, |
| 45 | +the system power consumption metrics collection will vary (Baremetal/VM). |
| 46 | +For more details on this please see the [detailed documentation][1]. |
| 47 | + |
| 48 | +The following sections will cover the main functionality of the various Kepler |
| 49 | +components. |
| 50 | + |
| 51 | +## BPF Exporter |
| 52 | + |
| 53 | +Package: `pkg/bpf` |
| 54 | + |
| 55 | +The bpf exporter is created in the main Kepler program through |
| 56 | +the collector manager instantiation: |
| 57 | + |
| 58 | +```golang |
| 59 | + m := manager.New(bpfExporter) |
| 60 | +``` |
| 61 | + |
| 62 | +The role of the bpf exporter is to setup the bpf programs that collect |
| 63 | +the low level resource metrics associated with each process. It's |
| 64 | +functionality includes: |
| 65 | + |
| 66 | +1. Modifying the eBPF program sampling rate and number of CPUs. |
| 67 | +1. Loading the eBPF program. |
| 68 | +1. Attaching the `KeplerSchedSwitchTrace` eBPF program. |
| 69 | +1. Attaching the `KeplerIrqTrace` eBPF program if `config.ExposeIRQCounterMetrics` |
| 70 | + is enabled. |
| 71 | +1. Initializing `enabledSoftwareCounters`. |
| 72 | +1. Attaching the `KeplerReadPageTrace` eBPF program. |
| 73 | +1. Attaching the `KeplerWritePageTrace` eBPF program. |
| 74 | +1. If `config.ExposeHardwareCounterMetrics` is enabled it creates the following |
| 75 | + hardware events: |
| 76 | + 1. CpuInstructionsEventReader |
| 77 | + 1. CpuCyclesEventReader |
| 78 | + 1. CacheMissEventReader |
| 79 | + |
| 80 | + It also initializes `enabledHardwareCounters`. |
| 81 | + |
| 82 | +## CollectorManager |
| 83 | + |
| 84 | +The Kepler Exporter (cmd/exporter) creates an instance of the `CollectorManager`. |
| 85 | +The `CollectorManager` contains the following items: |
| 86 | + |
| 87 | +- `StatsCollector` that is responsible for collecting resource and energy consumption |
| 88 | + metrics. It uses the model implementation to estimate the process energy (total |
| 89 | + and per component) and node energy (using the resource stats). |
| 90 | +- `PrometheusCollector` which is a prometheus exporter that exposes the Kepler |
| 91 | + metrics on a Prometheus-friendly URL. |
| 92 | +- `Watcher` that watches the Kubernetes API server for pod events. |
| 93 | + |
| 94 | +```golang |
| 95 | +type CollectorManager struct { |
| 96 | + // StatsCollector is responsible for collecting resource and energy consumption metrics and calculating them when needed |
| 97 | + StatsCollector *collector.Collector |
| 98 | + |
| 99 | + // PrometheusCollector implements the external Collector interface provided by the Prometheus client |
| 100 | + PrometheusCollector *exporter.PrometheusExporter |
| 101 | + |
| 102 | + // Watcher register in the kubernetes API Server to watch for pod events to add or remove it from the ContainerStats map |
| 103 | + Watcher *kubernetes.ObjListWatcher |
| 104 | +} |
| 105 | +``` |
| 106 | + |
| 107 | +On initialization the `CollectorManager` also creates the power estimator models. |
| 108 | + |
| 109 | +### StatsCollector |
| 110 | + |
| 111 | +Package: `pkg/manager` |
| 112 | + |
| 113 | +`StatsCollector` is responsible for updating the following stats: |
| 114 | + |
| 115 | +- Node Stats |
| 116 | +- Process stats |
| 117 | +- Container stats |
| 118 | +- VM stats |
| 119 | + |
| 120 | +> **_NOTE_**: these stats are updated by various subsystems. |
| 121 | +
|
| 122 | +When the [collector manager](#collectormanager) is started by the Kepler Exporter, |
| 123 | +it kicks off an endless loop that updates the stats periodically. |
| 124 | + |
| 125 | +For process statistics, the Process collector uses a BPF process collector to |
| 126 | +retrieve information collected by the Kepler BPF programs and stored in BPF |
| 127 | +maps. This information includes the process/executable name, PID, cgroup, |
| 128 | +CPU cycles, CPU instructions, cache misses, and cache hits. The BPF process |
| 129 | +collector checks if these processes belong to a VM or container. It also |
| 130 | +aggregates all the Kernel processes' metrics (which have a cgroup |
| 131 | +ID of 1 and a PID of 1). If GPU statistics are available per process, |
| 132 | +the stats are extended to include GPU compute and memory utilization. |
| 133 | + |
| 134 | +Node energy stats are also retrieved (if collection is supported). These stats |
| 135 | +include the underlying component stats (core, uncore, dram, package, gpus, ...), |
| 136 | +as well as the overall platform stats (Idle + Dynamic energy), and the |
| 137 | +the process energy consumption. the process energy consumption is estimated |
| 138 | +using its resource utilization and the node components energy consumption. |
| 139 | + |
| 140 | +`StatsCollector` eventually passes all the metrics it collects to `pkg/model` through |
| 141 | +`UpdateProcessEnergy()`, which estimates the power consumption of each process. |
| 142 | + |
| 143 | +> **_NOTE_**: For details on the Ratio Power Model, refer to this [explanation][2]. |
| 144 | +
|
| 145 | +### PrometheusCollector |
| 146 | + |
| 147 | +Package: pkg/manager |
| 148 | + |
| 149 | +`PrometheusCollector` supports multiple collectors: container, node, VM and process. |
| 150 | +The various collectors implement the `prometheus.Collector` interface. Each of these |
| 151 | +collectors fetch the Kepler metrics and expose them on a Prometheus-friendly URL. |
| 152 | +In Kepler the stats structures are shared between the PrometheusCollector(s) and |
| 153 | +the StatsCollector(s). |
| 154 | + |
| 155 | +The `prometheus.Collector` interface defines the following functions: |
| 156 | + |
| 157 | +- `Describe` sends the super-set of all possible descriptors of metrics |
| 158 | + collected by this Collector to the provided channel and returns once |
| 159 | + the last descriptor has been sent. |
| 160 | +- `Collect` is called by the Prometheus registry when collecting metrics. |
| 161 | + The implementation sends each collected metric via the provided channel |
| 162 | + and returns once the last metric has been sent. |
| 163 | + |
| 164 | +## Power Model Estimator |
| 165 | + |
| 166 | +Estimates power usage from the low level resource stats. |
| 167 | + |
| 168 | +> **_NOTE_** for more details please see [3] |
| 169 | +
|
| 170 | +[1]: https://sustainable-computing.io/usage/deep_dive/#collecting-system-power-consumption-vms-versus-bms |
| 171 | +[2]: https://sustainable-computing.io/usage/deep_dive/#ratio-power-model-explained |
| 172 | +[3]: https://sustainable-computing.io/kepler_model_server/power_estimation/ |
0 commit comments