https://github.com/sustainable-computing-io/kepler

They’re trying to figure out the way to attribute power to individual containers.

They measure execution time at the kernel level.

They have to measure instructions the CPU executes to get the power, but the value of the power isn’t consistent due to the things the CPU/kernel does to save power. (Governor’d down)

Use software counters to measure power consumption. Use ML models for approximation of ^^ Use hardware resource utilization to attribute power consumption by process/container/pods.

They made KEPLER (k8s based efficient power level exporter)

It uses eBPF to measure runtimes.

ebpf to get info, feeds that into an aggregation mechanism. They then model the server’s energy usage based on (cpu, memory, gpu, cgroupfs, hwmon), then they export this info into prometheus.

They call the exported data “energy stats” as “metrics counters”.

Kepler as a process emits data into prometheus, but they don’t necessarily know how to do power estimates. They download that from a “model server”. That model server also consumes the prometheus data to update itself.

Q: How do you bootstrap the power metrics given you need to have the model before you know what to emit?

  • they have a janky model to start with.

Q: How much energy overhead does kepler add? :)

  • They actually output a dashboard about that. ~0.02 cores, 150mb, 100 packets / second (I think?)