SREcon by Vijay Samuel

1.25M endpoints scaped 41M ingest / second 2B active time series 8k QPS 1y of raw metric retention

Legacy:

  • modified version of OpenTSDB
  • HBase scaling/availability issues
  • Custom protocols / DR inefficiences / data inconsistency
  • excess tribal knowledge

Goals:

  • better scaling profile
  • community focused ingest (standard protocols)
  • richer ad-hoc query support (promql)
  • support cloud native monitoring

Prom issues:

  • no HTTP Push support (at the time)
  • Federation of 100s of prom instances is tedius
  • scaling is hard (vertical or manual sharding)

Initial goals:

  • keep it simple. Fan-out writes as necessary
  • Any duplication is dealt with at query time
  • Use prom tsdb as-is
  • simple tenanting based on namespaces
  • Integrate tightly with k8s: discover shards using the kube api & operator to manage clusters
  • Support both grafana & promql
  • support alerting & recording
  • support 2x the cardinality limits of the legacy system

Attempt 1: Centralized approach

  • Can’t keep w/ new usecases (100% yoy growth for many years)
  • tenant based query routing is hard (requires namespace keyword always)
  • hard to support high cardinality (b/c of horizontal scaling limits and promql single threaddedness)
  • Too many dependencies (load balancers, cross-region networks, etc)

Attempt 2: introduce aggregators

  • drop labels before storage (e.g. don’t pre-aggregate)
  • 60% cpu/80% memory savings

5 people wrote a multi-region version of Monarch POC in 5 days

  • don’t have the good tech google has
  • spec management is hard
  • whitepaper wasn’t fully described

Attributes of “planet scale”

  • data needs to be close to the source (deployed per region/az/k8s cluster)
  • rollups get bubbled up
    • raw on leaf,
    • zone has region data,
    • root = high-level rollups
  • queries are federated
    • field-hint index was helpful (monarch paper, “bloom filter”-esque)
  • “control loops watch over git and delivery the changes into the cluster”

Indexers push data up through levels. Queriers look downwards through levels.

Query routing is dependent upon the index data being read.