latency is not normal(ly distributed)

Latency for web services isn’t normally distributed (aka Normal Distribution ). This means you can’t apply Standard Deviation to it. It just doesn’t make sense.

I came across this for people setting a “this is weird” type anomaly detection alert because they weren’t able to take the time to think through the correct latency numbers for their service. No shade, it requires real and actual thought which isn’t always easy to come by.

Instead, you probably want to pick a latency threshold like 500ms @ p95 and alarm around the time spent at/above that threshold.

The notes of Justin Abrahms

Recently updated

latency is not normal(ly distributed)

incident severity

Standard Deviation

Explorer

latency is not normal(ly distributed)

Graph View

Backlinks