incident severity

Incidents can be different levels of bad. Here is a strawman of how I’ve thought of these at previous jobs (hat tip to Lily Rapaport for the language). This is different than alarm severity .

SEV0/Critical: The business is significantly impacted

Core revenue-generating functionality is down or severely degraded for most users. This is when you page leadership because someone needs to draft customer communications.

Examples:

Website completely down.
Checkout completely broken—nobody can buy anything.
Fulfillment operations offline—warehouses can’t ship orders.
Authentication offline—users can’t log in.

Rough thresholds: 10%+ of traffic seeing errors on critical paths, 500+ users affected, or complete outages of revenue-generating services.

SEV1/High: This is bad and needs fixing right now

Major functionality impaired, significant users affected. Revenue impact is measurable but not catastrophic. You need immediate response, but the entire business isn’t offline.

Examples:

Search returns no results for 30% of queries (browse still works).
Subscription processing hours behind.
Payment processing with 15% failure rate.
Mobile app crashes for 20% of users.

Rough thresholds: 5-10% of traffic affected on critical paths, 100-500 users, 20+ support tickets about the same problem.

SEV2/Medium: Annoying but not catastrophic

Moderate functionality impairment limited to specific features. The impact is noticeable, but workarounds exist and core shopping works.

Examples:

Personalized recommendations show generic results.
Product images load slowly.
Promo codes occasionally don’t apply (5% failure rate).
Email notifications delayed 2+ hours.

Rough thresholds: 1-5% of traffic affected, 20-100 users, 5-20 support tickets, non-critical feature degradation.

SEV3/Low: Minor issues and edge cases

Minor functionality impairment with limited or no user impact. Often affects internal tools or edge case bugs.

Examples:

CMS slow for content editors.
Inventory sync delayed 1-2 hours.
Admin dashboard graphs not loading.
Minor UI glitch on rarely-used feature.

Rough thresholds: Less than 1% of traffic, 1-20 users, 1-3 support tickets, or internal-only systems.

The notes of Justin Abrahms

Recently updated

latency is not normal(ly distributed)