◇ /LANDING/INCIDENTS

War rooms with
auto-attached runbooks.

Detect, page, acknowledge, resolve. On-call rotations, escalation policies, and post-mortems that write themselves.

Auto-detectOn-call rotationsEscalation policiesPost-mortem draftsRunbook attach
01 · INCIDENT LIFECYCLE

Seven phases.
One timeline.

Every incident progresses through a structured lifecycle. Each phase is timestamped, attributed, and auditable.

incident #INC-2847 · API Gateway Outage
DETECTT+00:00
Quorum failure confirmed (6/8 probes)
PAGET+00:04
On-call @maya paged via PagerDuty + SMS
ACKT+00:47
@maya acknowledged, war room opened
INVESTIGATET+02:12
Runbook rb_db_failover auto-attached
MITIGATET+08:30
Traffic rerouted to standby region
RESOLVET+14:22
Root cause patched, monitors green
POST-MORTEMT++2h
Draft auto-generated from timeline
02 · ON-CALL ROTATIONS

Always someone
awake.

Define primary, secondary, weekend, and follow-the-sun rotations. Relays pages the right person based on time, timezone, and override schedules.

on-call schedule · platform-engineering
RotationResponderWindowChannel
Primary@mayaMon-Fri 09:00-18:00 PSTPagerDuty + SMS
Secondary@jordanMon-Fri 09:00-18:00 PSTSlack + Email
Weekend@alexSat-Sun 00:00-24:00 UTCPagerDuty + Phone
Follow-the-sun@kenji (NRT)Mon-Fri 18:00-02:00 PSTOpsgenie
Follow-the-sun@liam (AMS)Mon-Fri 02:00-09:00 PSTOpsgenie
03 · ESCALATION POLICIES

Unacked?
Escalate.

Define multi-tier escalation policies with configurable timeouts. If the primary does not acknowledge, escalation continues until someone does.

escalation policy · critical-infra
0mPage primary on-call@mayaPagerDuty
5mNo ack -- page secondary@jordanSMS + Slack
10mNo ack -- page team lead@chenPhone call
15mNo ack -- page VP Eng@sarahPhone + Email
30mStill open -- notify CTO@davidAll channels
04 · POST-MORTEM

Auto-drafted.
Human-reviewed.

Relays generates a post-mortem draft from the incident timeline, monitor data, and resolution notes. Your team reviews and publishes -- not starts from blank.

post-mortem · INC-2847 · auto-draftDRAFT
Summary
API Gateway returned 503 for 14m 22s. Root cause: connection pool exhaustion after failed deploy of v4.2.1.
Impact
2,847 requests failed (0.3% of daily volume). 12 customers reported issues. SLO burn rate: 4.2x.
Timeline
07:12 detect -- 07:12 page -- 07:13 ack -- 07:20 mitigate -- 07:26 resolve
Root Cause
Deploy v4.2.1 introduced a connection leak in the auth middleware. Pool saturated at 500 connections after 8 minutes.
Action Items
1. Add connection pool metrics alert (P1) 2. Canary deploy for auth changes (P2) 3. Circuit breaker on pool exhaustion (P1)

Resolve faster.
Learn more.