# Runbook: Global Shadow + Kill-Switch Override

- Canonical URL: https://docs.fairvisor.com/docs/cookbook/global-shadow-bypass-runbook/
- Section: docs
- Last updated: n/a
> Break-glass runbook for opening traffic while preserving evaluation telemetry.


## Purpose / When to use

Use this runbook only during major incident windows when normal mitigation cannot restore service quickly enough.

## Blast radius & risk level

- Risk level: critical
- Primary risk: reducing active enforcement guardrails while incident mode is enabled

## Signals / symptoms

- Sustained customer-impacting rejects across broad traffic
- rollback path unavailable or too slow for immediate recovery
- collateral from current controls exceeds acceptable impact

## Detection queries

```promql
rate(fairvisor_decisions_total{action="reject"}[5m])
max_over_time(fairvisor_global_shadow_active[1m])
max_over_time(fairvisor_kill_switch_override_active[1m])
```

## Triage checklist

1. Confirm incident commander approval for break-glass mode.
2. Set short TTL (15-30 minutes) and named owner.
3. Confirm rollback bundle is prepared before activation.
4. Decide scenario:
   - Scenario A: global shadow only
   - Scenario B: global shadow + kill-switch override

## Mitigation playbook

Scenario A (preferred first):

```json
{
  "bundle_version": 1101,
  "global_shadow": {
    "enabled": true,
    "reason": "inc-2026-02-20-reject-spike",
    "expires_at": "2026-02-20T19:00:00Z"
  }
}
```

Expected effect:

- client traffic opens (`allow` path)
- rule evaluation continues in shadow semantics
- kill-switch remains active

Scenario B (true break-glass):

```json
{
  "bundle_version": 1102,
  "global_shadow": {
    "enabled": true,
    "reason": "inc-2026-02-20-global-open",
    "expires_at": "2026-02-20T19:15:00Z"
  },
  "kill_switch_override": {
    "enabled": true,
    "reason": "inc-2026-02-20-global-open",
    "expires_at": "2026-02-20T19:15:00Z"
  }
}
```

Expected effect:

- client traffic opens
- kill-switch pre-check is skipped
- telemetry/metrics still available for evaluation paths

## Verification checklist

1. Active bundle version confirmed.
2. Override metrics show expected activation state.
3. Reject rate drops to acceptable range.
4. Core endpoints and latency recover.

## Exit criteria

- root cause fixed
- safe normal policy prepared
- override TTL no longer needed

## Rollback / recovery path

Deploy bundle without override blocks (or with `enabled: false`) and increment version:

```json
{
  "bundle_version": 1103,
  "policies": [ ... ],
  "kill_switches": [ ... ]
}
```

Then confirm:

- `fairvisor_global_shadow_active == 0`
- `fairvisor_kill_switch_override_active == 0`
- reject behavior returns to normal policy semantics

## Post-incident notes

Record:

- reason for break-glass activation
- scenario used (A or B)
- TTL duration and owner
- residual risk while active
- follow-up controls to avoid repeat

## Do not

- Do not enable Scenario B before validating Scenario A is insufficient.
- Do not run overrides without explicit expiry and owner.
- Do not leave overrides active after incident closure.

## Related docs

- [Runbook: Reject Spike](/docs/cookbook/reject-spike-runbook/)
- [Runbook: Kill-Switch Incident Response](/docs/cookbook/kill-switch-incident-response/)
- [Runbook: Bad Bundle Rollback](/docs/cookbook/bad-bundle-rollback-runbook/)

