# Runbook: Shadow Mode Rollout

- Canonical URL: https://docs.fairvisor.com/docs/cookbook/shadow-mode-rollout/
- Section: docs
- Last updated: n/a
> Phased rollout protocol for new policies using shadow-first enforcement.


## Purpose / When to use

Use this runbook whenever introducing a new limiter rule, changing descriptor keys, or tightening thresholds.

## Blast radius & risk level

- Risk level: medium
- Primary risk: false-positive rejects if promoted too early

## Signals / symptoms

- Planned policy change on production paths
- Unknown reject behavior for real traffic mix
- Need to validate descriptors and thresholds safely

## Detection queries

```promql
sum by (reason) (rate(fairvisor_decisions_total{action="allow"}[5m]))
rate(fairvisor_descriptor_missing_total[5m])
```

Log focus (would-reject analysis):

```bash
docker logs -f fairvisor | fairvisor logs --action=allow
```

## Triage checklist

1. Confirm selector scope (path/method) is intentionally narrow.
2. Confirm descriptor keys exist in traffic.
3. Confirm fallback behavior if rule `match` does not trigger.
4. Confirm runbook owner for promotion/rollback.

## Mitigation playbook

Phase 1: shadow deploy

```json
{
  "id": "candidate-policy",
  "spec": {
    "mode": "shadow",
    "selector": { "pathPrefix": "/api/" },
    "rules": [
      {
        "name": "candidate-limit",
        "limit_keys": ["header:x-api-key"],
        "algorithm": "token_bucket",
        "algorithm_config": { "tokens_per_second": 20, "burst": 40 }
      }
    ]
  }
}
```

Phase 2: observe

1. Run in shadow for at least one representative traffic cycle.
2. Measure would-reject concentration and descriptor misses.
3. Tune limits if unexpected patterns appear.

Phase 3: promote

1. Flip `mode` to `enforce`.
2. Increment `bundle_version`.
3. Deploy with active monitoring.

## Verification checklist

1. No sustained reject spike after promotion.
2. `Retry-After` distribution remains acceptable.
3. Core endpoint latency remains within SLO.
4. Descriptor missing metrics stay near baseline.

## Exit criteria

- Policy rejects align with intended abuse/surge controls
- No meaningful customer-facing regression
- Final config documented with rationale

## Rollback / recovery path

1. Revert policy to `shadow`, or restore previous bundle.
2. Validate reject-rate normalization.
3. Re-open tuning cycle before next enforce attempt.

## Post-incident notes

Record:

- shadow observation window duration
- top would-reject reasons
- threshold deltas between shadow and enforce

## Do not

- Do not skip shadow for new descriptor keys.
- Do not promote during unrelated production incident windows.
- Do not treat one quiet hour as sufficient baseline.

## Related docs

- [Shadow Mode](/docs/policy/shadow-mode/)
- [Runbook: Reject Spike](/docs/cookbook/reject-spike-runbook/)
- [Policy Lint Checklist](/docs/reference/policy-lint-checklist/)

