# Runbook: SaaS Disconnect

- Canonical URL: https://docs.fairvisor.com/docs/cookbook/saas-disconnect-runbook/
- Section: docs
- Last updated: n/a
> Operational procedure when an edge loses SaaS connectivity while continuing local enforcement.


## Purpose / When to use

Use this runbook when edge reports SaaS disconnected (`fairvisor_saas_reachable == 0`) and config updates are no longer flowing.

## Blast radius & risk level

- Risk level: medium
- Primary risk: stale policy state and delayed event delivery while traffic still passes through enforcement

## Signals / symptoms

- `fairvisor_saas_reachable == 0`
- `fairvisor status` shows disconnected state
- config pull and heartbeat errors increase

## Detection queries

```promql
fairvisor_saas_reachable
sum by (operation, status) (rate(fairvisor_saas_calls_total[5m]))
rate(fairvisor_events_sent_total{status="error"}[5m])
```

CLI checks:

```bash
fairvisor status --edge-url=http://localhost:8080
curl -sS http://localhost:8080/readyz
curl -sS http://localhost:8080/metrics | rg 'fairvisor_saas_reachable|fairvisor_saas_calls_total|fairvisor_events_sent_total'
```

## Triage checklist

1. Verify env values: `FAIRVISOR_SAAS_URL`, `FAIRVISOR_EDGE_ID`, `FAIRVISOR_EDGE_TOKEN`.
2. Validate outbound DNS/TLS/connectivity to SaaS endpoint.
3. Check token validity and recent rotation events.
4. Confirm edge still has active bundle loaded.
5. Assess event backlog risk window.

## Mitigation playbook

Safe-first path:

1. Keep edge serving with last known good bundle.
2. Fix network/auth root cause.
3. Confirm heartbeat and config pull resume.

If urgent policy change is needed during outage:

1. Switch to standalone known-good local bundle process.
2. Apply explicit rollback-safe file workflow.
3. Resume SaaS mode only after connectivity stabilizes.

## Verification checklist

1. `fairvisor_saas_reachable` stable at `1`.
2. SaaS call errors return to baseline.
3. Expected bundle version/hash observed.
4. Event send success recovers.

## Exit criteria

- Continuous connectivity for at least one polling/heartbeat cycle set
- No unresolved auth/network errors
- Pending config drift reconciled

## Rollback / recovery path

1. If reconnect fails repeatedly, maintain standalone control with explicit change freeze.
2. Keep incident bridge open until SaaS path stable.
3. Reconcile any temporary local changes back into control plane source of truth.

## Post-incident notes

Record:

- outage duration
- root cause class (network/auth/control-plane)
- config drift count while disconnected
- event backlog loss or recovery details

## Do not

- Do not redeploy frequent policy changes blindly while disconnected.
- Do not assume disconnected means enforcement is off.
- Do not rotate tokens without coordinated validation.

## Related docs

- [SaaS Connection Guide](/docs/deployment/saas/)
- [Runbook: Bad Bundle Rollback](/docs/cookbook/bad-bundle-rollback-runbook/)
- [Troubleshooting](/docs/reference/troubleshooting/)

