# Operator FAQ

- Canonical URL: https://docs.fairvisor.com/docs/reference/operator-faq/
- Section: docs
- Last updated: n/a
> Practical Q/A for platform and on-call teams operating Fairvisor Edge.


## Does Fairvisor require an external datastore?
No. Core limiter state is in nginx shared dict memory.

## What is the fastest health check set?
`/livez`, `/readyz`, and a sampled `/v1/decision` request.

## Why does /v1/decision return 503?
Usually no active bundle, invalid startup env, or runtime init failure.

## Can I run without SaaS?
Yes. Use standalone mode with `FAIRVISOR_CONFIG_FILE`.

## How do I force rollback quickly?
Re-apply known-good bundle and reload workers.

## How do I debug sudden 429 spikes?
Start with reject reason distribution, then map to policy/rule.

## How do I detect descriptor wiring bugs?
Watch `fairvisor_descriptor_missing_total` and verify gateway forwarding.

## Is Retry-After random?
It includes deterministic per-identity jitter to spread retries.

## Does reverse_proxy change decision logic?
No; logic is same. Context source differs from decision_service mode.

## Can I test limits safely in production?
Yes, via `mode: shadow` and phased promotion to enforce.

## Should I fail-open or fail-closed at gateway?
Route-dependent. High-risk endpoints usually fail-closed.

## What is the minimum alert set?
Reject spike, no_bundle_loaded, and SaaS disconnected (if SaaS mode).

## How do I size FAIRVISOR_SHARED_DICT_SIZE?
Start at `128m`, load test with real key cardinality, then increase.

## Why do counters reset after restart?
Shared dict is process-local memory; restart resets in-memory state.

## How do I trace one reject end-to-end?
Use `X-Fairvisor-Reason`, `Retry-After`, `RateLimit*`, and metrics. For policy/rule attribution, use debug session headers (`X-Fairvisor-Debug-*`).

## What if policy matches unexpectedly broad traffic?
Audit `pathPrefix` and method filters; add narrower selectors first.

## Is there a public admin policy API?
Not in current OSS runtime; policy delivery is file/SaaS oriented.

## Can I trust client-supplied X-Original-* headers?
Only from trusted gateway boundary; never from untrusted public clients.

## What should be in incident postmortem notes?
Trigger, impacted policy/rule, mitigation, rollback steps, and preventive checks.

## What is the latency impact of Fairvisor on my gateway?
Sub-millisecond per decision; state is in-process shared memory. No network hop per request. Actual overhead visible in `X-Fairvisor-Latency-Us` header.

## How do I configure different limits for free vs paid plans?
Use `jwt:plan` as the identity key and define separate rules per plan value, or use a single rule with per-key overrides if your plan maps to a descriptor value.

## Can I limit by both user and organization simultaneously?
Yes. Add two rules in the same policy: one keyed on `jwt:user_id` and one on `jwt:org_id`. Both counters are checked independently; either can reject the request.

## What is the difference between TPM and TPD limits?
TPM (tokens per minute) enforces instantaneous throughput — useful for protecting upstream capacity. TPD (tokens per day) enforces cumulative daily spend — useful for per-user or per-org quotas. Both can be configured on the same rule.

## How do I handle burst traffic without rejecting legitimate users?
Configure the token bucket `capacity` (burst allowance) above the steady-state `rate`. Bursts up to `capacity` are absorbed; sustained rate beyond `rate` is rejected. Shadow mode lets you tune these values against real traffic patterns before enforcing.

## How do I rotate API keys without downtime?
Update the policy bundle to accept the new key descriptor value (add it alongside the old one), push the bundle, wait for hot-reload, then decommission the old key. No restart required.

## How do I know which rule triggered a reject?
Check `X-Fairvisor-Reason` for the reject code, then use debug session headers (`X-Fairvisor-Debug-*`) to get rule and policy attribution. See [Decision Tracing](/docs/reference/decision-tracing/).

## What metrics should I alert on?
Minimum set: `fairvisor_requests_rejected_total` spike, `fairvisor_no_bundle_loaded` (non-zero), and `fairvisor_saas_disconnected_seconds` (if using SaaS). See [SLO & Alerting](/docs/reference/slo-alerting/).

## How do I test a new policy before deploying it?
Use `fairvisor policy test` CLI command with a fixture file, or deploy in `mode: shadow` to validate against real traffic. Shadow mode never blocks traffic but tracks would-reject decisions.

## What happens if a JWT is missing or malformed?
The JWT claim identity key falls back to a sentinel value (empty or `__invalid__`). Configure a catch-all rule or selector to handle unauthenticated traffic explicitly.

## How do I monitor Fairvisor in production?
Expose the `/metrics` endpoint to Prometheus. Key metrics: request rate, reject rate by reason, latency histogram, bundle reload events, and (in SaaS mode) heartbeat lag. See [Metrics reference](/docs/reference/metrics/).

## Can I apply different policies to different API paths?
Yes. Selectors support `pathPrefix` and HTTP method filters. Define separate policies per route and Fairvisor will apply only the matching rules per request.

## How do I prevent one tenant from starving others?
Use per-tenant identity keys (e.g. `jwt:org_id`). Each tenant gets independent counter buckets — one tenant hitting their limit does not affect others.

## What is the difference between warn, throttle, and reject actions?
`warn` — allows the request but sets headers indicating the budget is nearly exhausted. `throttle` — applies a response delay. `reject` — returns 429/503 immediately. Staged actions let you warn at 80%, throttle at 95%, and hard-reject at 100%.

## How do I audit which policy was active at the time of an incident?
Bundle version and load timestamp are included in the `/v1/status` response. SaaS mode records a full policy change log with timestamps and actor.

## Can Fairvisor enforce limits on non-LLM APIs?
Yes. The token bucket and cost-based budget limiters work on any HTTP API. LLM-specific limiters (TPM/TPD, token refund) apply only when configured on LLM endpoints, but rate and budget controls apply universally.

## How do I set a global emergency kill switch for all traffic?
Add a kill switch rule with a selector that matches all paths (`pathPrefix: /`) and a descriptor that evaluates to true for the traffic you want to block. Push the bundle and it takes effect on the next request.

## What is `fairvisor_descriptor_missing_total` and why is it spiking?
This counter increments when Fairvisor cannot extract the configured identity key from a request (e.g. a missing JWT claim or header). A spike usually means a gateway misconfiguration or a policy change that references a descriptor not being forwarded. Check gateway auth header passthrough and selector configuration.

