# HA Installation

- Canonical URL: https://docs.fairvisor.com/docs/deployment/ha-installation/
- Section: docs
- Last updated: n/a
> High-availability deployment with two parallel Fairvisor installations.


This guide describes a practical HA pattern for Fairvisor when you run **two parallel installations** (A and B) behind one gateway.

## Key constraint

Fairvisor OSS keeps limiter state in local memory (`ngx.shared.dict`) per instance.

- distributed/global shared counters are **not** supported
- each installation enforces limits independently

How you configure limits depends entirely on how your load balancer routes traffic.

## Option 1: Random / round-robin balancing

Each request is routed to any instance regardless of the client key. On average, each instance sees ~50% of the traffic for any given key, so you divide every limit by 2.

```text
Client
  -> Gateway / LB (round-robin)
     -> Fairvisor A (50% limits)
     -> Fairvisor B (50% limits)
         -> Upstream
```

**How it works:** a client sending 120 req/s against a 100 req/s limit distributes ~60 req/s to each instance. Each instance enforces its 50 req/s limit and rejects 10 — total 100 req/s allowed.

### How to set limits (÷2)

- `token_bucket.tokens_per_second` → divide by 2
- `token_bucket.burst` → divide by 2
- `cost_based.budget` → divide by 2
- `token_bucket_llm.tokens_per_minute` → divide by 2
- `token_bucket_llm.tokens_per_day` → divide by 2
- per-request caps (`max_*`) stay unchanged

### Traffic requirements

- keep distribution close to 50/50
- avoid sticky routing that sends a key mostly to one installation
- monitor per-installation reject rates and drift — heavy skew means one instance rejects early while the other still has budget

## Option 2: Sticky balancing (per-key)

Each client key (API key, user ID, tenant) is consistently routed to the same installation. Each instance enforces the **full** limit independently for the keys it owns.

```text
Client key X  -> Gateway / LB (hash on key)  -> Fairvisor A (100% limits)
Client key Y  -> Gateway / LB (hash on key)  -> Fairvisor B (100% limits)
                                                   -> Upstream
```

**How it works:** each instance only ever sees traffic for its own set of keys and enforces the full configured limit for each. No division needed.

Configure your load balancer to hash on the header or field that corresponds to your `limit_keys` (e.g. `Authorization`, `X-Tenant-ID`).

### How to set limits (unchanged)

Use your target limits directly — no division. Both installations carry identical policy.

### Traffic requirements

- sticky routing must be consistent — a key that hops between instances will see independent (effectively doubled) limits
- monitor per-installation traffic volume to detect key distribution imbalance

## Examples

### Token bucket

Target global policy:

```json
{
  "algorithm": "token_bucket",
  "algorithm_config": {
    "tokens_per_second": 100,
    "burst": 200
  }
}
```

| Balancing | Per-installation config |
|-----------|------------------------|
| Round-robin | `tokens_per_second: 50`, `burst: 100` |
| Sticky | `tokens_per_second: 100`, `burst: 200` |

### Cost budget

Target global daily budget: `100000`

| Balancing | Per-installation `budget` |
|-----------|--------------------------|
| Round-robin | `50000` |
| Sticky | `100000` |

### LLM token budgets

Target global: `tokens_per_minute = 120000`, `tokens_per_day = 2000000`

| Balancing | `tokens_per_minute` | `tokens_per_day` |
|-----------|---------------------|------------------|
| Round-robin | `60000` | `1000000` |
| Sticky | `120000` | `2000000` |

Keep `max_prompt_tokens`, `max_completion_tokens`, `max_tokens_per_request` unchanged in both cases.

## Operational checklist

1. Deploy installation A and B in separate failure domains (nodes/zones)
2. Decide on balancing mode and configure limits accordingly
3. Validate both `/readyz` endpoints in health checks
4. Test failover (all traffic to one side) and confirm expected behavior
5. Alert on reject-rate imbalance between A and B

## Failure behavior

If one installation is down, all traffic shifts to the remaining one.

- **Round-robin:** the surviving instance has only half the limit — effective global limit halves during the outage.
- **Sticky:** the surviving instance takes over all keys but still enforces full limits — clients whose keys now land on the wrong instance see full enforcement without history.

Options for degraded mode:

- accept temporary stricter limiting during failure
- maintain alternate emergency policy bundle with adjusted limits

## Related pages

- [Helm](/docs/deployment/helm/)
- [Performance Tuning](/docs/deployment/performance-tuning/)
- [Gateway Failure Policy](/docs/reference/gateway-failure-policy/)

