# Token Bucket

- Canonical URL: https://docs.fairvisor.com/docs/algorithms/token-bucket/
- Section: docs
- Last updated: n/a
> Continuous-time token bucket rate limiter — algorithm, configuration, and state format.


The token bucket is the primary algorithm for **request-rate limiting** (requests per second). It allows short bursts while enforcing a steady-state rate over time.

## How it works

The implementation uses a **continuous-time token bucket** (also called a *leaky bucket as a meter*):

1. Each unique limit key has a bucket with capacity `burst` tokens
2. Tokens refill at `tokens_per_second` per second, continuously
3. Each request consumes `cost` tokens (default 1)
4. If the bucket has enough tokens, the request is allowed; otherwise rejected

Refill is calculated lazily on each request — there is no background timer:

```
elapsed     = max(0, now - last_refill)
new_tokens  = min(tokens + elapsed × tokens_per_second, burst)

if new_tokens >= cost:
    allow, set tokens = new_tokens - cost
else:
    reject, retry_after = ceil((cost - new_tokens) / tokens_per_second)
```

The `max(0, ...)` guards against backward clock jumps.

## State

State is stored in `ngx.shared.fairvisor_counters` as a single string entry per limit key:

```
Key:    tb:{rule_name}:{composite_limit_key}
Value:  "{tokens}:{last_refill}"   (both as 6-decimal floats)
```

Example:
```
tb:per-org-rps:org-abc  →  "87.432100:1736940000.000000"
```

## Configuration

```json
{
  "algorithm": "token_bucket",
  "algorithm_config": {
    "tokens_per_second": 100,
    "burst": 200,
    "cost_source": "fixed",
    "fixed_cost": 1,
    "default_cost": 1
  }
}
```

| Field | Type | Required | Default | Constraint | Description |
|---|---|---|---|---|---|
| `tokens_per_second` | number | yes* | — | > 0 | Steady-state refill rate |
| `rps` | number | yes* | — | > 0 | Alias for `tokens_per_second` |
| `burst` | number | **yes** | — | ≥ `tokens_per_second` | Maximum bucket capacity (max burst size) |
| `cost_source` | string | no | `"fixed"` | — | `"fixed"`, `"header:<name>"`, or `"query:<name>"` |
| `fixed_cost` | number | no | `1` | > 0 | Token cost per request when `cost_source = "fixed"` |
| `default_cost` | number | no | `1` | > 0 | Fallback cost when the source header/query param is absent or non-numeric |

*One of `tokens_per_second` or `rps` must be provided.

## Variable cost

You can charge a variable number of tokens per request by reading the cost from a header or query parameter:

```json
{
  "algorithm_config": {
    "tokens_per_second": 100,
    "burst": 1000,
    "cost_source": "header:x-request-weight",
    "default_cost": 1
  }
}
```

If `X-Request-Weight: 5` is sent, 5 tokens are consumed. If the header is absent or contains a non-numeric value, `default_cost` (1) is used.

## Response headers

On every allowed request the following headers are set (in `enforce` mode), per [draft-ietf-httpapi-ratelimit-headers](https://datatracker.ietf.org/doc/draft-ietf-httpapi-ratelimit-headers/) and `Retry-After` per [RFC 9110](https://www.rfc-editor.org/rfc/rfc9110#field.retry-after):

```
RateLimit-Limit: 200
RateLimit-Remaining: 87
RateLimit-Reset: 1
RateLimit: "per-org-rps";r=87;t=1
```

On rejection:

```
HTTP 429 Too Many Requests
Retry-After: 2
RateLimit-Limit: 200
RateLimit-Remaining: 0
RateLimit-Reset: 2
X-Fairvisor-Reason: token_bucket_exceeded
```

## `Retry-After` jitter

The `Retry-After` value is the raw `ceil((cost - tokens) / tokens_per_second)` plus a **deterministic per-identity jitter** of up to 50%. The jitter is seeded from the identity (JWT subject, IP, path) so the same client always gets the same offset — different clients get different offsets. This prevents the thundering herd problem when many clients hit the limit at the same time.

## Failure behavior

If the shared dict increment fails (for example, under dict memory pressure), the algorithm fails open:

- the request is allowed
- token state is left unchanged (no partial decrement is applied)
- the failure is logged for metrics

Traffic is never blocked due to storage failure.

## Tuning

| Scenario | Recommendation |
|---|---|
| Interactive API (users expect burst then steady) | `burst = 5 × tokens_per_second` |
| Strict steady-rate enforcement | `burst = tokens_per_second` |
| LLM tool calls (each call is expensive) | `burst = tokens_per_second` + set `cost_source` to a per-call cost header |
| Background batch job (burst allowed at start) | `burst = 10 × tokens_per_second` |

## Examples

### Global rate limit with IP partitioning

```json
{
  "name": "global-rps",
  "limit_keys": ["ip:address"],
  "algorithm": "token_bucket",
  "algorithm_config": {
    "rps": 5,
    "burst": 10
  }
}
```

Each unique IP address gets its own 10-token bucket that refills at 5 tokens per second.

### Per-API-key tiered limits

```json
{
  "rules": [
    {
      "name": "enterprise",
      "limit_keys": ["header:x-api-key"],
      "algorithm": "token_bucket",
      "algorithm_config": { "rps": 1000, "burst": 2000 },
      "match": { "jwt:plan": "enterprise" }
    },
    {
      "name": "free",
      "limit_keys": ["header:x-api-key"],
      "algorithm": "token_bucket",
      "algorithm_config": { "rps": 10, "burst": 20 }
    }
  ]
}
```

