# Cost-Based Budget

- Canonical URL: https://docs.fairvisor.com/docs/algorithms/cost-budget/
- Section: docs
- Last updated: n/a
> Fixed-period quota limiter with staged warn/throttle/reject actions.


`cost_based` enforces a **cumulative budget per period** — designed for daily or hourly spend caps where each request has variable cost (money, units, weighted operations). It supports graceful degradation near budget exhaustion and a hard stop at the limit, making it complementary to token bucket, which limits instantaneous rate rather than total consumption.

## How it works

For each limiter key, Fairvisor tracks a single period counter:

```text
current_usage(period, key)
```

On each request:

1. Resolve request cost (`fixed`, `header:*`, `query:*`)
2. Compute current period start (`5m`, `1h`, `1d`, `7d`)
3. Atomically increment usage by cost
4. Compute `usage_percent = usage / budget * 100`
5. Pick highest matching staged action
6. Allow / throttle / reject
7. If rejected, roll back increment (request is not charged)

## Period alignment

| Period | Alignment |
|---|---|
| `5m` | start of current 5-minute UTC slot (`:00`, `:05`, `:10`, ...) |
| `1h` | start of current clock hour |
| `1d` | start of current UTC day |
| `7d` | start of current UTC week (Monday 00:00 UTC) |

No explicit TTL is required for counters, because period start is embedded in the key. When period changes, a new key is used.

## Cost source resolution

`cost_key` determines where cost is read from:

- `fixed`
- `header:<name>`
- `query:<name>`

If missing/invalid/non-positive, `default_cost` is used.

Practical pattern:

- start with `fixed_cost: 1`
- migrate to weighted pricing using `header:x-request-cost`

## Staged actions

`staged_actions` is a sorted threshold table. The active stage is the highest threshold reached by current usage.

Example:

```json
"staged_actions": [
  { "threshold_percent": 80, "action": "warn" },
  { "threshold_percent": 95, "action": "throttle", "delay_ms": 500 },
  { "threshold_percent": 100, "action": "reject" }
]
```

Action semantics:

- `warn`: request allowed, warning marker is set
- `throttle`: request delayed by `delay_ms`, then allowed
- `reject`: request denied with `budget_exceeded`

Important runtime nuance:

- config must contain `reject` at `100%`
- a `reject` threshold below `100%` is not used while still under budget
- true rejection happens only when usage goes over budget

## Rejection and rollback

When the increment makes usage exceed budget:

- Fairvisor performs compensating `dict:incr(..., -cost)`
- rejected request is not counted as consumed budget
- response reason is `budget_exceeded`

This keeps accounting aligned with accepted traffic.

## State format

Keys are stored in `ngx.shared.fairvisor_counters`:

```text
cb:{rule_name}:{composite_limit_key}:{period_start_epoch}
```

Example:

```text
cb:daily-budget:org-abc:1761177600
```

## Configuration

```json
{
  "algorithm": "cost_based",
  "algorithm_config": {
    "budget": 1000,
    "period": "5m",
    "cost_key": "header:x-cost-units",
    "fixed_cost": 1,
    "default_cost": 1,
    "staged_actions": [
      { "threshold_percent": 80, "action": "warn" },
      { "threshold_percent": 95, "action": "throttle", "delay_ms": 500 },
      { "threshold_percent": 100, "action": "reject" }
    ]
  }
}
```

| Field | Required | Default | Validation |
|---|---|---|---|
| `budget` | yes | - | positive number |
| `period` | yes | - | one of `5m`, `1h`, `1d`, `7d` |
| `cost_key` | no | `fixed` | `fixed`, `header:<name>`, `query:<name>` |
| `fixed_cost` | no | `1` | positive (used for `fixed`) |
| `default_cost` | no | `1` | positive |
| `staged_actions` | yes | - | non-empty, strictly ascending thresholds, must include `reject@100` |

`staged_actions[]` fields:

| Field | Required | Validation |
|---|---|---|
| `threshold_percent` | yes | number in `[0, 100]` |
| `action` | yes | `warn`, `throttle`, `reject` |
| `delay_ms` | conditional | required and `> 0` for `throttle` |

## Response headers

On allowed requests (in `enforce` mode):

```
RateLimit-Limit: 1000
RateLimit-Remaining: 753
RateLimit-Reset: <seconds until period end>
```

On rejection:

```
HTTP 429 Too Many Requests
Retry-After: <seconds until next period start>
RateLimit-Limit: 1000
RateLimit-Remaining: 0
X-Fairvisor-Reason: budget_exceeded
```

`Retry-After` is `ceil(next_period_start - now)`, minimum 1 second.

## Failure behavior

If shared dict increment fails (for example dict pressure), algorithm fails open:

- request is allowed
- error detail is returned to caller path for logging/metrics
- traffic is not blocked due to storage failure

## Tuning

1. Choose budget from real commercial quota (not synthetic RPS)
2. Start with one warning stage (`70-85%`) and one throttle stage (`90-97%`)
3. Keep throttle delay modest; runtime caps throttle sleep at 30s anyway
4. Always keep a hard reject at 100%
5. Use fixed cost first, then move to weighted cost once producers can send trustworthy cost headers

## Example

```json
{
  "name": "org-daily-spend",
  "limit_keys": ["jwt:org_id"],
  "algorithm": "cost_based",
  "algorithm_config": {
    "budget": 500,
    "period": "5m",
    "cost_key": "header:x-request-cost",
    "default_cost": 1,
    "staged_actions": [
      { "threshold_percent": 80, "action": "warn" },
      { "threshold_percent": 95, "action": "throttle", "delay_ms": 200 },
      { "threshold_percent": 100, "action": "reject" }
    ]
  }
}
```

## Combine with other controls

Typical production stack:

- `token_bucket` for short-term burst control
- `cost_based` for period quota
- `token_bucket_llm` for token economics on LLM endpoints

All matched rules must pass, so you can enforce both real-time and period constraints together.

