API reference

Errors & rate limits

Every error is an OpenAI-style envelope with a clear HTTP status — so existing error handling keeps working.

Error shape

Errors return a JSON body matching OpenAI’s envelope, with the HTTP status set appropriately. Read error.code to branch programmatically and error.message for a human-readable reason.

json
{
  "error": {
    "message": "Invalid API key.",
    "type": "authentication_error",
    "code": "invalid_api_key",
    "param": null
  }
}

Status codes

StatusCodeMeaning
400invalid_request_errorMalformed body, or a content cap exceeded (too many messages or characters).
401invalid_api_keyMissing, malformed, or revoked fw_live_ key.
404model_not_foundThe requested model isn’t a known niche slug.
422upstream_rejectedThe model runtime rejected the request itself.
429rate_limitedToo many requests, or more than 4 in flight for one key.
502upstream_unavailableThe model runtime was unreachable. Safe to retry.
504upstream_timeoutThe model didn’t respond within 60s. Safe to retry.

Rate limits

Two limits protect shared capacity. A single key may have at most 4 requests in flight at once — a model loads one at a time per worker, so piling requests onto one key only slows it down. Per-plan request throughput is enforced on top of that. Both surface as 429.

Tip.For batch or burst workloads, queue requests client-side and keep no more than 4 outstanding per key. Need higher throughput? A private deployment gives you dedicated capacity.

Handling errors

  • Retry 429 / 502 / 504 with exponential backoff and jitter — these are transient.
  • Don’t retry 400 / 401 / 404 — fix the request, key, or model instead.
  • Cap output with max_tokens to keep latency predictable (default 512, max 1024).
  • Trim history to stay under the per-request character caps on long conversations.