API reference

Errors & rate limits

Every error is an OpenAI-style envelope with a clear HTTP status — so existing error handling keeps working.

Error shape

Errors return a JSON body matching OpenAI’s envelope, with the HTTP status set appropriately. Read error.code to branch programmatically and error.message for a human-readable reason.

json

{
  "error": {
    "message": "Invalid API key.",
    "type": "authentication_error",
    "code": "invalid_api_key",
    "param": null
  }
}

Status codes

Status	Code	Meaning
400	invalid_request_error	Malformed body, or a content cap exceeded (too many messages or characters).
401	invalid_api_key	Missing, malformed, or revoked `fw_live_` key.
404	model_not_found	The requested `model` isn’t a known niche slug.
422	upstream_rejected	The model runtime rejected the request itself.
429	rate_limited	Too many requests, or more than 4 in flight for one key.
502	upstream_unavailable	The model runtime was unreachable. Safe to retry.
504	upstream_timeout	The model didn’t respond within 60s. Safe to retry.

Rate limits

Two limits protect shared capacity. A single key may have at most 4 requests in flight at once — a model loads one at a time per worker, so piling requests onto one key only slows it down. Per-plan request throughput is enforced on top of that. Both surface as 429.

Tip.For batch or burst workloads, queue requests client-side and keep no more than 4 outstanding per key. Need higher throughput? A private deployment gives you dedicated capacity.

Handling errors

Retry 429 / 502 / 504 with exponential backoff and jitter — these are transient.
Don’t retry 400 / 401 / 404 — fix the request, key, or model instead.
Cap output with max_tokens to keep latency predictable (default 512, max 1024).
Trim history to stay under the per-request character caps on long conversations.

← Streaming OpenAI compatibility →