API reference
Errors & rate limits
Every error is an OpenAI-style envelope with a clear HTTP status — so existing error handling keeps working.
Error shape
Errors return a JSON body matching OpenAI’s envelope, with the HTTP status set appropriately. Read error.code to branch programmatically and error.message for a human-readable reason.
json
{
"error": {
"message": "Invalid API key.",
"type": "authentication_error",
"code": "invalid_api_key",
"param": null
}
}Status codes
| Status | Code | Meaning |
|---|---|---|
| 400 | invalid_request_error | Malformed body, or a content cap exceeded (too many messages or characters). |
| 401 | invalid_api_key | Missing, malformed, or revoked fw_live_ key. |
| 404 | model_not_found | The requested model isn’t a known niche slug. |
| 422 | upstream_rejected | The model runtime rejected the request itself. |
| 429 | rate_limited | Too many requests, or more than 4 in flight for one key. |
| 502 | upstream_unavailable | The model runtime was unreachable. Safe to retry. |
| 504 | upstream_timeout | The model didn’t respond within 60s. Safe to retry. |
Rate limits
Two limits protect shared capacity. A single key may have at most 4 requests in flight at once — a model loads one at a time per worker, so piling requests onto one key only slows it down. Per-plan request throughput is enforced on top of that. Both surface as 429.
Tip.For batch or burst workloads, queue requests client-side and keep no more than 4 outstanding per key. Need higher throughput? A private deployment gives you dedicated capacity.
Handling errors
- Retry 429 / 502 / 504 with exponential backoff and jitter — these are transient.
- Don’t retry 400 / 401 / 404 — fix the request, key, or model instead.
- Cap output with
max_tokensto keep latency predictable (default 512, max 1024). - Trim history to stay under the per-request character caps on long conversations.