API reference

OpenAI compatibility

Flywheel implements the OpenAI Chat Completions API. Any OpenAI SDK works — change the base URL and key, set the model to a niche slug, and ship.

Drop-in swap

Point your existing OpenAI client at the Flywheel base URL, use a fw_live_ key, and set model to a niche slug. The request and response objects are identical, so the rest of your code is unchanged.

from openai import OpenAI

# change these two lines:
client = OpenAI(
    base_url="https://gyld.dev/api/v1",
    api_key="fw_live_…",
)

# everything else is the same OpenAI code you already have:
resp = client.chat.completions.create(
    model="fitness",                 # a niche slug, not an OpenAI model
    messages=[{"role": "user", "content": "Hi"}],
)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://gyld.dev/api/v1",            // change this
  apiKey: process.env.FLYWHEEL_API_KEY, // and this
});

const resp = await client.chat.completions.create({
  model: "fitness",
  messages: [{ role: "user", content: "Hi" }],
});

Tip.Self-hosting? Same story — vLLM and llama.cpp serve the OpenAI API too, so you point the same SDK at http://localhost:8000/v1 with any placeholder key.

What’s supported

Field	Status
`model`	✓ Required — the niche slug.
`messages`	✓ `system`, `user`, `assistant` roles.
`max_tokens`	✓ Default 512, capped at 1024.
`temperature`	✓ Supported.
Response shape	✓ Standard `chat.completion` object + `usage`.
Error envelope	✓ Standard OpenAI error JSON + HTTP status.

What’s ignored

Fields outside that set are accepted and ignored rather than rejected, so an existing OpenAI payload never errors just for carrying extra keys. Today that includes tools / function calling, response_format (JSON mode), n, logprobs, top_p, and presence/frequency_penalty. Native streaming is handled separately — see Streaming.

Note.Need one of these? Tell us what you’re building — these niche models are deliberately small and focused, and we add capabilities where they earn their place.

← Errors & rate limits The model family →