Models

The model family

Each model is a fine-tune of the same base (Qwen3.6-35B-A3B, Apache-2.0) for one vertical, with a point-of-use guardrail baked into the weights.

The family

Call a model by its slug (the model field). Every slug below is live on Hugging Face for self-hosting and on the hosted API. The catalog has the full card — taglines, evals, and install commands — for each.

Slug	Niche	Hugging Face
`legal-intake`	Law-Firm Intake Coordinator	`flywheel-ai/legal-intake`
`healthcare-frontdesk`	Healthcare Front Desk	`flywheel-ai/healthcare-frontdesk`
`automotive`	Auto Repair & Service Ops	`flywheel-ai/automotive`
`home-services`	Home Services (HVAC · Plumbing · Electrical)	`flywheel-ai/home-services`
`beauty-wellness`	Beauty & Wellness (Salon / Spa / Med-Spa)	`flywheel-ai/beauty-wellness`
`restaurant`	Restaurant & Hospitality	`flywheel-ai/restaurant`
`fitness`	Fitness	`flywheel-ai/fitness`
`construction`	Construction & Trades	`flywheel-ai/construction`
`agency-ops`	AI-Automation Agency Operator	`flywheel-ai/agency-ops`
`real-estate`	Real Estate Agent Ops	`flywheel-ai/real-estate`

Base & sizes

All models share one base — Qwen3.6-35B-A3B, an Apache-2.0 mixture-of-experts model — so they have identical runtime characteristics and you can swap niches without re-plumbing. Each ships in two builds:

GGUF (Q4_K_M) — ~20 GB. Laptop- and CPU-friendly; runs in llama.cpp.
bf16 safetensors — ~65 GB. Full precision for GPU serving with vLLM.

Tip.Because it’s a sparse MoE, only a fraction of the parameters activate per token — so the Q4 GGUF is genuinely usable on a single modern machine. See Self-hosting.

Picking a model

Choose the model whose vertical matches your business — a gym uses fitness, an auto shop uses automotive. Each is tuned for the language, tasks, and guardrails of that trade, so a niche model out-answers a general model of the same size on its home turf. If no niche fits, a private model can be trained for yours.

Versioning & guardrails

Models are versioned (v1.0, …) and a new version ships only when it measurably beats the last. With consent, real usage trains the next version — the flywheel. Every model also carries a point-of-use guardrail (its output is decision support, not professional advice) baked into both the weights and the runner config, so it travels with the model wherever you run it.

Browse all models →

← OpenAI compatibility Self-hosting →