Tell the router what matters — cost, latency, quality, or keeping data local — and it picks the best (provider, model) pair for each call. Layer conditions on top, and let automatic fallback handle the bad days.
Pass a strategy hint as the model name and the router resolves it to a concrete model at request time — so defaults can change without a redeploy.
strategy:costSends each request to the cheapest model that still meets the quality bar — capped by per-key budgets and alerts before the bill surprises you.
strategy:latencyOptimizes for fastest time-to-first-token. Ideal for interactive chat, autocomplete, and anything a human is waiting on.
strategy:qualityAlways picks the strongest available reasoner for the task — reserved for evals, planning, and high-stakes generations.
strategy:localPrefers models on your own VPS or GPU node, so PII-flagged traffic never leaves your perimeter. Falls back to cloud only if you allow it.
Layer rules on top of a strategy: prompt-token range, keyword match, industry tag, reasoning or tool requirements, max input cost. First matching rule by priority wins.
If your top pick is rate-limited, degraded, or offline, the router retries the next healthy model in the chain transparently — no client changes, no incident.
Pass "auto" and the router infers intent from the request itself, then applies the right strategy — so you can tune behavior centrally without touching client code.
Same OpenAI-compatible API. Swap the base URL, pick a strategy, and the router handles the rest.