● ROUTING STRATEGIES

One request. The right model, every time.

Tell the router what matters — cost, latency, quality, or keeping data local — and it picks the best (provider, model) pair for each call. Layer conditions on top, and let automatic fallback handle the bad days.

Four strategies, one alias each.

Pass a strategy hint as the model name and the router resolves it to a concrete model at request time — so defaults can change without a redeploy.

Cost

strategy:cost

Sends each request to the cheapest model that still meets the quality bar — capped by per-key budgets and alerts before the bill surprises you.

Latency

strategy:latency

Optimizes for fastest time-to-first-token. Ideal for interactive chat, autocomplete, and anything a human is waiting on.

Quality

strategy:quality

Always picks the strongest available reasoner for the task — reserved for evals, planning, and high-stakes generations.

Local

strategy:local

Prefers models on your own VPS or GPU node, so PII-flagged traffic never leaves your perimeter. Falls back to cloud only if you allow it.

How the router decides.

Conditions

Layer rules on top of a strategy: prompt-token range, keyword match, industry tag, reasoning or tool requirements, max input cost. First matching rule by priority wins.

Automatic fallback

If your top pick is rate-limited, degraded, or offline, the router retries the next healthy model in the chain transparently — no client changes, no incident.

Auto classification

Pass "auto" and the router infers intent from the request itself, then applies the right strategy — so you can tune behavior centrally without touching client code.

Route smarter without rewriting clients.

Same OpenAI-compatible API. Swap the base URL, pick a strategy, and the router handles the rest.

Get a key Read the API docs