Plan on Opus. Run the swarm on Sonnet.
Cut the cost of your agent fleet — without switching agents.
OpenThomas is a tiny local proxy that sits on the wire between your coding agents and the model providers they call. It watches every request and, where it can save you money, quietly routes it to a cheaper model — keeping the expensive model exactly where it earns its price, and nowhere else.
Its headline trick: making Claude Code Dynamic Workflows affordable.
Claude Code Dynamic Workflows are incredible and expensive. In Anthropic's own words:
"Claude dynamically writes orchestration scripts that run tens to hundreds of parallel subagents in a single session."
"Dynamic workflows can consume substantially more tokens than a typical Claude Code session."
Every one of those hundreds of subagents inherits the session model — Opus 4.8 — and there's no built-in knob to make the swarm cheaper. The planner needs Opus. The two hundred workers grepping files and running tests do not.
OpenThomas tells the planner apart from the workers on the wire and routes only the workers to a cheaper model:
Claude plans the work ........ claude-opus-4-8 ← untouched
└─ subagent #1 grep ........ claude-sonnet-4-6 ← ~5× cheaper
└─ subagent #2 edit ........ claude-sonnet-4-6
└─ subagent … ×200 .......... claude-sonnet-4-6
Sonnet is roughly 5× cheaper per token than Opus ($3 / $15 vs $15 / $75 per million in/out). The workers are the bulk of a workflow's tokens, so your bill drops by most of that — while the planning, verification, and final answer you actually read stay on Opus.
The classifier is exact, not a guess. Claude Code's planner always
carries the orchestrator-only Agent tool (it's the thing that spawns
subagents); subagents never do, because they can't nest. Verified
against 672 real calls: 100% of planners kept, 100% of subagents
caught, zero planner calls ever downgraded. Tiny background calls
(the security monitor, title generation) are left alone by a token
floor.
It's on by default the moment Claude Code is wired. Point it elsewhere,
or turn it off, in one click — or one line of ~/.openthomas/routing.json.
npm install -g @openthomas/openthomas
openthomas wire # detect your agents, install the tap, start the daemon
# …use Claude Code exactly as you do now — run a dynamic workflow…
openthomas # open the dashboard at http://localhost:9877openthomas wire is reversible: openthomas unwire restores every file
it touched. No accounts, no telemetry, no cloud — your traffic and your
traces stay on your machine. See PRIVACY.md.
The dashboard answers the three questions that actually matter when a fleet is burning tokens:
- How many agents are running right now, and which.
- What each agent is spending — live, per model.
- What each task is spending — the whole run, not just one call, with the cheaper-model swaps and the dollars they saved called out.
And one control: which model each agent uses, including the subagent-downgrade target. That's the whole product.
$ openthomas list
ID STARTED AGENT STATUS COST SERVED
──────────── ─────────────────── ─────────── ────── ─────── ─────────────────────────
ru_aBc1xYz9 14:23:11 claude-code done $0.04 opus-4-8 (planner)
ru_fOj6Ce1H 14:23:11 claude-code done $0.009 sonnet-4-6 ← opus-4-8 ↓
You do not rewrite anything or adopt a framework. OpenThomas generates the wrapping on the fly for whatever you already run, and saves money across all of it:
| Agent | Auto-wire | Cost-saver |
|---|---|---|
| Claude Code | ✅ | subagent downgrade (Dynamic Workflows) + per-route model routing |
| OpenClaw | ✅ | per-route model routing / failover |
| Hermes | ✅ | per-route model routing / failover |
| Codex | ✅ | per-route model routing |
| Claude Desktop | ✅ | per-route model routing |
| Cursor / Gemini CLI | ⓘ manual | point the base URL at the wire |
Because it taps the wire, OpenThomas works on agents it doesn't own and bills nothing extra — your subagent calls go to the same provider on the same credentials, just cheaper.
OpenThomas runs as a single local daemon. It never phones home, sends no
telemetry, and forwards your agent's traffic only to the provider your
agent already calls — and nowhere else. The one sanctioned outbound call
is a daily version check against the public npm registry, which carries
no data and is disableable (updateCheck: false). The full contract is
in PRIVACY.md.
Free and open source (MIT), entirely. Solo-built and used daily by the author; tested against real Claude Code, Claude Desktop, OpenClaw, Codex, and Hermes traffic. Bug reports and PRs welcome.