Skip to content

openthomas-com/openthomas

Repository files navigation

OpenThomas

Plan on Opus. Run the swarm on Sonnet.

Cut the cost of your agent fleet — without switching agents.

OpenThomas is a tiny local proxy that sits on the wire between your coding agents and the model providers they call. It watches every request and, where it can save you money, quietly routes it to a cheaper model — keeping the expensive model exactly where it earns its price, and nowhere else.

Its headline trick: making Claude Code Dynamic Workflows affordable.

The problem

Claude Code Dynamic Workflows are incredible and expensive. In Anthropic's own words:

"Claude dynamically writes orchestration scripts that run tens to hundreds of parallel subagents in a single session."

"Dynamic workflows can consume substantially more tokens than a typical Claude Code session."

Every one of those hundreds of subagents inherits the session model — Opus 4.8 — and there's no built-in knob to make the swarm cheaper. The planner needs Opus. The two hundred workers grepping files and running tests do not.

The fix

OpenThomas tells the planner apart from the workers on the wire and routes only the workers to a cheaper model:

  Claude plans the work ........  claude-opus-4-8   ← untouched
  └─ subagent  #1  grep ........  claude-sonnet-4-6 ← ~5× cheaper
  └─ subagent  #2  edit ........  claude-sonnet-4-6
  └─ subagent  … ×200 ..........  claude-sonnet-4-6

Sonnet is roughly 5× cheaper per token than Opus ($3 / $15 vs $15 / $75 per million in/out). The workers are the bulk of a workflow's tokens, so your bill drops by most of that — while the planning, verification, and final answer you actually read stay on Opus.

The classifier is exact, not a guess. Claude Code's planner always carries the orchestrator-only Agent tool (it's the thing that spawns subagents); subagents never do, because they can't nest. Verified against 672 real calls: 100% of planners kept, 100% of subagents caught, zero planner calls ever downgraded. Tiny background calls (the security monitor, title generation) are left alone by a token floor.

It's on by default the moment Claude Code is wired. Point it elsewhere, or turn it off, in one click — or one line of ~/.openthomas/routing.json.

Quick start

npm install -g @openthomas/openthomas

openthomas wire     # detect your agents, install the tap, start the daemon
# …use Claude Code exactly as you do now — run a dynamic workflow…
openthomas          # open the dashboard at http://localhost:9877

openthomas wire is reversible: openthomas unwire restores every file it touched. No accounts, no telemetry, no cloud — your traffic and your traces stay on your machine. See PRIVACY.md.

What you see

The dashboard answers the three questions that actually matter when a fleet is burning tokens:

  • How many agents are running right now, and which.
  • What each agent is spending — live, per model.
  • What each task is spending — the whole run, not just one call, with the cheaper-model swaps and the dollars they saved called out.

And one control: which model each agent uses, including the subagent-downgrade target. That's the whole product.

$ openthomas list
ID            STARTED              AGENT        STATUS  COST     SERVED
────────────  ───────────────────  ───────────  ──────  ───────  ─────────────────────────
ru_aBc1xYz9   14:23:11             claude-code  done    $0.04    opus-4-8  (planner)
ru_fOj6Ce1H   14:23:11             claude-code  done    $0.009   sonnet-4-6 ← opus-4-8 ↓

Keep your agents — OpenThomas wraps the wire, not the agent

You do not rewrite anything or adopt a framework. OpenThomas generates the wrapping on the fly for whatever you already run, and saves money across all of it:

Agent Auto-wire Cost-saver
Claude Code subagent downgrade (Dynamic Workflows) + per-route model routing
OpenClaw per-route model routing / failover
Hermes per-route model routing / failover
Codex per-route model routing
Claude Desktop per-route model routing
Cursor / Gemini CLI ⓘ manual point the base URL at the wire

Because it taps the wire, OpenThomas works on agents it doesn't own and bills nothing extra — your subagent calls go to the same provider on the same credentials, just cheaper.

Privacy

OpenThomas runs as a single local daemon. It never phones home, sends no telemetry, and forwards your agent's traffic only to the provider your agent already calls — and nowhere else. The one sanctioned outbound call is a daily version check against the public npm registry, which carries no data and is disableable (updateCheck: false). The full contract is in PRIVACY.md.

Status

Free and open source (MIT), entirely. Solo-built and used daily by the author; tested against real Claude Code, Claude Desktop, OpenClaw, Codex, and Hermes traffic. Bug reports and PRs welcome.

About

Cut the cost of your agent fleet without switching agents. Makes Claude Code Dynamic Workflows cheap: the planner stays on Opus, the hundreds of parallel subagents run on Sonnet. Free, MIT, local-first.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages