Open-source threat modeling that doesn't stop at the report. Point it at repos and/or a design doc; it builds a STRIDE threat model (DFD, per-element coverage matrix, OWASP/CWE/MITRE-grounded threats, reachability triage), then proposes a fix and proves it in a sandbox — replaying the exploit, asserting it no longer fires, and committing that as a permanent test — and produces a ready-to-open PR. The skills get sharper every time you run it.
Synthesis is built and maintained by UnitOne; the repository lives at UnitOneAI/ThreatModel.
MCP-first: the engine is exposed as MCP tools, so any agent (Claude Code, Cursor,
your own orchestrator) calls the same loop. It also ships a CLI. A real scan needs
a model — a hosted key, an OpenAI-compatible endpoint, or the bundled local model.
If none is configured a real run refuses rather than faking; --test mode
produces templated fixtures for CI/UI demos only and is loudly labeled.
The one command (mix any number of repos and docs — they're merged into one model):
synthesis analyze <repo-or-doc> [<repo-or-doc> ...]
# e.g.
synthesis analyze https://github.com/org/api https://github.com/org/worker arch.md threats.md --mode fixquick one LLM pass (≈ STRIDE-GPT / Threat Forge)
agentic planner → parallel skill reviewers → critic (the agentic loop)
fix + sandbox-validated remediation (diff + evidence) ← nobody else does this for free
Apache-2.0. The loop, skills, fixer, and a local Intent Graph are open. The federated cross-customer graph and the managed exploit-tier runtime are the paid tier — see Open-core.
| Synthesis | LLM threat-modelers (STRIDE-GPT, Threat Forge) | SAST (Semgrep, CodeQL) | Manual STRIDE | |
|---|---|---|---|---|
| Builds a STRIDE model from repos + design docs | ✅ | doc/description only | ❌ (code only) | ✅ (by hand) |
| Control IDs that resolve — hallucinated CWE/OWASP rejected | ✅ | ✅ | ✅ | |
| Reachability triage (cut unreachable noise) | ✅ | ❌ | partial | ✅ (by hand) |
| Proposes a fix | ✅ | ❌ | some autofix | ❌ |
| Proves the fix — replays the exploit, asserts it no longer fires | ✅ | ❌ | ❌ | ❌ |
| Improves with use (local skill flywheel) | ✅ | ❌ | rules are static | n/a |
| MCP-first — any agent calls the same loop | ✅ | ❌ | ❌ | ❌ |
| Open source, runs fully local/offline | ✅ (Apache-2.0) | varies | ✅ | n/a |
The row nobody else checks for free is proves the fix. Detection is commoditized; the agentic loop + sandbox replay is where Synthesis is different.
See the whole pipeline run end-to-end on templated fixtures — zero setup, nothing to
configure. (Loudly labeled demo: true; not a real scan — see Models.)
pip install synthesis-engine
echo "a REST API with JWT + an LLM agent + postgres, file upload" \
| synthesis analyze --doc - --mode fix --test
# from a repo clone you can also just run: make demoPrefer the browser? Open this repo in GitHub Codespaces (green Code button →
Codespaces → Create) — the devcontainer installs everything and runs make demo
on first boot.
pip install 'synthesis-engine[mcp]' # engine + MCP server
# pip install 'synthesis-engine[local]' # + bundled local model (no API key needed)
export ANTHROPIC_API_KEY=sk-ant-... # a real scan needs a model — pick one:
# or: export OPENAI_BASE_URL=... OPENAI_API_KEY=... (any OpenAI-compatible / local server)
# or: pip install 'synthesis-engine[local]' && export SYNTHESIS_USE_LOCAL=1
synthesis analyze https://github.com/org/api https://github.com/org/worker \
--doc design.md --mode fix --focus "unauthenticated peer; ransomware IT→OT"cp .env.example .env # add a key, or set SYNTHESIS_USE_LOCAL=1 for the local model
docker compose run --rm synthesis analyze https://github.com/org/repo --mode fixmake dev # editable install + dev extras
make demo # see the machinery in TEST mode (no key)
make test lint # offline tests + ruff
make scan-skills # the skill injection-scan gate
make build # wheel (verifies skills are packaged)
make serve # start the MCP server (stdio)Storage note: the local Intent Graph is SQLite — keep
SYNTHESIS_DBon a real local disk, not a network/FUSE mount (file locking).
The engine tries providers in this order; the first available wins:
| Order | Provider | How | Notes |
|---|---|---|---|
| 1 | Anthropic | ANTHROPIC_API_KEY |
best quality. Claude Fable 5 (claude-fable-5) recommended for threat modeling; Mythos 5 (claude-mythos-5, cyber-class) via the Project Glasswing preview. Also Sonnet/Opus/Haiku. |
| 2 | OpenAI-compatible | OPENAI_BASE_URL + OPENAI_API_KEY |
vLLM, Ollama, LM Studio, local servers |
| 3 | Bundled local model | pip install '.[local]' + SYNTHESIS_USE_LOCAL=1 |
Default Qwen3-4B-Instruct (Apache-2.0, ~2.5GB), pulled from HF Hub at a pinned revision on first use and cached. Security-domain upgrade: SYNTHESIS_LOCAL_MODEL=foundation-sec (Foundation-Sec-8B, Cisco, ~4.9GB) or foundation-sec-apache (Apache base). Full override via SYNTHESIS_LOCAL_REPO/FILE/REVISION. |
| — | Test mode | --test / SYNTHESIS_TEST_MODE=1 |
CI/UI demo only — deterministic templated fixtures, not a real scan. Refuses to masquerade: every output is stamped demo: true + a warning. |
If none is configured, a real run returns an error telling you how to fix it — it will not silently emit fixtures. The local model gives a genuine, free, fully offline scan; test mode does not.
Threat modeling is visual. Every model renders as a self-contained HTML report — data-flow diagram (Mermaid, with trust-zone subgraphs and attacker/asset/exposed highlighting), STRIDE-per-element coverage matrix, threat actors, trust zones, OWASP coverage, and a threat table with a per-threat fix drawer (mitigation, code diff, honesty-gate badges).
synthesis analyze ./arch.md --html report.html # write a report alongside a scan
synthesis report <model_id> --open # render a stored model + open it
synthesis ui # local web app → http://127.0.0.1:8765synthesis ui is a stdlib-only web app (no framework, localhost-bound), styled to the
UnitOne design system, with a left sidebar:
- Threat Models — list of every model you've run (persisted), and an Add Threat Model Source form to scan N repos + N docs together. Open one to see the full visual report (DFD, 3-column threat-analysis overview, STRIDE matrix, threats).
- Fix Queue — every generated fix across models: the diff, the skill that produced it, the component it touches, and security/behavior-verified badges.
- Learn · Skills — the skill auto-evolution surface: each skill's confidence cap and accept/reject history (the flywheel state).
Start the server: synthesis serve (stdio). Register it with your agent, e.g. for
Claude Code / Cursor:
{
"mcpServers": {
"synthesis": { "command": "synthesis", "args": ["serve"] }
}
}Tools exposed:
| Tool | What it does |
|---|---|
threat_model(repos, doc, mode, focus) |
generate a model (DFD, STRIDE matrix, threats, fixes) |
fix(model_id, threat_id) |
run the fixer on one threat → diff + sandbox validation (+ PR when the GitHub App is configured) |
get_model(model_id) |
fetch a stored model |
accept_threat(model_id, threat_id, accepted) |
human verdict → calibrates the skill (flywheel) |
list_skills() / skill_stats() |
the live skill index / current confidence caps |
Phase 0 INGEST N repos + doc → merged context + DFD seed
Phase A PLAN planner inventories components, SELECTS skills from the index
Phase B ANALYZE one read-only reviewer per (component × skill), in parallel,
emitting STRIDE entries with validated control IDs
Phase C MERGE dedupe, validate IDs resolve, reachability noise-cut
Phase D CRITIQUE challenge high-sev threats; downgrade unreachable ones
Phase E FIX propose diff → sandbox validate → characterization test → PR-ready output
The honesty gate. We own the security regression (replay the PoC, assert it now fails, commit it as a permanent test). We do not own functional regression — that needs your test suite. A fix that passes security but has no functional coverage ships as "security-verified, behavior-UNVERIFIED — human review required," never dressed up as fully tested.
Sandbox tiers. Rungs 1–3 (build / lint+SAST / tests) run in a network-isolated Docker container on your own code. Rung 4 (exploit no longer reproduces) and rung 5 (no behavioral regression) want hardware-grade isolation (gVisor / Firecracker per invocation) to safely run untrusted, agent-generated exploit reproducers — that hardened runtime is the managed tier. Same interface, swappable runtime.
Capabilities are skills — agentskills.io-style markdown in skills/ with a
frontmatter header. Adding a SKILL.md adds a capability; the planner reads the live
index and selects skills per component (add an LLM component → ai-security/llm-top-10
fires, no code change).
After each run, outcomes calibrate each skill's local confidence_cap
(synthesis stats shows the state), and accepted threats on similar components
warm-start the next run. This is the flywheel — and it runs locally, so
"auto-evolving" is true for a self-hoster on day one.
synthesis skills # the index
synthesis stats # confidence caps move as you accept/reject findings
synthesis accept tm-abc123 t-def456 --reject # feed a verdict back| Tier | Why | |
|---|---|---|
| Loop, skills, fixer, local Intent Graph + calibration | OSS (Apache-2.0) | the tool genuinely improves on your codebase |
| Federated Intent Graph (every customer's outcomes improve every customer's planner) | Paid | the network effect — the actual moat |
| Managed gVisor/Firecracker exploit-tier (rungs 4–5 at scale) | Paid | safely running untrusted exploit code is an operational liability |
| Pre-warmed, signed skill releases | Paid | OSS starts cold; managed ships pre-trained |
The license protects nothing — the pooled memory does. federated_warm_start() in
memory.py is the seam a managed client overrides; the OSS build returns local-only.
v0.1 — engine, MCP server, CLI, skills, local Intent Graph, Docker sandbox (rungs
1–3), and the provider ladder (hosted / OpenAI-compatible / bundled local model /
test mode) all working; tests run offline with pip install '.[dev]' && pytest (no
key, no model). Real PRs need a GitHub App (synthetic PRs until configured); rung 4–5 need
the managed runtime. Built and maintained by UnitOne.
