Synthesis

Open-source threat modeling that doesn't stop at the report. Point it at repos and/or a design doc; it builds a STRIDE threat model (DFD, per-element coverage matrix, OWASP/CWE/MITRE-grounded threats, reachability triage), then proposes a fix and proves it in a sandbox — replaying the exploit, asserting it no longer fires, and committing that as a permanent test — and produces a ready-to-open PR. The skills get sharper every time you run it.

Synthesis is built and maintained by UnitOne; the repository lives at UnitOneAI/ThreatModel.

MCP-first: the engine is exposed as MCP tools, so any agent (Claude Code, Cursor, your own orchestrator) calls the same loop. It also ships a CLI. A real scan needs a model — a hosted key, an OpenAI-compatible endpoint, or the bundled local model. If none is configured a real run refuses rather than faking; --test mode produces templated fixtures for CI/UI demos only and is loudly labeled.

The one command (mix any number of repos and docs — they're merged into one model):

synthesis analyze <repo-or-doc> [<repo-or-doc> ...]
# e.g.
synthesis analyze https://github.com/org/api https://github.com/org/worker arch.md threats.md --mode fix

quick    one LLM pass                                   (≈ STRIDE-GPT / Threat Forge)
agentic  planner → parallel skill reviewers → critic    (the agentic loop)
fix      + sandbox-validated remediation (diff + evidence)  ← nobody else does this for free

Apache-2.0. The loop, skills, fixer, and a local Intent Graph are open. The federated cross-customer graph and the managed exploit-tier runtime are the paid tier — see Open-core.

How it compares

	Synthesis	LLM threat-modelers (STRIDE-GPT, Threat Forge)	SAST (Semgrep, CodeQL)	Manual STRIDE
Builds a STRIDE model from repos + design docs	✅	doc/description only	❌ (code only)	✅ (by hand)
Control IDs that resolve — hallucinated CWE/OWASP rejected	✅	⚠️ ungrounded	✅	✅
Reachability triage (cut unreachable noise)	✅	❌	partial	✅ (by hand)
Proposes a fix	✅	❌	some autofix	❌
Proves the fix — replays the exploit, asserts it no longer fires	✅	❌	❌	❌
Improves with use (local skill flywheel)	✅	❌	rules are static	n/a
MCP-first — any agent calls the same loop	✅	❌	❌	❌
Open source, runs fully local/offline	✅ (Apache-2.0)	varies	✅	n/a

The row nobody else checks for free is proves the fix. Detection is commoditized; the agentic loop + sandbox replay is where Synthesis is different.

Quickstart

Try it in 60 seconds (no key, no model)

See the whole pipeline run end-to-end on templated fixtures — zero setup, nothing to configure. (Loudly labeled demo: true; not a real scan — see Models.)

pip install synthesis-engine
echo "a REST API with JWT + an LLM agent + postgres, file upload" \
  | synthesis analyze --doc - --mode fix --test
# from a repo clone you can also just run:  make demo

Prefer the browser? Open this repo in GitHub Codespaces (green Code button → Codespaces → Create) — the devcontainer installs everything and runs make demo on first boot.

pip

pip install 'synthesis-engine[mcp]'   # engine + MCP server
# pip install 'synthesis-engine[local]'   # + bundled local model (no API key needed)

export ANTHROPIC_API_KEY=sk-ant-...   # a real scan needs a model — pick one:
# or:  export OPENAI_BASE_URL=... OPENAI_API_KEY=...   (any OpenAI-compatible / local server)
# or:  pip install 'synthesis-engine[local]' && export SYNTHESIS_USE_LOCAL=1

synthesis analyze https://github.com/org/api https://github.com/org/worker \
  --doc design.md --mode fix --focus "unauthenticated peer; ransomware IT→OT"

Docker

cp .env.example .env          # add a key, or set SYNTHESIS_USE_LOCAL=1 for the local model
docker compose run --rm synthesis analyze https://github.com/org/repo --mode fix

Make (less typing)

make dev          # editable install + dev extras
make demo         # see the machinery in TEST mode (no key)
make test lint    # offline tests + ruff
make scan-skills  # the skill injection-scan gate
make build        # wheel (verifies skills are packaged)
make serve        # start the MCP server (stdio)

Storage note: the local Intent Graph is SQLite — keep SYNTHESIS_DB on a real local disk, not a network/FUSE mount (file locking).

Models / providers

The engine tries providers in this order; the first available wins:

Order	Provider	How	Notes
1	Anthropic	`ANTHROPIC_API_KEY`	best quality. Claude Fable 5 (`claude-fable-5`) recommended for threat modeling; Mythos 5 (`claude-mythos-5`, cyber-class) via the Project Glasswing preview. Also Sonnet/Opus/Haiku.
2	OpenAI-compatible	`OPENAI_BASE_URL` + `OPENAI_API_KEY`	vLLM, Ollama, LM Studio, local servers
3	Bundled local model	`pip install '.[local]'` + `SYNTHESIS_USE_LOCAL=1`	Default Qwen3-4B-Instruct (Apache-2.0, ~2.5GB), pulled from HF Hub at a pinned revision on first use and cached. Security-domain upgrade: `SYNTHESIS_LOCAL_MODEL=foundation-sec` (Foundation-Sec-8B, Cisco, ~4.9GB) or `foundation-sec-apache` (Apache base). Full override via `SYNTHESIS_LOCAL_REPO/FILE/REVISION`.
—	Test mode	`--test` / `SYNTHESIS_TEST_MODE=1`	CI/UI demo only — deterministic templated fixtures, not a real scan. Refuses to masquerade: every output is stamped `demo: true` + a warning.

If none is configured, a real run returns an error telling you how to fix it — it will not silently emit fixtures. The local model gives a genuine, free, fully offline scan; test mode does not.

Visual report & local UI

Threat modeling is visual. Every model renders as a self-contained HTML report — data-flow diagram (Mermaid, with trust-zone subgraphs and attacker/asset/exposed highlighting), STRIDE-per-element coverage matrix, threat actors, trust zones, OWASP coverage, and a threat table with a per-threat fix drawer (mitigation, code diff, honesty-gate badges).

synthesis analyze ./arch.md --html report.html     # write a report alongside a scan
synthesis report <model_id> --open                 # render a stored model + open it
synthesis ui                                       # local web app → http://127.0.0.1:8765

synthesis ui is a stdlib-only web app (no framework, localhost-bound), styled to the UnitOne design system, with a left sidebar:

Threat Models — list of every model you've run (persisted), and an Add Threat Model Source form to scan N repos + N docs together. Open one to see the full visual report (DFD, 3-column threat-analysis overview, STRIDE matrix, threats).
Fix Queue — every generated fix across models: the diff, the skill that produced it, the component it touches, and security/behavior-verified badges.
Learn · Skills — the skill auto-evolution surface: each skill's confidence cap and accept/reject history (the flywheel state).

Use it from an agent (MCP)

Start the server: synthesis serve (stdio). Register it with your agent, e.g. for Claude Code / Cursor:

{
  "mcpServers": {
    "synthesis": { "command": "synthesis", "args": ["serve"] }
  }
}

Tools exposed:

Tool	What it does
`threat_model(repos, doc, mode, focus)`	generate a model (DFD, STRIDE matrix, threats, fixes)
`fix(model_id, threat_id)`	run the fixer on one threat → diff + sandbox validation (+ PR when the GitHub App is configured)
`get_model(model_id)`	fetch a stored model
`accept_threat(model_id, threat_id, accepted)`	human verdict → calibrates the skill (flywheel)
`list_skills()` / `skill_stats()`	the live skill index / current confidence caps

How it works

Phase 0  INGEST    N repos + doc → merged context + DFD seed
Phase A  PLAN      planner inventories components, SELECTS skills from the index
Phase B  ANALYZE   one read-only reviewer per (component × skill), in parallel,
                   emitting STRIDE entries with validated control IDs
Phase C  MERGE     dedupe, validate IDs resolve, reachability noise-cut
Phase D  CRITIQUE  challenge high-sev threats; downgrade unreachable ones
Phase E  FIX       propose diff → sandbox validate → characterization test → PR-ready output

The honesty gate. We own the security regression (replay the PoC, assert it now fails, commit it as a permanent test). We do not own functional regression — that needs your test suite. A fix that passes security but has no functional coverage ships as "security-verified, behavior-UNVERIFIED — human review required," never dressed up as fully tested.

Sandbox tiers. Rungs 1–3 (build / lint+SAST / tests) run in a network-isolated Docker container on your own code. Rung 4 (exploit no longer reproduces) and rung 5 (no behavioral regression) want hardware-grade isolation (gVisor / Firecracker per invocation) to safely run untrusted, agent-generated exploit reproducers — that hardened runtime is the managed tier. Same interface, swappable runtime.

Skills & auto-evolution

Capabilities are skills — agentskills.io-style markdown in skills/ with a frontmatter header. Adding a SKILL.md adds a capability; the planner reads the live index and selects skills per component (add an LLM component → ai-security/llm-top-10 fires, no code change).

After each run, outcomes calibrate each skill's local confidence_cap (synthesis stats shows the state), and accepted threats on similar components warm-start the next run. This is the flywheel — and it runs locally, so "auto-evolving" is true for a self-hoster on day one.

synthesis skills      # the index
synthesis stats       # confidence caps move as you accept/reject findings
synthesis accept tm-abc123 t-def456 --reject   # feed a verdict back

Open-core

	Tier	Why
Loop, skills, fixer, local Intent Graph + calibration	OSS (Apache-2.0)	the tool genuinely improves on your codebase
Federated Intent Graph (every customer's outcomes improve every customer's planner)	Paid	the network effect — the actual moat
Managed gVisor/Firecracker exploit-tier (rungs 4–5 at scale)	Paid	safely running untrusted exploit code is an operational liability
Pre-warmed, signed skill releases	Paid	OSS starts cold; managed ships pre-trained

The license protects nothing — the pooled memory does. federated_warm_start() in memory.py is the seam a managed client overrides; the OSS build returns local-only.

Status

v0.1 — engine, MCP server, CLI, skills, local Intent Graph, Docker sandbox (rungs 1–3), and the provider ladder (hosted / OpenAI-compatible / bundled local model / test mode) all working; tests run offline with pip install '.[dev]' && pytest (no key, no model). Real PRs need a GitHub App (synthetic PRs until configured); rung 4–5 need the managed runtime. Built and maintained by UnitOne.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.devcontainer		.devcontainer
.github		.github
docs		docs
examples		examples
scripts		scripts
synthesis_engine		synthesis_engine
tests		tests
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE_REVIEW.md		ARCHITECTURE_REVIEW.md
CHANGELOG.md		CHANGELOG.md
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synthesis

How it compares

Quickstart

Try it in 60 seconds (no key, no model)

pip

Docker

Make (less typing)

Models / providers

Visual report & local UI

Use it from an agent (MCP)

How it works

Skills & auto-evolution

Open-core

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Synthesis

How it compares

Quickstart

Try it in 60 seconds (no key, no model)

pip

Docker

Make (less typing)

Models / providers

Visual report & local UI

Use it from an agent (MCP)

How it works

Skills & auto-evolution

Open-core

Status

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages