Version: 1.0.0 Effective: 2026-02-26 Maintained by: jp-cruz/LegionForge
LegionForge is a security-native AI agent framework. The following threats are in scope and actively defended against in the codebase:
| Threat | Defense | Implementation |
|---|---|---|
| Direct prompt injection | 24-pattern regex detector + adaptive Guardian rules | src/security/core.py:detect_injection(), Guardian _check_6 |
| Indirect prompt injection | RAG provenance scoring, trust threshold; document_summarize uses delimited content (<external_content> tags + SystemMessage boundary) |
src/database.py:store_document_with_provenance(), src/agents/researcher.py:document_summarize() |
| Tool poisoning / rug-pull | SHA-256 hash validation at registration + Ed25519 signing | src/security/core.py:verify_tool_before_invocation() |
| Tool revocation bypass | 10-second TTL revocation cache in Guardian sidecar | src/security/guardian.py:_check_0_tool_revocation() |
| Capability amplification | Negative capability list enforced by Guardian | src/security/guardian.py:_check_2_capability_boundary() |
| Privilege escalation | JWT task tokens: child capabilities ⊆ parent capabilities | src/security/acl.py:derive_task_token() |
| TOCTOU (time-of-check / time-of-use) | approved_snapshot verified post-execution in SecureToolNode |
src/base_graph.py:SecureToolNode |
| Resource bomb / economic DOS | Pre-execution token cost estimator + rate limiter | src/rate_limiter.py, src/safeguards.py |
| Credential theft | macOS Keychain storage; PII redaction from all outbound calls | src/security/core.py:sanitize_output() |
| Audit log tampering | SHA-256 hash chain on audit_log table; verified on startup — tamper detection halts startup (RuntimeError) |
src/database.py:verify_audit_log_chain(), init_db() |
| Supply chain | AI-BOM; Ed25519-signed crystallized tool manifests | src/tools/signing.py |
| Agent sequence violation | Sequence contracts registered per-agent; checked at every tool call | src/security/guardian.py:_check_4_sequence() |
| Crystallization bypass | AST guards (subscript, MRO traversal, globals) in pre-HITL analyzer | src/tools/crystallization_analyzer.py |
These are accepted risks for local development that must be resolved before the public 1.0 release. They are tracked here so they are not forgotten.
Severity: Medium (local dev) → High (any shared or internet-accessible deployment)
Status: ✅ Resolved 2026-03-04 (PR #212 + PR #213).
Resolution:
pg_hba.conf updated — Unix socket connections use peer auth (OS username match, no
password required for CLI psql); TCP connections use scram-sha-256:
local all all peer
host all all 127.0.0.1/32 scram-sha-256
host all all ::1/128 scram-sha-256
Strong passwords generated for jp and legionforge_app roles; stored in ~/.pgpass
(chmod 0600). macOS Keychain is attempted but not required (subprocess ACL blocks
programmatic reads/writes — ~/.pgpass is the reliable credential path).
_get_postgres_password() in src/database.py now raises RuntimeError if no credential
is found, rather than silently returning an empty password. New-developer opt-in:
export POSTGRES_TRUST_AUTH=true before make db-init (PR #213).
See also: src/database.py:_get_postgres_password(), ~/.pgpass.
- Embedding-level semantic poisoning — RAG poisoning at the vector level is an open research problem. Provenance scoring and trust-threshold flagging exist; embedding-level anomaly detection is deferred.
- Transitive Python dependency vulnerabilities —
pip-audit/ hash pinning is accepted residual risk; remediation via Dependabot alerts. - Dependabot #4 —
langchain-coreSSRF (LOW, accepted risk) — CVE affectsChatOpenAI.get_num_tokens_from_messages()when called withimage_urlmessage parts. LegionForge never calls this method with image content (text-only agents). Fix requires migrating the entire langchain stack from 0.3.x → 1.x, which is a planned Phase 9 upgrade. Risk is accepted for Phase 8; tracked in PHASE_PLAN.md Phase 9 prerequisites. - GGUF model integrity —
make verify-modelsprints SHA256 hashes for pinning.gguf_sha256: ""in the hardware profile skips model integrity until the operator pins.
This policy was evaluated against:
- NIST SP 800-61r3 §3.2.2 "Containment Strategy" — automated containment is appropriate when delay in response causes measurable additional damage (e.g., data exfiltration, privilege persistence, lateral movement).
- MITRE ATT&CK for Enterprise — Privilege Escalation (TA0004) and Lateral Movement (TA0008) techniques warrant immediate containment; Reconnaissance (TA0043) and Collection (TA0009) typically warrant logging + alerting.
- OWASP ASVS v4.0 §11.1.6 — automated session termination on detected attack patterns.
The agent run is terminated immediately. No further tool calls are made. The event is logged
to threat_events and the audit log.
| Trigger | MITRE Tactic |
|---|---|
| Command/code injection detected in tool args | Execution (TA0002) |
| Self-probe detected (agent querying its own credentials or config) | Discovery (TA0007) |
| Privilege escalation attempt (child token exceeds parent capabilities) | Privilege Escalation (TA0004) |
| TOCTOU mismatch (post-exec snapshot differs from approved snapshot) | Defense Evasion (TA0005) |
| Guardian sidecar unavailable (fail-safe: halt, never fail-open) | — |
CRITICAL-severity finding in pentest run with stop_on_critical=True |
— |
Rationale: These attacks — if permitted to continue — cause immediate, irreversible harm: injected code executes, escalated privileges persist, or audit integrity is lost. Delay causes damage (NIST SP 800-61r3 §3.2.2).
The tool call is blocked. The event is logged to threat_events. The agent continues
running but cannot invoke the blocked tool/capability. The operator is alerted via the
threat_events table and /status endpoint.
| Trigger | MITRE Tactic |
|---|---|
| Injection pattern detected in user input (not in tool args) | Initial Access (TA0001) |
| Credential probe (attempt to read keys not in approved scope) | Credential Access (TA0006) |
| Rate limit exceeded (token budget at ≥ 80% of daily cap) | Impact (TA0040) |
| Agent sequence contract violation | Defense Evasion (TA0005) |
| Unregistered tool invocation attempt | Execution (TA0002) |
| Revoked tool invocation attempt | Execution (TA0002) |
Rationale: These events are significant but allow safe continuation. The attacker does not gain additional capability from the blocked call, and continued logging provides richer forensic data. Halting on every probe would create excessive false-positive disruption (NIST SP 800-61r3 §3.2.3 "Eradication vs. Continued Monitoring").
The tool call succeeds with reduced fidelity. The event is logged.
| Trigger | Degradation |
|---|---|
| Local LLM unavailable | Fall back to cloud API or simplified response |
| Token budget at 100% | Return cached/stubbed response; no new LLM calls |
| Non-critical external tool error | Return error message to agent; agent decides next step |
LegionForge ships a built-in automated red-team agent (src/agents/pentest_agent.py).
Current baseline (Phase 6):
- 24 attack functions across 8 classes (3 variants each)
- Expected result on clean deployment: 0 bypasses, 24 defenses held
- Run:
make pentest(requires Docker Desktop + PostgreSQL)
When a bypass is found and a rule is approved via
POST /pentest/rules/{finding_id}/approve, it is automatically promoted into Guardian's
enforcement pipeline via promote_pentest_rule_to_threat_rule() and takes effect within
10 seconds — no Guardian restart required.
If you discover a security vulnerability in LegionForge:
- DO NOT open a public GitHub Issue with exploit details.
- Open a GitHub Issue at jp-cruz/LegionForge
with the title prefix
[SECURITY]and a brief, non-exploitable description. - We will acknowledge within 72 hours and open a private channel for details.
- Disclosure window: 90 days from acknowledgment, or sooner if a fix is released.
- Credit will be given in the release notes unless you prefer anonymity.
- Authentication/authorization bypass in Guardian or the health endpoints
- Injection bypasses not caught by the 24-pattern detector
- Audit log hash chain manipulation
- Ed25519 key material exposure
- JWT task token forgery
- Attacks requiring physical access to the operator's machine
- Social engineering of the operator
- Theoretical vulnerabilities with no practical exploit path
- Issues in Ollama, PostgreSQL, or Docker that are upstream responsibility
- Primary: GitHub Issues — jp-cruz/LegionForge
- Owner: John Paul "Jp" Cruz (@jp-cruz)
Injection detection uses a two-tier model for tool argument scanning
(SecureToolNode step 4a). User-input scanning (run_* functions) is
separate and controlled by prompt_injection_guard.
Tier 1 — HALT-WORTHY (force_end=True): Unambiguous injection attempts with essentially zero legitimate use in tool arguments. Any of these in an LLM-generated tool arg indicates the agent context is compromised.
| Pattern | Example trigger |
|---|---|
ignore all previous instructions |
Classic override |
disregard all previous |
Override variant |
forget all previous instructions |
Override variant |
override all previous instructions |
Override variant |
jailbreak |
Literal jailbreak keyword |
dan mode |
DAN activation |
dan 2.0, dan 11.0, etc. |
Numbered DAN variants |
enable/activate/unlock … mode |
Mode activation |
reveal/show/print your system prompt |
System prompt exfiltration |
what are your instructions |
Self-probe |
<system>, <instruction> |
XML injection delimiters |
[INST], [/INST] |
Llama-format injection |
<|im_start|>, <|im_end|> |
ChatML injection |
Tier 2 — LOG-ONLY (INJECTION_DETECTED, action_taken=LOGGED, confidence=0.5):
Real injection signals that also appear in legitimate research and educational
content. The event is logged to threat_events and the run continues.
Examples: act as, pretend you are, simulate being, roleplay as,
developer mode, from now on you must, hypothetically speaking,
for educational/research purposes, imagine you were, decode from base64.
Trade-off accepted: Tier 2 false positives are possible (e.g., a legitimate research query about why LLMs comply with adversarial instructions could contain "hypothetically speaking"). Phase 8 will replace this with a context-aware classifier that considers query intent and surrounding context.
Implementation:
src/security/core.py:_HALT_ON_INJECTION_PATTERNS— frozenset of Tier 1 patternssrc/security/core.py:has_halt_worthy_injection()— predicate used bySecureToolNode
Location: config/hardware_profiles/<profile>.yaml → security.prompt_injection_guard
What it controls: Whether user-supplied task inputs (run_* functions) are
scanned for injection patterns before being passed to the agent graph.
security:
prompt_injection_guard: true # production default — scan all task inputs
prompt_injection_guard: false # dev/test only — skip user-input scanWhat it does NOT control:
SecureToolNodetool-arg injection detection is always-on regardless of this setting. It cannot be disabled via config. This is intentional — tool args come from LLM output (not directly from the user), and a compromised context that generates Tier 1 patterns must be halted regardless of environment.
Affected run functions: run_agent(), run_researcher(), run_orchestrator(),
run_observer(), run_crystallizer(). Not run_threat_analyst() — that agent's
task is synthesized internally (see below).
Every run_* function calls both:
SafeguardedState.initial(agent_id="<name>")— setsstate["agent_id"]issue_task_token(agent_id="<name>", ...)— embeds identity in the JWT
The string passed to both MUST be identical. If they diverge, threat events
in threat_events and the JWT audit trail in audit_log will attribute the same
run to different identities, making forensic reconstruction unreliable.
| Agent | agent_id string |
|---|---|
| base_graph.py | "base_agent" |
| researcher.py | "researcher" |
| orchestrator.py | "orchestrator" |
| observer.py | "observer" |
| crystallizer.py | "crystallizer" |
| threat_analyst.py | "threat_analyst" |
New agents MUST add a row to this table and verify consistency before merging.
Rule: SafeguardedState.initial() MUST be called BEFORE sanitize_text() in
every run_* function.
Why: initial() generates the run_id UUID. If injection is detected in the
task input, log_threat_event() needs run_id to attach the event to the correct
run. Calling sanitize_text() first means injection events are logged with
run_id=None or a stale value — forensically useless.
# CORRECT — run_id available for DB logging
init = SafeguardedState.initial(agent_id="my_agent")
task, meta = sanitize_text(task, check_injection=settings.security.prompt_injection_guard)
if meta["injection_detected"]:
await log_threat_event(run_id=init["run_id"], ...)
# WRONG — run_id not yet generated when injection is detected
task, meta = sanitize_text(task)
init = SafeguardedState.initial()Exception: run_threat_analyst() has no sanitize_text() call. Its task string
is synthesized internally from a validated integer — not user-controlled text.
| Date | Change |
|---|---|
| 2026-02-26 | Initial SECURITY.md — v1.0, covers Phases 0–7 |
| 2026-02-26 | Added §"Injection Detection Architecture" — pattern tiering, prompt_injection_guard, agent_id invariant, run_id ordering rule |
| 2026-02-26 | Session 1 hardening: tool-result injection tiering (Fix 1); document_summarize content delimiter (Fix 2); GUARDIAN_REQUIRE_AUTH default → true (Fix 3); audit log tamper → RuntimeError halt (Fix 4) |
| 2026-03-02 | Added §"Known Security Gaps — Pre-1.0 Blockers": PostgreSQL trust auth accepted for local dev, documented full remediation path, flagged as v1.0 release blocker |
| 2026-03-04 | PR #210: extended exfiltration detection (3 new patterns: leak/dump/expose verbs, system message synonyms, "what were you told") + NFKC/zero-width normalization in detect_injection() |
| 2026-03-04 | PR #211: check_hitl_required() made async; DESTRUCTIVE_PATTERN events now logged to threat_events table (LOG tier confidence=0.6, HALT tier confidence=1.0) |
| 2026-03-04 | PR #212 + #213: PostgreSQL trust → scram-sha-256 migration complete; pre-v1.0 blocker closed |