Protocol SIFT++

60-second version (for reviewers). An autonomous Windows-memory DFIR agent that is forensically defensible:

It cannot alter evidence — destructive actions don't exist in its tool server (siftpp-spoliation-test: 14/14 attacks refused, evidence hash unchanged).

It catches its own hallucinations — a Skeptic agent independently reruns tools to refute each finding (confirmed/inferred/refuted) and sends weak ones back for re-investigation. (An independent re-run even refuted its own "DKOM rootkit" confirmation, correctly flagging it as a Volatility symbol artifact — docs/examples/srl-2018-linux/.)

It keeps a tamper-evident chain of custody — a hash-chained audit log where editing any one record is detected (siftpp-tamper-test).

Windows run on a SANS APT memory image: 4 confirmed of 10 findings, 2 self-corrections, evidence integrity verified, 302-record audit (hash chain OK). Independent Linux reproduction refuted one of those confirmed findings (DKOM/rootkit) as a tool artifact, leaving 3 retained confirmed findings, precision 1.00, recall 0.75, F1 0.86 under the manual-review proxy. Second independent public Windows memory case (DigitalCorpora M57 / Pat): 4 confirmed of 9 findings, 2 self-corrections, 265-record audit (hash chain OK) — confirmed findings match the documented "Advanced Keylogger" (precision/recall 1.00 vs a public answer key). The 5-minute demo video is a live Linux-terminal screencast; the investigation recorded on camera is committed in full at docs/examples/srl-2018-live/ (4 confirmed of 8, 1 self-correction iteration, 230-record chain) — replay it: uv run siftpp-trace docs/examples/srl-2018-live/audit.jsonl --replay. See the architecture diagram and the real report + logs. See it with no API key: uv run siftpp-demo · attack it: uv run siftpp-spoliation-test.

Most autonomous-IR agents compete on speed or breadth. Almost none can prove they never touched the evidence, fewer catch their own hallucinations, and almost none publish their accuracy, including misses. Protocol SIFT++ does all three.

A self-verifying, autonomous DFIR analyst for SANS FIND EVIL! 2026, built around read-only forensic tools, adversarial verification, and tamper-evident audit logs.

Protocol SIFT++ builds on the Protocol SIFT idea and adds the missing accuracy loop: an Investigator proposes findings, a Skeptic independently tries to refute them, and weak findings are sent back for automatic reinvestigation.

What makes it different from a typical "AI finds evil" agent: every safety claim is provable by attacking it. The agent is architecturally incapable of altering evidence (14/14 destructive attempts refused, evidence SHA-256 unchanged), its audit log is tamper-evident (edit one record and verify_chain fails), and every finding cites the exact tool command plus its output hash. Forensic defensibility, not a prompt that says "be careful."

Judging scorecard — verify each claim in seconds

FIND EVIL! criterion	What Protocol SIFT++ does	Verify it yourself
Autonomous + real-time self-correction (tiebreaker)	Investigator/Skeptic loop, no human in the loop; SANS and M57 runs each forced 2 corrections, and an independent re-run refuted its own confirmed "DKOM rootkit" as a tool artifact	`uv run siftpp-trace docs/examples/srl-2018-live/audit.jsonl --replay` (the on-camera run); `docs/examples/srl-2018-linux/report.md` -> Refuted; `uv run siftpp-demo` (no key)
IR accuracy / catches its own hallucinations	Skeptic re-runs tools to refute each finding -> `confirmed`/`inferred`/`refuted`; cross-run correction removed DKOM FP; SANS manual-review proxy F1 0.86; public answer key (M57): confirmed precision/recall/F1 = 1.00	`docs/ACCURACY_REPORT.md`
Depth > breadth	Primary SANS APT case, every claim verified, *reproduced on Windows and* Linux** with byte-identical evidence; also reproduced on a second independent public Windows memory case	`docs/examples/srl-2018-base-file-memory/` + `docs/examples/srl-2018-linux/` + `docs/examples/m57-pat-2009-12-05/`
Architectural (not prompt) guardrails	Read-only MCP server: no shell, no dump/write/network tool exists - spoliation is impossible by construction	`uv run siftpp-spoliation-test` -> 14/14 refused, evidence unchanged
Audit trail to specific tool executions	Hash-chained append-only log (tamper-evident) + every finding cites command + output SHA-256	`uv run siftpp-tamper-test` -> edit detected; `verify_chain(...)` -> `(True, 302)`
Usability / docs	One command, no API key, runs on Windows + Linux/SIFT; full docs	`uv run siftpp-demo`

What most autonomous-IR agents lack — and SIFT++ has: an adversarial Skeptic that challenges every finding, a cryptographic chain of custody (hash-chained audit + per-finding evidence hashes), and safety that's provable by attacking it. A broad agent that can't verify itself just produces more unverified claims, faster.

Deliberate scope (why "narrow" is the point)

Depth over breadth (criterion #3). One primary competition case, fully verified and reproduced, then one second independent public Windows memory case to prove the loop is not case-specific. The loop is tool-, model-, and OS-agnostic (Volatility 3 today; Windows + Linux/SIFT; DeepSeek or Anthropic), so breadth is configuration, not a redesign.
Platform & framework (per the rules). Built end to end with Claude Code (under OpenClaw); the runtime is MCP — Claude Code's native tool protocol — plus an Anthropic-SDK agent loop, i.e. the "comparable agentic architecture" the rules permit. Runs on the SANS SIFT Workstation: the Linux path is verified on Ubuntu 22.04 (SIFT's base OS), and the demo screencast is recorded in that Linux terminal. SIFT quick start: docs/TRY_IT_OUT.md.
Verification > volume. We chose one case, every claim adversarially checked and reproduced across OS, because the rubric explicitly rewards depth over breadth.
Terminal- and artifact-native, not a dashboard. Output is structured (report.json) and tamper-evident (audit.jsonl) so it feeds downstream tooling and stays court-defensible — the forensic idiom, not a demo UI.
Honest accuracy. No public answer key exists for this SANS sample, so instead of a self-graded score we use adversarial verification, cross-run reproduction, and disclosed misses.

Final Case Run

Selected SANS sample:

SRL-2018 Compromised Enterprise Network / base-file-memory.7z

Final DeepSeek run on the extracted memory image:

4 confirmed of 10 findings; 2 self-correction iteration(s); evidence integrity verified.
audit log: 302 records, hash chain OK

Cross-platform correction: an independent Linux re-run refuted the Windows DKOM/rootkit confirmation as a Volatility symbol/KDBG artifact. The corrected confirmed set used for the accuracy table is therefore 3 true positives, 0 false positives, and 1 false negative: precision 1.00, recall 0.75, F1 0.86.

The key corrected finding involved ngentask.exe: the Investigator initially overstated the malware attribution, the Skeptic downgraded it twice, and the system converged on a narrower confirmed behavioral claim tied to psscan and netscan evidence.

Second independent public memory case:

DigitalCorpora M57 Pat / pat-2009-12-05.winddramimage
4 confirmed of 9 findings; 2 self-correction iteration(s); evidence integrity verified.
audit log: 265 records, hash chain OK

The M57 run shows the same correction behavior on a separate XP memory image: the Skeptic downgraded over-strong "exfiltration" and "persistence" claims, and the system converged on narrower confirmed facts such as ToolKeyloggerDLL.dll loaded into explorer.exe, ToolKeylogger.exe as an explorer.exe child, and matching pslist/psscan process sets.

Why It Matters

AI-assisted attackers can move quickly, but autonomous responders can also hallucinate. Protocol SIFT++ targets both of FIND EVIL!'s top scoring areas:

Autonomous execution with real-time self-correction.
IR accuracy and hallucination catching.

The project is intentionally narrow: one primary SANS Windows memory case plus one public independent Windows memory case, a curated Volatility 3 toolset, strong evidence citations, and a visible correction loop.

Architecture

flowchart LR
  E["Evidence image"] --> G["EvidenceGuard: sha256, size, mtime"]
  G --> M["Read-only MCP server"]
  M --> V["Volatility 3 allowlist"]
  V --> T["Tool output"]

  O["Orchestrator"] --> I["Investigator"]
  O --> S["Skeptic"]
  I --> M
  S --> M
  I --> F["Draft findings"]
  F --> S
  S --> R["confirmed / inferred / refuted"]
  R -->|weak or refuted| I

  O --> A["Audit JSONL hash chain"]
  O --> P["report.md + report.json"]

The agents never receive a generic shell. The MCP server exposes only curated read-only Volatility tools and checks evidence integrity around every tool call.

Quick Start

Install dependencies with uv, then run the deterministic local demo:

uv run siftpp-demo

Prove the forensic guardrails by attacking them (no key needed):

uv run siftpp-spoliation-test
uv run siftpp-tamper-test

Replay the investigation shown in the demo video, straight from its tamper-evident audit log (no key needed):

uv run siftpp-trace docs/examples/srl-2018-live/audit.jsonl --replay

Download the selected SANS case:

uv run siftpp-download-case

Run the real investigation with DeepSeek:

uv run siftpp-investigate `
  --provider deepseek `
  --evidence evidence\srl-2018-base-file-memory\extracted\base-file-memory.img `
  --out analysis\srl-2018-base-file-memory `
  --case-id srl-2018-base-file-memory `
  --offline `
  --max-iterations 3

Set DEEPSEEK_API_KEY in the environment or an ignored local .env file. Do not commit API keys.

Outputs

The real run writes:

analysis/srl-2018-base-file-memory/report.md
analysis/srl-2018-base-file-memory/report.json
analysis/srl-2018-base-file-memory/audit.jsonl
analysis/srl-2018-base-file-memory/mcp-server.jsonl

Verify the audit chain:

uv run python -c `
  "from protocol_siftpp.audit import verify_chain; print(verify_chain('analysis/srl-2018-base-file-memory/audit.jsonl'))"

Expected:

(True, 302)

Deliverables

Development Checks

uv run pytest
uv run ruff check .

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
docs		docs
src/protocol_siftpp		src/protocol_siftpp
tests		tests
tools		tools
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
DESIGN.md		DESIGN.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Protocol SIFT++

Judging scorecard — verify each claim in seconds

Deliberate scope (why "narrow" is the point)

Final Case Run

Why It Matters

Architecture

Quick Start

Outputs

Deliverables

Development Checks

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Protocol SIFT++

Judging scorecard — verify each claim in seconds

Deliberate scope (why "narrow" is the point)

Final Case Run

Why It Matters

Architecture

Quick Start

Outputs

Deliverables

Development Checks

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages