Blackbox

QuickCall instruments AI coding agents on real dev machines — every prompt, every tool call, every correction. It spots the patterns: where agents go wrong, what conventions they miss, which mistakes keep happening across sessions.

Blackbox is the engine that does the heavy lifting. Drop in session traces from Claude Code, Codex CLI, or pi.dev, and it runs them through a multi-stage LLM pipeline to pull out root causes, recurring failures, and anti-patterns. The output feeds back into QuickCall so future agent sessions start smarter.

flowchart TD
    subgraph Upload
        A["POST /analyze<br/>upload JSONL files"] --> B["Detect source<br/>_detect_source()"]
        B --> C["Normalize<br/>_normalize_file()"]
    end

    C --> D["Return 202 Accepted<br/>run_id → background task"]

    subgraph Pipeline
        P0["P0 Normalize<br/>count + index messages"]
        P1["P1 Classify<br/>LLM label each user turn<br/>batches run concurrently"]
        P2["P2 Context<br/>build windows around triggers"]
        P3["P3 Root-Cause<br/>LLM per trigger window"]
        P4a["P4a Behavior<br/>rule type + confidence"]
        P4b["P4b Cluster<br/>group recurring patterns"]
        P4c["P4c Convention<br/>dont_do / do_instead"]
        P5["P5 Aggregate<br/>deduplicate + score severity"]
        P6["P6 Scope<br/>map to repos + devs"]
    end

    subgraph Client
        POLL["GET /runs/:id<br/>poll status"]
        OUT["GET /runs/:id/findings<br/>recurring findings JSON"]
    end

    P0 --> P1
    P1 --> P2
    P2 --> P3
    P3 --> P4a
    P3 --> P4b
    P3 --> P4c
    P4a --> P5
    P4b --> P5
    P4c --> P5
    P5 --> P6
    P6 --> POLL
    POLL --> OUT

Run

uv run uvicorn src.main:app --host 0.0.0.0 --port 8000

API

All responses are JSON.

`POST /analyze` — Upload trace files

Multipart form upload. Returns immediately with a run_id. Analysis runs in background.

Request:

curl -X POST http://localhost:8000/analyze \
  -F "[email protected]" \
  -F "[email protected]"

Response (202):

{
  "run_id": "run_a3f7e2d1",
  "status": "pending",
  "message": "Analysis started"
}

Auto-detects source from file content. Override with ?source=claude_code or ?source=codex_cli or ?source=pi.

`GET /runs/{run_id}` — Check run status

Poll this until status is "done".

Response:

{
  "run_id": "run_a3f7e2d1",
  "status": "done",
  "created_at": "2026-06-02T18:05:31.106757",
  "completed_at": "2026-06-02T18:05:37.927401",
  "stages": {
    "p0_normalize": {"status": "done", ...},
    "p1_classify":  {"status": "done", ...},
    "p2_context":   {"status": "done", ...},
    "p3_rca":       {"status": "done", ...},
    "p4a_behavior": {"status": "done", ...},
    "p4b_cluster":  {"status": "done", ...},
    "p4c_convention":{"status": "done", ...},
    "p5_aggregate": {"status": "done", ...},
    "p6_scope":     {"status": "done", ...}
  }
}

Stages progress: pending → running → done / error.

`GET /runs/{run_id}/findings` — Filtered findings (recurring only)

Returns findings that appear across 2+ sessions.

curl http://localhost:8000/runs/run_a3f7e2d1/findings

Response:

[
  {
    "session_id": "sess_abc123",
    "agents_md_rule": "Use specific error handling...",
    "category": "missing_context",
    "severity": 3,
    "is_recurring": true,
    "pattern_label": "error_handling",
    ...
  }
]

`GET /runs/{run_id}/findings/all` — All findings

Same structure as above but includes total_findings, severity_distribution, category_distribution, filtered_findings (recurring subset).

`GET /runs/{run_id}/stages/{stage_name}` — Raw stage output

Access any pipeline stage directly:

p0_normalize — normalized sessions with message counts
p1_classify — per-session message classifications
p5_aggregate — full findings + metadata

`GET /health`

{"status": "ok", "model": "kimi-k2.6"}

Pipeline Stages

Stage	What it does
p0_normalize	Parse uploaded JSONL → unified message format
p1_classify	Label each user message (question, new_task, correction, etc.)
p2_context	Build context windows around trigger turns
p3_rca	LLM root-cause analysis on triggers
p4a_behavior	Classify findings by rule type
p4b_cluster	Group recurring findings into patterns
p4c_convention	Identify wrong_approach conventions
p5_aggregate	Deduplicate, score severity, filter recurring
p6_scope	Map findings to repos / developers

Supported Formats

Source	Detection
Claude Code	JSONL with `"type":"user"`, `"uuid"`, `"sessionId"`, `"version"`
Codex CLI	JSONL with rollout filename or `"type":"conversation"`
pi.dev	JSONL with `"type":"session"`, `"type":"message"`

Environment

OPENAI_API_KEY=sk-...
OPENAI_BASE_URL=https://api.moonshot.ai/v1
MODEL=kimi-k2.6
CONCURRENCY=30

Tests

uv run pytest

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
docs		docs
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Blackbox

Run

API

`POST /analyze` — Upload trace files

`GET /runs/{run_id}` — Check run status

`GET /runs/{run_id}/findings` — Filtered findings (recurring only)

`GET /runs/{run_id}/findings/all` — All findings

`GET /runs/{run_id}/stages/{stage_name}` — Raw stage output

`GET /health`

Pipeline Stages

Supported Formats

Environment

Tests

What you get

Docs

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Blackbox

Run

API

POST /analyze — Upload trace files

GET /runs/{run_id} — Check run status

GET /runs/{run_id}/findings — Filtered findings (recurring only)

GET /runs/{run_id}/findings/all — All findings

GET /runs/{run_id}/stages/{stage_name} — Raw stage output

GET /health

Pipeline Stages

Supported Formats

Environment

Tests

What you get

Docs

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /analyze` — Upload trace files

`GET /runs/{run_id}` — Check run status

`GET /runs/{run_id}/findings` — Filtered findings (recurring only)

`GET /runs/{run_id}/findings/all` — All findings

`GET /runs/{run_id}/stages/{stage_name}` — Raw stage output

`GET /health`

Packages