Skip to content

quickcall-dev/blackbox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Blackbox

QuickCall instruments AI coding agents on real dev machines — every prompt, every tool call, every correction. It spots the patterns: where agents go wrong, what conventions they miss, which mistakes keep happening across sessions.

Blackbox is the engine that does the heavy lifting. Drop in session traces from Claude Code, Codex CLI, or pi.dev, and it runs them through a multi-stage LLM pipeline to pull out root causes, recurring failures, and anti-patterns. The output feeds back into QuickCall so future agent sessions start smarter.

flowchart TD
    subgraph Upload
        A["POST /analyze<br/>upload JSONL files"] --> B["Detect source<br/>_detect_source()"]
        B --> C["Normalize<br/>_normalize_file()"]
    end

    C --> D["Return 202 Accepted<br/>run_id → background task"]

    subgraph Pipeline
        P0["P0 Normalize<br/>count + index messages"]
        P1["P1 Classify<br/>LLM label each user turn<br/>batches run concurrently"]
        P2["P2 Context<br/>build windows around triggers"]
        P3["P3 Root-Cause<br/>LLM per trigger window"]
        P4a["P4a Behavior<br/>rule type + confidence"]
        P4b["P4b Cluster<br/>group recurring patterns"]
        P4c["P4c Convention<br/>dont_do / do_instead"]
        P5["P5 Aggregate<br/>deduplicate + score severity"]
        P6["P6 Scope<br/>map to repos + devs"]
    end

    subgraph Client
        POLL["GET /runs/:id<br/>poll status"]
        OUT["GET /runs/:id/findings<br/>recurring findings JSON"]
    end

    P0 --> P1
    P1 --> P2
    P2 --> P3
    P3 --> P4a
    P3 --> P4b
    P3 --> P4c
    P4a --> P5
    P4b --> P5
    P4c --> P5
    P5 --> P6
    P6 --> POLL
    POLL --> OUT
Loading

Run

uv run uvicorn src.main:app --host 0.0.0.0 --port 8000

API

All responses are JSON.

POST /analyze — Upload trace files

Multipart form upload. Returns immediately with a run_id. Analysis runs in background.

Request:

curl -X POST http://localhost:8000/analyze \
  -F "[email protected]" \
  -F "[email protected]"

Response (202):

{
  "run_id": "run_a3f7e2d1",
  "status": "pending",
  "message": "Analysis started"
}

Auto-detects source from file content. Override with ?source=claude_code or ?source=codex_cli or ?source=pi.


GET /runs/{run_id} — Check run status

Poll this until status is "done".

Response:

{
  "run_id": "run_a3f7e2d1",
  "status": "done",
  "created_at": "2026-06-02T18:05:31.106757",
  "completed_at": "2026-06-02T18:05:37.927401",
  "stages": {
    "p0_normalize": {"status": "done", ...},
    "p1_classify":  {"status": "done", ...},
    "p2_context":   {"status": "done", ...},
    "p3_rca":       {"status": "done", ...},
    "p4a_behavior": {"status": "done", ...},
    "p4b_cluster":  {"status": "done", ...},
    "p4c_convention":{"status": "done", ...},
    "p5_aggregate": {"status": "done", ...},
    "p6_scope":     {"status": "done", ...}
  }
}

Stages progress: pendingrunningdone / error.


GET /runs/{run_id}/findings — Filtered findings (recurring only)

Returns findings that appear across 2+ sessions.

curl http://localhost:8000/runs/run_a3f7e2d1/findings

Response:

[
  {
    "session_id": "sess_abc123",
    "agents_md_rule": "Use specific error handling...",
    "category": "missing_context",
    "severity": 3,
    "is_recurring": true,
    "pattern_label": "error_handling",
    ...
  }
]

GET /runs/{run_id}/findings/all — All findings

Same structure as above but includes total_findings, severity_distribution, category_distribution, filtered_findings (recurring subset).


GET /runs/{run_id}/stages/{stage_name} — Raw stage output

Access any pipeline stage directly:

  • p0_normalize — normalized sessions with message counts
  • p1_classify — per-session message classifications
  • p5_aggregate — full findings + metadata

GET /health

{"status": "ok", "model": "kimi-k2.6"}

Pipeline Stages

Stage What it does
p0_normalize Parse uploaded JSONL → unified message format
p1_classify Label each user message (question, new_task, correction, etc.)
p2_context Build context windows around trigger turns
p3_rca LLM root-cause analysis on triggers
p4a_behavior Classify findings by rule type
p4b_cluster Group recurring findings into patterns
p4c_convention Identify wrong_approach conventions
p5_aggregate Deduplicate, score severity, filter recurring
p6_scope Map findings to repos / developers

Supported Formats

Source Detection
Claude Code JSONL with "type":"user", "uuid", "sessionId", "version"
Codex CLI JSONL with rollout filename or "type":"conversation"
pi.dev JSONL with "type":"session", "type":"message"

Environment

OPENAI_API_KEY=sk-...
OPENAI_BASE_URL=https://api.moonshot.ai/v1
MODEL=kimi-k2.6
CONCURRENCY=30

Tests

uv run pytest

What you get

  1. POST /analyzerun_id (immediate)
  2. Poll GET /runs/{run_id} until status: "done"
  3. Fetch GET /runs/{run_id}/findings for actionable recurring issues

Docs

License

Apache 2.0 — see LICENSE.

About

Analysis engine inside QuickCall — ingests AI coding session traces, runs multi-stage LLM pipeline, surfaces root causes and recurring failure patterns

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages