Kasper is a plugin for opencode that monitors agent sessions, scores adherence to user instructions via LLM-as-judge, and injects corrective instructions into AGENTS.md and per-agent prompt files.
Unofficial plugin: This is an independent project and is not affiliated with, endorsed by, or maintained by the opencode team.
- LLM-as-Judge Scoring — Evaluates every session on 5 dimensions: instruction following, completeness, proactiveness, code quality, and communication
- Automatic Improvements — Detects recurring weaknesses and injects fixes into
AGENTS.mdor per-agent prompts (auto or manual approval) - Idle-Aware Evaluation — Sessions scored only when idle or complete, preventing partial-turn scoring
- Per-Agent Scoring — Separate aggregates and weakness profiles per agent
- Batch & Retroactive Scoring — Score past sessions via
/kasper score session <id>or bulk withlast N - Subagent Tracking — Tracks subagent calls and evaluates child sessions independently
- Real-time Status —
/kasper statusshows an In Progress banner with the current evaluation pass, weakness merge, and any pending improvements - Backups & Safety — Timestamped backups before every change; atomic writes with file locks
- Prompt Resolution — Honours opencode's
agent.<name>.prompt{file:...}and{path:...}directives; will not overwrite the wrong file
npm install @atonev/opencode-kasperRegister the plugin in your opencode.json (or in your global ~/.config/opencode/opencode.json):
{
"plugin": ["@atonev/opencode-kasper"]
}That's the entire plugin registration — there are no plugin-level options to pass on the plugin line. All kasper configuration (auto-update, model, thresholds, etc.) lives in a separate file, described in Configuration below.
Verify: Start a session and run /kasper status.
| Command | Description |
|---|---|
/kasper status [agent] |
Aggregate scores, top weaknesses, recent sessions, sparkline trend, and live In Progress banner |
/kasper score session <id> |
Evaluate a past session (last N, since YYYY-MM-DD, range X Y) |
/kasper improve [agent] |
Numbered table of improvement suggestions; pass --dry-run to preview without applying |
/kasper apply [n|all] |
Apply pending improvement (n for an index, all for everything queued) |
/kasper history [agent] |
Session history with score breakdowns |
/kasper auto on|off |
Toggle auto-apply for improvements in the current session |
/kasper migrate <name> [--show] |
Extract an inline opencode.json agent prompt to a file so kasper can edit it. With --show, just reports the current source |
/kasper reset |
Clear all state (prompts /kasper reset --force to confirm) |
/kasper help |
Show all commands |
| Tool | Description |
|---|---|
kasper_status |
Aggregate scores, per-agent breakdown, weaknesses |
kasper_improve |
Numbered improvement suggestions; supports --dry-run |
kasper_apply |
Apply by [N] index, or all |
kasper_history |
Adherence history and trends |
kasper_score_session |
Evaluate one or more sessions (single id, last N, since YYYY-MM-DD, range X Y) |
kasper_reset |
Clear all state |
Loaded from ~/.config/opencode/kasper.jsonc, .opencode/kasper.jsonc, or the kasper key in opencode.json. Project values override global values; missing fields fall back to the defaults below.
| Field | Type | Default | Description |
|---|---|---|---|
enabled |
boolean | true |
Master switch. Set to false to disable the plugin without uninstalling |
auto_update |
boolean | true |
Auto-apply improvements to AGENTS.md / agent prompts. Set to false to require /kasper apply for every change |
scoring_threshold |
number (0.0–1.0) | 0.6 |
Sessions scoring strictly below this trigger weakness detection and improvement suggestions |
model |
string | opencode/deepseek-v4-flash-free |
Provider/model used as the LLM judge. Pick a fast, cheap model |
detail_level |
minimal | standard | thorough |
standard |
How much context the judge sees. minimal is cheapest, thorough is most accurate |
| Field | Type | Default | Description |
|---|---|---|---|
min_session_messages |
integer (1–50) | 1 |
Skip sessions with fewer than this many user messages |
min_observations_for_update |
integer (1–10) | 2 |
A weakness must be observed at least this many times before an improvement is generated |
weakness_decay_days |
integer (0–365) | 30 |
After this many days without recurrence, weaknesses are forgotten. 0 disables decay |
evaluation_poll_interval_ms |
integer (1000–300000) | 10000 |
How often the background loop scans for idle sessions |
quiet |
boolean | false |
Suppress non-warning toast notifications |
| Field | Type | Default | Description |
|---|---|---|---|
scoring_retries |
integer (0–10) | 2 |
Retries when the scoring model returns invalid JSON |
scoring_timeout_ms |
integer (10000–600000) | 120000 |
Per-attempt timeout for the scoring model call |
max_score_input_chars |
integer (1000–50000) | 10000 |
Cap on input sent to the scoring model per session |
max_agent_guidance_chars |
integer (200–5000) | 1200 |
Cap on the size of each generated improvement (AGENTS.md or per-agent) |
improvement_expiry_days |
integer (0–365) | 60 |
Inactive improvements are pruned after this many days. 0 = never expire |
| Field | Type | Default | Description |
|---|---|---|---|
strict_sanitize |
boolean | true |
Reject generated improvements containing URLs, code blocks, or instruction-injection markers |
agent_prompt_inject_mode |
section | inline |
section |
How kasper writes improvements into an agent prompt file. section adds a visible ## Kasper Inferred Instructions block at the end. inline appends the guidance directly with no section header — wrapped only in <!-- kasper-injected:begin/end --> HTML comments for dedupe and rollback. AGENTS.md injection is always section style |
debug |
boolean | false |
Log SDK events and extra diagnostics. Disable in production |
| Field | Type | Default | Description |
|---|---|---|---|
state_dir |
string | .opencode/kasper (relative to cwd) |
Where session scores, state, and backups are written. Absolute path or project-relative |
~/.config/opencode/kasper.jsonc (recommended — full JSONC support including comments):
…or as a nested kasper key in opencode.json:
{
"plugin": ["@atonev/opencode-kasper"],
"kasper": {
"auto_update": true,
"scoring_threshold": 0.65,
"model": "opencode/minimax-m2.5-free"
}
}The kasper.jsonc file always wins over the kasper key in opencode.json if both exist. Project-local config overrides global config; the global config file lives at ~/.config/opencode/kasper.jsonc (or wherever XDG_CONFIG_HOME points).
Kasper follows opencode's own agent resolution rules when deciding where to read and write an agent's prompt. For an agent named <name>:
- Project
opencode.json— ifagent.<name>.promptis defined, that wins. Project config overrides global config. - Global
opencode.json— used if no project entry exists. - Convention — if no
agent.<name>is defined in either config, kasper falls back to:<projectRoot>/.opencode/agent/<name>.md<projectRoot>/.opencode/agents/<name>.md~/.config/opencode/agent/<name>.md~/.config/opencode/agents/<name>.md
The prompt value is interpreted in three ways:
- Raw string — the prompt is inline. Kasper refuses to edit it; run
/kasper migrate <name>to extract it to a file. {file:/abs/path/to/prompt.md}— the prompt is loaded from that file. Kasper reads and writes that exact file.~is expanded; relative paths resolve against the config file's directory.{path:/abs/path/to/prompt.md}— alias for{file:...}.
After migrate, the source opencode.json is rewritten to replace the inline prompt with a {file:...} directive, with comments and formatting preserved. Restart opencode for the new prompt file to take effect.
Why this matters: before this resolution existed, kasper would silently create an empty
<projectRoot>/.opencode/agents/<name>.mdwhenever the real prompt was defined via{file:...}— the only signal that anything happened was a stray stub file with the injected section. The resolver eliminates that class of bug entirely.
How kasper writes an improvement into the resolved prompt file is controlled by agent_prompt_inject_mode:
section(default) — appends a clearly labelled block:Self-documenting, easy to spot in a diff, easy to roll back via## Kasper Inferred Instructions <!-- kasper: 2026-06-08T12:00:00.000Z --> <guidance>
/kasper rollback <agent>.inline— appends the guidance directly at the end of the file with no section header:The HTML comments exist only for dedupe and rollback bookkeeping — they render as nothing in the prompt. Use this when the visible<!-- kasper-injected:begin --> <guidance> <!-- kasper-injected:end -->
## Kasper Inferred Instructionsheading is unwanted (e.g. you have a strict prompt structure that you don't want kasper touching).
Each session is scored 0.0–1.0 across five dimensions:
| Dimension | Description |
|---|---|
instruction_following |
Did the agent do exactly what was asked? |
completeness |
Did the agent fully complete the task? |
proactiveness |
Did the agent act appropriately? |
code_quality |
Quality and maintainability of code produced |
communication |
Clarity and helpfulness of explanations |
Scores display as 🟢 ≥80%, 🟡 ≥60%, 🔴 <60%. The /kasper status command shows an ASCII sparkline of the last 7 session scores.
- Observe — Hooks on
chat.messageandsession.createdaccumulate session data; configurable polling catches idle sessions - Evaluate — LLM-as-judge scores each session across 5 dimensions; large sessions split into per-pair evaluation
- Improve — Recurring weaknesses trigger suggestions (
AGENTS.mdor per-agent prompt); auto-applied or queued for review - Measure — Score delta tracking shows before/after improvement impact in
/kasper history
The /kasper status In Progress banner surfaces what's happening right now: paused state, the active evaluation pass (with elapsed time and queued session count), the cross-session weakness merge LLM call, and any pending improvements waiting for the next auto-update tick.
- Forward-looking only. Only sessions created after plugin start are auto-scored. Use
/kasper score session last <N>for retroactive batch scoring. - Current config only. Scoring uses today's
AGENTS.mdand prompts, not the versions active when the session originally ran. - Subagents. Subagent sessions are always eligible for auto-scoring. A subagent session needs only 1 user message to be picked up (primary sessions use
min_session_messages). The result is rolled up into the parent agent's per-agent stats.
bun install # Install dependencies
bun run build # Compile TypeScript
bun run typecheck # Type-check only
bun run lint # Lint with biome
bun test # 393 unit tests (387 pass, 6 skip)
bun run test:e2e # End-to-end tests (requires OPENCODE_E2E=1 and the opencode binary)MIT — see LICENSE.
{ "enabled": true, "auto_update": true, "scoring_threshold": 0.65, "model": "opencode/minimax-m2.5-free", "min_observations_for_update": 3, "quiet": true, "state_dir": "/var/lib/kasper" }