Name	Name	Last commit message	Last commit date
parent directory ..
.claude-plugin	.claude-plugin
.codex-plugin	.codex-plugin
agents	agents
commands	commands
scripts	scripts
skills/evaluation-methodology	skills/evaluation-methodology
src/plugin_eval	src/plugin_eval
tests	tests
README.md	README.md
pyproject.toml	pyproject.toml
uv.lock	uv.lock

Name

Last commit message

Last commit date

skills/evaluation-methodology

plugin-eval

Three-layer quality evaluation framework for Claude Code plugins.

Quick Start

cd plugins/plugin-eval
uv sync

# Evaluate a skill (static only, instant)
uv run plugin-eval score path/to/skill --depth quick

# Evaluate with LLM judge (~30s)
uv run plugin-eval score path/to/skill --depth standard

# Full certification (all layers, ~5 min)
uv run plugin-eval certify path/to/skill

Layers

Static Analysis — Structural checks, anti-pattern detection. Instant, free.
LLM Judge — Semantic evaluation (triggering, orchestration, output, scope). ~30s, 4 calls.
Monte Carlo — Statistical reliability via 50–100 simulated runs. ~2–5 min.

Commands

CLI	Claude Code	Description
`plugin-eval score`	`/eval`	Score a plugin or skill
`plugin-eval certify`	`/certify`	Full certification with badge
`plugin-eval compare`	`/compare`	Head-to-head comparison
`plugin-eval init`	—	Build corpus for Elo ranking

Documentation

See docs/plugin-eval.md for the full reference: layers, dimensions, scoring formula, anti-patterns, statistical methods, and project structure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

plugin-eval

Quick Start

Layers

Commands

Documentation

FilesExpand file tree

plugin-eval

Directory actions

More options

Directory actions

More options

Latest commit

History

plugin-eval

Folders and files

parent directory

README.md

plugin-eval

Quick Start

Layers

Commands

Documentation