docs(skill-eval): ailly-skill-eval authoring skill and guide by DavidSouther · Pull Request #195 · DavidSouther/ailly

DavidSouther · 2026-06-02T14:41:52Z

Summary

Adds skills/ailly-skill-eval/SKILL.md and skills/ailly-skill-eval/references/method.md so a maintainer handed only that directory can reconstruct the skill-eval method end to end.
Fixes two clippy findings in tests/skill_eval_guide.rs (collapsible_if and too_many_lines).
Extracts two deferred decisions from the design doc into TASKS.md Follow-ups (patterns-eval README reconciliation and the paired-skills split trigger).

Test plan

cargo test --test skill_eval_guide passes
mise run lint passes (no warnings)
TASKS.md Follow-ups section contains patterns-eval README reconciliation and ailly-skill-eval paired-skills split

🤖 Generated with Claude Code

Implement the three invocation `script` checkers (newtype, configuring-logging, emitting-logs) as strict TypeScript structural validators derived from each SKILL.md "Common Mistakes" section, replacing the placeholder checkers. Pin the candidate project to TypeScript via context/AGENTS.md so checkers target a known grammar. Turn the README's falsification claim into a checked invariant: ci.sh asserts `improved > 0` for the invocation arm and hard-fails without an API key rather than silently skipping. Make checkers idiom-robust (fenced-code extraction, expression-bodied arrow constructors, brand-helper variants). Includes cleanup pass: clippy clean, fixed stale assemble system-message counts, extracted deferred decisions (no-key CI policy, non-TS checkers) to TASKS.md. Co-Authored-By: "Ailly <[email protected]>" Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

Self-contained skill at skills/ailly-skill-eval/ that teaches the patterns-eval method generally: a holistic SKILL.md (project anatomy, the discovery/invocation axes, the assertion palette, falsification as an optional layer, the assemble/run/eval/report workflow) and a long-form references/method.md walking each part with its rationale, sourced to what e2e/patterns-eval/ actually builds. Both files link out to DESIGN.md for schema rather than restating it. The guide names the built falsification arm baseline.yaml per the Fidelity rule and states the improved > 0 && regressed == 0 gate with the null-result reading verbatim. Tracks the new skills/ tree in .gitignore. Feature test tests/skill_eval_guide.rs asserts the artifact is reconstructable from its directory alone; queues the review-and-refactor follow-up in TASKS. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]> Co-Authored-By: "Ailly <[email protected]>"

…o TASKS Collapse nested `if`/`if let` into a single guard in `link_targets` per clippy::collapsible_if, and suppress `too_many_lines` on the monolithic feature test with an `#[expect]` + reason explaining why splitting would harm the assertion-to-metric mapping. Adds two Follow-up entries from the design's deferred decisions: patterns-eval README reconciliation (baseline.yaml vs. invocation-baseline) and the ailly-skill-eval paired-skills split trigger. Co-Authored-By: Ailly <[email protected]>

DavidSouther and others added 3 commits June 1, 2026 15:05

DavidSouther merged commit 5126354 into main_two Jun 2, 2026
1 of 3 checks passed

DavidSouther deleted the 2026-06-01-A-skill-testing-docs branch June 2, 2026 15:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(skill-eval): ailly-skill-eval authoring skill and guide#195

docs(skill-eval): ailly-skill-eval authoring skill and guide#195
DavidSouther merged 3 commits into
main_twofrom
2026-06-01-A-skill-testing-docs

DavidSouther commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DavidSouther commented Jun 2, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant