Skip to content

docs(skill-eval): ailly-skill-eval authoring skill and guide#195

Merged
DavidSouther merged 3 commits into
main_twofrom
2026-06-01-A-skill-testing-docs
Jun 2, 2026
Merged

docs(skill-eval): ailly-skill-eval authoring skill and guide#195
DavidSouther merged 3 commits into
main_twofrom
2026-06-01-A-skill-testing-docs

Conversation

@DavidSouther

Copy link
Copy Markdown
Owner

Summary

  • Adds skills/ailly-skill-eval/SKILL.md and skills/ailly-skill-eval/references/method.md so a maintainer handed only that directory can reconstruct the skill-eval method end to end.
  • Fixes two clippy findings in tests/skill_eval_guide.rs (collapsible_if and too_many_lines).
  • Extracts two deferred decisions from the design doc into TASKS.md Follow-ups (patterns-eval README reconciliation and the paired-skills split trigger).

Test plan

  • cargo test --test skill_eval_guide passes
  • mise run lint passes (no warnings)
  • TASKS.md Follow-ups section contains patterns-eval README reconciliation and ailly-skill-eval paired-skills split

🤖 Generated with Claude Code

DavidSouther and others added 3 commits June 1, 2026 15:05
Implement the three invocation `script` checkers (newtype, configuring-logging,
emitting-logs) as strict TypeScript structural validators derived from each
SKILL.md "Common Mistakes" section, replacing the placeholder checkers. Pin the
candidate project to TypeScript via context/AGENTS.md so checkers target a known
grammar. Turn the README's falsification claim into a checked invariant: ci.sh
asserts `improved > 0` for the invocation arm and hard-fails without an API key
rather than silently skipping. Make checkers idiom-robust (fenced-code
extraction, expression-bodied arrow constructors, brand-helper variants).

Includes cleanup pass: clippy clean, fixed stale assemble system-message
counts, extracted deferred decisions (no-key CI policy, non-TS checkers) to
TASKS.md.

Co-Authored-By: "Ailly <[email protected]>"
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Self-contained skill at skills/ailly-skill-eval/ that teaches the
patterns-eval method generally: a holistic SKILL.md (project anatomy,
the discovery/invocation axes, the assertion palette, falsification as
an optional layer, the assemble/run/eval/report workflow) and a
long-form references/method.md walking each part with its rationale,
sourced to what e2e/patterns-eval/ actually builds.

Both files link out to DESIGN.md for schema rather than restating it.
The guide names the built falsification arm baseline.yaml per the
Fidelity rule and states the improved > 0 && regressed == 0 gate with
the null-result reading verbatim.

Tracks the new skills/ tree in .gitignore. Feature test
tests/skill_eval_guide.rs asserts the artifact is reconstructable from
its directory alone; queues the review-and-refactor follow-up in TASKS.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Co-Authored-By: "Ailly <[email protected]>"
…o TASKS

Collapse nested `if`/`if let` into a single guard in `link_targets` per
clippy::collapsible_if, and suppress `too_many_lines` on the monolithic
feature test with an `#[expect]` + reason explaining why splitting would
harm the assertion-to-metric mapping.

Adds two Follow-up entries from the design's deferred decisions:
patterns-eval README reconciliation (baseline.yaml vs. invocation-baseline)
and the ailly-skill-eval paired-skills split trigger.

Co-Authored-By: Ailly <[email protected]>
@DavidSouther DavidSouther merged commit 5126354 into main_two Jun 2, 2026
1 of 3 checks passed
@DavidSouther DavidSouther deleted the 2026-06-01-A-skill-testing-docs branch June 2, 2026 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant