docs(skill-eval): ailly-skill-eval authoring skill and guide#195
Merged
Conversation
Implement the three invocation `script` checkers (newtype, configuring-logging, emitting-logs) as strict TypeScript structural validators derived from each SKILL.md "Common Mistakes" section, replacing the placeholder checkers. Pin the candidate project to TypeScript via context/AGENTS.md so checkers target a known grammar. Turn the README's falsification claim into a checked invariant: ci.sh asserts `improved > 0` for the invocation arm and hard-fails without an API key rather than silently skipping. Make checkers idiom-robust (fenced-code extraction, expression-bodied arrow constructors, brand-helper variants). Includes cleanup pass: clippy clean, fixed stale assemble system-message counts, extracted deferred decisions (no-key CI policy, non-TS checkers) to TASKS.md. Co-Authored-By: "Ailly <[email protected]>" Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Self-contained skill at skills/ailly-skill-eval/ that teaches the patterns-eval method generally: a holistic SKILL.md (project anatomy, the discovery/invocation axes, the assertion palette, falsification as an optional layer, the assemble/run/eval/report workflow) and a long-form references/method.md walking each part with its rationale, sourced to what e2e/patterns-eval/ actually builds. Both files link out to DESIGN.md for schema rather than restating it. The guide names the built falsification arm baseline.yaml per the Fidelity rule and states the improved > 0 && regressed == 0 gate with the null-result reading verbatim. Tracks the new skills/ tree in .gitignore. Feature test tests/skill_eval_guide.rs asserts the artifact is reconstructable from its directory alone; queues the review-and-refactor follow-up in TASKS. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]> Co-Authored-By: "Ailly <[email protected]>"
…o TASKS Collapse nested `if`/`if let` into a single guard in `link_targets` per clippy::collapsible_if, and suppress `too_many_lines` on the monolithic feature test with an `#[expect]` + reason explaining why splitting would harm the assertion-to-metric mapping. Adds two Follow-up entries from the design's deferred decisions: patterns-eval README reconciliation (baseline.yaml vs. invocation-baseline) and the ailly-skill-eval paired-skills split trigger. Co-Authored-By: Ailly <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
skills/ailly-skill-eval/SKILL.mdandskills/ailly-skill-eval/references/method.mdso a maintainer handed only that directory can reconstruct the skill-eval method end to end.tests/skill_eval_guide.rs(collapsible_ifandtoo_many_lines).TASKS.mdFollow-ups (patterns-eval README reconciliation and the paired-skills split trigger).Test plan
cargo test --test skill_eval_guidepassesmise run lintpasses (no warnings)TASKS.mdFollow-ups section containspatterns-eval README reconciliationandailly-skill-eval paired-skills split🤖 Generated with Claude Code