Skip to content

skill-validator: restore 15K aggregate cap as the real Copilot CLI skill-menu budget#803

Open
Evangelink wants to merge 4 commits into
dotnet:mainfrom
Evangelink:restore-15k-skill-budget
Open

skill-validator: restore 15K aggregate cap as the real Copilot CLI skill-menu budget#803
Evangelink wants to merge 4 commits into
dotnet:mainfrom
Evangelink:restore-15k-skill-budget

Conversation

@Evangelink

Copy link
Copy Markdown
Member

Why

The skill-validator''s per-plugin aggregate description cap (SkillProfiler.MaxAggregateDescriptionLength) had been raised 15,000 → 20,000 → 22,000, justified by a code comment asserting that 15K was "a local repo policy, NOT a documented Copilot/agentskills constraint."

That assertion is wrong. The GitHub Copilot CLI renders the model-facing <available_skills> menu under a hard 15,000-character budget (the agent SDK''s SKILL_CHAR_BUDGET, default 15e3 — confirmed in CLI 1.0.36 and 1.0.61). Skills are listed alphabetically by name and emitted with their full <description> only until the budget is exhausted; every skill past the cut-off collapses to a bare name with no description and can no longer be reliably model-activated.

Raising the validator cap didn''t add headroom — it masked silent menu truncation. This is the root cause behind the dotnet-test plugin-arm activation failures (e.g. run-tests, test-*): they sit alphabetically late, fell into the name-only overflow, and never activated in plugin eval runs even though they activate fine in isolation. Description tuning can''t fix that — the description is never shown.

What

  • SkillProfiler.MaxAggregateDescriptionLength: 22,000 → 15,000, with the comment rewritten to document the real Copilot CLI budget (and correct the prior claim).
  • Aggregate now excludes disable-model-invocation: true skills. The CLI drops those from the menu entirely, so they don''t consume the budget. This makes the cap satisfiable by hiding reference / agent-orchestrated primitives rather than only by trimming descriptions.
  • InvestigatingResults.md: documents plugin-arm-only non-activation caused by skill-menu budget overflow, and how to fix it.

⚠️ Sequencing

dotnet-test currently aggregates ~20.7K chars (the only plugin over 15K), so skill-check will fail for it until it is slimmed below the cap — via disable-model-invocation on reference/primitive skills (see #800) plus description trims. This PR should merge once dotnet-test is ≤ 15K visible. All other plugins are already under the cap (next largest: dotnet-msbuild at ~14.5K).

Verification

  • skill-validator builds clean (0 warnings).
  • Confirmed the cap is enforced and that disable-model-invocation skills are excluded from the aggregate (flagging two reference skills dropped the reported total by exactly their description lengths).

…opilot CLI skill-menu budget

The per-plugin aggregate description cap had been raised 15,000 -> 20,000
-> 22,000 under the belief that 15K was 'a local repo policy, NOT a
documented Copilot constraint'. That belief was wrong: the GitHub Copilot
CLI renders the model-facing <available_skills> menu under a hard 15,000-
char budget (the agent SDK's SKILL_CHAR_BUDGET, default 15e3, confirmed in
CLI 1.0.36 and 1.0.61). Skills are listed alphabetically and emitted with
their full <description> only until the budget is exhausted; every skill
past the cut-off collapses to a bare name with no description and can no
longer be reliably model-activated. Raising the validator cap merely
masked this silent menu truncation — e.g. dotnet-test's run-tests and
test-* skills stopped activating in plugin eval runs because they fell
into the name-only overflow.

Changes:
- SkillProfiler.MaxAggregateDescriptionLength: 22,000 -> 15,000, with the
  comment rewritten to document the real Copilot CLI budget (and correct
  the prior 'not a documented constraint' claim).
- CheckCommand aggregate now excludes skills marked
  'disable-model-invocation: true' — the CLI drops those from the menu, so
  they do not consume the budget. This makes the cap satisfiable by hiding
  reference / agent-orchestrated primitives rather than only by trimming.
- InvestigatingResults.md: document plugin-arm-only non-activation caused
  by skill-menu budget overflow, and how to fix it.

Note: dotnet-test currently exceeds 15K and must be slimmed below it
(via disable-model-invocation on reference/primitive skills plus
description trims) before this cap can go green repo-wide.

Co-authored-by: Copilot <[email protected]>
Copilot AI review requested due to automatic review settings June 22, 2026 15:37
@github-actions

Copy link
Copy Markdown
Contributor

Note

This PR is from a fork and modifies infrastructure files (eng/ or .github/).

Changes to infrastructure typically need to be submitted from a branch in dotnet/skills (not a fork) so that CI workflows run with the correct permissions and secrets.

Please consider recreating this PR from an upstream branch. If you don't have push access to dotnet/skills, ask a maintainer to push your branch for you.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Aligns skill-validator’s per-plugin aggregate description cap with the Copilot CLI’s effective 15,000-character skill-menu budget, and updates validation/docs to prevent silent <available_skills> truncation from masking plugin-arm non-activation.

Changes:

  • Restores SkillProfiler.MaxAggregateDescriptionLength to 15,000 and rewrites the rationale/commentary to reflect the Copilot CLI menu budget behavior.
  • Updates check to exclude skills with disable-model-invocation: true from the aggregate description total (matching CLI menu behavior).
  • Documents “plugin-arm-only non-activation due to menu overflow” troubleshooting steps in InvestigatingResults.md.
Show a summary per file
File Description
eng/skill-validator/src/docs/InvestigatingResults.md Adds guidance for diagnosing plugin-only non-activation caused by Copilot CLI skill-menu budget overflow and suggests mitigations.
eng/skill-validator/src/Check/SkillProfiler.cs Lowers the aggregate description cap to 15,000 and documents it as a Copilot CLI budget constraint.
eng/skill-validator/src/Check/CheckCommand.cs Excludes disable-model-invocation: true skills from the aggregate description calculation during plugin checks.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 3/3 changed files
  • Comments generated: 2

Comment thread eng/skill-validator/src/Check/CheckCommand.cs
Comment thread eng/skill-validator/src/Check/SkillProfiler.cs
@AbhitejJohn

Copy link
Copy Markdown
Collaborator

@Evangelink : Can we re-create this from a branch in the repo please?

…ion check

Address review: replace Regex.IsMatch(pattern-string) with a
[GeneratedRegex] partial method (AOT-friendly, no per-call cache lookup),
matching FrontmatterParser's style. Runs once per skill during checks.

Co-authored-by: Copilot <[email protected]>
@github-actions github-actions Bot added the waiting-on-author PR state label label Jun 22, 2026
@github-actions

Copy link
Copy Markdown
Contributor

👋 @Evangelink — this PR has 2 unresolved review thread(s). When you're ready, please address the feedback and push an update; the triage bot will pick up the next state automatically. (Add the no-stale label to silence further pings.)

@github-actions github-actions Bot added pr-state/ready-for-eval PR is mergeable and awaiting evaluation and removed waiting-on-author PR state label labels Jun 22, 2026
github-actions Bot added a commit that referenced this pull request Jun 22, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Skill Validation Results

Skill Scenario Quality Skills Loaded Overfit Verdict
system-text-json-net11 Serialize JSON in .NET 11 with PascalCase property names 4.0/5 → 5.0/5 🟢 ✅ system-text-json-net11; tools: skill ✅ 0.06 [1]
system-text-json-net11 Type-safe JsonTypeInfo access without exceptions in .NET 11 3.0/5 → 5.0/5 🟢 ✅ system-text-json-net11; tools: skill, edit, view ✅ 0.06
system-text-json-net11 Non-activation: camelCase JSON serialization on .NET 8 5.0/5 → 5.0/5 ℹ️ not activated (expected) ✅ 0.06 [2]
optimizing-ef-core-queries Optimize bulk operations with EF Core 7+ ExecuteUpdate and ExecuteDelete 5.0/5 → 5.0/5 ⚠️ NOT ACTIVATED 🟡 0.22

[1] (Isolated) Quality improved but weighted score is -2.1% due to: tokens (65782 → 85343), tool calls (5 → 6)
[2] (Isolated) Quality unchanged but weighted score is -16.2% due to: judgment, tokens (51994 → 80828), tool calls (4 → 6)

Model: claude-opus-4.6 | Judge: claude-opus-4.6

🔍 Full Results - additional metrics and failure investigation steps

To investigate failures, paste this to your AI coding agent:

For PR 803 in dotnet/skills, download eval artifacts with gh run download 27975446091 --repo dotnet/skills --pattern "skill-validator-results-*" --dir ./eval-results, then fetch https://raw.githubusercontent.com/dotnet/skills/78e9dda103845334f0ab7d390467ad30e744f360/eng/skill-validator/src/docs/InvestigatingResults.md and follow it to analyze the results.json files. Diagnose each failure, suggest fixes to the eval.yaml and skill content, and tell me what to fix first.

▶ Sessions Visualisation -- interactive replay of all evaluation sessions
📊 Session Analytics (preview) -- aggregated metrics across evaluation sessions

@github-actions github-actions Bot added ready-to-merge PR state label and removed pr-state/ready-for-eval PR is mergeable and awaiting evaluation labels Jun 22, 2026
@github-actions

Copy link
Copy Markdown
Contributor

✅ Approved by @AbhitejJohn. cc @dotnet/skills-merge-approvers — ready to merge.

Copilot AI review requested due to automatic review settings June 23, 2026 06:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 1

Comment thread eng/skill-validator/src/Check/CheckCommand.cs Outdated
@github-actions github-actions Bot added waiting-on-author PR state label and removed ready-to-merge PR state label labels Jun 23, 2026
…ck-scalar false positives

The regex-based check matched any line in the frontmatter, so a block-scalar description that merely mentioned 'disable-model-invocation: true' on its own line was wrongly treated as disabling model invocation. Parse the frontmatter with the existing YAML deserializer (which correctly handles block scalars) by adding a DisableModelInvocation field to SkillFrontmatter, and drop the regex entirely.

Co-authored-by: Copilot <[email protected]>
@Evangelink

Copy link
Copy Markdown
Member Author

/evaluate

@Evangelink Evangelink enabled auto-merge (squash) June 23, 2026 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

waiting-on-author PR state label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants