Add tests for ModelCapabilities InferFromName and fuzzy prefix matching by PureWeen · Pull Request #775 · PureWeen/PolyPilot

PureWeen · 2026-04-27T10:39:35Z

Summary

Adds comprehensive [Theory] tests to ModelCapabilitiesTests in SessionOrganizationTests.cs covering three previously untested methods.

Coverage added

InferFromName (7 branches):

Opus family → ReasoningExpert | CodeExpert | ToolUse
Sonnet family → CodeExpert | ToolUse | Fast
Haiku family → Fast | CostEfficient
Gemini family → ReasoningExpert | LargeContext | Vision (+ Fast | CostEfficient because "gemini" contains "mini" — documented in test comment)
Codex variant → adds CodeExpert
Mini variant → adds Fast | CostEfficient
Max variant → adds ReasoningExpert
Unknown → None

GetCapabilities fuzzy prefix matching:

claude-opus-4.6-1m, claude-opus-4.6-preview, claude-opus-4.6-extended → resolve to claude-opus-4.6 flags
claude-sonnet-4.5-preview, claude-sonnet-4.5-20250514 → resolve to claude-sonnet-4.5 flags
gpt-4.1-preview → resolves to gpt-4.1 flags (unambiguous; no shorter registry key matches)
Fuzzy match returns same flags as exact registry lookup

GetStrengths (all 4 paths):

Registry hit → returns non-empty string without "Unknown" or "Inferred"
Fuzzy match → returns registry strengths string
Inferred (novel model name) → returns "Inferred: ..." with content
Completely unknown → returns "Unknown model"

Test results

All 39 new tests pass. Build clean.

- Add [Theory][InlineData] tests for InferFromName covering all 7 pattern branches: opus, sonnet, haiku, gemini family; codex, mini, max variants - Add fuzzy prefix matching tests for GetCapabilities: versioned slugs like 'claude-opus-4.6-1m' and 'claude-sonnet-4.5-preview' resolve to registry entries; 'gpt-4.1-preview' unambiguously matches 'gpt-4.1' - Add GetStrengths tests for all four code paths: registry hit, fuzzy match, inferred (returns 'Inferred: ...' prefix with content), and fully unknown (returns 'Unknown model') - Document the 'gemini'/'mini' substring quirk in InferFromName via test comment Co-authored-by: Copilot <[email protected]>

github-actions · 2026-04-27T10:52:49Z

🟢 Design-level finding — Near-duplicate coverage with `ModelNameInferenceTests`

The new InferFromName_* tests in ModelCapabilitiesTests (Opus/Sonnet/Haiku/Codex/Mini/Max/Gemini/Unknown branches) substantially overlap with existing coverage in the ModelNameInferenceTests class (~line 2952 in the same file), which exercises the same InferFromName branch categories with slightly different input slugs and weaker HasFlag assertions. The new tests add incremental value (exact combined-flag assertions, Theory/InlineData multi-input cases, and a Sonnet+Codex combination test), but most branch-level coverage is near-duplicate.

Suggestion: Consider consolidating the overlapping InferFromName tests into one class to reduce maintenance overhead — or at minimum add a comment cross-referencing the two test suites so future maintainers understand the intentional overlap.

Flagged by: 3/3 reviewers (1 initially + 2 confirmed in follow-up)

Generated by Expert Code Review (auto) for issue #775 · ● 5.9M · ◷

github-actions

Good test coverage for InferFromName, fuzzy prefix matching, and GetStrengths — all 39 tests correctly reflect actual production behavior. Three findings below (0 critical, 1 moderate, 2 minor).

Generated by Expert Code Review (auto) for issue #775 · ● 5.9M

github-actions · 2026-04-27T10:52:49Z

+    [Theory]
+    [InlineData("claude-sonnet-4.5-20250514")]
+    [InlineData("claude-sonnet-4.5-preview")]
+    public void GetCapabilities_FuzzyPrefix_SonnetVariant_ReturnsSonnetFlags(string slug)
+    {
+        var caps = ModelCapabilities.GetCapabilities(slug);
+        // claude-sonnet-4.5 registry entry: CodeExpert|ToolUse|Fast
+        Assert.True(caps.HasFlag(ModelCapability.CodeExpert));
+        Assert.True(caps.HasFlag(ModelCapability.ToolUse));
+        Assert.True(caps.HasFlag(ModelCapability.Fast));
+    }


🟢 MINOR — Fuzzy-match test relies on non-deterministic Dictionary iteration order

"claude-sonnet-4.5-20250514".StartsWith("claude-sonnet-4") is true, so this slug matches both the "claude-sonnet-4.5" and "claude-sonnet-4" registry keys in the foreach loop. The code returns whichever the Dictionary iterator yields first — which isn't guaranteed across .NET versions.

Both entries have identical caps today (CodeExpert | ToolUse | Fast), so this is safe for now. If they ever diverge, this test becomes flaky.

Suggestion: Add a comment noting the assumption:

// Both claude-sonnet-4.5 and claude-sonnet-4 share identical caps today; // if they diverge, fuzzy matching will need longest-prefix-wins logic.

github-actions · 2026-04-27T10:52:50Z

+    }
+
+    [Fact]
+    public void GetCapabilities_FuzzyPrefix_GptMiniVariant_ReturnsGpt4Flags()


🟢 MINOR — Misleading test method name

This tests "gpt-4.1-preview" fuzzy-matching to "gpt-4.1", but the name says "GptMiniVariant" — there's no "mini" in the slug. Suggest renaming to GetCapabilities_FuzzyPrefix_GptPreviewVariant_ReturnsGpt41Flags.

github-actions · 2026-04-27T10:52:50Z

+        // Note: "gemini" contains "mini" as a substring, so the mini-variant logic also fires,
+        // adding Fast|CostEfficient. This is expected production code behaviour.
+        Assert.True(caps.HasFlag(ModelCapability.Fast));
+        Assert.True(caps.HasFlag(ModelCapability.CostEfficient));


🟡 MODERATE — Test cements a production code bug as "expected behaviour"

InferFromName (ModelCapabilities.cs:80) uses lower.Contains("mini"), which falsely matches "ge**mini**". Calling this "expected production code behaviour" discourages fixing it, and any future fix to use word-boundary matching would break these assertions.

Suggestion: Reword the comment to acknowledge this as a known limitation:

// Known issue: "gemini" contains "mini" as a substring, causing the mini-variant // logic to also fire. If InferFromName is updated to use word-boundary matching, // these two asserts should be removed.

github-actions · 2026-04-27T10:52:50Z

+        // Note: "gemini" contains "mini" as a substring, so the mini-variant logic also fires,
+        // adding Fast|CostEfficient. This is expected production code behaviour.
+        Assert.True(caps.HasFlag(ModelCapability.Fast));
+        Assert.True(caps.HasFlag(ModelCapability.CostEfficient));


🟡 MODERATE — Test locks in a substring-collision defect as "expected behaviour"

"gemini" contains "mini" at positions 2–5, so the production code's lower.Contains("mini") check in InferFromName (ModelCapabilities.cs:80) fires on every Gemini-family slug. This incorrectly tags premium Gemini models with Fast | CostEfficient — flags that semantically mean "cheap/lightweight."

The test comment calls this "expected production code behaviour," which effectively approves the defect and will prevent future maintainers from recognising it as a bug. Any novel Gemini model inferred from its name will be misclassified, potentially skewing role-assignment logic in GetRoleWarnings.

Suggestion: Reword the test comment to acknowledge this as a known defect (e.g., // KNOWN BUG: "gemini" contains "mini" substring — see #<issue>), and consider a production fix using word-boundary matching (e.g., Regex.IsMatch(lower, @"\bmini\b") or checking for -mini/_mini tokens).

Flagged by: 3/3 reviewers

github-actions · 2026-04-27T10:52:50Z

+    }
+
+    [Fact]
+    public void GetCapabilities_FuzzyPrefix_GptMiniVariant_ReturnsGpt4Flags()


🟢 MINOR — Misleading test name

GetCapabilities_FuzzyPrefix_GptMiniVariant_ReturnsGpt4Flags tests "gpt-4.1-preview", which is a preview variant, not a mini variant. The name will mislead future maintainers into thinking this covers mini-model behaviour.

Suggestion: Rename to GetCapabilities_FuzzyPrefix_GptPreviewVariant_ReturnsGpt41Flags (or similar) to match the actual input slug.

Flagged by: 2/3 reviewers

github-actions · 2026-04-27T10:52:50Z

+    [Theory]
+    [InlineData("claude-sonnet-4.5-20250514")]
+    [InlineData("claude-sonnet-4.5-preview")]
+    public void GetCapabilities_FuzzyPrefix_SonnetVariant_ReturnsSonnetFlags(string slug)


🟢 MINOR — Fuzzy-prefix test relies on Dictionary iteration order

"claude-sonnet-4.5-preview" prefix-matches two registry keys: "claude-sonnet-4" and "claude-sonnet-4.5". The foreach over the internal Dictionary<string, ...> has no guaranteed iteration order in .NET. This test passes today because both keys happen to share identical capability flags (CodeExpert | ToolUse | Fast), but if either entry is updated independently, this test could become non-deterministically fragile.

Suggestion: Either use test inputs that unambiguously match a single registry key, or add a comment documenting that correctness depends on both keys sharing the same flags.

Flagged by: 2/3 reviewers

PureWeen · 2026-04-27T16:37:30Z

/review

PureWeen · 2026-04-27T16:40:49Z

/review

PureWeen · 2026-04-27T16:57:11Z

/review

github-actions · 2026-04-27T16:57:40Z

✅ Expert Code Review completed successfully!

github-actions

Expert Code Review — Adversarial Consensus

Good test coverage for InferFromName, fuzzy prefix matching, and GetStrengths. The 39 new tests correctly reflect production behavior and follow the project's conventions. Three new findings below (supplementing the existing review's findings on gemini/mini collision, GptMiniVariant naming, and Dictionary iteration order).

Findings

#	Severity	Consensus	File	Finding
1	🟡 MODERATE	3/3 reviewers	`SessionOrganizationTests.cs:2554`	`GetStrengths_FuzzyMatchedModels` assertions don't verify correct registry mapping — fuzzy result should be compared to canonical exact-slug result
2	🟢 MINOR	3/3 reviewers	`SessionOrganizationTests.cs:2451`	Variant-only tests (`Codex`, `Mini`, `Max`) use `HasFlag` instead of `Assert.Equal` — can't detect spurious flags
3	🟢 MINOR	2/3 reviewers	`SessionOrganizationTests.cs:2544`	`GetStrengths_KnownModels` uses only negative assertions — can't catch wrong-entry regressions (1 reviewer argued positive assertions couple to copy edits)

Discarded (single reviewer, no follow-up consensus): Gemini test method name omits 2 of 5 validated flags.

Assessment

Test coverage: Comprehensive — all InferFromName family/variant branches, fuzzy prefix matching, and all 4 GetStrengths paths are covered
Test correctness: All tests accurately reflect current production behavior
CI status: PR description reports all 39 tests pass, build clean
Risk: Low — test-only change, no production code modified

Methodology: 3 independent reviewers with adversarial consensus (initial review + targeted follow-ups for disputed findings)

Warning

⚠️ Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

api.nuget.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "api.nuget.org"

See Network Configuration for more information.

Generated by Expert Code Review for issue #775 · ● 10.2M

github-actions · 2026-04-27T17:11:31Z

+    public void InferFromName_CodexVariant_AddsCodeExpert()
+    {
+        var caps = ModelCapabilities.InferFromName("new-model-codex");
+        Assert.True(caps.HasFlag(ModelCapability.CodeExpert));


🟢 MINOR — Variant-only tests use HasFlag instead of Assert.Equal (3/3 reviewers)

These inputs ("new-model-codex", "new-model-mini", "new-model-max") don't match any family branch, so the expected flags are fully deterministic (e.g., codex → exactly CodeExpert). HasFlag can't detect regressions that add spurious flags — for example, if a code change accidentally added ToolUse to codex variants, this test would still pass.

Suggestion: Use Assert.Equal to match the pattern of the family tests:

Assert.Equal(ModelCapability.CodeExpert, caps);

Same for the mini and max variant tests.

github-actions · 2026-04-27T17:11:31Z

+        Assert.NotEmpty(strengths);
+        Assert.DoesNotContain("Unknown", strengths, StringComparison.OrdinalIgnoreCase);
+        Assert.DoesNotContain("Inferred", strengths, StringComparison.OrdinalIgnoreCase);


🟢 MINOR — Known-model strengths test uses only negative assertions (2/3 reviewers)

NotEmpty + DoesNotContain("Unknown") + DoesNotContain("Inferred") confirms the registry-match path was taken, but can't catch a regression where GetStrengths returns the wrong registry entry's text. One reviewer disagreed, noting that adding positive assertions on prose content couples tests to copy edits — a reasonable tradeoff concern.

Suggestion (optional): If desired, add one keyword per model: e.g., Assert.Contains("reasoning", ...) for opus, Assert.Contains("cost", ...) for haiku. Alternatively, accept the current level as sufficient for a "path coverage" test.

github-actions · 2026-04-27T17:11:31Z

+        var strengths = ModelCapabilities.GetStrengths(slug);
+        Assert.NotEmpty(strengths);
+        Assert.DoesNotContain("Unknown", strengths, StringComparison.OrdinalIgnoreCase);
+        Assert.DoesNotContain("Inferred", strengths, StringComparison.OrdinalIgnoreCase);


🟡 MODERATE — Fuzzy-match strengths test doesn't verify correct registry mapping (3/3 reviewers)

These negative-only assertions (NotEmpty, DoesNotContain("Unknown"), DoesNotContain("Inferred")) prove the result didn't fall through to the inferred/unknown fallback paths, but can't verify which registry entry was matched. If the fuzzy loop's iteration order changes and "claude-opus-4.6-1m" resolves to a different model's strengths, this test still passes.

Suggestion: Pin the fuzzy result to its expected canonical entry:

Assert.Equal(ModelCapabilities.GetStrengths("claude-opus-4.6"), strengths); // (and similarly for "claude-sonnet-4.5-preview" → "claude-sonnet-4.5")

This mirrors the pattern already used in GetCapabilities_FuzzyPrefix_MatchesSameAsExact.

PureWeen · 2026-04-27T17:14:29Z

/review

PureWeen · 2026-04-27T17:24:34Z

/review

github-actions · 2026-04-27T17:24:58Z

✅ Expert Code Review completed successfully!

PureWeen · 2026-04-27T20:13:11Z

/review

github-actions · 2026-04-27T20:18:44Z

✅ Expert Code Review completed successfully!

github-actions

Found 2 new findings (both 🟡) beyond the 6 already-known items:

🟡 Sonnet fuzzy-prefix test is a false positive — GetCapabilities_FuzzyPrefix_SonnetVariant_ReturnsSonnetFlags passes even if the fuzzy-matching loop is deleted, because InferFromName returns identical flags (CodeExpert|ToolUse|Fast) to the sonnet registry entries. The opus fuzzy tests don't have this problem (LargeContext distinguishes the paths). A haiku variant test would provide real fuzzy-path coverage.
🟡 Combined family+variant test is tautological — InferFromName_SonnetCodexCombined can't detect whether the codex branch fires, because sonnet already provides CodeExpert. Using a haiku+codex combination would make the codex contribution observable.

No 🔴 bugs or regressions found. The remaining tests (opus fuzzy, InferFromName families, GetStrengths paths, unknown model) are well-constructed.

Generated by Expert Code Review for issue #775 · ● 11.8M

github-actions · 2026-04-27T20:39:08Z

+        Assert.True(caps.HasFlag(ModelCapability.ToolUse));
+        Assert.True(caps.HasFlag(ModelCapability.Fast));
+    }
+
+    [Fact]
+    public void GetCapabilities_FuzzyPrefix_GptMiniVariant_ReturnsGpt4Flags()
+    {
+        // "gpt-4.1" has Fast|CostEfficient|ToolUse; "gpt-4.1-preview" unambiguously fuzzy-matches it


🟡 False-positive test — cannot detect fuzzy-matching regressions

Both sonnet registry entries (claude-sonnet-4.5, claude-sonnet-4) have flags CodeExpert|ToolUse|Fast. InferFromName for any slug containing "sonnet" also returns CodeExpert|ToolUse|Fast (line 74 of ModelCapabilities.cs). The flag sets are identical.

This means if the entire fuzzy-prefix loop in GetCapabilities (lines 54-57) were deleted, these tests would still pass — InferFromName produces the exact same result. Compare with the opus fuzzy tests above, which do provide real coverage because LargeContext is in the registry entry but NOT in InferFromName's opus output.

Concrete failing scenario: Delete the foreach fuzzy loop from GetCapabilities → opus fuzzy tests fail (good), sonnet fuzzy tests still pass (bad — zero coverage of the code path they claim to test).

Suggested fix: Replace with a model whose registry entry diverges from InferFromName, or add an Assert.Equal against the exact registry value paired with a separate test proving InferFromName returns a different value. Alternatively, test a haiku variant (registry has ToolUse, InferFromName doesn't):

[Fact] public void GetCapabilities_FuzzyPrefix_HaikuVariant_ReturnsRegistryFlags() { var caps = ModelCapabilities.GetCapabilities("claude-haiku-4.5-fast"); // Registry: Fast|CostEfficient|ToolUse; InferFromName: Fast|CostEfficient (no ToolUse) Assert.True(caps.HasFlag(ModelCapability.ToolUse)); // only passes via fuzzy }

github-actions · 2026-04-27T20:39:08Z

+    {
+        // sonnet family gives CodeExpert|ToolUse|Fast; codex adds CodeExpert (idempotent via flags)
+        var caps = ModelCapabilities.InferFromName("sonnet-codex-experimental");
+        Assert.True(caps.HasFlag(ModelCapability.CodeExpert));
+        Assert.True(caps.HasFlag(ModelCapability.ToolUse));
+        Assert.True(caps.HasFlag(ModelCapability.Fast));
+    }
+
+    [Fact]


🟡 Tautological test — codex branch is unobservable

The sonnet family already provides CodeExpert (line 74 of ModelCapabilities.cs), and CodeExpert is the only flag the codex variant adds (line 79). So the |= from the codex branch is a no-op here — the test result is identical whether or not the codex variant code exists.

Concrete failing scenario: Delete line 79 (if (lower.Contains("codex")) caps |= ModelCapability.CodeExpert;) → this test still passes, because sonnet already supplied CodeExpert|ToolUse|Fast.

Suggested fix: Use a family that does NOT include CodeExpert, so the codex variant's contribution is observable:

[Fact] public void InferFromName_HaikuCodexCombined_AddsBothFamilyAndVariantFlags() { // haiku → Fast|CostEfficient; codex → |= CodeExpert (observable addition) var caps = ModelCapabilities.InferFromName("haiku-codex-v2"); Assert.True(caps.HasFlag(ModelCapability.Fast)); Assert.True(caps.HasFlag(ModelCapability.CostEfficient)); Assert.True(caps.HasFlag(ModelCapability.CodeExpert)); // fails if codex branch removed }

PureWeen · 2026-04-27T21:16:43Z

/review

github-actions · 2026-04-27T21:17:21Z

✅ Expert Code Review completed successfully!

github-actions · 2026-04-27T21:30:29Z

Expert Code Review — PR #775

Methodology: 3 independent reviewers with adversarial consensus (2 follow-up rounds for disputed findings)

Findings

#	Severity	Consensus	File	Line(s)	Finding
1	🟡 MODERATE	3/3	`SessionOrganizationTests.cs` L2438–2444	2438–2444	Gemini tests encode "mini" substring false positive as expected behavior. `InferFromName("gemini-4-pro")` gets `Fast\|CostEfficient` because `"gemini"` contains `"mini"` at index 2. The test asserts these flags with a comment calling it "expected production code behaviour," cementing a real bug as a contract. When the production code is fixed (e.g., `lower.Contains("-mini")`), these tests will regress. Consider adding a `// BUG:` or `// TODO:` comment referencing a tracking issue instead of "expected behaviour."
2	🟡 MODERATE	3/3 (after follow-up)	`SessionOrganizationTests.cs` L2428–2432	2428–2432	Haiku inference omits `ToolUse`, inconsistent with registry. The registry entry for `claude-haiku-4.5` has `Fast \| CostEfficient \| ToolUse`, but `InferFromName`'s haiku branch only produces `Fast \| CostEfficient`. The test asserts this gap without comment. Novel haiku models (e.g., `claude-haiku-5`) falling through to inference will trigger a spurious "This model may not support tool use well" warning from `GetRoleWarnings`. Consider adding a comment noting the `ToolUse` omission, or filing a follow-up to add `ToolUse` to the haiku inference branch.
3	🟢 MINOR	2/3	`SessionOrganizationTests.cs` L2515	2515	Misleading test method name. `GetCapabilities_FuzzyPrefix_GptMiniVariant_ReturnsGpt4Flags` tests `"gpt-4.1-preview"` — a preview variant, not a mini variant. The name suggests mini-variant coverage that doesn't exist. Rename to `GetCapabilities_FuzzyPrefix_Gpt41PreviewVariant_ReturnsGpt41Flags`.
4	🟢 MINOR	3/3	`SessionOrganizationTests.cs` L2502–2512	2502–2512	Fuzzy match tests are silently order-dependent. `"claude-sonnet-4.5-preview"` prefix-matches both `"claude-sonnet-4"` and `"claude-sonnet-4.5"` in the registry. Tests pass because both entries have identical flags today, masking the ambiguity. If flags diverge or insertion order changes, results become non-deterministic. Consider adding a comment noting this assumption, or adding a test that explicitly verifies longest-prefix-wins behavior.

Discarded Findings

Finding	Flagged by	Outcome
Inferred strengths string omits "tool use" text for opus models	1/3	Disproven — `ToolUse` is intentionally not mapped in the `GetStrengths` inferred path; test correctly reflects production behavior

CI Status

⚪ Pending — no check runs reported yet.

Test Coverage Assessment

This is a test-only PR adding 39 new [Theory]/[Fact] tests covering InferFromName, GetCapabilities (fuzzy prefix matching), and GetStrengths (all 4 code paths). The tests are well-structured, use appropriate xUnit patterns, and assertions are verified correct against the production code. No production code changes require additional test coverage.

Overall Assessment

Clean test-only PR with good coverage. The two moderate findings are about tests encoding existing production quirks (gemini/mini substring match, haiku missing ToolUse) as expected behavior without tracking comments — worth addressing to prevent future confusion, but not blocking.

Generated by Expert Code Review · 3 independent reviewers with adversarial consensus

Warning

⚠️ Firewall blocked 2 domains

The following domains were blocked by the firewall during workflow execution:

api.nuget.org
dc.services.visualstudio.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "api.nuget.org"
    - "dc.services.visualstudio.com"

See Network Configuration for more information.

Generated by Expert Code Review for issue #775 · ● 12.9M · ◷

github-actions

Reviewed the full production code in ModelCapabilities.cs and all 19 new tests. One finding:

🟡 Gemini "mini" substring contamination locked in by test — InferFromName line 80 uses Contains("mini") which matches the substring inside "gemini", incorrectly adding Fast|CostEfficient to all Gemini models. The test asserts these flags as "expected". Fix: use Contains("-mini") in production code and update the test accordingly.

No other regressions, bugs, security, data-loss, race conditions, or test-correctness issues found. All other assertions are accurate and robust against Dictionary iteration order.

Warning

⚠️ Firewall blocked 2 domains

The following domains were blocked by the firewall during workflow execution:

api.nuget.org
dc.services.visualstudio.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "api.nuget.org"
    - "dc.services.visualstudio.com"

See Network Configuration for more information.

Generated by Expert Code Review for issue #775 · ● 12.9M

github-actions · 2026-04-27T21:30:30Z

+        // adding Fast|CostEfficient. This is expected production code behaviour.
+        Assert.True(caps.HasFlag(ModelCapability.Fast));
+        Assert.True(caps.HasFlag(ModelCapability.CostEfficient));
+    }


🟡 Test locks in a production bug — "gemini" contains "mini" as a substring

The production code at ModelCapabilities.cs:80 uses lower.Contains("mini"), which matches the "mini" inside "ge**mini**". This causes every Gemini model inferred via InferFromName to incorrectly receive Fast | CostEfficient.

This test asserts those incorrect flags and documents it as "expected production code behaviour", which locks in the bug — any future fix to the variant matching would break these assertions.

Concrete impact: GetStrengths("gemini-4-pro") returns "Inferred: reasoning, fast, cost-efficient, multimodal, large context", mislabeling a hypothetical premium model as "fast, cost-efficient".

Suggested fix — in ModelCapabilities.cs line 80, change:

if (lower.Contains("mini")) caps |= ...

to:

if (lower.Contains("-mini")) caps |= ...

All real "mini" variant slugs use a -mini suffix (gpt-5-mini, gpt-5.1-codex-mini). Then update this test to not assert Fast/CostEfficient for Gemini models.

github-actions · 2026-04-28T17:55:59Z

Expert Code Review — PR #775

Methodology: 3 independent reviewers with adversarial consensus (+ 2 targeted follow-ups for 1 disputed finding)

3 findings posted as inline comments (1 🟡 moderate, 2 🟢 minor)

#	Severity	Consensus	File	Line(s)	Finding
1	🟡 MODERATE	2/3 reviewers	`SessionOrganizationTests.cs`	2504	Non-deterministic fuzzy match — `claude-sonnet-4.5-preview` matches two registry keys (`claude-sonnet-4` and `claude-sonnet-4.5`); `Dictionary<>` iteration order is unspecified
2	🟢 MINOR	2/3 reviewers	`SessionOrganizationTests.cs`	2515	Misleading test name `GptMiniVariant` — test slug `gpt-4.1-preview` has no "mini" substring
3	🟢 MINOR	3/3 (after follow-up)	`SessionOrganizationTests.cs`	2492–2496	Fuzzy-prefix tests use `HasFlag` instead of `Assert.Equal` — can't detect spurious extra flags

Assessment

CI status: All 6 check runs passed ✅
Test coverage: Comprehensive — all InferFromName family/variant branches, fuzzy prefix matching, and all 4 GetStrengths paths covered
Risk: Low — test-only change, no production code modified
Prior reviews: 4 prior automated reviews exist with 8 findings already posted; the 3 findings above are new

Generated by Expert Code Review · 3 independent reviewers with adversarial consensus

Generated by Expert Code Review · ● 12.1M · ◷

github-actions

One genuine finding — the new GetStrengths tests for inferred models provide false confidence because ToolUse is inferred but never rendered in the strengths string.

GetStrengths silently drops ToolUse for all inferred models (Medium)

InferFromName sets ToolUse for opus (line 73) and sonnet (line 74), but the strengths rendering block (lines 100–106 of ModelCapabilities.cs) has no branch for ToolUse — it's the only capability flag without one. So GetStrengths("claude-opus-99") returns "Inferred: reasoning, code" with no mention of tool use, even though GetCapabilities("claude-opus-99") returns ReasoningExpert | CodeExpert | ToolUse.

The new test GetStrengths_InferredOpus_ContainsReasoningAndCode passes by checking "reasoning" and "code" but never catches the missing "tool" — adding Assert.Contains("tool", strengths, OrdinalIgnoreCase) would expose the bug.

Fix: add if (inferred.HasFlag(ModelCapability.ToolUse)) parts.Add("tool use"); to the rendering block, and add a "tool" assertion to the test.

Generated by Expert Code Review · ● 12.1M

github-actions · 2026-04-28T17:56:00Z

+
+    [Theory]
+    [InlineData("claude-sonnet-4.5-20250514")]
+    [InlineData("claude-sonnet-4.5-preview")]


🟡 MODERATE — Non-deterministic fuzzy match: claude-sonnet-4.5-preview matches two registry keys (2/3 reviewers)

"claude-sonnet-4.5-preview" satisfies StartsWith(key) for both "claude-sonnet-4.5" and "claude-sonnet-4" in the registry dictionary. Dictionary<> enumeration order is unspecified in .NET, so which key "wins" the fuzzy match is nondeterministic.

The test passes today only because both keys share identical flags (CodeExpert|ToolUse|Fast) and the same Strengths string. If either entry diverges in a future registry update, this test becomes unpredictably flaky.

The same ambiguity affects GetStrengths_FuzzyMatchedModels_ReturnRegistryStrengths (line 2551) which also uses "claude-sonnet-4.5-preview".

Fix: Use a slug that unambiguously matches exactly one registry key, or add an assertion that verifies the exact matched key.

github-actions · 2026-04-28T17:56:01Z

+    public void GetCapabilities_FuzzyPrefix_OpusVariant_ReturnsOpusFlags(string slug)
+    {
+        var caps = ModelCapabilities.GetCapabilities(slug);
+        // claude-opus-4.6 registry entry: ReasoningExpert|CodeExpert|ToolUse|LargeContext
+        Assert.True(caps.HasFlag(ModelCapability.ReasoningExpert));


🟢 MINOR — Fuzzy-prefix tests use HasFlag instead of Assert.Equal (3/3 reviewers after follow-up)

All three fuzzy-prefix tests (OpusVariant, SonnetVariant, GptMiniVariant) use Assert.True(caps.HasFlag(...)), which only verifies minimum flags are present. If GetCapabilities starts returning extra unintended flags for fuzzy-prefix slugs, these tests would silently pass.

Since fuzzy-prefix resolution is expected to mirror a specific registry entry exactly, Assert.Equal(expected, caps) is the stronger and more appropriate contract check — consistent with the InferFromName family tests above which already use Assert.Equal.

Note: GetCapabilities_FuzzyPrefix_MatchesSameAsExact (line 2526) already uses Assert.Equal for opus, which is good — extending this pattern to all fuzzy-prefix tests would close the gap.

github-actions · 2026-04-28T17:56:01Z

+    }
+
+    [Fact]
+    public void GetCapabilities_FuzzyPrefix_GptMiniVariant_ReturnsGpt4Flags()


🟢 MINOR — Misleading test name GptMiniVariant (2/3 reviewers)

"gpt-4.1-preview" contains no "mini" substring and no mini-variant logic fires — this test exercises the basic prefix fuzzy path (gpt-4.1-preview → gpt-4.1). The name will confuse future readers searching for mini-variant coverage.

Suggestion: Rename to something like GetCapabilities_FuzzyPrefix_Gpt41Preview_ReturnsGpt41Flags.

github-actions

Expert Code Review: 1 finding posted inline (minor). See summary comment for details.

Generated by Expert Code Review · ● 2.7M

github-actions · 2026-04-28T18:17:06Z

+        Assert.True(caps.HasFlag(ModelCapability.CostEfficient));
+    }
+
+    [Fact]
+    public void InferFromName_CodexVariant_AddsCodeExpert()
+    {


🟢 MINOR · 2/3 reviewers

Enshrining a production quirk as required test behavior. These assertions lock in the "gemini" → "mini" substring false-positive as expected, which means a future fix to use word-boundary matching in InferFromName would break these tests.

Suggestion: Consider splitting the gemini assertions — keep ReasoningExpert | LargeContext | Vision as the required core, and assert Fast | CostEfficient separately with a comment like // TODO: Known quirk — "gemini" contains "mini". Remove when InferFromName uses word-boundary matching. This way a future fix only needs to update the quirk assertion, not the core gemini test.

PureWeen · 2026-04-28T18:36:38Z

/review

github-actions Bot reviewed Apr 27, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

github-actions Bot added the needs-review label Apr 27, 2026

This comment has been minimized.

Sign in to view

github-actions Bot reviewed Apr 27, 2026

View reviewed changes

github-actions Bot reviewed Apr 28, 2026

View reviewed changes

Conversation

PureWeen commented Apr 27, 2026

Summary

Coverage added

Test results

Uh oh!

github-actions Bot commented Apr 27, 2026

🟢 Design-level finding — Near-duplicate coverage with ModelNameInferenceTests

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

PureWeen commented Apr 27, 2026

Uh oh!

PureWeen commented Apr 27, 2026

Uh oh!

PureWeen commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Expert Code Review — Adversarial Consensus

Findings

Assessment

Uh oh!

github-actions Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

PureWeen commented Apr 27, 2026

Uh oh!

PureWeen commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

PureWeen commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

PureWeen commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

🟢 Design-level finding — Near-duplicate coverage with `ModelNameInferenceTests`

github-actions Bot commented Apr 27, 2026 •

edited

Loading

github-actions Bot commented Apr 27, 2026 •

edited

Loading

github-actions Bot commented Apr 27, 2026 •

edited

Loading

github-actions Bot commented Apr 27, 2026 •

edited

Loading