Skip to content

Add tests for ModelCapabilities InferFromName and fuzzy prefix matching#775

Open
PureWeen wants to merge 1 commit intomainfrom
op-test-modelcapabilities-inference
Open

Add tests for ModelCapabilities InferFromName and fuzzy prefix matching#775
PureWeen wants to merge 1 commit intomainfrom
op-test-modelcapabilities-inference

Conversation

@PureWeen
Copy link
Copy Markdown
Owner

Summary

Adds comprehensive [Theory] tests to ModelCapabilitiesTests in SessionOrganizationTests.cs covering three previously untested methods.

Coverage added

InferFromName (7 branches):

  • Opus family → ReasoningExpert | CodeExpert | ToolUse
  • Sonnet family → CodeExpert | ToolUse | Fast
  • Haiku family → Fast | CostEfficient
  • Gemini family → ReasoningExpert | LargeContext | Vision (+ Fast | CostEfficient because "gemini" contains "mini" — documented in test comment)
  • Codex variant → adds CodeExpert
  • Mini variant → adds Fast | CostEfficient
  • Max variant → adds ReasoningExpert
  • Unknown → None

GetCapabilities fuzzy prefix matching:

  • claude-opus-4.6-1m, claude-opus-4.6-preview, claude-opus-4.6-extended → resolve to claude-opus-4.6 flags
  • claude-sonnet-4.5-preview, claude-sonnet-4.5-20250514 → resolve to claude-sonnet-4.5 flags
  • gpt-4.1-preview → resolves to gpt-4.1 flags (unambiguous; no shorter registry key matches)
  • Fuzzy match returns same flags as exact registry lookup

GetStrengths (all 4 paths):

  • Registry hit → returns non-empty string without "Unknown" or "Inferred"
  • Fuzzy match → returns registry strengths string
  • Inferred (novel model name) → returns "Inferred: ..." with content
  • Completely unknown → returns "Unknown model"

Test results

All 39 new tests pass. Build clean.

- Add [Theory][InlineData] tests for InferFromName covering all 7 pattern
  branches: opus, sonnet, haiku, gemini family; codex, mini, max variants
- Add fuzzy prefix matching tests for GetCapabilities: versioned slugs like
  'claude-opus-4.6-1m' and 'claude-sonnet-4.5-preview' resolve to registry
  entries; 'gpt-4.1-preview' unambiguously matches 'gpt-4.1'
- Add GetStrengths tests for all four code paths: registry hit, fuzzy match,
  inferred (returns 'Inferred: ...' prefix with content), and fully unknown
  (returns 'Unknown model')
- Document the 'gemini'/'mini' substring quirk in InferFromName via test comment

Co-authored-by: Copilot <[email protected]>
@github-actions
Copy link
Copy Markdown
Contributor

🟢 Design-level finding — Near-duplicate coverage with ModelNameInferenceTests

The new InferFromName_* tests in ModelCapabilitiesTests (Opus/Sonnet/Haiku/Codex/Mini/Max/Gemini/Unknown branches) substantially overlap with existing coverage in the ModelNameInferenceTests class (~line 2952 in the same file), which exercises the same InferFromName branch categories with slightly different input slugs and weaker HasFlag assertions. The new tests add incremental value (exact combined-flag assertions, Theory/InlineData multi-input cases, and a Sonnet+Codex combination test), but most branch-level coverage is near-duplicate.

Suggestion: Consider consolidating the overlapping InferFromName tests into one class to reduce maintenance overhead — or at minimum add a comment cross-referencing the two test suites so future maintainers understand the intentional overlap.

Flagged by: 3/3 reviewers (1 initially + 2 confirmed in follow-up)

Generated by Expert Code Review (auto) for issue #775 · ● 5.9M ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good test coverage for InferFromName, fuzzy prefix matching, and GetStrengths — all 39 tests correctly reflect actual production behavior. Three findings below (0 critical, 1 moderate, 2 minor).

Generated by Expert Code Review (auto) for issue #775 · ● 5.9M

Comment on lines +2502 to +2512
[Theory]
[InlineData("claude-sonnet-4.5-20250514")]
[InlineData("claude-sonnet-4.5-preview")]
public void GetCapabilities_FuzzyPrefix_SonnetVariant_ReturnsSonnetFlags(string slug)
{
var caps = ModelCapabilities.GetCapabilities(slug);
// claude-sonnet-4.5 registry entry: CodeExpert|ToolUse|Fast
Assert.True(caps.HasFlag(ModelCapability.CodeExpert));
Assert.True(caps.HasFlag(ModelCapability.ToolUse));
Assert.True(caps.HasFlag(ModelCapability.Fast));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 MINOR — Fuzzy-match test relies on non-deterministic Dictionary iteration order

"claude-sonnet-4.5-20250514".StartsWith("claude-sonnet-4") is true, so this slug matches both the "claude-sonnet-4.5" and "claude-sonnet-4" registry keys in the foreach loop. The code returns whichever the Dictionary iterator yields first — which isn't guaranteed across .NET versions.

Both entries have identical caps today (CodeExpert | ToolUse | Fast), so this is safe for now. If they ever diverge, this test becomes flaky.

Suggestion: Add a comment noting the assumption:

// Both claude-sonnet-4.5 and claude-sonnet-4 share identical caps today;
// if they diverge, fuzzy matching will need longest-prefix-wins logic.

}

[Fact]
public void GetCapabilities_FuzzyPrefix_GptMiniVariant_ReturnsGpt4Flags()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 MINOR — Misleading test method name

This tests "gpt-4.1-preview" fuzzy-matching to "gpt-4.1", but the name says "GptMiniVariant" — there's no "mini" in the slug. Suggest renaming to GetCapabilities_FuzzyPrefix_GptPreviewVariant_ReturnsGpt41Flags.

Comment on lines +2441 to +2444
// Note: "gemini" contains "mini" as a substring, so the mini-variant logic also fires,
// adding Fast|CostEfficient. This is expected production code behaviour.
Assert.True(caps.HasFlag(ModelCapability.Fast));
Assert.True(caps.HasFlag(ModelCapability.CostEfficient));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — Test cements a production code bug as "expected behaviour"

InferFromName (ModelCapabilities.cs:80) uses lower.Contains("mini"), which falsely matches "ge**mini**". Calling this "expected production code behaviour" discourages fixing it, and any future fix to use word-boundary matching would break these assertions.

Suggestion: Reword the comment to acknowledge this as a known limitation:

// Known issue: "gemini" contains "mini" as a substring, causing the mini-variant
// logic to also fire. If InferFromName is updated to use word-boundary matching,
// these two asserts should be removed.

Comment on lines +2441 to +2444
// Note: "gemini" contains "mini" as a substring, so the mini-variant logic also fires,
// adding Fast|CostEfficient. This is expected production code behaviour.
Assert.True(caps.HasFlag(ModelCapability.Fast));
Assert.True(caps.HasFlag(ModelCapability.CostEfficient));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — Test locks in a substring-collision defect as "expected behaviour"

"gemini" contains "mini" at positions 2–5, so the production code's lower.Contains("mini") check in InferFromName (ModelCapabilities.cs:80) fires on every Gemini-family slug. This incorrectly tags premium Gemini models with Fast | CostEfficient — flags that semantically mean "cheap/lightweight."

The test comment calls this "expected production code behaviour," which effectively approves the defect and will prevent future maintainers from recognising it as a bug. Any novel Gemini model inferred from its name will be misclassified, potentially skewing role-assignment logic in GetRoleWarnings.

Suggestion: Reword the test comment to acknowledge this as a known defect (e.g., // KNOWN BUG: "gemini" contains "mini" substring — see #<issue>), and consider a production fix using word-boundary matching (e.g., Regex.IsMatch(lower, @"\bmini\b") or checking for -mini/_mini tokens).

Flagged by: 3/3 reviewers

}

[Fact]
public void GetCapabilities_FuzzyPrefix_GptMiniVariant_ReturnsGpt4Flags()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 MINOR — Misleading test name

GetCapabilities_FuzzyPrefix_GptMiniVariant_ReturnsGpt4Flags tests "gpt-4.1-preview", which is a preview variant, not a mini variant. The name will mislead future maintainers into thinking this covers mini-model behaviour.

Suggestion: Rename to GetCapabilities_FuzzyPrefix_GptPreviewVariant_ReturnsGpt41Flags (or similar) to match the actual input slug.

Flagged by: 2/3 reviewers

[Theory]
[InlineData("claude-sonnet-4.5-20250514")]
[InlineData("claude-sonnet-4.5-preview")]
public void GetCapabilities_FuzzyPrefix_SonnetVariant_ReturnsSonnetFlags(string slug)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 MINOR — Fuzzy-prefix test relies on Dictionary iteration order

"claude-sonnet-4.5-preview" prefix-matches two registry keys: "claude-sonnet-4" and "claude-sonnet-4.5". The foreach over the internal Dictionary<string, ...> has no guaranteed iteration order in .NET. This test passes today because both keys happen to share identical capability flags (CodeExpert | ToolUse | Fast), but if either entry is updated independently, this test could become non-deterministically fragile.

Suggestion: Either use test inputs that unambiguously match a single registry key, or add a comment documenting that correctness depends on both keys sharing the same flags.

Flagged by: 2/3 reviewers

@PureWeen
Copy link
Copy Markdown
Owner Author

/review

2 similar comments
@PureWeen
Copy link
Copy Markdown
Owner Author

/review

@PureWeen
Copy link
Copy Markdown
Owner Author

/review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 27, 2026

Expert Code Review completed successfully!

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expert Code Review — Adversarial Consensus

Good test coverage for InferFromName, fuzzy prefix matching, and GetStrengths. The 39 new tests correctly reflect production behavior and follow the project's conventions. Three new findings below (supplementing the existing review's findings on gemini/mini collision, GptMiniVariant naming, and Dictionary iteration order).

Findings

# Severity Consensus File Finding
1 🟡 MODERATE 3/3 reviewers SessionOrganizationTests.cs:2554 GetStrengths_FuzzyMatchedModels assertions don't verify correct registry mapping — fuzzy result should be compared to canonical exact-slug result
2 🟢 MINOR 3/3 reviewers SessionOrganizationTests.cs:2451 Variant-only tests (Codex, Mini, Max) use HasFlag instead of Assert.Equal — can't detect spurious flags
3 🟢 MINOR 2/3 reviewers SessionOrganizationTests.cs:2544 GetStrengths_KnownModels uses only negative assertions — can't catch wrong-entry regressions (1 reviewer argued positive assertions couple to copy edits)

Discarded (single reviewer, no follow-up consensus): Gemini test method name omits 2 of 5 validated flags.

Assessment

  • Test coverage: Comprehensive — all InferFromName family/variant branches, fuzzy prefix matching, and all 4 GetStrengths paths are covered
  • Test correctness: All tests accurately reflect current production behavior
  • CI status: PR description reports all 39 tests pass, build clean
  • Risk: Low — test-only change, no production code modified

Methodology: 3 independent reviewers with adversarial consensus (initial review + targeted follow-ups for disputed findings)

Warning

⚠️ Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • api.nuget.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "api.nuget.org"

See Network Configuration for more information.

Generated by Expert Code Review for issue #775 · ● 10.2M

public void InferFromName_CodexVariant_AddsCodeExpert()
{
var caps = ModelCapabilities.InferFromName("new-model-codex");
Assert.True(caps.HasFlag(ModelCapability.CodeExpert));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 MINOR — Variant-only tests use HasFlag instead of Assert.Equal (3/3 reviewers)

These inputs ("new-model-codex", "new-model-mini", "new-model-max") don't match any family branch, so the expected flags are fully deterministic (e.g., codex → exactly CodeExpert). HasFlag can't detect regressions that add spurious flags — for example, if a code change accidentally added ToolUse to codex variants, this test would still pass.

Suggestion: Use Assert.Equal to match the pattern of the family tests:

Assert.Equal(ModelCapability.CodeExpert, caps);

Same for the mini and max variant tests.

Comment on lines +2544 to +2546
Assert.NotEmpty(strengths);
Assert.DoesNotContain("Unknown", strengths, StringComparison.OrdinalIgnoreCase);
Assert.DoesNotContain("Inferred", strengths, StringComparison.OrdinalIgnoreCase);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 MINOR — Known-model strengths test uses only negative assertions (2/3 reviewers)

NotEmpty + DoesNotContain("Unknown") + DoesNotContain("Inferred") confirms the registry-match path was taken, but can't catch a regression where GetStrengths returns the wrong registry entry's text. One reviewer disagreed, noting that adding positive assertions on prose content couples tests to copy edits — a reasonable tradeoff concern.

Suggestion (optional): If desired, add one keyword per model: e.g., Assert.Contains("reasoning", ...) for opus, Assert.Contains("cost", ...) for haiku. Alternatively, accept the current level as sufficient for a "path coverage" test.

Comment on lines +2554 to +2557
var strengths = ModelCapabilities.GetStrengths(slug);
Assert.NotEmpty(strengths);
Assert.DoesNotContain("Unknown", strengths, StringComparison.OrdinalIgnoreCase);
Assert.DoesNotContain("Inferred", strengths, StringComparison.OrdinalIgnoreCase);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — Fuzzy-match strengths test doesn't verify correct registry mapping (3/3 reviewers)

These negative-only assertions (NotEmpty, DoesNotContain("Unknown"), DoesNotContain("Inferred")) prove the result didn't fall through to the inferred/unknown fallback paths, but can't verify which registry entry was matched. If the fuzzy loop's iteration order changes and "claude-opus-4.6-1m" resolves to a different model's strengths, this test still passes.

Suggestion: Pin the fuzzy result to its expected canonical entry:

Assert.Equal(ModelCapabilities.GetStrengths("claude-opus-4.6"), strengths);
// (and similarly for "claude-sonnet-4.5-preview" → "claude-sonnet-4.5")

This mirrors the pattern already used in GetCapabilities_FuzzyPrefix_MatchesSameAsExact.

@PureWeen
Copy link
Copy Markdown
Owner Author

/review

1 similar comment
@PureWeen
Copy link
Copy Markdown
Owner Author

/review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 27, 2026

Expert Code Review completed successfully!

@github-actions

This comment has been minimized.

@PureWeen
Copy link
Copy Markdown
Owner Author

/review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 27, 2026

Expert Code Review completed successfully!

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 2 new findings (both 🟡) beyond the 6 already-known items:

  1. 🟡 Sonnet fuzzy-prefix test is a false positiveGetCapabilities_FuzzyPrefix_SonnetVariant_ReturnsSonnetFlags passes even if the fuzzy-matching loop is deleted, because InferFromName returns identical flags (CodeExpert|ToolUse|Fast) to the sonnet registry entries. The opus fuzzy tests don't have this problem (LargeContext distinguishes the paths). A haiku variant test would provide real fuzzy-path coverage.

  2. 🟡 Combined family+variant test is tautologicalInferFromName_SonnetCodexCombined can't detect whether the codex branch fires, because sonnet already provides CodeExpert. Using a haiku+codex combination would make the codex contribution observable.

No 🔴 bugs or regressions found. The remaining tests (opus fuzzy, InferFromName families, GetStrengths paths, unknown model) are well-constructed.

Generated by Expert Code Review for issue #775 · ● 11.8M

Comment on lines +2510 to +2517
Assert.True(caps.HasFlag(ModelCapability.ToolUse));
Assert.True(caps.HasFlag(ModelCapability.Fast));
}

[Fact]
public void GetCapabilities_FuzzyPrefix_GptMiniVariant_ReturnsGpt4Flags()
{
// "gpt-4.1" has Fast|CostEfficient|ToolUse; "gpt-4.1-preview" unambiguously fuzzy-matches it
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 False-positive test — cannot detect fuzzy-matching regressions

Both sonnet registry entries (claude-sonnet-4.5, claude-sonnet-4) have flags CodeExpert|ToolUse|Fast. InferFromName for any slug containing "sonnet" also returns CodeExpert|ToolUse|Fast (line 74 of ModelCapabilities.cs). The flag sets are identical.

This means if the entire fuzzy-prefix loop in GetCapabilities (lines 54-57) were deleted, these tests would still pass — InferFromName produces the exact same result. Compare with the opus fuzzy tests above, which do provide real coverage because LargeContext is in the registry entry but NOT in InferFromName's opus output.

Concrete failing scenario: Delete the foreach fuzzy loop from GetCapabilities → opus fuzzy tests fail (good), sonnet fuzzy tests still pass (bad — zero coverage of the code path they claim to test).

Suggested fix: Replace with a model whose registry entry diverges from InferFromName, or add an Assert.Equal against the exact registry value paired with a separate test proving InferFromName returns a different value. Alternatively, test a haiku variant (registry has ToolUse, InferFromName doesn't):

[Fact]
public void GetCapabilities_FuzzyPrefix_HaikuVariant_ReturnsRegistryFlags()
{
    var caps = ModelCapabilities.GetCapabilities("claude-haiku-4.5-fast");
    // Registry: Fast|CostEfficient|ToolUse; InferFromName: Fast|CostEfficient (no ToolUse)
    Assert.True(caps.HasFlag(ModelCapability.ToolUse)); // only passes via fuzzy
}

Comment on lines +2471 to +2479
{
// sonnet family gives CodeExpert|ToolUse|Fast; codex adds CodeExpert (idempotent via flags)
var caps = ModelCapabilities.InferFromName("sonnet-codex-experimental");
Assert.True(caps.HasFlag(ModelCapability.CodeExpert));
Assert.True(caps.HasFlag(ModelCapability.ToolUse));
Assert.True(caps.HasFlag(ModelCapability.Fast));
}

[Fact]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Tautological test — codex branch is unobservable

The sonnet family already provides CodeExpert (line 74 of ModelCapabilities.cs), and CodeExpert is the only flag the codex variant adds (line 79). So the |= from the codex branch is a no-op here — the test result is identical whether or not the codex variant code exists.

Concrete failing scenario: Delete line 79 (if (lower.Contains("codex")) caps |= ModelCapability.CodeExpert;) → this test still passes, because sonnet already supplied CodeExpert|ToolUse|Fast.

Suggested fix: Use a family that does NOT include CodeExpert, so the codex variant's contribution is observable:

[Fact]
public void InferFromName_HaikuCodexCombined_AddsBothFamilyAndVariantFlags()
{
    // haiku → Fast|CostEfficient; codex → |= CodeExpert (observable addition)
    var caps = ModelCapabilities.InferFromName("haiku-codex-v2");
    Assert.True(caps.HasFlag(ModelCapability.Fast));
    Assert.True(caps.HasFlag(ModelCapability.CostEfficient));
    Assert.True(caps.HasFlag(ModelCapability.CodeExpert)); // fails if codex branch removed
}

@PureWeen
Copy link
Copy Markdown
Owner Author

/review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 27, 2026

Expert Code Review completed successfully!

@github-actions
Copy link
Copy Markdown
Contributor

Expert Code Review — PR #775

Methodology: 3 independent reviewers with adversarial consensus (2 follow-up rounds for disputed findings)

Findings

# Severity Consensus File Line(s) Finding
1 🟡 MODERATE 3/3 SessionOrganizationTests.cs L2438–2444 2438–2444 Gemini tests encode "mini" substring false positive as expected behavior. InferFromName("gemini-4-pro") gets Fast|CostEfficient because "gemini" contains "mini" at index 2. The test asserts these flags with a comment calling it "expected production code behaviour," cementing a real bug as a contract. When the production code is fixed (e.g., lower.Contains("-mini")), these tests will regress. Consider adding a // BUG: or // TODO: comment referencing a tracking issue instead of "expected behaviour."
2 🟡 MODERATE 3/3 (after follow-up) SessionOrganizationTests.cs L2428–2432 2428–2432 Haiku inference omits ToolUse, inconsistent with registry. The registry entry for claude-haiku-4.5 has Fast | CostEfficient | ToolUse, but InferFromName's haiku branch only produces Fast | CostEfficient. The test asserts this gap without comment. Novel haiku models (e.g., claude-haiku-5) falling through to inference will trigger a spurious "This model may not support tool use well" warning from GetRoleWarnings. Consider adding a comment noting the ToolUse omission, or filing a follow-up to add ToolUse to the haiku inference branch.
3 🟢 MINOR 2/3 SessionOrganizationTests.cs L2515 2515 Misleading test method name. GetCapabilities_FuzzyPrefix_GptMiniVariant_ReturnsGpt4Flags tests "gpt-4.1-preview" — a preview variant, not a mini variant. The name suggests mini-variant coverage that doesn't exist. Rename to GetCapabilities_FuzzyPrefix_Gpt41PreviewVariant_ReturnsGpt41Flags.
4 🟢 MINOR 3/3 SessionOrganizationTests.cs L2502–2512 2502–2512 Fuzzy match tests are silently order-dependent. "claude-sonnet-4.5-preview" prefix-matches both "claude-sonnet-4" and "claude-sonnet-4.5" in the registry. Tests pass because both entries have identical flags today, masking the ambiguity. If flags diverge or insertion order changes, results become non-deterministic. Consider adding a comment noting this assumption, or adding a test that explicitly verifies longest-prefix-wins behavior.

Discarded Findings

Finding Flagged by Outcome
Inferred strengths string omits "tool use" text for opus models 1/3 Disproven — ToolUse is intentionally not mapped in the GetStrengths inferred path; test correctly reflects production behavior

CI Status

⚪ Pending — no check runs reported yet.

Test Coverage Assessment

This is a test-only PR adding 39 new [Theory]/[Fact] tests covering InferFromName, GetCapabilities (fuzzy prefix matching), and GetStrengths (all 4 code paths). The tests are well-structured, use appropriate xUnit patterns, and assertions are verified correct against the production code. No production code changes require additional test coverage.

Overall Assessment

Clean test-only PR with good coverage. The two moderate findings are about tests encoding existing production quirks (gemini/mini substring match, haiku missing ToolUse) as expected behavior without tracking comments — worth addressing to prevent future confusion, but not blocking.

Generated by Expert Code Review · 3 independent reviewers with adversarial consensus

Warning

⚠️ Firewall blocked 2 domains

The following domains were blocked by the firewall during workflow execution:

  • api.nuget.org
  • dc.services.visualstudio.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "api.nuget.org"
    - "dc.services.visualstudio.com"

See Network Configuration for more information.

Generated by Expert Code Review for issue #775 · ● 12.9M ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the full production code in ModelCapabilities.cs and all 19 new tests. One finding:

🟡 Gemini "mini" substring contamination locked in by testInferFromName line 80 uses Contains("mini") which matches the substring inside "gemini", incorrectly adding Fast|CostEfficient to all Gemini models. The test asserts these flags as "expected". Fix: use Contains("-mini") in production code and update the test accordingly.

No other regressions, bugs, security, data-loss, race conditions, or test-correctness issues found. All other assertions are accurate and robust against Dictionary iteration order.

Warning

⚠️ Firewall blocked 2 domains

The following domains were blocked by the firewall during workflow execution:

  • api.nuget.org
  • dc.services.visualstudio.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "api.nuget.org"
    - "dc.services.visualstudio.com"

See Network Configuration for more information.

Generated by Expert Code Review for issue #775 · ● 12.9M

// adding Fast|CostEfficient. This is expected production code behaviour.
Assert.True(caps.HasFlag(ModelCapability.Fast));
Assert.True(caps.HasFlag(ModelCapability.CostEfficient));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Test locks in a production bug — "gemini" contains "mini" as a substring

The production code at ModelCapabilities.cs:80 uses lower.Contains("mini"), which matches the "mini" inside "ge**mini**". This causes every Gemini model inferred via InferFromName to incorrectly receive Fast | CostEfficient.

This test asserts those incorrect flags and documents it as "expected production code behaviour", which locks in the bug — any future fix to the variant matching would break these assertions.

Concrete impact: GetStrengths("gemini-4-pro") returns "Inferred: reasoning, fast, cost-efficient, multimodal, large context", mislabeling a hypothetical premium model as "fast, cost-efficient".

Suggested fix — in ModelCapabilities.cs line 80, change:

if (lower.Contains("mini")) caps |= ...

to:

if (lower.Contains("-mini")) caps |= ...

All real "mini" variant slugs use a -mini suffix (gpt-5-mini, gpt-5.1-codex-mini). Then update this test to not assert Fast/CostEfficient for Gemini models.

@github-actions
Copy link
Copy Markdown
Contributor

Expert Code Review — PR #775

Methodology: 3 independent reviewers with adversarial consensus (+ 2 targeted follow-ups for 1 disputed finding)

3 findings posted as inline comments (1 🟡 moderate, 2 🟢 minor)

# Severity Consensus File Line(s) Finding
1 🟡 MODERATE 2/3 reviewers SessionOrganizationTests.cs 2504 Non-deterministic fuzzy match — claude-sonnet-4.5-preview matches two registry keys (claude-sonnet-4 and claude-sonnet-4.5); Dictionary<> iteration order is unspecified
2 🟢 MINOR 2/3 reviewers SessionOrganizationTests.cs 2515 Misleading test name GptMiniVariant — test slug gpt-4.1-preview has no "mini" substring
3 🟢 MINOR 3/3 (after follow-up) SessionOrganizationTests.cs 2492–2496 Fuzzy-prefix tests use HasFlag instead of Assert.Equal — can't detect spurious extra flags

Assessment

  • CI status: All 6 check runs passed ✅
  • Test coverage: Comprehensive — all InferFromName family/variant branches, fuzzy prefix matching, and all 4 GetStrengths paths covered
  • Risk: Low — test-only change, no production code modified
  • Prior reviews: 4 prior automated reviews exist with 8 findings already posted; the 3 findings above are new

Generated by Expert Code Review · 3 independent reviewers with adversarial consensus

Generated by Expert Code Review · ● 12.1M ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One genuine finding — the new GetStrengths tests for inferred models provide false confidence because ToolUse is inferred but never rendered in the strengths string.

GetStrengths silently drops ToolUse for all inferred models (Medium)

InferFromName sets ToolUse for opus (line 73) and sonnet (line 74), but the strengths rendering block (lines 100–106 of ModelCapabilities.cs) has no branch for ToolUse — it's the only capability flag without one. So GetStrengths("claude-opus-99") returns "Inferred: reasoning, code" with no mention of tool use, even though GetCapabilities("claude-opus-99") returns ReasoningExpert | CodeExpert | ToolUse.

The new test GetStrengths_InferredOpus_ContainsReasoningAndCode passes by checking "reasoning" and "code" but never catches the missing "tool" — adding Assert.Contains("tool", strengths, OrdinalIgnoreCase) would expose the bug.

Fix: add if (inferred.HasFlag(ModelCapability.ToolUse)) parts.Add("tool use"); to the rendering block, and add a "tool" assertion to the test.

Generated by Expert Code Review · ● 12.1M


[Theory]
[InlineData("claude-sonnet-4.5-20250514")]
[InlineData("claude-sonnet-4.5-preview")]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — Non-deterministic fuzzy match: claude-sonnet-4.5-preview matches two registry keys (2/3 reviewers)

"claude-sonnet-4.5-preview" satisfies StartsWith(key) for both "claude-sonnet-4.5" and "claude-sonnet-4" in the registry dictionary. Dictionary<> enumeration order is unspecified in .NET, so which key "wins" the fuzzy match is nondeterministic.

The test passes today only because both keys share identical flags (CodeExpert|ToolUse|Fast) and the same Strengths string. If either entry diverges in a future registry update, this test becomes unpredictably flaky.

The same ambiguity affects GetStrengths_FuzzyMatchedModels_ReturnRegistryStrengths (line 2551) which also uses "claude-sonnet-4.5-preview".

Fix: Use a slug that unambiguously matches exactly one registry key, or add an assertion that verifies the exact matched key.

Comment on lines +2492 to +2496
public void GetCapabilities_FuzzyPrefix_OpusVariant_ReturnsOpusFlags(string slug)
{
var caps = ModelCapabilities.GetCapabilities(slug);
// claude-opus-4.6 registry entry: ReasoningExpert|CodeExpert|ToolUse|LargeContext
Assert.True(caps.HasFlag(ModelCapability.ReasoningExpert));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 MINOR — Fuzzy-prefix tests use HasFlag instead of Assert.Equal (3/3 reviewers after follow-up)

All three fuzzy-prefix tests (OpusVariant, SonnetVariant, GptMiniVariant) use Assert.True(caps.HasFlag(...)), which only verifies minimum flags are present. If GetCapabilities starts returning extra unintended flags for fuzzy-prefix slugs, these tests would silently pass.

Since fuzzy-prefix resolution is expected to mirror a specific registry entry exactly, Assert.Equal(expected, caps) is the stronger and more appropriate contract check — consistent with the InferFromName family tests above which already use Assert.Equal.

Note: GetCapabilities_FuzzyPrefix_MatchesSameAsExact (line 2526) already uses Assert.Equal for opus, which is good — extending this pattern to all fuzzy-prefix tests would close the gap.

}

[Fact]
public void GetCapabilities_FuzzyPrefix_GptMiniVariant_ReturnsGpt4Flags()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 MINOR — Misleading test name GptMiniVariant (2/3 reviewers)

"gpt-4.1-preview" contains no "mini" substring and no mini-variant logic fires — this test exercises the basic prefix fuzzy path (gpt-4.1-preview → gpt-4.1). The name will confuse future readers searching for mini-variant coverage.

Suggestion: Rename to something like GetCapabilities_FuzzyPrefix_Gpt41Preview_ReturnsGpt41Flags.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expert Code Review: 1 finding posted inline (minor). See summary comment for details.

Generated by Expert Code Review · ● 2.7M

Comment on lines +2444 to +2449
Assert.True(caps.HasFlag(ModelCapability.CostEfficient));
}

[Fact]
public void InferFromName_CodexVariant_AddsCodeExpert()
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 MINOR · 2/3 reviewers

Enshrining a production quirk as required test behavior. These assertions lock in the "gemini""mini" substring false-positive as expected, which means a future fix to use word-boundary matching in InferFromName would break these tests.

Suggestion: Consider splitting the gemini assertions — keep ReasoningExpert | LargeContext | Vision as the required core, and assert Fast | CostEfficient separately with a comment like // TODO: Known quirk — "gemini" contains "mini". Remove when InferFromName uses word-boundary matching. This way a future fix only needs to update the quirk assertion, not the core gemini test.

@PureWeen
Copy link
Copy Markdown
Owner Author

/review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant