test(coding-agent): fix compaction test broken by model catalog window drift#118
Merged
Merged
Conversation
…indow The 'threshold compaction for error messages using last successful usage' test hardcoded 190K token usage and relied on Claude Sonnet 4.5's 200K context window so that usage crossed the compaction threshold (contextWindow - reserveTokens). The generated model catalog was refreshed from models.dev, growing Sonnet 4.5's window to 1M, so 190K no longer crosses the threshold and the test failed in the publish-npm job (which runs the full coding-agent suite), blocking releases. Derive the near-limit usage from the model's actual contextWindow (window - 8K, above the 16384 reserve) so the assertion is robust to catalog window changes. Behavior is unchanged; only the test fixture is made drift-proof. Co-Authored-By: Claude Fable 5 <[email protected]>
There was a problem hiding this comment.
No issues found across 1 file
You’re at about 95% of the monthly reviewed-line limit. You may want to disable incremental reviews to conserve quota. Reviews will continue until that limit is exceeded. If you need help avoiding interruptions, please contact [email protected].
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The
Releasepublish-npm job failed on one test inpackages/coding-agent:This blocked publishing
v2026.7.5(nothing was published). It is not a regression in this cycle's code — the test source and all compaction logic are unchanged sincev2026.7.4(whose publish job passed this test).Root cause
The test uses
getModel("anthropic", "claude-sonnet-4-5")and sets up 190K tokens of "last successful usage", expecting threshold compaction to fire.shouldCompacttriggers whencontextTokens > contextWindow - reserveTokens(defaultreserveTokens= 16384).The generated model catalog was refreshed from models.dev, growing Sonnet 4.5's
contextWindowfrom 200000 to 1000000:190000 > 200000 - 16384 = 183616→ true → compaction fired → test passed.190000 > 1000000 - 16384 = 983616→ false → no compaction → test failed.So the test hardcoded an assumption about a live-catalog value. This is the same drift class as the sibling example-model-id fix (#116).
Fix
Derive the near-limit usage from the model's actual
contextWindow(window - 8000, comfortably above the 16384 reserve) so the assertion holds for any window size. Behavior is unchanged — only the test fixture is made drift-proof.Testing
npm run check(incl.check:neo) green.🤖 Generated with Claude Code
Summary by cubic
Fix publish-blocking compaction test by deriving near-limit token usage from the model’s
contextWindow, making the test resilient to model catalog window changes.model.contextWindow - 8_000(above the 16,384 reserve) so threshold compaction reliably triggers regardless of window size.Written for commit a9609d5. Summary will update on new commits.