Skip to content

test(coding-agent): fix compaction test broken by model catalog window drift#118

Merged
code-yeongyu merged 1 commit into
mainfrom
fix/compaction-test-context-window-drift
Jul 5, 2026
Merged

test(coding-agent): fix compaction test broken by model catalog window drift#118
code-yeongyu merged 1 commit into
mainfrom
fix/compaction-test-context-window-drift

Conversation

@code-yeongyu

@code-yeongyu code-yeongyu commented Jul 5, 2026

Copy link
Copy Markdown
Owner

Problem

The Release publish-npm job failed on one test in packages/coding-agent:

agent-session-auto-compaction-queue.test.ts > should trigger threshold compaction for error messages using last successful usage
AssertionError: expected "_runAutoCompaction" to be called with arguments: [ 'threshold', false ]
Number of calls: 0

This blocked publishing v2026.7.5 (nothing was published). It is not a regression in this cycle's code — the test source and all compaction logic are unchanged since v2026.7.4 (whose publish job passed this test).

Root cause

The test uses getModel("anthropic", "claude-sonnet-4-5") and sets up 190K tokens of "last successful usage", expecting threshold compaction to fire. shouldCompact triggers when contextTokens > contextWindow - reserveTokens (default reserveTokens = 16384).

The generated model catalog was refreshed from models.dev, growing Sonnet 4.5's contextWindow from 200000 to 1000000:

  • Old: 190000 > 200000 - 16384 = 183616true → compaction fired → test passed.
  • New: 190000 > 1000000 - 16384 = 983616false → no compaction → test failed.

So the test hardcoded an assumption about a live-catalog value. This is the same drift class as the sibling example-model-id fix (#116).

Fix

Derive the near-limit usage from the model's actual contextWindow (window - 8000, comfortably above the 16384 reserve) so the assertion holds for any window size. Behavior is unchanged — only the test fixture is made drift-proof.

Testing

  • The failing test now passes; full file 6/6 green (RED before, GREEN after).
  • Full npm run check (incl. check:neo) green.

🤖 Generated with Claude Code


Summary by cubic

Fix publish-blocking compaction test by deriving near-limit token usage from the model’s contextWindow, making the test resilient to model catalog window changes.

  • Bug Fixes
    • Replace hardcoded 190K tokens with model.contextWindow - 8_000 (above the 16,384 reserve) so threshold compaction reliably triggers regardless of window size.
    • No behavior changes; only the test fixture is updated.

Written for commit a9609d5. Summary will update on new commits.

Review in cubic

…indow

The 'threshold compaction for error messages using last successful usage' test
hardcoded 190K token usage and relied on Claude Sonnet 4.5's 200K context window
so that usage crossed the compaction threshold (contextWindow - reserveTokens).
The generated model catalog was refreshed from models.dev, growing Sonnet 4.5's
window to 1M, so 190K no longer crosses the threshold and the test failed in the
publish-npm job (which runs the full coding-agent suite), blocking releases.

Derive the near-limit usage from the model's actual contextWindow (window - 8K,
above the 16384 reserve) so the assertion is robust to catalog window changes.
Behavior is unchanged; only the test fixture is made drift-proof.

Co-Authored-By: Claude Fable 5 <[email protected]>

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

You’re at about 95% of the monthly reviewed-line limit. You may want to disable incremental reviews to conserve quota. Reviews will continue until that limit is exceeded. If you need help avoiding interruptions, please contact [email protected].

Re-trigger cubic

@code-yeongyu code-yeongyu merged commit c2dad55 into main Jul 5, 2026
3 checks passed
@code-yeongyu code-yeongyu deleted the fix/compaction-test-context-window-drift branch July 5, 2026 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant