test(coding-agent): fix compaction test broken by model catalog window drift by code-yeongyu · Pull Request #118 · code-yeongyu/senpi

code-yeongyu · 2026-07-05T16:59:43Z

Problem

The Release publish-npm job failed on one test in packages/coding-agent:

agent-session-auto-compaction-queue.test.ts > should trigger threshold compaction for error messages using last successful usage
AssertionError: expected "_runAutoCompaction" to be called with arguments: [ 'threshold', false ]
Number of calls: 0

This blocked publishing v2026.7.5 (nothing was published). It is not a regression in this cycle's code — the test source and all compaction logic are unchanged since v2026.7.4 (whose publish job passed this test).

Root cause

The test uses getModel("anthropic", "claude-sonnet-4-5") and sets up 190K tokens of "last successful usage", expecting threshold compaction to fire. shouldCompact triggers when contextTokens > contextWindow - reserveTokens (default reserveTokens = 16384).

The generated model catalog was refreshed from models.dev, growing Sonnet 4.5's contextWindow from 200000 to 1000000:

Old: 190000 > 200000 - 16384 = 183616 → true → compaction fired → test passed.
New: 190000 > 1000000 - 16384 = 983616 → false → no compaction → test failed.

So the test hardcoded an assumption about a live-catalog value. This is the same drift class as the sibling example-model-id fix (#116).

Fix

Derive the near-limit usage from the model's actual contextWindow (window - 8000, comfortably above the 16384 reserve) so the assertion holds for any window size. Behavior is unchanged — only the test fixture is made drift-proof.

Testing

The failing test now passes; full file 6/6 green (RED before, GREEN after).
Full npm run check (incl. check:neo) green.

🤖 Generated with Claude Code

Summary by cubic

Fix publish-blocking compaction test by deriving near-limit token usage from the model’s contextWindow, making the test resilient to model catalog window changes.

Bug Fixes
- Replace hardcoded 190K tokens with model.contextWindow - 8_000 (above the 16,384 reserve) so threshold compaction reliably triggers regardless of window size.
- No behavior changes; only the test fixture is updated.

^{Written for commit a9609d5. Summary will update on new commits.}

…indow The 'threshold compaction for error messages using last successful usage' test hardcoded 190K token usage and relied on Claude Sonnet 4.5's 200K context window so that usage crossed the compaction threshold (contextWindow - reserveTokens). The generated model catalog was refreshed from models.dev, growing Sonnet 4.5's window to 1M, so 190K no longer crosses the threshold and the test failed in the publish-npm job (which runs the full coding-agent suite), blocking releases. Derive the near-limit usage from the model's actual contextWindow (window - 8K, above the 16384 reserve) so the assertion is robust to catalog window changes. Behavior is unchanged; only the test fixture is made drift-proof. Co-Authored-By: Claude Fable 5 <[email protected]>

cubic-dev-ai

No issues found across 1 file

_{You’re at about 95% of the monthly reviewed-line limit. You may want to disable incremental reviews to conserve quota. Reviews will continue until that limit is exceeded. If you need help avoiding interruptions, please contact [email protected].}

_{Re-trigger cubic}

cubic-dev-ai Bot reviewed Jul 5, 2026

View reviewed changes

code-yeongyu merged commit c2dad55 into main Jul 5, 2026
3 checks passed

code-yeongyu deleted the fix/compaction-test-context-window-drift branch July 5, 2026 17:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(coding-agent): fix compaction test broken by model catalog window drift#118

test(coding-agent): fix compaction test broken by model catalog window drift#118
code-yeongyu merged 1 commit into
mainfrom
fix/compaction-test-context-window-drift

code-yeongyu commented Jul 5, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

code-yeongyu commented Jul 5, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root cause

Fix

Testing

Summary by cubic

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

code-yeongyu commented Jul 5, 2026 •

edited by cubic-dev-ai Bot

Loading