Skip to content

Fix cache position phase detection 🤖🤖🤖#243

Open
fallintoplace wants to merge 1 commit into
NVIDIA:mainfrom
fallintoplace:fix/cache-position-phase-detection
Open

Fix cache position phase detection 🤖🤖🤖#243
fallintoplace wants to merge 1 commit into
NVIDIA:mainfrom
fallintoplace:fix/cache-position-phase-detection

Conversation

@fallintoplace

Copy link
Copy Markdown

Summary

  • Add a shared is_prefilling helper based on zero-based cache_position length.
  • Use the helper across prefill-only and decoding press hooks so phase detection is consistent.
  • Add focused regression coverage for single-token prefill versus single-token decoding.

Root cause

cache_position is zero-based, so checks like cache_position[-1] <= q_len missed the +1 when comparing against q_len. That made cache_position=[1] with q_len=1 look like prefill even though it is the first decode step after a one-token prompt.

Validation

  • uv run pytest tests/test_phase_detection.py
  • uv run pytest tests/test_decoding_compression.py::test_decoding_press_reuse_across_sequences
  • uv run flake8 kvpress/presses/base_press.py kvpress/presses/decoding_press.py kvpress/presses/prefill_decoding_press.py kvpress/presses/cam_press.py kvpress/presses/dms_press.py kvpress/presses/fastkvzip_press.py tests/test_phase_detection.py
  • uv run mypy kvpress/presses/base_press.py kvpress/presses/decoding_press.py kvpress/presses/prefill_decoding_press.py kvpress/presses/cam_press.py kvpress/presses/dms_press.py kvpress/presses/fastkvzip_press.py --check-untyped-defs
  • make style

@copy-pr-bot

copy-pr-bot Bot commented Jun 16, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant