Skip to content

Fix HTTP-continuation loop: externalize table-function scan state as a cursor#1

Merged
rustyconover merged 1 commit into
mainfrom
fix/http-continuation-cursor
Jun 24, 2026
Merged

Fix HTTP-continuation loop: externalize table-function scan state as a cursor#1
rustyconover merged 1 commit into
mainfrom
fix/http-continuation-cursor

Conversation

@rustyconover

Copy link
Copy Markdown
Contributor

The bug

All five vision table functions (classify x4 overloads, image_classes()) were
TableFunctionGenerator[Args] with state: None, emitting all rows in one
out.emit() then out.finish().

Over the stateless http transport the framework round-trips a producer's
per-scan state through a continuation token and emits at most one producer batch
per response. A position-less (None) state restarts from row 0 on every resume
and loops forever once the result exceeds one batch. image_classes() = 1000
rows is the smoking gun. subprocess/unix keep live state in-process, so they hide
the bug; only http (and a serialize-between-ticks unit test) expose it.

The fix (mirrors vgi-search ScanState)

  • Add ROWS_PER_TICK = 64 and ScanState(ArrowSerializableDataclass) with
    started/offset/rows_ipc, plus result_to_ipc/ipc_to_table helpers.
  • Convert all five functions to TableFunctionGenerator[Args, ScanState] with
    initial_state. First tick materializes the full output batch into
    state.rows_ipc; each tick emits one bounded ROWS_PER_TICK slice from
    offset, advances offset before out.emit() (so a tick suspended at the
    limit-1 boundary serializes the advanced offset), and finishes when drained.
  • The NULL/empty-image early-out (zero-row batch + finish) is preserved. Rows and
    schema are byte-identical to before.

Validation (fail-old / pass-new)

  • Test harness invoke_table_function(..., serialize_state=True) now models http:
    caps each tick to one producer batch and wire-serializes ScanState between
    every tick, with a 1000-tick guard. On the old emit-all/None code it
    overruns the guard (re-emits row 0); on the cursor code it terminates with
    each row emitted exactly once. Confirmed: the new tests fail on the old
    tables.py (guard trips) and pass on the new one.
  • TestScanStateRoundTrip (model-gated, 1000-row image_classes) and
    TestCursorSurvivesContinuation (offline, model.classify_image monkeypatched
    to 200 synthetic preds so it runs without the ONNX weights), plus small-chunk
    (ROWS_PER_TICK=2) and no-preds cases.
  • test/sql/classify.test asserts image_classes() pages correctly over http:
    count, dense idx 0..999, ordered head at OFFSET 500.

Verified locally: pytest -q green; SQL E2E green on subprocess, http, and
unix
transports; ruff format/check, mypy, pydoclint all clean.

🤖 Generated with Claude Code

…a cursor

The five vision table functions were TableFunctionGenerator[Args] with
`state: None`, emitting all rows in one `out.emit()` then `out.finish()`. Over
the stateless HTTP transport the framework round-trips per-scan state through a
continuation token and emits at most one producer batch per response; a
position-less state restarts from row 0 on every resume and loops forever once
the result exceeds one batch. `image_classes()` (1000 rows) is the smoking gun.
subprocess/unix keep live state in-process and hide the bug.

Fix (mirrors vgi-search's ScanState):
- Add `ROWS_PER_TICK = 64` and `ScanState(ArrowSerializableDataclass)` with
  `started`/`offset`/`rows_ipc` (+ `result_to_ipc`/`ipc_to_table` helpers).
- Convert all five functions to TableFunctionGenerator[Args, ScanState] with
  `initial_state`. First tick materializes the full output batch into
  `state.rows_ipc`; each tick emits one bounded ROWS_PER_TICK slice from
  `offset`, advances `offset` BEFORE emit (so a tick suspended at the limit-1
  boundary serializes the advanced offset), and finishes when drained. The
  NULL/empty-image early-out (zero-row batch + finish) is preserved. Rows and
  schema are byte-identical.

Validation:
- harness `invoke_table_function(..., serialize_state=True)` models http by
  capping each tick to one producer batch and wire-serializing ScanState between
  every tick, with a 1000-tick guard. On the old emit-all/None code it overruns
  the guard (re-emits row 0); on the cursor code it terminates once per row.
- New TestScanStateRoundTrip (model-gated, 1000-row image_classes) and
  TestCursorSurvivesContinuation (offline, monkeypatched classify_image -> 200
  synthetic preds) tests, plus small-chunk and no-preds cases.
- classify.test asserts image_classes() pages correctly over http (count, dense
  idx, ordered head at offset 500). Verified locally green on subprocess, http,
  and unix transports.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@rustyconover rustyconover merged commit 22a2c93 into main Jun 24, 2026
10 checks passed
@rustyconover rustyconover deleted the fix/http-continuation-cursor branch June 24, 2026 03:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant