Fix HTTP-continuation loop: externalize table-function scan state as a cursor#1
Merged
Merged
Conversation
…a cursor The five vision table functions were TableFunctionGenerator[Args] with `state: None`, emitting all rows in one `out.emit()` then `out.finish()`. Over the stateless HTTP transport the framework round-trips per-scan state through a continuation token and emits at most one producer batch per response; a position-less state restarts from row 0 on every resume and loops forever once the result exceeds one batch. `image_classes()` (1000 rows) is the smoking gun. subprocess/unix keep live state in-process and hide the bug. Fix (mirrors vgi-search's ScanState): - Add `ROWS_PER_TICK = 64` and `ScanState(ArrowSerializableDataclass)` with `started`/`offset`/`rows_ipc` (+ `result_to_ipc`/`ipc_to_table` helpers). - Convert all five functions to TableFunctionGenerator[Args, ScanState] with `initial_state`. First tick materializes the full output batch into `state.rows_ipc`; each tick emits one bounded ROWS_PER_TICK slice from `offset`, advances `offset` BEFORE emit (so a tick suspended at the limit-1 boundary serializes the advanced offset), and finishes when drained. The NULL/empty-image early-out (zero-row batch + finish) is preserved. Rows and schema are byte-identical. Validation: - harness `invoke_table_function(..., serialize_state=True)` models http by capping each tick to one producer batch and wire-serializing ScanState between every tick, with a 1000-tick guard. On the old emit-all/None code it overruns the guard (re-emits row 0); on the cursor code it terminates once per row. - New TestScanStateRoundTrip (model-gated, 1000-row image_classes) and TestCursorSurvivesContinuation (offline, monkeypatched classify_image -> 200 synthetic preds) tests, plus small-chunk and no-preds cases. - classify.test asserts image_classes() pages correctly over http (count, dense idx, ordered head at offset 500). Verified locally green on subprocess, http, and unix transports. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The bug
All five vision table functions (
classifyx4 overloads,image_classes()) wereTableFunctionGenerator[Args]withstate: None, emitting all rows in oneout.emit()thenout.finish().Over the stateless http transport the framework round-trips a producer's
per-scan state through a continuation token and emits at most one producer batch
per response. A position-less (
None) state restarts from row 0 on every resumeand loops forever once the result exceeds one batch.
image_classes()= 1000rows is the smoking gun. subprocess/unix keep live state in-process, so they hide
the bug; only http (and a serialize-between-ticks unit test) expose it.
The fix (mirrors vgi-search
ScanState)ROWS_PER_TICK = 64andScanState(ArrowSerializableDataclass)withstarted/offset/rows_ipc, plusresult_to_ipc/ipc_to_tablehelpers.TableFunctionGenerator[Args, ScanState]withinitial_state. First tick materializes the full output batch intostate.rows_ipc; each tick emits one boundedROWS_PER_TICKslice fromoffset, advancesoffsetbeforeout.emit()(so a tick suspended at thelimit-1 boundary serializes the advanced offset), and finishes when drained.
schema are byte-identical to before.
Validation (fail-old / pass-new)
invoke_table_function(..., serialize_state=True)now models http:caps each tick to one producer batch and wire-serializes
ScanStatebetweenevery tick, with a 1000-tick guard. On the old emit-all/
Nonecode itoverruns the guard (re-emits row 0); on the cursor code it terminates with
each row emitted exactly once. Confirmed: the new tests fail on the old
tables.py(guard trips) and pass on the new one.TestScanStateRoundTrip(model-gated, 1000-rowimage_classes) andTestCursorSurvivesContinuation(offline,model.classify_imagemonkeypatchedto 200 synthetic preds so it runs without the ONNX weights), plus small-chunk
(
ROWS_PER_TICK=2) and no-preds cases.test/sql/classify.testassertsimage_classes()pages correctly over http:count, dense
idx0..999, ordered head at OFFSET 500.Verified locally:
pytest -qgreen; SQL E2E green on subprocess, http, andunix transports; ruff format/check, mypy, pydoclint all clean.
🤖 Generated with Claude Code