Investigate: ProcessingState doesn't trigger response close when input EOSes mid-run

While debugging a flaky restate E2E test (`InvokerMemoryTest`) we observed the SDK keeping its HTTP/2 response stream open for the full 5-second runtime drain timeout when the runtime closes the request stream mid-attempt (e.g., on OOM yield).

Tracing the path:

- `HttpRequestFlowAdapter.handleRequestEnd` (`sdk-http-vertx/src/main/java/dev/restate/sdk/http/vertx/HttpRequestFlowAdapter.java:95`) → `inputMessagesSubscriber.onComplete()` on EOS.
- `StateMachineImpl.onComplete` (`sdk-core/src/main/java/dev/restate/sdk/core/statemachine/StateMachineImpl.java:160`) → `currentState.onInputClosed(stateContext)` then `triggerNextEventSignal()`.
- `WaitingStartState` / `WaitingReplayEntriesState` correctly throw → `hitError` → response closes. ✓
- The default `onInputClosed` in `State.java:195-198` (used by `ProcessingState`) only marks input closed; it doesn't transition or close the response.
- `ProcessingState.doProgress` (`ProcessingState.java:83-90`) does close the response (`hitSuspended`) — but only when called and only if no run is currently executing.

Scenario that hangs: the user coroutine has emitted a `ProposeRunCompletion` and is parked awaiting the ack. Server EOSes the request → `onInputClosed` marks input closed → `triggerNextEventSignal` runs whatever listener was registered, but the awaiting-ack coroutine isn't necessarily that listener, so it never re-enters `doProgress`. The response stays open until something else (eventually) pokes the state machine.

Question: is this intended? Options to investigate:

1. Have `onInputClosed` in `ProcessingState` (and similar) actively schedule a re-run of `doProgress` so the state machine can decide to suspend.
2. Cancel any pending user-code coroutine that's awaiting a completion when input closes.
3. Document the current behavior as a deliberate contract (the runtime must drain) and accept the 5 s tail.

Symptom on the restate side: `Response stream draining timeout!` fired 108 times across 14 invocations in 120 s in a real CI run (https://github.com/restatedev/restate/actions/runs/26099619862/job/76748911672); companion issue filed against restate to revisit whether the server should drain at all on the yield path.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate: ProcessingState doesn't trigger response close when input EOSes mid-run #610

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Investigate: ProcessingState doesn't trigger response close when input EOSes mid-run #610

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions