While debugging a flaky restate E2E test (InvokerMemoryTest) we observed the SDK keeping its HTTP/2 response stream open for the full 5-second runtime drain timeout when the runtime closes the request stream mid-attempt (e.g., on OOM yield).
Tracing the path:
HttpRequestFlowAdapter.handleRequestEnd (sdk-http-vertx/src/main/java/dev/restate/sdk/http/vertx/HttpRequestFlowAdapter.java:95) → inputMessagesSubscriber.onComplete() on EOS.
StateMachineImpl.onComplete (sdk-core/src/main/java/dev/restate/sdk/core/statemachine/StateMachineImpl.java:160) → currentState.onInputClosed(stateContext) then triggerNextEventSignal().
WaitingStartState / WaitingReplayEntriesState correctly throw → hitError → response closes. ✓
- The default
onInputClosed in State.java:195-198 (used by ProcessingState) only marks input closed; it doesn't transition or close the response.
ProcessingState.doProgress (ProcessingState.java:83-90) does close the response (hitSuspended) — but only when called and only if no run is currently executing.
Scenario that hangs: the user coroutine has emitted a ProposeRunCompletion and is parked awaiting the ack. Server EOSes the request → onInputClosed marks input closed → triggerNextEventSignal runs whatever listener was registered, but the awaiting-ack coroutine isn't necessarily that listener, so it never re-enters doProgress. The response stays open until something else (eventually) pokes the state machine.
Question: is this intended? Options to investigate:
- Have
onInputClosed in ProcessingState (and similar) actively schedule a re-run of doProgress so the state machine can decide to suspend.
- Cancel any pending user-code coroutine that's awaiting a completion when input closes.
- Document the current behavior as a deliberate contract (the runtime must drain) and accept the 5 s tail.
Symptom on the restate side: Response stream draining timeout! fired 108 times across 14 invocations in 120 s in a real CI run (https://github.com/restatedev/restate/actions/runs/26099619862/job/76748911672); companion issue filed against restate to revisit whether the server should drain at all on the yield path.
While debugging a flaky restate E2E test (
InvokerMemoryTest) we observed the SDK keeping its HTTP/2 response stream open for the full 5-second runtime drain timeout when the runtime closes the request stream mid-attempt (e.g., on OOM yield).Tracing the path:
HttpRequestFlowAdapter.handleRequestEnd(sdk-http-vertx/src/main/java/dev/restate/sdk/http/vertx/HttpRequestFlowAdapter.java:95) →inputMessagesSubscriber.onComplete()on EOS.StateMachineImpl.onComplete(sdk-core/src/main/java/dev/restate/sdk/core/statemachine/StateMachineImpl.java:160) →currentState.onInputClosed(stateContext)thentriggerNextEventSignal().WaitingStartState/WaitingReplayEntriesStatecorrectly throw →hitError→ response closes. ✓onInputClosedinState.java:195-198(used byProcessingState) only marks input closed; it doesn't transition or close the response.ProcessingState.doProgress(ProcessingState.java:83-90) does close the response (hitSuspended) — but only when called and only if no run is currently executing.Scenario that hangs: the user coroutine has emitted a
ProposeRunCompletionand is parked awaiting the ack. Server EOSes the request →onInputClosedmarks input closed →triggerNextEventSignalruns whatever listener was registered, but the awaiting-ack coroutine isn't necessarily that listener, so it never re-entersdoProgress. The response stays open until something else (eventually) pokes the state machine.Question: is this intended? Options to investigate:
onInputClosedinProcessingState(and similar) actively schedule a re-run ofdoProgressso the state machine can decide to suspend.Symptom on the restate side:
Response stream draining timeout!fired 108 times across 14 invocations in 120 s in a real CI run (https://github.com/restatedev/restate/actions/runs/26099619862/job/76748911672); companion issue filed against restate to revisit whether the server should drain at all on the yield path.