Skip to content

Empty intermediate SSE fix#4321

Open
mzegla wants to merge 1 commit into
mainfrom
vlm_race_fix
Open

Empty intermediate SSE fix#4321
mzegla wants to merge 1 commit into
mainfrom
vlm_race_fix

Conversation

@mzegla

@mzegla mzegla commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets a streaming edge case where the final “finish-only” flush can lead to an extra/empty intermediate SSE iteration by propagating an isLast flag from OVMSTextStreamer through servable callbacks into DeltaChannel.

Changes:

  • Extend OVMSTextStreamer::Callback to include an isLast boolean derived from finish_reason != NONE.
  • Extend DeltaChannel::push() to accept isLast and (currently) mark the channel complete when isLast=true.
  • Update GenAI + legacy LM/VLM servables to forward isLast through their streamer callbacks.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/llm/visual_language_model/legacy/servable.cpp Updates legacy VLM callbacks to forward isLast into the delta channel / unary accumulator.
src/llm/servable.hpp Updates DeltaChannel::push() signature and completion behavior when pushing the last delta.
src/llm/servable.cpp Updates base GenAI streaming callback signature to accept/forward isLast.
src/llm/ovms_text_streamer.hpp Updates OVMSTextStreamer::Callback typedef and documents isLast.
src/llm/ovms_text_streamer.cpp Computes isLast from finish_reason and forwards it to the callback, including finish-only flush handling.
src/llm/language_model/legacy/servable.cpp Updates legacy LM streaming callback signature and forwards isLast into the delta channel.
Comments suppressed due to low confidence (1)

src/llm/visual_language_model/legacy/servable.cpp:162

  • The streamer callback can receive finish-only deltas (no "delta" object). In that case, rapidjson::Document may be empty/non-object, and calling HasMember() directly can trigger RapidJSON assertions/UB. Guard on delta.IsObject() before HasMember() so the unary VLM path safely ignores finish-only documents.
        auto unaryCallback = [& ctx = *legacyExecutionContext](rapidjson::Document delta, bool /*isLast*/) -> ov::genai::StreamingStatus {
            if (delta.HasMember("delta") && delta["delta"].IsObject() &&
                delta["delta"].HasMember("content") && delta["delta"]["content"].IsString()) {
                ctx.accumulatedUnaryText += delta["delta"]["content"].GetString();
            }

Comment on lines +203 to 208
if (isLast) {
// Parser produced no delta for the final flush (e.g. generation ended on a
// special token the parser absorbed). Still fire the callback with an empty
// Document so the caller can emit the finish_reason chunk.
return m_callback(rapidjson::Document{}, true);
}
Comment thread src/llm/servable.hpp
Comment on lines +84 to 92
// When isLast is true, also marks the channel complete atomically so consumers
// always see the final document and the completion flag in the same observation.
void push(rapidjson::Document delta, bool isLast = false) {
{
std::lock_guard<std::mutex> lock(m_mutex);
m_deltas.push_back(std::move(delta));
if (isLast)
m_complete = true;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants