Skip to content

GenAI servables - refactor input processing#4318

Draft
mzegla wants to merge 1 commit into
mainfrom
servables_input_flow_refactor_1
Draft

GenAI servables - refactor input processing#4318
mzegla wants to merge 1 commit into
mainfrom
servables_input_flow_refactor_1

Conversation

@mzegla

@mzegla mzegla commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

@mzegla mzegla force-pushed the servables_input_flow_refactor_1 branch 3 times, most recently from 7ba1037 to f8f5106 Compare June 23, 2026 14:49
@mzegla mzegla force-pushed the servables_input_flow_refactor_1 branch from f8f5106 to 6fe4461 Compare June 23, 2026 14:53
@mzegla mzegla requested a review from Copilot June 23, 2026 14:54

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors GenAI servable input handling to use a unified InputRequest + InputProcessor chain, moving generation-config extraction into the API handler and deferring multimodal image decoding (and related validation) out of the OpenAI request parsers.

Changes:

  • Introduces InputRequest and an InputProcessor pipeline (raw prompt extraction, chat template application, tokenization, deferred image decoding, text-content normalization).
  • Updates LM/VLM servables and executors to consume executionContext->inputRequest (and removes legacy prepareInputs overrides in VLM servables).
  • Updates OpenAI handlers/tests to preserve multimodal content arrays in ChatHistory, removes processedJson/imageHistory, and adjusts tools parsing assertions.

Reviewed changes

Copilot reviewed 36 out of 36 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/test/llm/llmnode_test.cpp Updates tests to use executionContext.inputRequest.inputIds.
src/test/llm/input_processing/raw_prompt_extractor_test.cpp Adds unit tests for RawPromptExtractor.
src/test/llm/input_processing/image_decoding_processor_test.cpp Adds unit tests for ImageDecodingProcessor behavior without actual decoding.
src/test/http_openai_handler_test.cpp Updates parsing tests to assert ChatHistory preservation and tool map contents (no processedJson/imageHistory).
src/llm/visual_language_model/legacy/servable.hpp Removes VLM legacy inputText/inputImages fields and sets isVLM in processor context.
src/llm/visual_language_model/legacy/servable.cpp Switches to extractInputRequest(); removes legacy VLM prepareInputs implementation.
src/llm/visual_language_model/legacy/legacy_executor.cpp Uses inputRequest.promptText/inputImages/generationConfig for generation.
src/llm/visual_language_model/continuous_batching/servable.hpp Removes VLM CB inputText/inputImages fields and sets isVLM in processor context.
src/llm/visual_language_model/continuous_batching/servable.cpp Uses inputRequest.* when adding requests; removes legacy VLM prepareInputs.
src/llm/servable.hpp Replaces inputIds/GenerationConfigBuilder in execution context with InputRequest; adds InputProcessorContext.
src/llm/servable.cpp Refactors base parseRequest/prepareInputs to build and process InputRequest.
src/llm/servable_initializer.cpp Populates InputProcessorContext (tokenizer + optional Python template processor).
src/llm/language_model/legacy/servable.cpp Uses inputRequest for generation config and NPU input-length validation.
src/llm/language_model/legacy/legacy_executor.cpp Uses inputRequest.inputIds/generationConfig for generation.
src/llm/language_model/continuous_batching/servable.cpp Uses inputRequest for scheduler limits and pipeline add_request.
src/llm/io_processing/input_request.hpp Adds InputRequest and InputPayload variant.
src/llm/io_processing/input_processors/tokenization_processor.hpp Adds tokenization processor definition.
src/llm/io_processing/input_processors/tokenization_processor.cpp Implements tokenization into req.inputIds.
src/llm/io_processing/input_processors/text_content_normalization_processor.hpp Adds text-only content-array normalizer (LM paths).
src/llm/io_processing/input_processors/text_content_normalization_processor.cpp Implements content-array flattening to string with \\n joins.
src/llm/io_processing/input_processors/raw_prompt_extractor.hpp Adds raw prompt extractor (COMPLETIONS path).
src/llm/io_processing/input_processors/image_decoding_processor.hpp Adds deferred image decoding processor (VLM paths).
src/llm/io_processing/input_processors/image_decoding_processor.cpp Implements image decoding + <ov_genai_image_N> injection into message content.
src/llm/io_processing/input_processors/chat_template_processor.hpp Adds chat template processor (Python and native paths).
src/llm/io_processing/input_processors/chat_template_processor.cpp Implements prompt building from ChatHistory.
src/llm/io_processing/input_processor.hpp Adds orchestrator selecting processors based on config + payload variant.
src/llm/io_processing/input_processor.cpp Builds and executes the processor chain.
src/llm/io_processing/input_processor_context.hpp Adds per-deployment resources for input processing.
src/llm/io_processing/input_processing_config.hpp Adds deployment-level processing config (isVLM).
src/llm/io_processing/base_input_processor.hpp Adds base interface for processing steps.
src/llm/BUILD Adds Bazel targets/deps for new IO processing components.
src/llm/apis/openai_responses.cpp Preserves content arrays in ChatHistory and removes Python processedJson path + eager image decoding.
src/llm/apis/openai_request.hpp Removes processedJson and imageHistory from OpenAIRequest.
src/llm/apis/openai_completions.cpp Preserves multimodal content arrays in ChatHistory and removes eager image decoding + processedJson rebuild.
src/llm/apis/openai_api_handler.hpp Removes getProcessedJson/getImageHistory; adds extractInputRequest().
src/llm/apis/openai_api_handler.cpp Implements extractInputRequest() and removes processedJson mutations from tools parsing.

Comment on lines +19 to +21
#include <string>
#include <unordered_map>
#include <utility>
Comment on lines +39 to +44
for (size_t i = 0; i < chatHistory.size(); i++) {
const auto content = chatHistory[i]["content"];
if (content.as_string().value_or("").find("<ov_genai_image_") != std::string::npos) {
return absl::InvalidArgumentError("Message contains restricted <ov_genai_image> tag");
}
}
Comment on lines +69 to +71
} else if (type == "text") {
textContent += part["text"].as_string().value_or("");
}
Comment thread src/llm/servable.cpp
Comment on lines 207 to +210
if (getProperties()->maxModelLength.has_value()) {
if (executionContext->inputIds.get_size() > getProperties()->maxModelLength.value()) {
if (req.inputIds.get_size() > getProperties()->maxModelLength.value()) {
std::stringstream ss;
ss << "Number of prompt tokens: " << executionContext->inputIds.get_size() << " exceeds model max length: " << getProperties()->maxModelLength.value();
ss << "Number of prompt tokens: " << req.inputIds.get_size()
Comment on lines +499 to +507
InputRequest req;
req.generationConfig = configBuilder.getConfig();
if (endpoint == Endpoint::COMPLETIONS) {
req.input = request.prompt.value_or("");
} else {
// CHAT_COMPLETIONS and RESPONSES both use ChatHistory.
// Copied (not moved) so the handler retains its own copy for response serialization.
req.input = request.chatHistory;
}
Comment on lines +49 to +65
if (isChatPath) {
#if (PYTHON_DISABLE == 0)
processors.emplace_back(std::make_unique<ChatTemplateProcessor>(
context.tokenizer,
*context.templateProcessor,
context.modelsPath));
#else
processors.emplace_back(std::make_unique<ChatTemplateProcessor>(context.tokenizer));
#endif
} else {
processors.emplace_back(std::make_unique<RawPromptExtractor>());
}

if (!context.config.isVLM) {
processors.emplace_back(std::make_unique<TokenizationProcessor>(
context.tokenizer, addSpecialTokens));
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants