Background
pathwaycom/llm-app#129 added a Video RAG with TwelveLabs template containing two reusable building blocks:
TwelveLabsVideoParser (pw.UDF) — uploads video bytes as a TwelveLabs asset and turns them into a rich text description via the Pegasus video-understanding model. Output is the standard list[(text, metadata)], so it chunks/embeds/indexes exactly like the built-in PDF parsers.
MarengoEmbedder (BaseEmbedder) — a retriever embedder backed by the Marengo multimodal model, returning 512-dim vectors in a shared text/image/audio/video embedding space.
Both currently live in a per-template pathway_twelvelabs package. This issue proposes promoting them into pathway.xpacks.llm as first-class, supported components.
Motivation
- Fills a real gap in modality coverage.
xpacks/llm/parsers.py already ships ImageParser and AudioParser but has no video parser. MarengoEmbedder would be the first xpacks embedder in a shared multimodal space. New modality, zero new infrastructure.
- Already conforms to xpacks conventions.
MarengoEmbedder subclasses BaseEmbedder like OpenAIEmbedder/GeminiEmbedder/BedrockEmbedder and reuses udfs.async_executor, ExponentialBackoffRetryStrategy, CacheStrategy, max_batch_size. TwelveLabsVideoParser mirrors the existing pw.UDF parsers. Minimal adaptation required.
- Generic, not template-specific. Prompt, model, API key, retries, and caching are all parameterized; no template-specific assumptions in the classes themselves.
Proposed scope
Shortly after the components are ported into the Pathway framework and the changes are released, it makes sense to refactor the llm-app template so that it imports components directly from Pathway.
Background
pathwaycom/llm-app#129 added a Video RAG with TwelveLabs template containing two reusable building blocks:
TwelveLabsVideoParser(pw.UDF) — uploads video bytes as a TwelveLabs asset and turns them into a rich text description via the Pegasus video-understanding model. Output is the standardlist[(text, metadata)], so it chunks/embeds/indexes exactly like the built-in PDF parsers.MarengoEmbedder(BaseEmbedder) — a retriever embedder backed by the Marengo multimodal model, returning 512-dim vectors in a shared text/image/audio/video embedding space.Both currently live in a per-template
pathway_twelvelabspackage. This issue proposes promoting them intopathway.xpacks.llmas first-class, supported components.Motivation
xpacks/llm/parsers.pyalready shipsImageParserandAudioParserbut has no video parser.MarengoEmbedderwould be the first xpacks embedder in a shared multimodal space. New modality, zero new infrastructure.MarengoEmbeddersubclassesBaseEmbedderlikeOpenAIEmbedder/GeminiEmbedder/BedrockEmbedderand reusesudfs.async_executor,ExponentialBackoffRetryStrategy,CacheStrategy,max_batch_size.TwelveLabsVideoParsermirrors the existingpw.UDFparsers. Minimal adaptation required.Proposed scope
pathway/xpacks/llm/fold intoembedders.py/parsers.pywith exports in the respective__init__.twelvelabsan optional dependency via the existing lazy-import +ImportErrorguidance pattern, so plainimport pathwayis unaffected. Create atwelvelabssection inpyproject.tomlso that the dependency can be installed viapip install pathway[twelvelabs].python/pathway/xpacks/llm/tests/; keep the live smoke test gated behindTWELVELABS_API_KEY.Shortly after the components are ported into the Pathway framework and the changes are released, it makes sense to refactor the
llm-apptemplate so that it imports components directly from Pathway.