Add optional TwelveLabs Pegasus footage analysis by mohit-twelvelabs · Pull Request #140 · openvideodev/openvideo

mohit-twelvelabs · 2026-06-25T10:21:30Z

Hi! I'm Mohit, I work at TwelveLabs (@mohit-twelvelabs).

What this adds

An opt-in TwelveLabs Pegasus backend for the video footage analyzer in the asset indexer (apps/director/src/rag). Today, AssetIndexerService describes each detected scene by extracting 3 keyframes and sending them to Gemini Flash. This PR adds PegasusAnalyzerService, which instead analyzes each scene window natively with Pegasus — a video-understanding model that reasons over the actual footage (motion, temporal context, on-screen text) rather than sampled stills.

It returns the exact same { description, objects, topics, keywords } shape the visual timeline already expects, so the two backends are drop-in interchangeable.

Why it helps OpenVideo

The "Semantic Search" and "AI Director" features are only as good as the scene descriptions feeding the vector store. Keyframe sampling can miss anything that happens between frames — actions, camera moves, transient on-screen text. Pegasus watches the whole clip, which tends to produce richer, more accurate descriptions for footage-heavy content, improving downstream retrieval and auto-editing. It also removes per-scene local ffmpeg keyframe extraction from that path.

Opt-in / non-breaking

Default behavior is unchanged. The Gemini keyframe path stays the default.
Pegasus is used only when both TWELVELABS_API_KEY is set and VISUAL_ANALYZER=twelvelabs.
On any Pegasus error, the indexer transparently falls back to the existing Gemini path — indexing never fails because of this.
New env vars are documented in AGENTS.md.

How it was tested

Added pegasus-analyzer.service.spec.ts (Vitest, matching the existing auto-caption.skill.spec.ts style):
- No-network unit tests for isEnabled() gating and the JSON/markdown/fallback parsing logic.
- A live wiring test that is skipped automatically unless TWELVELABS_API_KEY is present.
Ran the suite against the real API: all 7 pass, including a live Pegasus 1.5 analysis of a public sample video returning the structured scene JSON. Marengo embeddings (512-dim) were also verified live while validating the SDK.
oxlint and oxfmt clean on all changed files; the new service type-checks against the [email protected] SDK types.

You can grab a free API key at https://twelvelabs.io — there's a generous free tier.

Adds PegasusAnalyzerService as an opt-in visual analysis backend for the asset indexer. When VISUAL_ANALYZER=twelvelabs and TWELVELABS_API_KEY are set, video scenes are described natively by Pegasus (no keyframe sampling) instead of the default Gemini path. Falls back to Gemini on any error. Non-breaking: default behavior is unchanged when the key is absent.

vercel · 2026-06-25T10:21:34Z

@mohit-twelvelabs is attempting to deploy a commit to the openvideo Team on Vercel.

A member of the Team first needs to authorize it.

xo-o · 2026-06-25T23:11:52Z

I would to prefer to add twelvelabs support here processor-modal/src/services/twelvelabs_analyzer.py

…or-modal/src/services/twelvelabs_analyzer.py Moves the opt-in TwelveLabs Pegasus footage analysis from the director app (pegasus-analyzer.service.ts) to the processor-modal Python service layer as TwelveLabsVisionAnalyzer, implementing the VisionAnalyzer interface alongside GeminiVisionAnalyzer per maintainer request. - New service uses httpx against the TwelveLabs v1.3 REST API (x-api-key), matching the sibling DeepgramTranscriber/GeminiVisionAnalyzer conventions. - Pegasus analyzes a video URL + scene window (its native strength); the frame/text interface methods lazily delegate to Gemini, so a TwelveLabs-only deployment needs no Google key. - VideoIndexer selects the backend via VISUAL_ANALYZER=twelvelabs + a key; default Gemini path is unchanged when unset. Per-scene fallback to keyframes on any Pegasus error. - Removes the TS Pegasus service, its spec, DI wiring, and the twelvelabs-js dependency from the director app. - Adds a focused test (no-network parse/selection + key-gated live analyze) and documents the opt-in env vars in .env.example and README.

mohit-twelvelabs · 2026-06-26T00:03:48Z

Thanks for the steer, @xo-o — agreed that's the right home. I moved the TwelveLabs integration over to apps/processor-modal/src/services/twelvelabs_analyzer.py (commit ae67c44), implementing the existing VisionAnalyzer interface as TwelveLabsVisionAnalyzer alongside GeminiVisionAnalyzer and following the sibling-service conventions there (httpx against the TwelveLabs v1.3 REST API with x-api-key, os.getenv config, VisionAnalysisError on misconfig, same parse-with-fallback as the Gemini analyzer).

A few notes:

Still fully opt-in and non-breaking: the indexer only selects Pegasus when VISUAL_ANALYZER=twelvelabs and TWELVELABS_API_KEY are set; otherwise the default Gemini path is unchanged. Per-scene fallback to keyframes on any Pegasus error.
Pegasus analyzes the source video URL + scene window (its native strength — it needs >=4s of real footage, not stills), so the frame/text interface methods lazily delegate to Gemini. That means a TwelveLabs-only deployment doesn't even need a Google key.
Removed the old TS service (pegasus-analyzer.service.ts), its spec, the DI wiring in rag.module.ts/asset-indexer.service.ts, and the twelvelabs-js dependency from the director app.
Added a focused test (no-network parse/selection + a live analyze test gated on TWELVELABS_API_KEY) and documented the opt-in env vars in .env.example and the README. Verified the live Pegasus scene-analysis call end to end.

Happy to adjust naming or the selection seam if you'd prefer something different.

— Mohit (@mohit-twelvelabs, TwelveLabs)

xo-o · 2026-06-28T11:01:29Z

TwelveLabs integration is being added here (#144 ), along with other refactorings to support additional providers.

mohit-twelvelabs · 2026-06-28T11:35:54Z

That's great to hear, @xo-o — really glad TwelveLabs is landing in openvideo via #144, and the multi-provider refactor sounds like the right foundation. Happy to close this in favor of #144 whenever you'd like.

If it's useful, a couple of things I verified while building this that might save you time in #144:

Pegasus (pegasus1.5) doesn't accept a bare video_id — pass video=VideoContext_Url(url=...) or VideoContext_AssetId(asset_id=...). Direct local-file asset upload caps at 200MB (public URLs go up to 4GB), and the analysis window needs to be at least 4s.
Marengo (marengo3.0) embeddings: /v1.3/embed wants multipart/form-data for every request including text-only; the raw JSON vector key is float (the Python SDK aliases it to float_).

Happy to review the TwelveLabs parts of #144 or help with anything API-side — just tag me. Thanks for picking it up!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add optional TwelveLabs Pegasus footage analysis#140

Add optional TwelveLabs Pegasus footage analysis#140
mohit-twelvelabs wants to merge 2 commits into
openvideodev:mainfrom
mohit-twelvelabs:feat/twelvelabs-integration

mohit-twelvelabs commented Jun 25, 2026

Uh oh!

vercel Bot commented Jun 25, 2026

Uh oh!

xo-o commented Jun 25, 2026

Uh oh!

mohit-twelvelabs commented Jun 26, 2026

Uh oh!

xo-o commented Jun 28, 2026

Uh oh!

mohit-twelvelabs commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mohit-twelvelabs commented Jun 25, 2026

What this adds

Why it helps OpenVideo

Opt-in / non-breaking

How it was tested

Uh oh!

vercel Bot commented Jun 25, 2026

Uh oh!

xo-o commented Jun 25, 2026

Uh oh!

mohit-twelvelabs commented Jun 26, 2026

Uh oh!

xo-o commented Jun 28, 2026

Uh oh!

mohit-twelvelabs commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants