Skip to content

Add optional TwelveLabs Pegasus footage analysis#140

Open
mohit-twelvelabs wants to merge 2 commits into
openvideodev:mainfrom
mohit-twelvelabs:feat/twelvelabs-integration
Open

Add optional TwelveLabs Pegasus footage analysis#140
mohit-twelvelabs wants to merge 2 commits into
openvideodev:mainfrom
mohit-twelvelabs:feat/twelvelabs-integration

Conversation

@mohit-twelvelabs

Copy link
Copy Markdown

Hi! I'm Mohit, I work at TwelveLabs (@mohit-twelvelabs).

What this adds

An opt-in TwelveLabs Pegasus backend for the video footage analyzer in the asset indexer (apps/director/src/rag). Today, AssetIndexerService describes each detected scene by extracting 3 keyframes and sending them to Gemini Flash. This PR adds PegasusAnalyzerService, which instead analyzes each scene window natively with Pegasus — a video-understanding model that reasons over the actual footage (motion, temporal context, on-screen text) rather than sampled stills.

It returns the exact same { description, objects, topics, keywords } shape the visual timeline already expects, so the two backends are drop-in interchangeable.

Why it helps OpenVideo

The "Semantic Search" and "AI Director" features are only as good as the scene descriptions feeding the vector store. Keyframe sampling can miss anything that happens between frames — actions, camera moves, transient on-screen text. Pegasus watches the whole clip, which tends to produce richer, more accurate descriptions for footage-heavy content, improving downstream retrieval and auto-editing. It also removes per-scene local ffmpeg keyframe extraction from that path.

Opt-in / non-breaking

  • Default behavior is unchanged. The Gemini keyframe path stays the default.
  • Pegasus is used only when both TWELVELABS_API_KEY is set and VISUAL_ANALYZER=twelvelabs.
  • On any Pegasus error, the indexer transparently falls back to the existing Gemini path — indexing never fails because of this.
  • New env vars are documented in AGENTS.md.

How it was tested

  • Added pegasus-analyzer.service.spec.ts (Vitest, matching the existing auto-caption.skill.spec.ts style):
    • No-network unit tests for isEnabled() gating and the JSON/markdown/fallback parsing logic.
    • A live wiring test that is skipped automatically unless TWELVELABS_API_KEY is present.
  • Ran the suite against the real API: all 7 pass, including a live Pegasus 1.5 analysis of a public sample video returning the structured scene JSON. Marengo embeddings (512-dim) were also verified live while validating the SDK.
  • oxlint and oxfmt clean on all changed files; the new service type-checks against the [email protected] SDK types.

You can grab a free API key at https://twelvelabs.io — there's a generous free tier.

Adds PegasusAnalyzerService as an opt-in visual analysis backend for the
asset indexer. When VISUAL_ANALYZER=twelvelabs and TWELVELABS_API_KEY are
set, video scenes are described natively by Pegasus (no keyframe sampling)
instead of the default Gemini path. Falls back to Gemini on any error.
Non-breaking: default behavior is unchanged when the key is absent.
@vercel

vercel Bot commented Jun 25, 2026

Copy link
Copy Markdown

@mohit-twelvelabs is attempting to deploy a commit to the openvideo Team on Vercel.

A member of the Team first needs to authorize it.

@xo-o

xo-o commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

I would to prefer to add twelvelabs support here processor-modal/src/services/twelvelabs_analyzer.py

…or-modal/src/services/twelvelabs_analyzer.py

Moves the opt-in TwelveLabs Pegasus footage analysis from the director app
(pegasus-analyzer.service.ts) to the processor-modal Python service layer as
TwelveLabsVisionAnalyzer, implementing the VisionAnalyzer interface alongside
GeminiVisionAnalyzer per maintainer request.

- New service uses httpx against the TwelveLabs v1.3 REST API (x-api-key),
  matching the sibling DeepgramTranscriber/GeminiVisionAnalyzer conventions.
- Pegasus analyzes a video URL + scene window (its native strength); the
  frame/text interface methods lazily delegate to Gemini, so a TwelveLabs-only
  deployment needs no Google key.
- VideoIndexer selects the backend via VISUAL_ANALYZER=twelvelabs + a key;
  default Gemini path is unchanged when unset. Per-scene fallback to keyframes
  on any Pegasus error.
- Removes the TS Pegasus service, its spec, DI wiring, and the twelvelabs-js
  dependency from the director app.
- Adds a focused test (no-network parse/selection + key-gated live analyze)
  and documents the opt-in env vars in .env.example and README.
@mohit-twelvelabs

Copy link
Copy Markdown
Author

Thanks for the steer, @xo-o — agreed that's the right home. I moved the TwelveLabs integration over to apps/processor-modal/src/services/twelvelabs_analyzer.py (commit ae67c44), implementing the existing VisionAnalyzer interface as TwelveLabsVisionAnalyzer alongside GeminiVisionAnalyzer and following the sibling-service conventions there (httpx against the TwelveLabs v1.3 REST API with x-api-key, os.getenv config, VisionAnalysisError on misconfig, same parse-with-fallback as the Gemini analyzer).

A few notes:

  • Still fully opt-in and non-breaking: the indexer only selects Pegasus when VISUAL_ANALYZER=twelvelabs and TWELVELABS_API_KEY are set; otherwise the default Gemini path is unchanged. Per-scene fallback to keyframes on any Pegasus error.
  • Pegasus analyzes the source video URL + scene window (its native strength — it needs >=4s of real footage, not stills), so the frame/text interface methods lazily delegate to Gemini. That means a TwelveLabs-only deployment doesn't even need a Google key.
  • Removed the old TS service (pegasus-analyzer.service.ts), its spec, the DI wiring in rag.module.ts/asset-indexer.service.ts, and the twelvelabs-js dependency from the director app.
  • Added a focused test (no-network parse/selection + a live analyze test gated on TWELVELABS_API_KEY) and documented the opt-in env vars in .env.example and the README. Verified the live Pegasus scene-analysis call end to end.

Happy to adjust naming or the selection seam if you'd prefer something different.

— Mohit (@mohit-twelvelabs, TwelveLabs)

@xo-o

xo-o commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

TwelveLabs integration is being added here (#144 ), along with other refactorings to support additional providers.

@mohit-twelvelabs

Copy link
Copy Markdown
Author

That's great to hear, @xo-o — really glad TwelveLabs is landing in openvideo via #144, and the multi-provider refactor sounds like the right foundation. Happy to close this in favor of #144 whenever you'd like.

If it's useful, a couple of things I verified while building this that might save you time in #144:

  • Pegasus (pegasus1.5) doesn't accept a bare video_id — pass video=VideoContext_Url(url=...) or VideoContext_AssetId(asset_id=...). Direct local-file asset upload caps at 200MB (public URLs go up to 4GB), and the analysis window needs to be at least 4s.
  • Marengo (marengo3.0) embeddings: /v1.3/embed wants multipart/form-data for every request including text-only; the raw JSON vector key is float (the Python SDK aliases it to float_).

Happy to review the TwelveLabs parts of #144 or help with anything API-side — just tag me. Thanks for picking it up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants