Skip to content

brickgale/caption-sync

Repository files navigation

Caption Sync

Automated video generation with intelligent caption timing using Remotion.

Use as:

  • 📦 npm Library - Import into your automation projects
  • 🎬 CLI Tool - Run commands directly

Features

  • ⏱️ Accurate Timing: Word-level timestamps from Whisper API for perfect sync
  • 🎯 Automatic Caption Timing: Fast mode with weighted word-count distribution
  • 📱 Mobile-First: Defaults to 1080x1920 vertical format (TikTok, Reels, Shorts)
  • 🎬 Green Screen Ready: Pure green background for chroma key compositing
  • 🤖 Speech-to-Text: Transcribe audio and generate captions from actual speech
  • 📏 Smart Character Limits: Automatically split long captions at word boundaries
  • 🎨 Customizable Styling: Configure colors, fonts, letter spacing, and animations
  • 📍 Flexible Placement: Position captions at top, bottom, or center
  • Smooth Animations: Fade, subtle slide-up, or no animation options
  • 🎬 Remotion-based: Professional video output with React components
  • 📚 TypeScript Support: Full type definitions included

Quick Start

As npm Library

npm install caption-sync
import { generateVideo } from 'caption-sync';

generateVideo({
  scriptPath: './script.txt',
  audioPath: './audio.mp3',
  outputPath: './out/video.mp4',
});

Requirements: Node.js v20.6+, FFmpeg installed

👉 See docs/LIBRARY.md for complete API documentation


As CLI Tool

Requirements:

  • Node.js v20.6+ (for built-in .env file support)
  • FFmpeg (install instructions)
  • Linux users: Chrome dependencies (see below)

Install:

git clone <repository-url>
cd caption-sync
npm install

Install FFmpeg:

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

Linux: Chrome dependencies

sudo apt-get install -y libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 \
  libcups2 libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 \
  libxrandr2 libgbm1 libpango-1.0-0 libcairo2 libasound2

Generate video:

npm run generate

Two timing modes available:

  1. Fast mode (word-count estimation):

    npm run generate-captions -- --max-characters=50
  2. Accurate mode (word-level timestamps from audio):

    export OPENAI_API_KEY=your-api-key
    npm run generate-captions -- --transcribe --max-characters=50

    Recommended: Uses Whisper API for perfect sync with actual speech timing

Preview in Remotion Studio:

npm run dev

Output: out/video.mp4

Configuration

Caption Sync is highly customizable. You can configure:

  • 🎨 Styling: Colors, fonts, shadows, backgrounds
  • 📍 Placement: Top, bottom, or center captions
  • Animations: Fade, slide-up, or none
  • 📐 Resolution: Vertical, square, HD, oWAWA FHD presets
  • 📏 Character Limits: Control caption length
  • 🎬 Chroma Key: Green screen backgrounds for compositing

👉 See docs/CONFIGURATION.md for complete configuration guide

Speech-to-Text

Use OpenAI Whisper or ElevenLabs to transcribe audio and get word-by-word highlighting:

Option 1: ElevenLabs Speech-to-Text (Default)

  1. Get an API key from elevenlabs.io/app/settings/api-keys
  2. Add to .env file:
    echo "ELEVENLABS_API_KEY=your-api-key-here" > .env
  3. Generate with word highlighting:
    npm run generate -- --transcribe

Option 2: OpenAI Whisper

⚠️ Note: OpenAI provider is available but has not been tested yet. Use ElevenLabs for verified functionality.

  1. Get an API key from platform.openai.com
  2. Add to .env file:
    echo "OPENAI_API_KEY=your-api-key-here" >> .env
  3. Generate with OpenAI:
    npm run generate -- --transcribe --provider=openai

💡 Node v20.6+ automatically loads .env files - no extra packages needed!

Programmatic Usage with Providers

import { generateCaptions, createTranscriptionProvider } from 'caption-sync';

// Using ElevenLabs (default - tested and verified)
const captions = await generateCaptions({
  scriptPath: './script.txt',
  audioPath: './audio.mp3',
  useTranscription: true,
  transcriptionProvider: 'elevenlabs',
  elevenLabsApiKey: process.env.ELEVENLABS_API_KEY,
});

// Using OpenAI (available but untested)
const captions2 = await generateCaptions({
  scriptPath: './script.txt',
  audioPath: './audio.mp3',
  useTranscription: true,
  transcriptionProvider: 'openai',
  openaiApiKey: process.env.OPENAI_API_KEY,
});

// Or use providers directly (Strategy Pattern)
const provider = createTranscriptionProvider({
  provider: 'elevenlabs',
  apiKey: process.env.ELEVENLABS_API_KEY!,
});
const words = await provider.transcribeWithTimestamps('./audio.mp3');

How It Works

  1. Text Processing: Script is sanitized and split into phrases
  2. Optional Transcription: Audio transcribed with Whisper for accurate timing
  3. Weighted Timing: Duration distributed by word count
  4. Video Rendering: Remotion generates frames with animated captions
  5. Output: Final MP4 video with synced captions

Documentation

Troubleshooting

  • FFmpeg not found: Ensure FFmpeg is installed and in PATH
  • OpenAI API errors: Check that OPENAI_API_KEY is set correctly in .env
  • Captions too long: Reduce MAX_CHARACTERS_PER_CAPTION in config
  • Linux browser errors: Install Chrome dependencies (see installation above)

License

MIT

About

🎙️Syncing Audio & Captions

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors