Automated video generation with intelligent caption timing using Remotion.
Use as:
- 📦 npm Library - Import into your automation projects
- 🎬 CLI Tool - Run commands directly
- ⏱️ Accurate Timing: Word-level timestamps from Whisper API for perfect sync
- 🎯 Automatic Caption Timing: Fast mode with weighted word-count distribution
- 📱 Mobile-First: Defaults to 1080x1920 vertical format (TikTok, Reels, Shorts)
- 🎬 Green Screen Ready: Pure green background for chroma key compositing
- 🤖 Speech-to-Text: Transcribe audio and generate captions from actual speech
- 📏 Smart Character Limits: Automatically split long captions at word boundaries
- 🎨 Customizable Styling: Configure colors, fonts, letter spacing, and animations
- 📍 Flexible Placement: Position captions at top, bottom, or center
- ✨ Smooth Animations: Fade, subtle slide-up, or no animation options
- 🎬 Remotion-based: Professional video output with React components
- 📚 TypeScript Support: Full type definitions included
npm install caption-syncimport { generateVideo } from 'caption-sync';
generateVideo({
scriptPath: './script.txt',
audioPath: './audio.mp3',
outputPath: './out/video.mp4',
});Requirements: Node.js v20.6+, FFmpeg installed
👉 See docs/LIBRARY.md for complete API documentation
Requirements:
- Node.js v20.6+ (for built-in .env file support)
- FFmpeg (install instructions)
- Linux users: Chrome dependencies (see below)
Install:
git clone <repository-url>
cd caption-sync
npm installInstall FFmpeg:
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt-get install ffmpegLinux: Chrome dependencies
sudo apt-get install -y libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 \
libcups2 libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 \
libxrandr2 libgbm1 libpango-1.0-0 libcairo2 libasound2Generate video:
npm run generateTwo timing modes available:
-
Fast mode (word-count estimation):
npm run generate-captions -- --max-characters=50
-
Accurate mode (word-level timestamps from audio):
export OPENAI_API_KEY=your-api-key npm run generate-captions -- --transcribe --max-characters=50✨ Recommended: Uses Whisper API for perfect sync with actual speech timing
Preview in Remotion Studio:
npm run dev
Output: out/video.mp4
Caption Sync is highly customizable. You can configure:
- 🎨 Styling: Colors, fonts, shadows, backgrounds
- 📍 Placement: Top, bottom, or center captions
- ✨ Animations: Fade, slide-up, or none
- 📐 Resolution: Vertical, square, HD, oWAWA FHD presets
- 📏 Character Limits: Control caption length
- 🎬 Chroma Key: Green screen backgrounds for compositing
👉 See docs/CONFIGURATION.md for complete configuration guide
Use OpenAI Whisper or ElevenLabs to transcribe audio and get word-by-word highlighting:
- Get an API key from elevenlabs.io/app/settings/api-keys
- Add to
.envfile:echo "ELEVENLABS_API_KEY=your-api-key-here" > .env
- Generate with word highlighting:
npm run generate -- --transcribe
⚠️ Note: OpenAI provider is available but has not been tested yet. Use ElevenLabs for verified functionality.
- Get an API key from platform.openai.com
- Add to
.envfile:echo "OPENAI_API_KEY=your-api-key-here" >> .env
- Generate with OpenAI:
npm run generate -- --transcribe --provider=openai
💡 Node v20.6+ automatically loads .env files - no extra packages needed!
import { generateCaptions, createTranscriptionProvider } from 'caption-sync';
// Using ElevenLabs (default - tested and verified)
const captions = await generateCaptions({
scriptPath: './script.txt',
audioPath: './audio.mp3',
useTranscription: true,
transcriptionProvider: 'elevenlabs',
elevenLabsApiKey: process.env.ELEVENLABS_API_KEY,
});
// Using OpenAI (available but untested)
const captions2 = await generateCaptions({
scriptPath: './script.txt',
audioPath: './audio.mp3',
useTranscription: true,
transcriptionProvider: 'openai',
openaiApiKey: process.env.OPENAI_API_KEY,
});
// Or use providers directly (Strategy Pattern)
const provider = createTranscriptionProvider({
provider: 'elevenlabs',
apiKey: process.env.ELEVENLABS_API_KEY!,
});
const words = await provider.transcribeWithTimestamps('./audio.mp3');- Text Processing: Script is sanitized and split into phrases
- Optional Transcription: Audio transcribed with Whisper for accurate timing
- Weighted Timing: Duration distributed by word count
- Video Rendering: Remotion generates frames with animated captions
- Output: Final MP4 video with synced captions
- Configuration Guide - Styling, placement, animations, and more
- Library API - Use as npm package in your projects
- Transcription Providers - OpenAI, ElevenLabs, and adding new providers
- Examples - Code examples and advanced usage
- Animation Guide - Caption animation options
- FFmpeg not found: Ensure FFmpeg is installed and in PATH
- OpenAI API errors: Check that
OPENAI_API_KEYis set correctly in.env - Captions too long: Reduce
MAX_CHARACTERS_PER_CAPTIONin config - Linux browser errors: Install Chrome dependencies (see installation above)
MIT