Caption Sync

Automated video generation with intelligent caption timing using Remotion.

Use as:

📦 npm Library - Import into your automation projects
🎬 CLI Tool - Run commands directly

Features

⏱️ Accurate Timing: Word-level timestamps from Whisper API for perfect sync
🎯 Automatic Caption Timing: Fast mode with weighted word-count distribution
📱 Mobile-First: Defaults to 1080x1920 vertical format (TikTok, Reels, Shorts)
🎬 Green Screen Ready: Pure green background for chroma key compositing
🤖 Speech-to-Text: Transcribe audio and generate captions from actual speech
📏 Smart Character Limits: Automatically split long captions at word boundaries
🎨 Customizable Styling: Configure colors, fonts, letter spacing, and animations
📍 Flexible Placement: Position captions at top, bottom, or center
✨ Smooth Animations: Fade, subtle slide-up, or no animation options
🎬 Remotion-based: Professional video output with React components
📚 TypeScript Support: Full type definitions included

Quick Start

As npm Library

npm install caption-sync

import { generateVideo } from 'caption-sync';

generateVideo({
  scriptPath: './script.txt',
  audioPath: './audio.mp3',
  outputPath: './out/video.mp4',
});

Requirements: Node.js v20.6+, FFmpeg installed

👉 See docs/LIBRARY.md for complete API documentation

As CLI Tool

Requirements:

Node.js v20.6+ (for built-in .env file support)
FFmpeg (install instructions)
Linux users: Chrome dependencies (see below)

Install:

git clone <repository-url>
cd caption-sync
npm install

Install FFmpeg:

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

Linux: Chrome dependencies

sudo apt-get install -y libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 \
  libcups2 libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 \
  libxrandr2 libgbm1 libpango-1.0-0 libcairo2 libasound2

Generate video:

npm run generate

Two timing modes available:

Fast mode (word-count estimation):

npm run generate-captions -- --max-characters=50

Accurate mode (word-level timestamps from audio):
```
export OPENAI_API_KEY=your-api-key
npm run generate-captions -- --transcribe --max-characters=50
```
✨ Recommended: Uses Whisper API for perfect sync with actual speech timing

Preview in Remotion Studio:

npm run dev

Output: out/video.mp4

Configuration

Caption Sync is highly customizable. You can configure:

🎨 Styling: Colors, fonts, shadows, backgrounds
📍 Placement: Top, bottom, or center captions
✨ Animations: Fade, slide-up, or none
📐 Resolution: Vertical, square, HD, oWAWA FHD presets
📏 Character Limits: Control caption length
🎬 Chroma Key: Green screen backgrounds for compositing

👉 See docs/CONFIGURATION.md for complete configuration guide

Speech-to-Text

Use OpenAI Whisper or ElevenLabs to transcribe audio and get word-by-word highlighting:

Option 1: ElevenLabs Speech-to-Text (Default)

Get an API key from elevenlabs.io/app/settings/api-keys

Add to .env file:

echo "ELEVENLABS_API_KEY=your-api-key-here" > .env

Generate with word highlighting:
```
npm run generate -- --transcribe
```

Option 2: OpenAI Whisper

⚠️ Note: OpenAI provider is available but has not been tested yet. Use ElevenLabs for verified functionality.

Get an API key from platform.openai.com

Add to .env file:

echo "OPENAI_API_KEY=your-api-key-here" >> .env

Generate with OpenAI:

npm run generate -- --transcribe --provider=openai

💡 Node v20.6+ automatically loads .env files - no extra packages needed!

Programmatic Usage with Providers

import { generateCaptions, createTranscriptionProvider } from 'caption-sync';

// Using ElevenLabs (default - tested and verified)
const captions = await generateCaptions({
  scriptPath: './script.txt',
  audioPath: './audio.mp3',
  useTranscription: true,
  transcriptionProvider: 'elevenlabs',
  elevenLabsApiKey: process.env.ELEVENLABS_API_KEY,
});

// Using OpenAI (available but untested)
const captions2 = await generateCaptions({
  scriptPath: './script.txt',
  audioPath: './audio.mp3',
  useTranscription: true,
  transcriptionProvider: 'openai',
  openaiApiKey: process.env.OPENAI_API_KEY,
});

// Or use providers directly (Strategy Pattern)
const provider = createTranscriptionProvider({
  provider: 'elevenlabs',
  apiKey: process.env.ELEVENLABS_API_KEY!,
});
const words = await provider.transcribeWithTimestamps('./audio.mp3');

How It Works

Text Processing: Script is sanitized and split into phrases
Optional Transcription: Audio transcribed with Whisper for accurate timing
Weighted Timing: Duration distributed by word count
Video Rendering: Remotion generates frames with animated captions
Output: Final MP4 video with synced captions

Documentation

Configuration Guide - Styling, placement, animations, and more
Library API - Use as npm package in your projects
Transcription Providers - OpenAI, ElevenLabs, and adding new providers
Examples - Code examples and advanced usage
Animation Guide - Caption animation options

Troubleshooting

FFmpeg not found: Ensure FFmpeg is installed and in PATH
OpenAI API errors: Check that OPENAI_API_KEY is set correctly in .env
Captions too long: Reduce MAX_CHARACTERS_PER_CAPTION in config
Linux browser errors: Install Chrome dependencies (see installation above)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.vscode		.vscode
docs		docs
public/assets		public/assets
src		src
.env.example		.env.example
.gitignore		.gitignore
.npmignore		.npmignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
remotion.config.ts		remotion.config.ts
tsconfig.json		tsconfig.json
tsconfig.lib.json		tsconfig.lib.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Caption Sync

Features

Quick Start

As npm Library

As CLI Tool

Configuration

Speech-to-Text

Option 1: ElevenLabs Speech-to-Text (Default)

Option 2: OpenAI Whisper

Programmatic Usage with Providers

How It Works

Documentation

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Caption Sync

Features

Quick Start

As npm Library

As CLI Tool

Configuration

Speech-to-Text

Option 1: ElevenLabs Speech-to-Text (Default)

Option 2: OpenAI Whisper

Programmatic Usage with Providers

How It Works

Documentation

Troubleshooting

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages