Extract audio from video files and transcribe using OpenAI Whisper API.
- Video to Audio Extraction: Automatically extracts MP3 audio from video files using ffmpeg
- Language Support: 99+ languages with auto-detection
- Idempotent: Skip re-extraction and re-transcription if outputs already exist
- File Size Warning: Alerts when audio exceeds OpenAI's 25MB limit
No Speaker Diarization: OpenAI's whisper-1 model does not support speaker diarization. For multi-speaker videos, use:
stt_video_using_assemblyai.sh- Speaker labels: A, B, C...stt_video_using_speechmatics.sh- Speaker labels: S1, S2, S3...
# Required: ffmpeg for audio extraction
sudo pacman -S ffmpeg # Arch Linux
# or
sudo apt install ffmpeg # Debian/Ubuntu
# Required: stt_openai.py in PATH
# Clone: https://github.com/CLIAI/handy_scripts
# Set your OpenAI API key
export OPENAI_API_KEY="your_api_key_here"Get your API key at: https://platform.openai.com/api-keys
# Basic transcription (auto-detect language)
./stt_video_using_openai.sh video.mp4
# Specify language
./stt_video_using_openai.sh video.mp4 en
# German video
./stt_video_using_openai.sh video.mp4 deUsage: stt_video_using_openai.sh video_file [language_code]
Arguments:
video_file Path to the video file to transcribe
language_code Language code (default: auto)
./stt_video_using_openai.sh lecture.mp4Creates:
lecture.mp4.mp3- Extracted audiolecture.mp4.mp3.openai.json- Full API responselecture.mp4.mp3.txt- Plain text transcript
./stt_video_using_openai.sh interview.mp4 en./stt_video_using_openai.sh video.mp4
# Prompts:
# Language code [auto]: deInput: video.mp4
Output:
video.mp4.mp3- Extracted audio (128k, 44.1kHz)video.mp4.mp3.openai.json- Full OpenAI API responsevideo.mp4.mp3.txt- Human-readable transcript
┌─────────────┐ ┌─────────────┐ ┌──────────────────┐
│ video.mp4 │────▶│ video.mp3 │────▶│ video.mp3.txt │
└─────────────┘ └─────────────┘ └──────────────────┘
(ffmpeg extract) (OpenAI Whisper)
- Check dependencies: Verifies
stt_openai.pyis in PATH - Extract audio: Uses ffmpeg to extract MP3 (skipped if exists)
- Check file size: Warns if audio exceeds 25MB limit
- Transcribe: Calls
stt_openai.pywith language option - Output: Displays transcript location and content
OpenAI Whisper API has a 25MB file size limit.
For larger files:
# Option 1: Use lower bitrate extraction
ffmpeg -i video.mp4 -vn -ab 64k -ar 16000 -y video.mp3
# Option 2: Split into chunks
ffmpeg -i video.mp4 -f segment -segment_time 600 -vn -ab 128k chunk_%03d.mp3
# Option 3: Use AssemblyAI or Speechmatics (no size limit)
./stt_video_using_assemblyai.sh video.mp4
./stt_video_using_speechmatics.sh video.mp4$ ./stt_video_using_openai.sh video.mp4 en
# Extracts audio, transcribes
$ ./stt_video_using_openai.sh video.mp4 en
# Skips extraction (MP3 exists), skips transcription (TXT exists)
File video.mp4.mp3 already exists.
SKIPPING: transcription of video.mp4.mp3 as video.mp4.mp3.txt already existsTo force re-processing: Delete existing .mp3 and/or .txt files
Any format supported by ffmpeg:
- MP4, MKV, AVI, MOV, WMV, FLV, WebM
- And many more
The script stt_openai.py is required to run this program.
It is not currently in your PATH.
Solution: Add the handy_scripts directory to your PATH
ffmpeg: command not found
Solution: Install ffmpeg for your system
Error: OPENAI_API_KEY environment variable not set.
Solution: export OPENAI_API_KEY="your_key"
WARNING: Audio file is larger than 25MB (OpenAI limit).
Consider splitting the file or using AssemblyAI/Speechmatics instead.
Solution: Use lower bitrate, split file, or use alternative STT service
| Feature | OpenAI | AssemblyAI | Speechmatics |
|---|---|---|---|
| Speaker diarization | No* | Yes | Yes |
| Max file size | 25 MB | 5 GB | Unlimited |
| Languages | 99+ | 99+ | 55+ |
| Translation | Yes | No | No |
*gpt-4o-transcribe-diarize model supports diarization but requires different API
- stt_openai.py - Underlying transcription tool
- stt_video_using_assemblyai.sh - AssemblyAI video transcription (with diarization)
- stt_video_using_speechmatics.sh - Speechmatics video transcription (with diarization)
Part of the CLIAI handy_scripts collection.