Extract audio from video files and transcribe using AssemblyAI with optional speaker diarisation.
- Video to Audio Extraction: Automatically extracts MP3 audio from video files using ffmpeg
- Speaker Diarisation: Identify and label multiple speakers (A, B, C, etc.)
- Language Support: 99+ languages including auto-detection
- Idempotent: Skip re-extraction and re-transcription if outputs already exist
- Interactive Prompts: Prompts for speaker count and language if not provided
# Required: ffmpeg for audio extraction
sudo pacman -S ffmpeg # Arch Linux
# or
sudo apt install ffmpeg # Debian/Ubuntu
# Required: stt_assemblyai.py in PATH
# Clone: https://github.com/CLIAI/handy_scripts
# Set your AssemblyAI API key
export ASSEMBLYAI_API_KEY="your_api_key_here"Get your API key at: https://www.assemblyai.com/
# Basic transcription (prompts for speaker count and language)
./stt_video_using_assemblyai.sh video.mp4
# Specify expected speakers
./stt_video_using_assemblyai.sh video.mp4 2
# Specify speakers and language
./stt_video_using_assemblyai.sh video.mp4 3 enUsage: stt_video_using_assemblyai.sh video_file [expected_speakers [language_code]]
Arguments:
video_file Path to the video file to transcribe
expected_speakers Number of speakers (0=auto-detect, 1=no diarisation)
language_code Language code (default: en)
./stt_video_using_assemblyai.sh lecture.mp4 1 enCreates:
lecture.mp4.mp3- Extracted audiolecture.mp4.mp3.assemblyai.json- Full API responselecture.mp4.mp3.txt- Plain text transcript
./stt_video_using_assemblyai.sh interview.mp4 2 enCreates transcript with speaker labels:
Speaker A: Welcome to the show.
Speaker B: Thanks for having me.
Speaker A: Let's talk about your latest project.
./stt_video_using_assemblyai.sh meeting.mp4 0 enEnables diarisation but lets AssemblyAI determine the number of speakers.
./stt_video_using_assemblyai.sh video.mp4
# Prompts:
# Expected speakers [0] (0==any): 3
# Language code [en]: deInput: video.mp4
Output:
video.mp4.mp3- Extracted audio (128k, 44.1kHz)video.mp4.mp3.assemblyai.json- Full AssemblyAI API responsevideo.mp4.mp3.txt- Human-readable transcript
┌─────────────┐ ┌─────────────┐ ┌──────────────────┐
│ video.mp4 │────▶│ video.mp3 │────▶│ video.mp3.txt │
└─────────────┘ └─────────────┘ └──────────────────┘
(ffmpeg extract) (AssemblyAI STT)
- Check dependencies: Verifies
stt_assemblyai.pyis in PATH - Extract audio: Uses ffmpeg to extract MP3 (skipped if exists)
- Transcribe: Calls
stt_assemblyai.pywith appropriate flags - Output: Displays transcript location and content
$ ./stt_video_using_assemblyai.sh video.mp4 2 en
# Extracts audio, transcribes
$ ./stt_video_using_assemblyai.sh video.mp4 2 en
# Skips extraction (MP3 exists), skips transcription (TXT exists)
File video.mp4.mp3 already exists.
SKIPPING: transcription of video.mp4.mp3 as video.mp4.mp3.txt already existsTo force re-processing: Delete existing .mp3 and/or .txt files
| Value | Behavior |
|---|---|
0 |
Diarisation enabled, auto-detect speaker count |
1 |
No diarisation (single speaker) |
2+ |
Diarisation with expected speaker hint |
Any format supported by ffmpeg:
- MP4, MKV, AVI, MOV, WMV, FLV, WebM
- And many more
The script stt_assemblyai.py is required to run this program.
It is not currently in your PATH.
Please ensure that it is available.
One way to do this is by cloning the repository
https://github.com/CLIAI/handy_scripts
into a directory in your PATH.
Solution: Add the handy_scripts directory to your PATH
ffmpeg: command not found
Solution: Install ffmpeg for your system
Error: ASSEMBLYAI_API_KEY environment variable not set.
Solution: export ASSEMBLYAI_API_KEY="your_key"
- stt_assemblyai.py - Underlying transcription tool
- stt_assemblyai_speaker_mapper.py - Map speaker labels (A, B) to actual names
- stt_video_using_speechmatics.sh - Alternative using Speechmatics API
Part of the CLIAI handy_scripts collection.