Transcribe audio files using the Speechmatics API with support for speaker diarisation, 55+ languages, and batch processing.
- Speaker Diarisation: Identify and label multiple speakers (S1, S2, S3, etc.)
- Language Support: 55+ languages including auto-detection
- Multiple Output Formats: JSON (full API response) + TXT (human-readable transcript)
- Idempotent: Skip re-transcription if output already exists
- Progress Monitoring: Real-time status updates with verbose logging
- Region Selection: EU, US, and AU endpoints for data residency compliance
- Operating Points: Standard (faster) or Enhanced (more accurate) models
- META Warning Messages: Automatic disclaimer about potential transcription errors
# Install dependencies (handled automatically by uv)
# - requests>=2.31
# Set your Speechmatics API key
export SPEECHMATICS_API_KEY="your_api_key_here"Get your API key at: https://portal.speechmatics.com/
# Basic transcription
./stt_speechmatics.py audio.mp3
# With speaker diarisation
./stt_speechmatics.py -d audio.mp3
# Specify language (German)
./stt_speechmatics.py -l de audio.mp3
# Enhanced accuracy mode
./stt_speechmatics.py --operating-point enhanced audio.mp3
# US region with diarisation and max 3 speakers
./stt_speechmatics.py -R us -d --max-speakers 3 audio.mp3By default, all transcript outputs include a META warning message to remind readers that STT transcripts may contain errors.
Automatically prepends a disclaimer to transcript files:
- TXT files: YAML front matter format at top of file
- JSON files: First key:
{"_meta_note": "{message}", ...}
---
meta: THIS IS AN AUTOMATED SPEECH-TO-TEXT (STT) TRANSCRIPT AND MAY CONTAIN TRANSCRIPTION ERRORS. This transcript was generated by automated speech recognition technology and should be treated as a rough transcription for reference purposes. Common types of errors include: incorrect word recognition (especially homophones, proper nouns, technical terminology, or words in noisy audio conditions), missing or incorrect punctuation, speaker misidentification in multi-speaker scenarios, and timing inaccuracies. For best comprehension and to mentally correct potential errors, please consider: the broader conversational context, relevant domain knowledge, technical background of the subject matter, and any supplementary information about the speakers or topic. This transcript is intended to convey the general content and flow of the conversation rather than serving as a verbatim, word-perfect record. When critical accuracy is required, please verify important details against the original audio source.
---
Via command-line flag:
./stt_speechmatics.py --no-meta-message audio.mp3
# or
./stt_speechmatics.py --disable-meta-message audio.mp3Via environment variable (system-wide):
export STT_META_MESSAGE_DISABLE=1
./stt_speechmatics.py audio.mp3export STT_META_MESSAGE="DRAFT - UNVERIFIED TRANSCRIPT"
./stt_speechmatics.py audio.mp3./stt_speechmatics.py audio.mp3Creates:
audio.mp3.speechmatics.json- Full Speechmatics API responseaudio.mp3.txt- Plain text transcript
# Auto-detect number of speakers
./stt_speechmatics.py -d audio.mp3
# Limit to maximum 3 speakers
./stt_speechmatics.py -d --max-speakers 3 audio.mp3
# Adjust speaker detection sensitivity (higher = more speakers)
./stt_speechmatics.py -d --speaker-sensitivity 0.7 audio.mp3Creates:
audio.mp3.speechmatics.json- Full response with speaker labelsaudio.mp3.txt- Formatted transcript:Speaker S1:\t text
Note: Speechmatics uses S1, S2, S3, etc. for speaker labels (not A, B, C like AssemblyAI).
# English (default)
./stt_speechmatics.py audio.mp3
# German
./stt_speechmatics.py -l de audio.mp3
# French
./stt_speechmatics.py -l fr audio.mp3
# Japanese
./stt_speechmatics.py -l ja audio.mp3
# Mandarin Chinese
./stt_speechmatics.py -l cmn audio.mp3Supported language codes: en, de, fr, es, it, pt, nl, pl, ru, ja, ko, zh, cmn, ar, hi, and 40+ more (ISO 639-1/639-3).
# Standard (faster processing)
./stt_speechmatics.py --operating-point standard audio.mp3
# Enhanced (higher accuracy)
./stt_speechmatics.py --operating-point enhanced audio.mp3# EU region (default)
./stt_speechmatics.py -R eu audio.mp3
# US region
./stt_speechmatics.py -R us audio.mp3
# Australia region
./stt_speechmatics.py -R au audio.mp3Regions:
eu/eu1: https://eu1.asr.api.speechmatics.com/v2us/us1: https://us1.asr.api.speechmatics.com/v2au/au1: https://au1.asr.api.speechmatics.com/v2
# Specify output file
./stt_speechmatics.py -o transcript.txt audio.mp3
# Creates: transcript.txt, audio.mp3.speechmatics.json# Print transcript to stdout, no file creation
./stt_speechmatics.py -o - audio.mp3# Basic info (-v)
./stt_speechmatics.py -v audio.mp3
# Detailed debug output (-vvvvv)
./stt_speechmatics.py -vvvvv audio.mp3Log levels:
-v(1+): INFO - Progress updates-vvvvv(5+): DEBUG - Full API request/response details
# Transcribe from URL (skips upload step)
./stt_speechmatics.py https://example.com/audio.mp3audio_input- Path to audio file or URL to transcribe
Supported formats: MP3, MP4, WAV, FLAC, M4A, OGG, and more
-d, --diarisation- Enable speaker diarisation (S1, S2, S3, etc.)--max-speakers N- Maximum number of speakers (minimum: 2, default: unlimited)--speaker-sensitivity FLOAT- Detection sensitivity (0-1, default: 0.5)
-o, --output PATH- Output file path (default:{audio_input}.txt)- Use
-for stdout only (no files created)
- Use
-q, --quiet- Suppress status messages (output only transcript)
-l, --language CODE- Language code (default:en)--operating-point {standard,enhanced}- Model accuracy (default: server decides)
-R, --region {eu,eu1,us,us1,au,au1}- API endpoint region (default:eu)
-v, --verbose- Increase verbosity (use multiple times:-v,-vv,-vvvvv)
--no-meta-message,--disable-meta-message- Disable META warning message
Environment Variables:
STT_META_MESSAGE_DISABLE=1- Disable system-wideSTT_META_MESSAGE="text"- Custom message
Input: audio.mp3
Output:
audio.mp3.speechmatics.json- Full Speechmatics API response (always created)audio.mp3.txt- Human-readable transcript (default output)
Complete API response including:
job- Job metadata (id, created_at, duration)metadata- Transcription config usedresults- Word-level results with timing and confidence_meta_note- META warning message (if enabled)
Example:
{
"_meta_note": "THIS IS AN AUTOMATED SPEECH-TO-TEXT...",
"job": {
"id": "abc123",
"created_at": "2025-01-06T10:00:00Z",
"duration": 180
},
"results": [
{
"type": "word",
"alternatives": [{"content": "Hello", "confidence": 0.98}],
"start_time": 0.5,
"end_time": 0.9,
"speaker": "S1"
}
]
}---
meta: THIS IS AN AUTOMATED SPEECH-TO-TEXT...
---
Hello everyone, welcome to the show. Today we're discussing...
---
meta: THIS IS AN AUTOMATED SPEECH-TO-TEXT...
---
Speaker S1: Hello everyone, welcome to the show.
Speaker S2: Thanks for having me.
Speaker S1: Today we're discussing artificial intelligence.
Format: Speaker {label}:\t{text}\n (tab after colon)
The tool checks if output files already exist before transcribing:
$ ./stt_speechmatics.py audio.mp3
# Transcribes audio, creates files
$ ./stt_speechmatics.py audio.mp3
# Skips transcription, displays existing transcript
SKIPPING: transcription of audio.mp3 as audio.mp3.txt already existsTo force re-transcription: Delete existing .txt file
Process with stt_speechmatics_speaker_mapper.py:
# Step 1: Transcribe with diarisation
./stt_speechmatics.py -d audio.mp3
# Step 2: Map speaker labels to names
./stt_speechmatics_speaker_mapper.py -m "Alice Anderson,Bob Martinez" audio.mp3.speechmatics.jsonError: SPEECHMATICS_API_KEY environment variable not set.
Get your API key at: https://portal.speechmatics.com/Solution: export SPEECHMATICS_API_KEY="your_key"
ERROR: Error creating job: [Errno 2] No such file or directory: 'audio.mp3'Solution: Check file path
ERROR: Error creating job: 401 Unauthorized
REST RESPONSE: {"error": "Invalid API key"}Solution: Verify API key is correct
ERROR: Error waiting for job: Job rejected: Unsupported audio formatSolution: Convert audio to supported format (MP3, WAV, etc.)
Jobs typically take less than half the audio duration. A 40-minute file should complete within 20 minutes.
For long audio (>1 hour), use verbose mode to monitor progress:
./stt_speechmatics.py -v audio.mp3Output:
INFO: Processing audio input...
INFO: output filename: audio.mp3.txt
INFO: Submitting transcription job...
INFO: Job created: abc123...
INFO: Waiting for job to complete...
INFO: Job status: running
INFO: Job status: running
INFO: Job status: done
INFO: Retrieving transcript...
INFO: Writing output files...
INFO: Server response written to audio.mp3.speechmatics.json
INFO: Output written to audio.mp3.txt
INFO: Done.
Major languages include:
- European: English, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Russian, Swedish, Norwegian, Danish, Finnish, Czech, Hungarian, Romanian, Greek, Bulgarian, Croatian
- Asian: Japanese, Korean, Mandarin, Cantonese, Hindi, Thai, Vietnamese, Indonesian, Malay, Tamil, Tagalog
- Other: Arabic, Hebrew, Turkish, Ukrainian, Swahili, Welsh
See full list at: https://docs.speechmatics.com/introduction/supported-languages
- stt_speechmatics_speaker_mapper.py - Map speaker labels (S1, S2) to actual names
- stt_assemblyai.py - Alternative tool using AssemblyAI API
- stt_openai_OR_local_whisper_cli.py - Alternative tool using OpenAI Whisper
| Feature | Speechmatics | AssemblyAI |
|---|---|---|
| Speaker labels | S1, S2, S3... | A, B, C... |
| Unknown speaker | UU | - |
| Languages | 55+ | 99+ |
| Word error rate | 6.8% | ~5% |
| Latency | 150ms p95 | Similar |
| Regions | EU, US, AU | EU, US |
Cause: Filename too long (>255 characters)
Solution: Tool automatically truncates long filenames while preserving extensions
Cause: Forgot -d flag
Solution: Re-run with -d flag
Cause: Language not specified
Solution: Specify language explicitly with -l flag
Possible causes:
- Poor audio quality
- Background noise
- Non-standard accents
Solutions:
- Use
--operating-point enhancedfor better accuracy - Specify language with
-lflag - Use higher quality audio source
Part of the CLIAI handy_scripts collection.