Post-processing tool to replace speaker labels (A, B, C) with actual speaker names in AssemblyAI transcription JSON files.
When stt_assemblyai.py generates transcriptions with speaker diarisation (-d flag), it produces speaker labels like "Speaker A", "Speaker B", etc. This tool allows you to replace those generic labels with actual names after you've reviewed the transcript and identified who each speaker is.
- Format-agnostic: Uses recursive JSON traversal to find and replace ALL
"speaker"keys, regardless of JSON structure - Future-proof: Works even if AssemblyAI changes their JSON format
- Multiple input methods: Comma-separated, file-based (4 formats), interactive prompts, or LLM-assisted detection
- Non-destructive: Creates new
.mapped.jsonand.mapped.txtfiles, preserves originals - Idempotent: Can remap the same transcript multiple times with different mappings
- LLM-powered (optional): Automatically detect speaker names from conversation context using AI
NEW: Use AI to automatically identify speaker names from transcript context!
Install Instructor library for LLM integration:
# Core dependencies
pip install instructor pydantic
# Provider-specific (install as needed)
pip install openai # For OpenAI
pip install anthropic # For Anthropic/Claude
pip install google-generativeai # For Google Gemini
# For local Ollama (no API key needed)
# Install Ollama from ollama.com
ollama pull llama3.2# Automatic detection with OpenAI
./stt_assemblyai_speaker_mapper.py --llm-detect openai/gpt-4o-mini audio.json
# Local/offline with Ollama (free, no API key)
./stt_assemblyai_speaker_mapper.py --llm-detect ollama/llama3.2 audio.json
# Interactive mode with AI suggestions
./stt_assemblyai_speaker_mapper.py --llm-interactive anthropic/claude-3-5-haiku audio.json| Provider | Format | Example | Requirements |
|---|---|---|---|
| OpenAI | openai/MODEL |
openai/gpt-4o-mini |
API key |
| Anthropic | anthropic/MODEL |
anthropic/claude-3-5-haiku |
API key |
google/MODEL |
google/gemini-2.0-flash-exp |
API key | |
| Groq | groq/MODEL |
groq/llama-3.1-70b-versatile |
API key (ultra-fast) |
| Ollama | ollama/MODEL |
ollama/llama3.2 |
Local (no API key) |
| 100+ more | via LiteLLM | litellm/... |
Varies |
Set API keys:
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GROQ_API_KEY="gsk_..."AI analyzes transcript and applies best-guess speaker names:
./stt_assemblyai_speaker_mapper.py --llm-detect openai/gpt-4o-mini audio.jsonOutput:
INFO: Analyzing transcript with LLM...
INFO: LLM confidence: high
INFO: LLM reasoning: Names explicitly mentioned in conversation
INFO: Detected 2 speaker(s): A, B
INFO: Applied mappings:
INFO: A → Alice Anderson
INFO: B → Bob Smith
Created: audio.assemblyai.mapped.json, audio.mapped.txt
AI suggests names, you confirm or override:
./stt_assemblyai_speaker_mapper.py --llm-interactive openai/gpt-4o-mini audio.jsonInteraction:
=== AI-Detected Speaker Mappings ===
A => Alice Anderson
B => Bob Smith
C => Unknown
=== Review and Confirm ===
Enter=accept | name=override | skip=abort | help=commands | play=audio
about=edit context file | !cmd: run shell commands
A => [Alice Anderson]: _ ← Press Enter to accept
B => [Bob Smith]: Robert ← Type to override
C => [Unknown]: about ← Opens editor to add context
→ Opening audio.about.md in nano...
C => [Unknown]: Charlie Chaplin ← Now provide name
Try AI, fall back to manual if it fails:
./stt_assemblyai_speaker_mapper.py --llm-detect-fallback ollama/llama3.2 audio.jsonIf LLM fails (API error, timeout, etc.), automatically switches to manual interactive mode.
./stt_assemblyai_speaker_mapper.py \
--llm-detect ollama/llama3.2 \
--llm-endpoint http://gpu-server:11434 \
audio.json# Send more utterances for better context (default: 20)
./stt_assemblyai_speaker_mapper.py \
--llm-detect openai/gpt-4o-mini \
--llm-sample-size 30 \
audio.json./stt_assemblyai_speaker_mapper.py -vv --llm-detect openai/gpt-4o-mini audio.jsonShows detailed LLM reasoning and confidence scores.
| Provider | Speed | Cost/transcript | Quality | Offline |
|---|---|---|---|---|
| Groq | ⚡⚡⚡ | ~$0.001 | ⭐⭐⭐⭐ | No |
| OpenAI gpt-4o-mini | ⚡⚡ | ~$0.005 | ⭐⭐⭐⭐⭐ | No |
| Anthropic Haiku | ⚡⚡⭐ | ~$0.002 | ⭐⭐⭐⭐ | No |
| Anthropic Sonnet | ⚡⚡ | ~$0.020 | ⭐⭐⭐⭐⭐ | No |
| Ollama (local) | ⚡ | Free | ⭐⭐⭐ | Yes |
Recommended: Start with groq/llama-3.1-70b-versatile (fast + cheap) or ollama/llama3.2 (free + offline).
NEW: All transcript outputs now include a META warning message by default to remind readers that STT transcripts may contain errors.
Automatically prepends a disclaimer to transcript files warning about potential transcription errors:
- TXT files: YAML front matter format at top of file
- JSON files: Appears as first key:
{"_meta_note": "{message}", ...}
---
meta: THIS IS AN AUTOMATED SPEECH-TO-TEXT (STT) TRANSCRIPT AND MAY CONTAIN TRANSCRIPTION ERRORS. This transcript was generated by automated speech recognition technology and should be treated as a rough transcription for reference purposes. Common types of errors include: incorrect word recognition (especially homophones, proper nouns, technical terminology, or words in noisy audio conditions), missing or incorrect punctuation, speaker misidentification in multi-speaker scenarios, and timing inaccuracies. For best comprehension and to mentally correct potential errors, please consider: the broader conversational context, relevant domain knowledge, technical background of the subject matter, and any supplementary information about the speakers or topic. This transcript is intended to convey the general content and flow of the conversation rather than serving as a verbatim, word-perfect record. When critical accuracy is required, please verify important details against the original audio source.
---
Via command-line flag:
./stt_assemblyai_speaker_mapper.py --no-meta-message -m "Alice,Bob" audio.json
# or
./stt_assemblyai_speaker_mapper.py --disable-meta-message -m "Alice,Bob" audio.jsonVia environment variable (system-wide):
export STT_META_MESSAGE_DISABLE=1
./stt_assemblyai_speaker_mapper.py -m "Alice,Bob" audio.jsonYou can provide your own custom warning message:
export STT_META_MESSAGE="DRAFT TRANSCRIPT - NOT VERIFIED - FOR INTERNAL USE ONLY"
./stt_assemblyai_speaker_mapper.py -m "Alice,Bob" audio.jsonTXT file (audio.mapped.txt):
---
meta: THIS IS AN AUTOMATED SPEECH-TO-TEXT (STT) TRANSCRIPT AND MAY CONTAIN...
---
Alice Anderson: Hello everyone, welcome to the show.
Bob Martinez: Thanks for having me.
JSON file (audio.assemblyai.mapped.json):
{
"_meta_note": "THIS IS AN AUTOMATED SPEECH-TO-TEXT (STT) TRANSCRIPT AND MAY CONTAIN...",
"utterances": [
{
"speaker": "Alice Anderson",
"text": "Hello everyone, welcome to the show."
}
]
}The LLM analyzes a strategic sample of the transcript looking for:
- Direct name mentions: "Hi Alice", "Thanks Bob"
- Introductions: "I'm...", "My name is..."
- Context clues: Professional roles, relationships, topics
- Speaking patterns: Formality, expertise signals
It returns structured suggestions with confidence levels:
- High: Names explicitly mentioned
- Medium: Strong contextual clues
- Low: Weak inference (often returns "Unknown")
Two types of context files can improve LLM speaker detection:
- Directory context (
SPEAKER.CONTEXT.md) - Applies to all audio files in a directory tree - File-specific context (
{audiofile}.about.md) - Applies to a single audio file
Create a SPEAKER.CONTEXT.md file in any directory. It applies to all audio files in that directory and subdirectories (similar to .gitignore).
Search behavior:
- Searches in the audio file's directory first
- Walks up parent directories until found
- Searches both original path AND resolved symlink path
Example structure:
project/
├── SPEAKER.CONTEXT.md ← Applies to all files below
├── meetings/
│ ├── meeting1.mp3
│ └── meeting2.mp3
└── interviews/
├── SPEAKER.CONTEXT.md ← Overrides for this subdir
└── interview1.mp3
Example content:
# Project Context
This project contains recordings from Company X.
Common speakers:
* Greg Williams - CEO, leads most meetings
* Alice Chen - CTO, discusses technical topics
* Bob Smith - Sales Director, client-facing calls
Topics: product roadmap, engineering, sales pipelineProvide file-specific context with .about.md files.
- Path:
{audiofile}.about.md(e.g.,meeting.mp3.about.md) - Purpose: Provide context about speakers, roles, and topics
- Format: Free-form markdown
During interactive mode, type about to open the file in your editor:
=== Review and Confirm ===
A => [Unknown]: about
→ Opening meeting.mp3.about.md in nano...
✓ About file saved: meeting.mp3.about.md
A => [Unknown]:
The file is opened in $EDITOR (or $VISUAL, or nano as fallback).
Create the file before running speaker detection:
cat > meeting.mp3.about.md << 'EOF'
## Meeting Context
This is a product planning meeting between:
* Alice Chen - Product Manager, leads the discussion
* Bob Smith - Engineering Lead, discusses technical feasibility
* Carol Davis - Designer, presents UI mockups
Topics covered: Q1 roadmap, feature prioritization, design review
EOFWhen an .about.md file exists:
- Content is automatically loaded and passed to the LLM prompt
- LLM uses this context alongside transcript analysis
- Significantly improves accuracy when names aren't mentioned in audio
Example prompt addition:
CONTEXT PROVIDED BY USER:
## Meeting Context
This is a product planning meeting between:
* Alice Chen - Product Manager
* Bob Smith - Engineering Lead
Use the above context to help identify speakers...
Use in interactive !commands:
!cat {about} # View about file content
!less {about} # Page through about file
!$EDITOR {about} # Edit about filePlaceholders:
{about}- Full path to about file{ab}- Short alias
NEW: Let AI identify speakers automatically!
# Step 1: Transcribe audio with speaker diarisation
./stt_assemblyai.py -d audio.mp3
# Creates: audio.mp3.assemblyai.json, audio.mp3.txt
# Step 2: Let AI identify speakers (single command!)
./stt_assemblyai_speaker_mapper.py --llm-detect openai/gpt-4o-mini audio.mp3.assemblyai.json
# Creates: audio.mp3.assemblyai.mapped.json, audio.mp3.mapped.txt
# Step 3: Review results
cat audio.mp3.mapped.txt
# Output:
# Alice Anderson: Hello there
# Bob Smith: Hi, how are you?
# Alice Anderson: I'm doing wellEven better - Interactive with AI suggestions:
# Step 2 alternative: AI suggests, you confirm/override
./stt_assemblyai_speaker_mapper.py --llm-interactive openai/gpt-4o-mini audio.mp3.assemblyai.json
# Shows:
# === AI-Detected Speaker Mappings ===
# A => Alice Anderson
# B => Bob Smith
#
# === Review and Confirm (press Enter to accept, or type to override) ===
# A => [Alice Anderson]: ← Press Enter to accept
# B => [Bob Smith]: ← Press Enter to accept# Step 1: Transcribe audio with speaker diarisation
./stt_assemblyai.py -d audio.mp3
# Creates: audio.mp3.assemblyai.json, audio.mp3.txt
# Step 2: Review transcript to identify speakers
cat audio.mp3.txt
# Output:
# Speaker A: Hello there
# Speaker B: Hi, how are you?
# Speaker A: I'm doing well
# Step 3: Detect speakers in JSON (optional)
./stt_assemblyai_speaker_mapper.py --detect audio.mp3.assemblyai.json
# Output: Detected speakers: A, B
# Step 4: Apply speaker name mapping manually
./stt_assemblyai_speaker_mapper.py -m "Alice Anderson,Beat Barrinson" audio.mp3.assemblyai.json
# Creates: audio.mp3.assemblyai.mapped.json, audio.mp3.mapped.txt
# Step 5: Review mapped transcript
cat audio.mp3.mapped.txt
# Output:
# Alice Anderson: Hello there
# Beat Barrinson: Hi, how are you?
# Alice Anderson: I'm doing well./stt_assemblyai_speaker_mapper.py --detect audio.assemblyai.jsonOutput:
Detected speakers: A, B, C
./stt_assemblyai_speaker_mapper.py -m "Alice Anderson,Beat Barrinson,Charlie Chaplin" audio.assemblyai.jsonMaps speakers in sorted order:
- A → Alice Anderson
- B → Beat Barrinson
- C → Charlie Chaplin
File: speakers.txt
Alice Anderson
Beat Barrinson
Charlie Chaplin
Usage:
./stt_assemblyai_speaker_mapper.py -M speakers.txt audio.assemblyai.jsonMapping: Sorted speakers → Sequential names
- A → Alice Anderson
- B → Beat Barrinson
- C → Charlie Chaplin
File: speakers.txt
A: Alice Anderson
B: Beat Barrinson
C: Charlie Chaplin
Usage:
./stt_assemblyai_speaker_mapper.py -M speakers.txt audio.assemblyai.jsonMapping: Direct key-to-value
File: speakers.txt
Speaker A: Alice Anderson
Speaker B: Beat Barrinson
Usage:
./stt_assemblyai_speaker_mapper.py -M speakers.txt audio.assemblyai.jsonMapping: Full label as key
File: speakers.txt
A: Alice Anderson
Speaker B: Beat Barrinson
C: Charlie Chaplin
Usage:
./stt_assemblyai_speaker_mapper.py -M speakers.txt audio.assemblyai.jsonMapping: Handles both formats in the same file
./stt_assemblyai_speaker_mapper.py --interactive audio.assemblyai.jsonInteraction:
=== Detected Speakers ===
Name for 'A' (press Enter to keep): Alice Anderson
Name for 'B' (press Enter to keep): Beat Barrinson
Name for 'C' (press Enter to keep):
INFO: Detected 3 speaker(s): A, B, C
INFO: Applied mappings:
INFO: A → Alice Anderson
INFO: B → Beat Barrinson
INFO: Wrote JSON: audio.assemblyai.mapped.json
INFO: Wrote TXT: audio.mapped.txt
Created: audio.assemblyai.mapped.json, audio.mapped.txt
./stt_assemblyai_speaker_mapper.py -vv -f -m "Host,Guest" interview.json./stt_assemblyai_speaker_mapper.py -o final_transcript -m "Alice,Bob" audio.json
# Creates: final_transcript.json, final_transcript.txt./stt_assemblyai_speaker_mapper.py --txt-only -m "Alice,Bob" audio.json
# Creates only: audio.mapped.txt./stt_assemblyai_speaker_mapper.py --json-only -m "Alice,Bob" audio.json
# Creates only: audio.assemblyai.mapped.jsoninput_json- Path to AssemblyAI JSON file (e.g.,audio.assemblyai.json)
-m, --speaker-map STR- Comma-separated speaker names (e.g.,"Alice,Bob,Charlie")-M, --speaker-map-file PATH- File with speaker mappings (auto-detects format)--interactive- Interactively prompt for speaker names
-o, --output BASE- Output base name (default: auto-generate with.mapped)-f, --force- Overwrite existing output files--txt-only- Generate only .txt file (skip .json)--json-only- Generate only .json file (skip .txt)--detect- Only show detected speakers and exit (no processing)
-v, --verbose- Increase verbosity (count-based:-v= INFO,-vvvvv= DEBUG)-q, --quiet- Suppress all non-error output
--no-meta-message,--disable-meta-message- Disable the META warning message about transcription errors
Environment Variables:
STT_META_MESSAGE_DISABLE=1- Disable META message system-wideSTT_META_MESSAGE="text"- Use custom META message
- Input:
audio.mp3.assemblyai.json - JSON output:
audio.mp3.assemblyai.mapped.json(full JSON with speaker fields replaced) - TXT output:
audio.mp3.mapped.txt(formatted transcript with tab after speaker name)
Alice Anderson: Hello there
Beat Barrinson: Hi, how are you?
Alice Anderson: I'm doing well
Note: Tab character (\t) after colon for easy parsing/alignment
All "speaker" key values are replaced throughout the JSON structure:
{
"utterances": [
{
"speaker": "Alice Anderson",
"text": "Hello there",
"confidence": 0.95,
"start": 100,
"end": 1500,
"words": [
{
"text": "Hello",
"start": 100,
"end": 500,
"confidence": 0.98,
"speaker": "Alice Anderson"
}
]
}
]
}The tool uses recursive JSON traversal to find and replace speaker values, making it robust against JSON structure changes:
def replace_speakers_recursive(obj, speaker_map):
if isinstance(obj, dict):
for key, value in obj.items():
if key == "speaker" and isinstance(value, str):
# Replace speaker value
obj[key] = speaker_map.get(value, value)
else:
# Recurse into nested structures
replace_speakers_recursive(value, speaker_map)
elif isinstance(obj, list):
for item in obj:
replace_speakers_recursive(item, speaker_map)Benefits:
- Works with ANY JSON structure containing
"speaker"keys - Future-proof: handles AssemblyAI API changes
- Comprehensive: catches speaker references in unexpected locations
- Portable: could work with other STT providers' JSON formats
The tool validates your mapping and provides helpful warnings:
WARNING: Unmapped speakers (keeping original): CYou provided mapping for A and B, but speaker C exists in the transcript. C will remain as "Speaker C".
WARNING: Extra mappings for non-existent speakers: DYou provided a mapping for speaker D, but no speaker D exists in the JSON.
WARNING: Empty speaker mapping - no changes will be madeNo valid mappings were found (e.g., empty file or all speakers skipped in interactive mode).
ERROR: File not found: audio.assemblyai.jsonERROR: Invalid JSON: Expecting value: line 1 column 1 (char 0)ERROR: No speakers detected in JSON (no 'speaker' keys found)This means the JSON doesn't contain speaker diarisation data. Run stt_assemblyai.py with -d flag to enable diarisation.
ERROR: No mapping source provided (use -m, -M, or --interactive)ERROR: Output file(s) already exist: audio.mapped.txt, audio.assemblyai.mapped.json
ERROR: Use -f/--force to overwriteYou can map only some speakers:
# Only map speaker A, keep B and C as-is
./stt_assemblyai_speaker_mapper.py -m "Alice" audio.jsonMapping files support comment lines (lines starting with #):
# Project interview speakers
A: Alice Anderson
B: Beat Barrinson
# C was not identified yet
The tool is idempotent - you can remap the same file multiple times:
# First attempt (wrong names)
./stt_assemblyai_speaker_mapper.py -m "John,Jane" audio.json
# Correct attempt
./stt_assemblyai_speaker_mapper.py -f -m "Alice,Bob" audio.jsonWorks seamlessly with stt_video_using_assemblyai.sh:
# Extract and transcribe
./stt_video_using_assemblyai.sh -d video.mp4
# Review transcript
cat video.mp4.txt
# Map speakers
./stt_assemblyai_speaker_mapper.py -m "Host,Guest" video.mp4.assemblyai.jsonCause: No transcript segments found in JSON
Solution: Check that JSON contains diarisation data with --detect flag
Cause: Speakers are mapped in sorted order (A, B, C)
Solution: Use explicit key:value format in mapping file:
B: Bob (first speaker chronologically)
A: Alice (second speaker chronologically)
Cause: AssemblyAI updated their API response format
Solution: The recursive traversal should handle this automatically. If not, file an issue with sample JSON.
python3 test_stt_assemblyai_speaker_mapper.py# Create sample JSON
cat > sample.json << 'EOF'
{
"utterances": [
{"speaker": "A", "text": "Hello"},
{"speaker": "B", "text": "Hi there"}
]
}
EOF
# Test detection
./stt_assemblyai_speaker_mapper.py --detect sample.json
# Test mapping
./stt_assemblyai_speaker_mapper.py -m "Alice,Bob" sample.jsonstt_assemblyai.py- Main transcription tool (creates the JSON files this tool processes)stt_video_using_assemblyai.sh- Wrapper script for video transcriptiongoogle_cloud_ai/multi-speaker_markup_from_dialog_transcript.py- Similar tool for Google Cloud AI TTS
Part of the CLIAI handy_scripts collection.