Interactive voice recording tool with real-time audio level monitoring and multiple transcription backend options (OpenAI Whisper API or local whisper.cpp).
- Interactive Voice Recording: Press Enter to stop recording
- Real-time Audio Level Indicator: Visual feedback with progress bar
- Multiple Transcription Backends:
- OpenAI Whisper API (cloud, fast, high quality)
- Local whisper.cpp (offline, with --speed-up)
- Local whisper.cpp (offline, standard)
- Output Options: Stdout, file, append mode, or clipboard
- META Warning Messages: Automatic disclaimer about potential transcription errors
- Silent Mode: Non-interactive operation for scripting
# Install dependencies (handled automatically by uv)
# - numpy>=1.24
# - soundfile>=0.12
# - sounddevice>=0.4
# - prompt-toolkit>=3.0
# - openai>=1.0
# Set your OpenAI API key (for API transcription)
export OPENAI_API_KEY="sk-..."
# For local transcription, install whisper.cpp:
# https://github.com/ggerganov/whisper.cpp# Interactive recording with OpenAI API transcription
./stt_openai_OR_local_whisper_cli.py
# Record and save to file
./stt_openai_OR_local_whisper_cli.py -o transcript.txt
# Record and append to existing file
./stt_openai_OR_local_whisper_cli.py -a notes.txt
# Copy result to clipboard
./stt_openai_OR_local_whisper_cli.py -c
# Silent mode (for scripting)
./stt_openai_OR_local_whisper_cli.py -s -o transcript.txtBy default, all transcript outputs include a META warning message to remind readers that STT transcripts may contain errors.
Automatically prepends a disclaimer to transcript outputs:
- File output (
-o,-a): META prepended to beginning of file - Stdout output: META prepended to printed transcript
- Clipboard (
-c): META not included (clean copy for pasting)
---
meta: THIS IS AN AUTOMATED SPEECH-TO-TEXT (STT) TRANSCRIPT AND MAY CONTAIN TRANSCRIPTION ERRORS. This transcript was generated by automated speech recognition technology and should be treated as a rough transcription for reference purposes. Common types of errors include: incorrect word recognition (especially homophones, proper nouns, technical terminology, or words in noisy audio conditions), missing or incorrect punctuation, speaker misidentification in multi-speaker scenarios, and timing inaccuracies. For best comprehension and to mentally correct potential errors, please consider: the broader conversational context, relevant domain knowledge, technical background of the subject matter, and any supplementary information about the speakers or topic. This transcript is intended to convey the general content and flow of the conversation rather than serving as a verbatim, word-perfect record. When critical accuracy is required, please verify important details against the original audio source.
---
Via command-line flag:
./stt_openai_OR_local_whisper_cli.py --no-meta-message -o transcript.txt
# or
./stt_openai_OR_local_whisper_cli.py --disable-meta-message -o transcript.txtVia environment variable (system-wide):
export STT_META_MESSAGE_DISABLE=1
./stt_openai_OR_local_whisper_cli.py -o transcript.txtexport STT_META_MESSAGE="VOICE NOTE - UNVERIFIED"
./stt_openai_OR_local_whisper_cli.py -o transcript.txtTo file (transcript.txt):
---
meta: THIS IS AN AUTOMATED SPEECH-TO-TEXT (STT) TRANSCRIPT...
---
This is my voice note for today. I wanted to mention...
To clipboard (-c):
This is my voice note for today. I wanted to mention...
Note: No META when copying to clipboard for clean pasting
$ ./stt_openai_OR_local_whisper_cli.py
Recording. Press option and ENTER (or just ENTER for default):
1. transcript with openai API (default)
2. transcript locally with whisper.cpp (with `--speed-up`)
3. transcript locally with whisper.cpp
4. TODO
Recording. Press option and ENTER (or just ENTER for default)... 5.3sec ####______Audio Level Indicator:
#- Audio detected above threshold_- Audio below threshold (silence)- Updates every 100ms
To stop: Press Enter (or type option number + Enter)
$ ./stt_openai_OR_local_whisper_cli.py -s
# Records until Enter is pressed, no prompts displayed./stt_openai_OR_local_whisper_cli.py
# Speak your note, press Enter
# Transcript printed to stdout./stt_openai_OR_local_whisper_cli.py -o meeting-notes.txt
# Speak, press Enter
# Creates: meeting-notes.txt./stt_openai_OR_local_whisper_cli.py -a daily-notes.txt
# Speak, press Enter
# Appends to existing file./stt_openai_OR_local_whisper_cli.py -c
# Speak, press Enter
# Transcript copied to clipboard (requires xclip)./stt_openai_OR_local_whisper_cli.py
# When prompted, type: 2
# Uses local whisper.cpp with --speed-up./stt_openai_OR_local_whisper_cli.py -s -o transcript.txt
# No prompts, records until Enter./stt_openai_OR_local_whisper_cli.py -x -o transcript.txt
# For use in automated pipelines-o, --output FILE- Save transcript to file-a, --append FILE- Append transcript to existing file-c, --clipboard- Copy transcript to clipboard (requiresxclip)
-s, --silent- Silent mode (no prompts, wait for Enter)-x, --non-interactive- Non-interactive mode for pipelines (WIP)
--no-meta-message,--disable-meta-message- Disable META warning message
Environment Variables:
STT_META_MESSAGE_DISABLE=1- Disable system-wideSTT_META_MESSAGE="text"- Custom message
Pros:
- Fast (~2-5 seconds)
- High accuracy
- No local resources
- Always up-to-date model
Cons:
- Requires internet
- Costs ~$0.006/minute
- Requires API key
Usage: Press Enter or type 1
Pros:
- Offline
- Free
- Private
Cons:
- Slower (~5-15 seconds)
- Requires local model files
- Lower accuracy than API
Usage: Type 2 + Enter
Requirements:
- whisper.cpp installed: https://github.com/ggerganov/whisper.cpp
- Large model:
/usr/share/whisper.cpp-model-large/large.bin
Pros:
- Offline
- Free
- Best local accuracy
Cons:
- Slowest (~10-30 seconds)
- Requires local model files
Usage: Type 3 + Enter
- Sample Rate: 16kHz (standard for speech recognition)
- Channels: Mono (1 channel)
- Format: 16-bit WAV
- Temporary Storage: Files created in
/tmpand deleted after transcription
The tool uses an adaptive threshold (default: 0.15) to distinguish speech from silence:
- Tracks minimum and maximum RMS (root mean square) audio levels
- Normalizes current level to 0-100% scale
- Displays visual indicator:
#for speech,_for silence
- Initialize audio device (sounddevice)
- Start streaming audio capture
- Real-time RMS calculation for level indicator
- Buffer audio frames in queue
- On Enter: Stop recording, write frames to WAV file
- Transcribe WAV file with selected backend
- Clean up temporary file
- Output transcript
$ ./stt_openai_OR_local_whisper_cli.py
# ... recording ...
---
meta: THIS IS AN AUTOMATED SPEECH-TO-TEXT (STT) TRANSCRIPT...
---
This is a test recording for demonstrating the voice note feature.$ ./stt_openai_OR_local_whisper_cli.py -o note.txt
$ cat note.txt
---
meta: THIS IS AN AUTOMATED SPEECH-TO-TEXT (STT) TRANSCRIPT...
---
This is a test recording for demonstrating the voice note feature.$ ./stt_openai_OR_local_whisper_cli.py -a notes.txt
# Appends to end of file (creates if doesn't exist)$ ./stt_openai_OR_local_whisper_cli.py -c
# Transcript copied to clipboard
# No META message included (for clean pasting)Requires: xclip command-line tool
# Install xclip
sudo apt install xclip # Debian/Ubuntu
sudo pacman -S xclip # Arch LinuxValueError: Please set the OPENAI_API_KEY environment variable.Solution: export OPENAI_API_KEY="sk-..."
ERROR: The soundfile module is not available. Please install it using 'pip install soundfile'.Solution: Ensure dependencies are installed (automatic with uv)
An error occurred while initializing the sound device: No Default Input Device AvailableSolution: Check that a microphone is connected and permissions are correct
Recording interrupted by user.Expected behavior: Press Ctrl+C to cancel recording
xclip is not available. Please install xclip or use a different method to copy to clipboard.Solution: sudo apt install xclip
Command 'whisper.cpp' not foundSolution: Install whisper.cpp and ensure it's in PATH
The audio level indicator updates 10 times per second (100ms refresh) showing:
- Current audio level as a percentage of detected range
- Duration of recording
- Visual bar graph with
#and_characters
Example:
Recording... 12.5sec #####_____
- 12.5 seconds recorded
- Currently detecting speech (5/10 bars)
Audio is recorded to a temporary WAV file:
/tmp/tmpXXXXXX.wav # Random filenameCleanup:
- File automatically deleted after successful transcription
- If process crashes, system temp cleaner will eventually remove it
When using option 2 or 3, the tool:
- Copies WAV to temporary directory
- Runs whisper.cpp with specified flags
- Reads generated .txt file
- Displays timing information
- Cleans up temporary directory
Timing output:
0.52user 0.03system 0:05.23elapsed 10%CPU# Morning routine
./stt_openai_OR_local_whisper_cli.py -a journal-2025-01-15.txt
# Throughout the day
./stt_openai_OR_local_whisper_cli.py -a journal-2025-01-15.txt
# Each recording appends to same file# Quick notes during meeting
./stt_openai_OR_local_whisper_cli.py -c
# Paste into meeting notes document# Draft email body
./stt_openai_OR_local_whisper_cli.py -c
# Paste into email client# Dictate complex explanation
./stt_openai_OR_local_whisper_cli.py -o comment.txt
# Copy/paste into code| Feature | OpenAI API | whisper.cpp --speed-up | whisper.cpp standard |
|---|---|---|---|
| Speed | ⚡⚡⚡ 2-5s | ⚡⚡ 5-15s | ⚡ 10-30s |
| Accuracy | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ |
| Offline | ❌ No | ✅ Yes | ✅ Yes |
| Cost | $0.006/min | Free | Free |
| Privacy | Cloud | Local | Local |
| Setup | API key | whisper.cpp + model | whisper.cpp + model |
Recommendation:
- For quick notes: OpenAI API (fast, convenient)
- For private/offline: whisper.cpp standard (best local quality)
- For batch processing: whisper.cpp --speed-up (balance speed/quality)
Symptoms: Audio level indicator shows all _ (underscores)
Possible causes:
- Microphone muted
- Wrong input device selected
- Microphone permissions denied
Solutions:
- Check system audio settings
- Verify microphone permissions
- Test microphone with other applications
Causes:
- System overload
- USB audio device issues
- Buffer underruns
Solutions:
- Close unnecessary applications
- Use built-in microphone instead of USB
- Check system audio settings
Cause: Using whisper.cpp standard mode
Solution: Use option 2 (--speed-up) or option 1 (OpenAI API)
Possible causes:
- Background noise
- Speaking too quietly/quickly
- Accent or dialect issues
- Poor microphone quality
Solutions:
- Record in quiet environment
- Speak clearly and at moderate pace
- Use OpenAI API (better accuracy)
- Use higher quality microphone
- stt_assemblyai.py - Transcribe audio files with AssemblyAI (better for recordings)
- stt_assemblyai_speaker_mapper.py - Map speaker labels to names
- stt_video_using_assemblyai.sh - Extract and transcribe video audio
Part of the CLIAI handy_scripts collection.