OpenAI/Whisper Voice Recording & Transcription CLI

Interactive voice recording tool with real-time audio level monitoring and multiple transcription backend options (OpenAI Whisper API or local whisper.cpp).

Features

Interactive Voice Recording: Press Enter to stop recording
Real-time Audio Level Indicator: Visual feedback with progress bar
Multiple Transcription Backends:
1. OpenAI Whisper API (cloud, fast, high quality)
2. Local whisper.cpp (offline, with --speed-up)
3. Local whisper.cpp (offline, standard)
Output Options: Stdout, file, append mode, or clipboard
META Warning Messages: Automatic disclaimer about potential transcription errors
Silent Mode: Non-interactive operation for scripting

Prerequisites

# Install dependencies (handled automatically by uv)
# - numpy>=1.24
# - soundfile>=0.12
# - sounddevice>=0.4
# - prompt-toolkit>=3.0
# - openai>=1.0

# Set your OpenAI API key (for API transcription)
export OPENAI_API_KEY="sk-..."

# For local transcription, install whisper.cpp:
# https://github.com/ggerganov/whisper.cpp

Quick Start

# Interactive recording with OpenAI API transcription
./stt_openai_OR_local_whisper_cli.py

# Record and save to file
./stt_openai_OR_local_whisper_cli.py -o transcript.txt

# Record and append to existing file
./stt_openai_OR_local_whisper_cli.py -a notes.txt

# Copy result to clipboard
./stt_openai_OR_local_whisper_cli.py -c

# Silent mode (for scripting)
./stt_openai_OR_local_whisper_cli.py -s -o transcript.txt

META Transcript Warning Message

By default, all transcript outputs include a META warning message to remind readers that STT transcripts may contain errors.

What It Does

Automatically prepends a disclaimer to transcript outputs:

File output (-o, -a): META prepended to beginning of file
Stdout output: META prepended to printed transcript
Clipboard (-c): META not included (clean copy for pasting)

Default Message

---
meta: THIS IS AN AUTOMATED SPEECH-TO-TEXT (STT) TRANSCRIPT AND MAY CONTAIN TRANSCRIPTION ERRORS. This transcript was generated by automated speech recognition technology and should be treated as a rough transcription for reference purposes. Common types of errors include: incorrect word recognition (especially homophones, proper nouns, technical terminology, or words in noisy audio conditions), missing or incorrect punctuation, speaker misidentification in multi-speaker scenarios, and timing inaccuracies. For best comprehension and to mentally correct potential errors, please consider: the broader conversational context, relevant domain knowledge, technical background of the subject matter, and any supplementary information about the speakers or topic. This transcript is intended to convey the general content and flow of the conversation rather than serving as a verbatim, word-perfect record. When critical accuracy is required, please verify important details against the original audio source.
---

Disabling the META Message

Via command-line flag:

./stt_openai_OR_local_whisper_cli.py --no-meta-message -o transcript.txt
# or
./stt_openai_OR_local_whisper_cli.py --disable-meta-message -o transcript.txt

Via environment variable (system-wide):

export STT_META_MESSAGE_DISABLE=1
./stt_openai_OR_local_whisper_cli.py -o transcript.txt

Custom META Message

export STT_META_MESSAGE="VOICE NOTE - UNVERIFIED"
./stt_openai_OR_local_whisper_cli.py -o transcript.txt

Example Output

To file (transcript.txt):

---
meta: THIS IS AN AUTOMATED SPEECH-TO-TEXT (STT) TRANSCRIPT...
---
This is my voice note for today. I wanted to mention...

To clipboard (-c):

This is my voice note for today. I wanted to mention...

Note: No META when copying to clipboard for clean pasting

Interactive Recording

Normal Mode

$ ./stt_openai_OR_local_whisper_cli.py
Recording. Press option and ENTER (or just ENTER for default):
1. transcript with openai API (default)
2. transcript locally with whisper.cpp (with `--speed-up`)
3. transcript locally with whisper.cpp
4. TODO

Recording. Press option and ENTER (or just ENTER for default)... 5.3sec ####______

Audio Level Indicator:

# - Audio detected above threshold
_ - Audio below threshold (silence)
Updates every 100ms

To stop: Press Enter (or type option number + Enter)

Silent Mode

$ ./stt_openai_OR_local_whisper_cli.py -s
# Records until Enter is pressed, no prompts displayed

Usage Examples

1. Quick Voice Note

./stt_openai_OR_local_whisper_cli.py
# Speak your note, press Enter
# Transcript printed to stdout

2. Save to File

./stt_openai_OR_local_whisper_cli.py -o meeting-notes.txt
# Speak, press Enter
# Creates: meeting-notes.txt

3. Append to Daily Notes

./stt_openai_OR_local_whisper_cli.py -a daily-notes.txt
# Speak, press Enter
# Appends to existing file

4. Copy to Clipboard

./stt_openai_OR_local_whisper_cli.py -c
# Speak, press Enter
# Transcript copied to clipboard (requires xclip)

5. Local Transcription (Offline)

./stt_openai_OR_local_whisper_cli.py
# When prompted, type: 2
# Uses local whisper.cpp with --speed-up

6. Silent Scripting Mode

./stt_openai_OR_local_whisper_cli.py -s -o transcript.txt
# No prompts, records until Enter

7. Non-Interactive Pipeline

./stt_openai_OR_local_whisper_cli.py -x -o transcript.txt
# For use in automated pipelines

Command-Line Options

Output Control

-o, --output FILE - Save transcript to file
-a, --append FILE - Append transcript to existing file
-c, --clipboard - Copy transcript to clipboard (requires xclip)

Behavior

-s, --silent - Silent mode (no prompts, wait for Enter)
-x, --non-interactive - Non-interactive mode for pipelines (WIP)

META Message Control

--no-meta-message, --disable-meta-message - Disable META warning message

Environment Variables:

STT_META_MESSAGE_DISABLE=1 - Disable system-wide
STT_META_MESSAGE="text" - Custom message

Transcription Backend Options

Option 1: OpenAI Whisper API (Default)

Pros:

Fast (~2-5 seconds)
High accuracy
No local resources
Always up-to-date model

Cons:

Requires internet
Costs ~$0.006/minute
Requires API key

Usage: Press Enter or type 1

Option 2: Local whisper.cpp (--speed-up)

Pros:

Offline
Free
Private

Cons:

Slower (~5-15 seconds)
Requires local model files
Lower accuracy than API

Usage: Type 2 + Enter

Requirements:

whisper.cpp installed: https://github.com/ggerganov/whisper.cpp
Large model: /usr/share/whisper.cpp-model-large/large.bin

Option 3: Local whisper.cpp (Standard)

Pros:

Offline
Free
Best local accuracy

Cons:

Slowest (~10-30 seconds)
Requires local model files

Usage: Type 3 + Enter

Audio Recording Details

Technical Specifications

Sample Rate: 16kHz (standard for speech recognition)
Channels: Mono (1 channel)
Format: 16-bit WAV
Temporary Storage: Files created in /tmp and deleted after transcription

Audio Level Threshold

The tool uses an adaptive threshold (default: 0.15) to distinguish speech from silence:

Tracks minimum and maximum RMS (root mean square) audio levels
Normalizes current level to 0-100% scale
Displays visual indicator: # for speech, _ for silence

Recording Flow

Initialize audio device (sounddevice)
Start streaming audio capture
Real-time RMS calculation for level indicator
Buffer audio frames in queue
On Enter: Stop recording, write frames to WAV file
Transcribe WAV file with selected backend
Clean up temporary file
Output transcript

Output Formats

Stdout (Default)

$ ./stt_openai_OR_local_whisper_cli.py
# ... recording ...
---
meta: THIS IS AN AUTOMATED SPEECH-TO-TEXT (STT) TRANSCRIPT...
---
This is a test recording for demonstrating the voice note feature.

File Output

$ ./stt_openai_OR_local_whisper_cli.py -o note.txt
$ cat note.txt
---
meta: THIS IS AN AUTOMATED SPEECH-TO-TEXT (STT) TRANSCRIPT...
---
This is a test recording for demonstrating the voice note feature.

Append Mode

$ ./stt_openai_OR_local_whisper_cli.py -a notes.txt
# Appends to end of file (creates if doesn't exist)

Clipboard Mode

$ ./stt_openai_OR_local_whisper_cli.py -c
# Transcript copied to clipboard
# No META message included (for clean pasting)

Requires: xclip command-line tool

# Install xclip
sudo apt install xclip  # Debian/Ubuntu
sudo pacman -S xclip    # Arch Linux

Error Handling

Missing OpenAI API Key

ValueError: Please set the OPENAI_API_KEY environment variable.

Solution: export OPENAI_API_KEY="sk-..."

Sound Device Not Available

ERROR: The soundfile module is not available. Please install it using 'pip install soundfile'.

Solution: Ensure dependencies are installed (automatic with uv)

Sound Device Initialization Failed

An error occurred while initializing the sound device: No Default Input Device Available

Solution: Check that a microphone is connected and permissions are correct

Keyboard Interrupt

Recording interrupted by user.

Expected behavior: Press Ctrl+C to cancel recording

xclip Not Available

xclip is not available. Please install xclip or use a different method to copy to clipboard.

Solution: sudo apt install xclip

whisper.cpp Not Found

Command 'whisper.cpp' not found

Solution: Install whisper.cpp and ensure it's in PATH

Advanced Features

Real-Time Audio Monitoring

The audio level indicator updates 10 times per second (100ms refresh) showing:

Current audio level as a percentage of detected range
Duration of recording
Visual bar graph with # and _ characters

Example:

Recording... 12.5sec #####_____

12.5 seconds recorded
Currently detecting speech (5/10 bars)

Temporary File Handling

Audio is recorded to a temporary WAV file:

/tmp/tmpXXXXXX.wav  # Random filename

Cleanup:

File automatically deleted after successful transcription
If process crashes, system temp cleaner will eventually remove it

Local whisper.cpp Integration

When using option 2 or 3, the tool:

Copies WAV to temporary directory
Runs whisper.cpp with specified flags
Reads generated .txt file
Displays timing information
Cleans up temporary directory

Timing output:

0.52user 0.03system 0:05.23elapsed 10%CPU

Workflow Integration

Daily Journaling

# Morning routine
./stt_openai_OR_local_whisper_cli.py -a journal-2025-01-15.txt

# Throughout the day
./stt_openai_OR_local_whisper_cli.py -a journal-2025-01-15.txt

# Each recording appends to same file

Meeting Notes

# Quick notes during meeting
./stt_openai_OR_local_whisper_cli.py -c
# Paste into meeting notes document

Email Composition

# Draft email body
./stt_openai_OR_local_whisper_cli.py -c
# Paste into email client

Code Comments

# Dictate complex explanation
./stt_openai_OR_local_whisper_cli.py -o comment.txt
# Copy/paste into code

Comparison: API vs Local

Feature	OpenAI API	whisper.cpp --speed-up	whisper.cpp standard
Speed	⚡⚡⚡ 2-5s	⚡⚡ 5-15s	⚡ 10-30s
Accuracy	⭐⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐
Offline	❌ No	✅ Yes	✅ Yes
Cost	$0.006/min	Free	Free
Privacy	Cloud	Local	Local
Setup	API key	whisper.cpp + model	whisper.cpp + model

Recommendation:

For quick notes: OpenAI API (fast, convenient)
For private/offline: whisper.cpp standard (best local quality)
For batch processing: whisper.cpp --speed-up (balance speed/quality)

Troubleshooting

Problem: No audio detected

Symptoms: Audio level indicator shows all _ (underscores)

Possible causes:

Microphone muted
Wrong input device selected
Microphone permissions denied

Solutions:

Check system audio settings
Verify microphone permissions
Test microphone with other applications

Problem: Choppy/distorted audio

Causes:

System overload
USB audio device issues
Buffer underruns

Solutions:

Close unnecessary applications
Use built-in microphone instead of USB
Check system audio settings

Problem: Transcription takes very long

Cause: Using whisper.cpp standard mode

Solution: Use option 2 (--speed-up) or option 1 (OpenAI API)

Problem: Low transcription accuracy

Possible causes:

Background noise
Speaking too quietly/quickly
Accent or dialect issues
Poor microphone quality

Solutions:

Record in quiet environment
Speak clearly and at moderate pace
Use OpenAI API (better accuracy)
Use higher quality microphone

Related Tools

stt_assemblyai.py - Transcribe audio files with AssemblyAI (better for recordings)
stt_assemblyai_speaker_mapper.py - Map speaker labels to names
stt_video_using_assemblyai.sh - Extract and transcribe video audio

License

Part of the CLIAI handy_scripts collection.

FilesExpand file tree

stt_openai_OR_local_whisper_cli.README.md

Latest commit

History

stt_openai_OR_local_whisper_cli.README.md

File metadata and controls

OpenAI/Whisper Voice Recording & Transcription CLI

Features

Prerequisites

Quick Start

META Transcript Warning Message

What It Does

Default Message

Disabling the META Message

Custom META Message

Example Output

Interactive Recording

Normal Mode

Silent Mode

Usage Examples

1. Quick Voice Note

2. Save to File

3. Append to Daily Notes

4. Copy to Clipboard

5. Local Transcription (Offline)

6. Silent Scripting Mode

7. Non-Interactive Pipeline

Command-Line Options

Output Control

Behavior

META Message Control

Transcription Backend Options

Option 1: OpenAI Whisper API (Default)

Option 2: Local whisper.cpp (--speed-up)

Option 3: Local whisper.cpp (Standard)

Audio Recording Details

Technical Specifications

Audio Level Threshold

Recording Flow

Output Formats

Stdout (Default)

File Output

Append Mode

Clipboard Mode

Error Handling

Missing OpenAI API Key

Sound Device Not Available

Sound Device Initialization Failed

Keyboard Interrupt

xclip Not Available

whisper.cpp Not Found

Advanced Features

Real-Time Audio Monitoring

Temporary File Handling

Local whisper.cpp Integration

Workflow Integration

Daily Journaling

Meeting Notes

Email Composition

Code Comments

Comparison: API vs Local

Troubleshooting

Problem: No audio detected

Problem: Choppy/distorted audio

Problem: Transcription takes very long

Problem: Low transcription accuracy

Related Tools

License