Skip to content

Latest commit

 

History

History
569 lines (388 loc) · 13.2 KB

File metadata and controls

569 lines (388 loc) · 13.2 KB

OpenAI/Whisper Voice Recording & Transcription CLI

Interactive voice recording tool with real-time audio level monitoring and multiple transcription backend options (OpenAI Whisper API or local whisper.cpp).

Features

  • Interactive Voice Recording: Press Enter to stop recording
  • Real-time Audio Level Indicator: Visual feedback with progress bar
  • Multiple Transcription Backends:
    1. OpenAI Whisper API (cloud, fast, high quality)
    2. Local whisper.cpp (offline, with --speed-up)
    3. Local whisper.cpp (offline, standard)
  • Output Options: Stdout, file, append mode, or clipboard
  • META Warning Messages: Automatic disclaimer about potential transcription errors
  • Silent Mode: Non-interactive operation for scripting

Prerequisites

# Install dependencies (handled automatically by uv)
# - numpy>=1.24
# - soundfile>=0.12
# - sounddevice>=0.4
# - prompt-toolkit>=3.0
# - openai>=1.0

# Set your OpenAI API key (for API transcription)
export OPENAI_API_KEY="sk-..."

# For local transcription, install whisper.cpp:
# https://github.com/ggerganov/whisper.cpp

Quick Start

# Interactive recording with OpenAI API transcription
./stt_openai_OR_local_whisper_cli.py

# Record and save to file
./stt_openai_OR_local_whisper_cli.py -o transcript.txt

# Record and append to existing file
./stt_openai_OR_local_whisper_cli.py -a notes.txt

# Copy result to clipboard
./stt_openai_OR_local_whisper_cli.py -c

# Silent mode (for scripting)
./stt_openai_OR_local_whisper_cli.py -s -o transcript.txt

META Transcript Warning Message

By default, all transcript outputs include a META warning message to remind readers that STT transcripts may contain errors.

What It Does

Automatically prepends a disclaimer to transcript outputs:

  • File output (-o, -a): META prepended to beginning of file
  • Stdout output: META prepended to printed transcript
  • Clipboard (-c): META not included (clean copy for pasting)

Default Message

---
meta: THIS IS AN AUTOMATED SPEECH-TO-TEXT (STT) TRANSCRIPT AND MAY CONTAIN TRANSCRIPTION ERRORS. This transcript was generated by automated speech recognition technology and should be treated as a rough transcription for reference purposes. Common types of errors include: incorrect word recognition (especially homophones, proper nouns, technical terminology, or words in noisy audio conditions), missing or incorrect punctuation, speaker misidentification in multi-speaker scenarios, and timing inaccuracies. For best comprehension and to mentally correct potential errors, please consider: the broader conversational context, relevant domain knowledge, technical background of the subject matter, and any supplementary information about the speakers or topic. This transcript is intended to convey the general content and flow of the conversation rather than serving as a verbatim, word-perfect record. When critical accuracy is required, please verify important details against the original audio source.
---

Disabling the META Message

Via command-line flag:

./stt_openai_OR_local_whisper_cli.py --no-meta-message -o transcript.txt
# or
./stt_openai_OR_local_whisper_cli.py --disable-meta-message -o transcript.txt

Via environment variable (system-wide):

export STT_META_MESSAGE_DISABLE=1
./stt_openai_OR_local_whisper_cli.py -o transcript.txt

Custom META Message

export STT_META_MESSAGE="VOICE NOTE - UNVERIFIED"
./stt_openai_OR_local_whisper_cli.py -o transcript.txt

Example Output

To file (transcript.txt):

---
meta: THIS IS AN AUTOMATED SPEECH-TO-TEXT (STT) TRANSCRIPT...
---
This is my voice note for today. I wanted to mention...

To clipboard (-c):

This is my voice note for today. I wanted to mention...

Note: No META when copying to clipboard for clean pasting

Interactive Recording

Normal Mode

$ ./stt_openai_OR_local_whisper_cli.py
Recording. Press option and ENTER (or just ENTER for default):
1. transcript with openai API (default)
2. transcript locally with whisper.cpp (with `--speed-up`)
3. transcript locally with whisper.cpp
4. TODO

Recording. Press option and ENTER (or just ENTER for default)... 5.3sec ####______

Audio Level Indicator:

  • # - Audio detected above threshold
  • _ - Audio below threshold (silence)
  • Updates every 100ms

To stop: Press Enter (or type option number + Enter)

Silent Mode

$ ./stt_openai_OR_local_whisper_cli.py -s
# Records until Enter is pressed, no prompts displayed

Usage Examples

1. Quick Voice Note

./stt_openai_OR_local_whisper_cli.py
# Speak your note, press Enter
# Transcript printed to stdout

2. Save to File

./stt_openai_OR_local_whisper_cli.py -o meeting-notes.txt
# Speak, press Enter
# Creates: meeting-notes.txt

3. Append to Daily Notes

./stt_openai_OR_local_whisper_cli.py -a daily-notes.txt
# Speak, press Enter
# Appends to existing file

4. Copy to Clipboard

./stt_openai_OR_local_whisper_cli.py -c
# Speak, press Enter
# Transcript copied to clipboard (requires xclip)

5. Local Transcription (Offline)

./stt_openai_OR_local_whisper_cli.py
# When prompted, type: 2
# Uses local whisper.cpp with --speed-up

6. Silent Scripting Mode

./stt_openai_OR_local_whisper_cli.py -s -o transcript.txt
# No prompts, records until Enter

7. Non-Interactive Pipeline

./stt_openai_OR_local_whisper_cli.py -x -o transcript.txt
# For use in automated pipelines

Command-Line Options

Output Control

  • -o, --output FILE - Save transcript to file
  • -a, --append FILE - Append transcript to existing file
  • -c, --clipboard - Copy transcript to clipboard (requires xclip)

Behavior

  • -s, --silent - Silent mode (no prompts, wait for Enter)
  • -x, --non-interactive - Non-interactive mode for pipelines (WIP)

META Message Control

  • --no-meta-message, --disable-meta-message - Disable META warning message

Environment Variables:

  • STT_META_MESSAGE_DISABLE=1 - Disable system-wide
  • STT_META_MESSAGE="text" - Custom message

Transcription Backend Options

Option 1: OpenAI Whisper API (Default)

Pros:

  • Fast (~2-5 seconds)
  • High accuracy
  • No local resources
  • Always up-to-date model

Cons:

  • Requires internet
  • Costs ~$0.006/minute
  • Requires API key

Usage: Press Enter or type 1

Option 2: Local whisper.cpp (--speed-up)

Pros:

  • Offline
  • Free
  • Private

Cons:

  • Slower (~5-15 seconds)
  • Requires local model files
  • Lower accuracy than API

Usage: Type 2 + Enter

Requirements:

Option 3: Local whisper.cpp (Standard)

Pros:

  • Offline
  • Free
  • Best local accuracy

Cons:

  • Slowest (~10-30 seconds)
  • Requires local model files

Usage: Type 3 + Enter

Audio Recording Details

Technical Specifications

  • Sample Rate: 16kHz (standard for speech recognition)
  • Channels: Mono (1 channel)
  • Format: 16-bit WAV
  • Temporary Storage: Files created in /tmp and deleted after transcription

Audio Level Threshold

The tool uses an adaptive threshold (default: 0.15) to distinguish speech from silence:

  • Tracks minimum and maximum RMS (root mean square) audio levels
  • Normalizes current level to 0-100% scale
  • Displays visual indicator: # for speech, _ for silence

Recording Flow

  1. Initialize audio device (sounddevice)
  2. Start streaming audio capture
  3. Real-time RMS calculation for level indicator
  4. Buffer audio frames in queue
  5. On Enter: Stop recording, write frames to WAV file
  6. Transcribe WAV file with selected backend
  7. Clean up temporary file
  8. Output transcript

Output Formats

Stdout (Default)

$ ./stt_openai_OR_local_whisper_cli.py
# ... recording ...
---
meta: THIS IS AN AUTOMATED SPEECH-TO-TEXT (STT) TRANSCRIPT...
---
This is a test recording for demonstrating the voice note feature.

File Output

$ ./stt_openai_OR_local_whisper_cli.py -o note.txt
$ cat note.txt
---
meta: THIS IS AN AUTOMATED SPEECH-TO-TEXT (STT) TRANSCRIPT...
---
This is a test recording for demonstrating the voice note feature.

Append Mode

$ ./stt_openai_OR_local_whisper_cli.py -a notes.txt
# Appends to end of file (creates if doesn't exist)

Clipboard Mode

$ ./stt_openai_OR_local_whisper_cli.py -c
# Transcript copied to clipboard
# No META message included (for clean pasting)

Requires: xclip command-line tool

# Install xclip
sudo apt install xclip  # Debian/Ubuntu
sudo pacman -S xclip    # Arch Linux

Error Handling

Missing OpenAI API Key

ValueError: Please set the OPENAI_API_KEY environment variable.

Solution: export OPENAI_API_KEY="sk-..."

Sound Device Not Available

ERROR: The soundfile module is not available. Please install it using 'pip install soundfile'.

Solution: Ensure dependencies are installed (automatic with uv)

Sound Device Initialization Failed

An error occurred while initializing the sound device: No Default Input Device Available

Solution: Check that a microphone is connected and permissions are correct

Keyboard Interrupt

Recording interrupted by user.

Expected behavior: Press Ctrl+C to cancel recording

xclip Not Available

xclip is not available. Please install xclip or use a different method to copy to clipboard.

Solution: sudo apt install xclip

whisper.cpp Not Found

Command 'whisper.cpp' not found

Solution: Install whisper.cpp and ensure it's in PATH

Advanced Features

Real-Time Audio Monitoring

The audio level indicator updates 10 times per second (100ms refresh) showing:

  • Current audio level as a percentage of detected range
  • Duration of recording
  • Visual bar graph with # and _ characters

Example:

Recording... 12.5sec #####_____
  • 12.5 seconds recorded
  • Currently detecting speech (5/10 bars)

Temporary File Handling

Audio is recorded to a temporary WAV file:

/tmp/tmpXXXXXX.wav  # Random filename

Cleanup:

  • File automatically deleted after successful transcription
  • If process crashes, system temp cleaner will eventually remove it

Local whisper.cpp Integration

When using option 2 or 3, the tool:

  1. Copies WAV to temporary directory
  2. Runs whisper.cpp with specified flags
  3. Reads generated .txt file
  4. Displays timing information
  5. Cleans up temporary directory

Timing output:

0.52user 0.03system 0:05.23elapsed 10%CPU

Workflow Integration

Daily Journaling

# Morning routine
./stt_openai_OR_local_whisper_cli.py -a journal-2025-01-15.txt

# Throughout the day
./stt_openai_OR_local_whisper_cli.py -a journal-2025-01-15.txt

# Each recording appends to same file

Meeting Notes

# Quick notes during meeting
./stt_openai_OR_local_whisper_cli.py -c
# Paste into meeting notes document

Email Composition

# Draft email body
./stt_openai_OR_local_whisper_cli.py -c
# Paste into email client

Code Comments

# Dictate complex explanation
./stt_openai_OR_local_whisper_cli.py -o comment.txt
# Copy/paste into code

Comparison: API vs Local

Feature OpenAI API whisper.cpp --speed-up whisper.cpp standard
Speed ⚡⚡⚡ 2-5s ⚡⚡ 5-15s ⚡ 10-30s
Accuracy ⭐⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐⭐
Offline ❌ No ✅ Yes ✅ Yes
Cost $0.006/min Free Free
Privacy Cloud Local Local
Setup API key whisper.cpp + model whisper.cpp + model

Recommendation:

  • For quick notes: OpenAI API (fast, convenient)
  • For private/offline: whisper.cpp standard (best local quality)
  • For batch processing: whisper.cpp --speed-up (balance speed/quality)

Troubleshooting

Problem: No audio detected

Symptoms: Audio level indicator shows all _ (underscores)

Possible causes:

  • Microphone muted
  • Wrong input device selected
  • Microphone permissions denied

Solutions:

  • Check system audio settings
  • Verify microphone permissions
  • Test microphone with other applications

Problem: Choppy/distorted audio

Causes:

  • System overload
  • USB audio device issues
  • Buffer underruns

Solutions:

  • Close unnecessary applications
  • Use built-in microphone instead of USB
  • Check system audio settings

Problem: Transcription takes very long

Cause: Using whisper.cpp standard mode

Solution: Use option 2 (--speed-up) or option 1 (OpenAI API)

Problem: Low transcription accuracy

Possible causes:

  • Background noise
  • Speaking too quietly/quickly
  • Accent or dialect issues
  • Poor microphone quality

Solutions:

  • Record in quiet environment
  • Speak clearly and at moderate pace
  • Use OpenAI API (better accuracy)
  • Use higher quality microphone

Related Tools

  • stt_assemblyai.py - Transcribe audio files with AssemblyAI (better for recordings)
  • stt_assemblyai_speaker_mapper.py - Map speaker labels to names
  • stt_video_using_assemblyai.sh - Extract and transcribe video audio

License

Part of the CLIAI handy_scripts collection.