Skip to content

PinW/whisper-key-local

Repository files navigation

Whisper Key - Local Speech-to-Text

Global hotkeys to record speech and transcribe directly to your cursor.

Questions or ideas? Discord

✨ Features

  • Global Hotkey: Start recording speech from any app
  • Auto-Paste: Transcribe directly to cursor
  • Auto-Send: Optionally auto-send with ENTER keypress
  • Local/Offline: Voice data never leaves your computer
  • CPU Ready: Small, efficient models available
  • GPU Ready: Support for both NVIDIA & AMD cards
  • Cross-platform: Works on Windows and macOS
  • Voice Commands: Trigger shortcuts, text snippets, and shell commands by voice — docs
  • Configurable: Customize hotkeys, models, and much more

🚀 Quick Start

From PyPI (Recommended)

Requires Python 3.11-3.13

# With pipx (isolated environment)
pipx install whisper-key-local

# Or with pip
pip install whisper-key-local

Then run: whisper-key (or wk for short)

Windows App

  1. Download whisper-key.exe from the latest release
  2. Run whisper-key.exe

From Source

git clone https://github.com/PinW/whisper-key-local.git
cd whisper-key-local
pip install -e .
python whisper-key.py

🎤 Basic Usage

Hotkey Windows macOS
Start recording Ctrl+Win Fn+Ctrl
Stop & transcribe Ctrl Fn
Stop & auto-send Alt Option
Cancel recording Esc Shift
Voice command mode Alt+Win Fn+Command

Open the system tray / menu bar icon to:

  • Toggle auto-paste vs clipboard-only
  • Change transcription model
  • Select audio device

🗣️ Voice Commands

Speak trigger phrases to run shell commands and more. Define in:

  • Windows: %APPDATA%\whisperkey\commands.yaml
  • macOS: ~/.whisperkey/commands.yaml
commands:
  # Send a keyboard shortcut
  - trigger: "undo"
    hotkey: "ctrl+z"
  # Deliver pre-written text
  - trigger: "my email"
    type: "[email protected]"
  # Run a shell command
  - trigger: "open notepad"
    run: 'notepad.exe'

See the Voice Commands Guide for full details.

⚡ GPU Acceleration

Whisper Key detects your GPU on first launch and offers one-press install of the required runtime libraries. Supports NVIDIA (CUDA) and AMD (ROCm).

For manual setup or troubleshooting, see the GPU Setup Guide.

⚙️ Configuration

Local settings at:

  • Windows: %APPDATA%\whisperkey\user_settings.yaml
  • macOS: ~/.whisperkey/user_settings.yaml

Delete this file and restart app to reset to defaults.

Option Default Notes
Whisper
whisper.model tiny Any model defined in whisper.models
whisper.device cpu cpu or cuda (NVIDIA/AMD GPU) — setup guide
whisper.compute_type int8 int8/float16/float32
whisper.language auto auto or language code (en, es, fr, etc.)
whisper.beam_size 5 Higher = more accurate but slower (1-10)
whisper.initial_prompt "" Guide transcription style, language variant, or script
whisper.hotwords [] Words the model should favor (names, technical terms)
whisper.models (see config) Add custom HuggingFace or local models
Post-Processing
post_processing.strip_trailing_period false Strip trailing period from output
post_processing.corrections {} Fix recurring misheard words, e.g. CAPEX: [cap x]
Hotkeys
hotkey.recording_hotkey ctrl+win / fn+ctrl Windows / macOS
hotkey.stop_key ctrl / fn Stop recording
hotkey.auto_send_key alt / option Stop + paste + Enter
hotkey.cancel_combination esc / shift Cancel recording
hotkey.recording_mode toggle toggle or push_to_talk
hotkey.command_hotkey alt+win / fn+command Voice command mode
Voice Activity Detection
vad.vad_precheck_enabled true Prevent hallucinations on silence
vad.vad_onset_threshold 0.7 Speech detection start (0.0-1.0)
vad.vad_offset_threshold 0.55 Speech detection end (0.0-1.0)
vad.vad_min_speech_duration 0.1 Min speech segment (seconds)
vad.vad_realtime_enabled true Auto-stop on silence
vad.vad_silence_timeout_seconds 30.0 Seconds before auto-stop
Audio
audio.host null Audio API (WASAPI, Core Audio, etc.)
audio.channels 1 1 = mono, 2 = stereo
audio.dtype float32 float32/int16/int24/int32
audio.max_duration 900 Max recording seconds (0 = unlimited)
audio.input_device default Device ID or "default"
Clipboard
clipboard.auto_paste true false = clipboard only
clipboard.delivery_method paste paste (Ctrl+V) or type (direct injection)
clipboard.paste_hotkey ctrl+v / cmd+v Paste key simulation
clipboard.paste_pre_paste_delay 0.05 Delay after copy, before paste hotkey (seconds)
clipboard.paste_preserve_clipboard true Restore clipboard after paste
clipboard.paste_clipboard_restore_delay 0.5 Delay before clipboard restore (seconds)
clipboard.type_also_copy_to_clipboard false Also copy to clipboard in type mode
clipboard.type_auto_enter_delay 0.15 Delay before ENTER after typing (seconds)
clipboard.type_auto_enter_delay_per_100_chars 0.1 Extra ENTER delay per 100 typed chars (seconds)
Logging
logging.level INFO DEBUG/INFO/WARNING/ERROR/CRITICAL
logging.file.enabled true Write to app.log
logging.log_transcriptions false Include transcribed text in log (privacy)
logging.console.enabled true Print to console
logging.console.level WARNING Console verbosity
Audio Feedback
audio_feedback.enabled true Play sounds on record/stop
audio_feedback.transcription_complete_enabled false Play sound on transcription complete
audio_feedback.ready_enabled true Play sound when app finishes loading
audio_feedback.start_sound assets/sounds/... Custom sound file path
audio_feedback.stop_sound assets/sounds/... Custom sound file path
audio_feedback.cancel_sound assets/sounds/... Custom sound file path
audio_feedback.transcription_complete_sound assets/sounds/... Custom sound file path
audio_feedback.ready_sound assets/sounds/... Custom sound file path
System Tray
system_tray.enabled true Show tray icon
system_tray.tooltip Whisper Key Hover text
Terminal Title
terminal_title.idle "" Tab title prefix when idle: static string or [prefix, seconds] animation frames
terminal_title.recording 🔴 blink Tab title prefix while recording
terminal_title.processing "" Tab title prefix while transcribing
Console
console.start_hidden false Hide console after startup (whisper-key-hideable.exe only)
Update
update.mode prompt prompt or auto
Voice Commands
voice_commands.enabled true Enable voice command mode

📁 Model Cache

Default path for transcription models (via HuggingFace):

  • Windows: %USERPROFILE%\.cache\huggingface\hub\
  • macOS: ~/.cache/huggingface/hub/

Contributing

Check the roadmap for planned features and see CONTRIBUTING.md for guidelines. Please open an issue before starting work on new features.

📦 Dependencies

Cross-platform: faster-whisper · numpy · sounddevice · soxr · pyperclip · ruamel.yaml · pystray · Pillow · playsound3 · ten-vad · hf-xet

Windows: global-hotkeys · pywin32

macOS: pyobjc-framework-Quartz · pyobjc-framework-ApplicationServices

About

Global hotkeys to record speech and transcribe directly to your cursor

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors