Skip to content

Flemma-Dev/voxize

Repository files navigation

Voxize

Voice-to-text overlay for Linux — built for devs who talk to AI.

Voxize — clean text ready to paste

I built Voxize because I spend most of my day talking to AI — writing prompts for Claude Code, describing bugs, explaining architecture decisions. Typing all of that was the bottleneck. With Voxize, I press a hotkey, speak naturally, and get clean text in my clipboard 2x faster than I could type it.

What you're reading right now was dictated, not typed.

How it works

Press your global hotkey and a translucent overlay appears on top of whatever you're working on. Start speaking — your words stream in live:

Live transcription streaming as you speak

When you're done, hit Stop. Voxize runs the full audio through OpenAI's batch transcription for maximum accuracy, then cleans up filler words and punctuation with a lightweight AI pass. The result goes straight to your clipboard — paste and move on.

Three phases, one hotkey:

  1. Live preview — real-time streaming gives you visual feedback as you speak (via gpt-4o-mini-transcribe)
  2. Batch transcription — full audio processed in one pass, dramatically more accurate than real-time segmentation (via gpt-4o-transcribe)
  3. AI cleanup — fixes filler words, punctuation, and formatting (via gpt-5.4-nano)

The live preview is throwaway — just visual feedback. The real transcription happens after you stop, and it's dramatically more accurate.

Why it nails technical terms

OpenAI's transcription models are surprisingly well-aligned with programming vocabulary. Terms like subprocess, WebSocket, asyncio, and JWT come through accurately without any hints.

For project-specific jargon, drop a WHISPER.txt file in your working directory:

Glossary: worktree, subagent, GLib.idle_add, libadwaita, PyGObject

Voxize detects the focused window's working directory (via the Window Calls GNOME Shell extension) and loads the file as vocabulary guidance. Domain-specific terms that would normally get mangled come through clean. The content is passed directly to the cleanup model, so it can be any free-form instructions — not just a glossary.

Integrations

Voxize plugs into your desktop environment to work seamlessly. All of these are optional — Voxize degrades gracefully without them — but the experience is much better with them.

  • Window Calls (GNOME Shell extension) — this is how Voxize knows where you're working. It detects the focused window's PID, resolves its working directory (even through Ghostty → tmux → nvim chains), and loads your WHISPER.txt. Also used for always-on-top in the meeting recorder. Without it, vocabulary guidance is silently skipped.
  • PipeWire — the audio backbone. Dictation captures via PortAudio/sounddevice, meeting recording via pw-cat --record (two streams: mic + system audio). Volume ducking uses pw-dump and wpctl to silence your browser while recording. Ships with modern GNOME.
  • FFmpeg — the meeting recorder uses ffmpeg to compress WAV → Opus after recording and to downmix stereo to mono before transcription. ffprobe reads recording duration. Not needed for dictation.
  • ElevenLabs Scribe — powers the meeting recorder's post-recording transcription with speaker diarization. Not needed for dictation or for recording meetings — only for transcribing them.
  • GNOME Keyring — API keys are stored securely via secret-tool, not environment variables or config files.

Getting started

Note

Voxize is Linux/Wayland/GNOME only. Pin a commit if you need a stable target.

NixOS package

An example NixOS package is available at StanAngeloff/nix-meridian. Add it to your system configuration and bind voxize to a global hotkey — no dev shell needed.

Development workflow

  1. Clone and enter the dev shell (all system deps handled by Nix):

    git clone https://github.com/Flemma-Dev/voxize.git
    cd voxize
    nix develop
  2. Store your OpenAI API key in the GNOME Keyring:

    secret-tool store --label='OpenAI API Key' service openai key api
  3. Run:

    uv run python -m voxize

To bind Voxize to a global hotkey (GNOME Settings → Keyboard → Custom Shortcuts):

nix develop /path/to/voxize --command bash -c "cd /path/to/voxize && uv run python -m voxize"

Tip

Use nix-direnv to cache the dev shell — avoids the cold-start cost on every hotkey press.

Meeting recorder

Voxize also includes a meeting recorder that captures both your microphone and system audio into a stereo Opus file — left channel mic, right channel system.

uv run python -m voxize.meeting

Meeting session list

After recording, a built-in workbench lets you transcribe with speaker diarization, rename speakers, and generate meeting titles — all without leaving the app. You can supply key terms before transcribing to improve accuracy on domain-specific vocabulary.

Why ElevenLabs for meetings, not OpenAI? Dictation prompts are short, technical, and latency-sensitive — OpenAI's gpt-4o-transcribe excels there, especially with WHISPER.txt vocabulary guidance. Meetings are longer, more conversational, and need speaker diarization — ElevenLabs' Scribe v2 ranks #1 across 49 models with a 2.3% word error rate, nearly twice as accurate as OpenAI's batch model. Voxize uses the best tool for each job.

Recording needs no API key. To enable transcription, store your ElevenLabs key in the GNOME Keyring:

secret-tool store --label='ElevenLabs API Key' service elevenlabs key api

Configuration

Voxize reads $XDG_CONFIG_HOME/voxize/voxize.toml on startup. A commented template with all defaults is created on first run — uncomment any line to override.

Key settings:

  • Volume ducking — automatically quiets Chrome, Firefox, and Brave while recording
  • Auto-close — overlay closes after 30s of inactivity in the ready state. Override with VOXIZE_AUTOCLOSE=0 to disable
  • Session retention — 500 sessions / 14 days by default, configurable per-app (dictation and meetings prune independently)

Session data lives in ~/.local/state/voxize/ — audio, transcripts, costs, and debug logs for each session.

License

AGPL-3.0

About

Voice-to-text tool for Linux (Wayland/GNOME)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages