🎙️ Phoneme

Local-first voice transcription for power users.

Hit a hotkey. Speak. Get text anywhere.

Phoneme runs 100% offline by default. No cloud required, no subscriptions, no telemetry.

🧠 Philosophy

Principle	What It Means
🔒 Privacy First	Voice never leaves your machine. No forced updates, no tracking.
⚡ Flexible	Local Whisper + Ollama for privacy, or cloud APIs (OpenAI, Anthropic, Groq, Gemini, Deepgram, and more) for speed. Each step picks its own provider.
🔌 Extensible	JSON output → your scripts. Obsidian, Notion, Jira, Discord, Python—wherever you want.

🎯 Why Voice?

You think faster than you type. The average person speaks at 150 words per minute but types at only 40. That gap is where ideas die.

Capture thoughts before they evaporate. Voice lets you seize ideas in their natural habitat—while walking, showering, driving, cooking. No app to open, no cursor to find. Just hit a hotkey and think out loud.

Speak to AI like a human. When you dictate a prompt, you give cleaner context—natural pauses, emphasis, clarifications that you'd never type out. The models understand you better when you sound like yourself.

Accessibility is for everyone. RSI, carpal tunnel, vision strain, dyslexia, tremors—typing isn't universal. Voice removes barriers. But even without disabilities, your wrists will thank you after your 10,000th daily keystroke.

No punctuation, no spelling, no backspace. Just pure thought flow. The AI cleans it up. You focus on what to say, not how to format it.

Multitasking is real. Record a meeting while taking notes. Capture a shower thought while soaping. Dictate a bug fix while compiling. Voice doesn't steal your eyes or hands from the task at hand.

Mobile-first life. Your phone is always there. Typing on glass at 20 WPM isn't. Voice makes your pocket computer actually useful for more than consumption.

⚙️ How It Works

Phoneme uses a decoupled, pipeline-driven architecture.

%%{init: {'flowchart': {'curve': 'basis', 'useMaxWidth': false}, 'theme': 'dark', 'themeVariables': { 'fontSize': '12px' }}}%%
flowchart TD
    Input[🎤 Voice] -->|Hotkey| Daemon[Daemon]
    
    subgraph T [Transcribe]
        Daemon --> Whisper{Whisper}
        Whisper -->|Local/Cloud| Raw[Raw Text]
    end
    
    subgraph E [Enrich]
        Raw --> Diarize{Diarize}
        Diarize -->|Opt| Tagged[Tagged]
    end
    
    subgraph P [Process]
        Tagged --> LLM{LLM}
        LLM -->|Opt| Final[Final]
    end
    
    Final --> Catalog[(SQLite)]
    Final --> Hooks[[Hooks]]
    Hooks --> Dest[Obsidian/Webhooks/Type]

✨ Core Features

🎙️ Local transcription by default: A bundled whisper.cpp server runs on your machine — audio never leaves your PC. The First Run Wizard detects your RAM/VRAM and picks the right model.
🔌 Bring-your-own provider: Transcription, live preview, cleanup, summary, auto-title, and auto-tags each pick their own provider+model independently. Local whisper.cpp/Ollama for privacy, or cloud APIs (OpenAI, Anthropic, Groq, Gemini, Deepgram, AssemblyAI, ElevenLabs, and many more) for speed. One-click presets, live model lists.
👥 Meeting Mode (Dual-Track Capture): Capture both your microphone and your computer's audio as two linked tracks sharing a wall-clock timeline, merged into one chronological transcript. Optional speaker diarization (offline ONNX, or cloud) labels who spoke on any Zoom, Teams, or Meet call.
⌨️ Transcribe-in-Place (Ctrl+Alt+I): Speak with a global hotkey and Phoneme types (or pastes) your dictated words into the focused application (Word, Slack, Chrome, VS Code). A zero-latency fast lane skips the queue entirely so text lands the moment you stop talking.
✨ Smart Cleanup, AI Summary & Auto-Titles: Pipe raw transcripts through an LLM to fix stutters, reformat, or translate — and optionally generate a per-recording summary, on demand or automatically. Every recording gets a readable auto-title (a free heuristic, or an optional LLM title). Three transcript layers (raw → cleaned → edited) are kept so nothing is lost.
🔍 Keyword + Semantic Search: Manage thousands of recordings with SQLite FTS5 full-text search, or search by meaning with an offline, chunked hybrid index — per-passage ONNX embeddings fused with keyword ranking (RRF), cached in memory for fast recall, so a query finds the recording whether you remember the gist or the one distinctive word. More-like-this finds a recording's neighbours from its stored vectors (no re-embedding). Bring your own embedding model.
🏷️ Organize at scale: Tags with a full manager, ⭐ favorites, saved searches that snapshot every filter, AI auto-tag suggestions you approve before they apply, and a side-by-side view for any two transcripts.
📤 Import & export anything: Import .wav / .mp3 / .m4a / .flac straight into the pipeline; export the whole library as a portable zip, or any recording as SRT / WebVTT captions.
⌨️ Keyboard everything: Opt-in vim-style navigation drives all three panes (and the queue), g-chords jump anywhere, the list zooms with Ctrl+=/-, and ? shows the full cheat sheet.
🩺 Provider-aware Self-healing: A header health pill + Doctor watch the local servers and follow the effective connection each feature uses (cloud keys included); one click (or phoneme doctor --fix) sweeps a hung/orphaned whisper-server and respawns it from config.
♻️ Clean lifecycle: The daemon owns the work and outlives any window. Quit stops it cleanly (finalizing an in-flight take, killing its whisper/Ollama children) — or leave it running headless. A Phoneme-launched Ollama is started on demand and never touches an Ollama you already had running.
💻 CLI is a Peer: Every GUI action is a CLI command (phoneme record --start). Bind it to AutoHotkey, Stream Deck, or Kanata.

🆚 Alternatives & Similar Projects

Phoneme isn't for everyone, and that's fine. If one of these fits your needs better, use it:

Wispr Flow — Highly polished, commercial, cloud-based. Types directly into your focused app.
MacWhisper & Superwhisper — Excellent local dictation for macOS.
AudioPen — Cloud web app that beautifully summarizes rambling thoughts.

Reach for Phoneme when you want it local-first, open-source, Windows-native, and endlessly scriptable.

📚 Documentation

Full documentation index →

Users

Guide	Topic
Getting Started	Install, wizard, first recording
Providers & Models	Pick STT/LLM providers, keys, local vs cloud
Meeting Mode	Dual-track capture + wall-clock sync
Hotkeys & Recording Modes	Hold, toggle, CLI bindings
Settings Overview	Every settings screen (with screenshots)
Smart Cleanup & Summary	LLM post-processing + AI summary
Semantic Search	Meaning-based recall
FAQ	Common questions
Troubleshooting	Fixes and diagnostics

Developers

Guide	Topic
CONTRIBUTING.md	Dev setup, IPC workflow, PR checklist
Architecture	The full journey: three processes, a recording's life, recall path
Internals	Subsystem deep dives: async topology, audio, catalog/search, alignment math
Backend Guide	Rust workspace map, actors, supervision, SQLx/WAL
Config Reference	Full `config.toml` schema
IPC Integration	NDJSON named pipe
CLI Reference	All commands
Testing & CI	Local checks matching GitHub Actions
Roadmap	Planned features & direction
Changelog	Shipped releases

🚀 Quick Start

Download the latest MSI from the Releases page. The included First Run Wizard will detect your hardware and configure the optimal Whisper model automatically!

# Power users can bypass the UI entirely and use the CLI:
phoneme record --start
phoneme record --stop
phoneme list

📄 License

MIT OR Apache-2.0.

Phoneme is built by @namefailed. It is not a commercial product, has no telemetry, and never will.

💖 Support

If you find Phoneme useful, please consider supporting my work:

Name		Name	Last commit message	Last commit date
Latest commit History 957 Commits
.github		.github
archive_internal		archive_internal
bin		bin
crates		crates
docs		docs
frontend		frontend
hooks		hooks
installer		installer
scripts		scripts
src-tauri		src-tauri
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
ROADMAP.md		ROADMAP.md
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ Phoneme

🧠 Philosophy

🎯 Why Voice?

⚙️ How It Works

✨ Core Features

🆚 Alternatives & Similar Projects

📚 Documentation

Users

Developers

🚀 Quick Start

📄 License

💖 Support

About

Licenses found

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎙️ Phoneme

🧠 Philosophy

🎯 Why Voice?

⚙️ How It Works

✨ Core Features

🆚 Alternatives & Similar Projects

📚 Documentation

Users

Developers

🚀 Quick Start

📄 License

💖 Support

About

Topics

Resources

License

Licenses found

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages