Offline voice dictation for Windows — hold a hotkey, speak, release to type into the focused window.
ASR runtime uses sherpa-onnx (SenseVoice / Paraformer).
| Path | Role |
|---|---|
runtime/ |
Python ASR server (voxtype-runtime.exe, WebSocket voxtype-voice-v1) |
app/ |
Tauri client — settings, overlay, hotkeys, direct typing, Quicker HTTP API |
plugin/ |
Quicker plugin — button triggers dictation |
catalog/models.json |
Public model catalog for settings UI |
# 1. ASR runtime
cd runtime
uv sync
uv run download-asr-model
uv run voxtype-runtime
# 2. Tauri client (new terminal)
cd app
pnpm install
pnpm tauri dev- Runtime health:
http://127.0.0.1:6016/health - Client API:
http://127.0.0.1:6020/health - Default hotkey: F9 (hold to dictate, release to type)
- GPU acceleration is on by default (CUDA on Windows/Linux, CoreML on macOS); toggle in settings or disable with CPU-only inference
Data directory: %LOCALAPPDATA%\VoxType\ (models, settings).
Model weights are not bundled (too large). Model download URLs are written in catalog/models.json and ship inside the installer at catalog/models.json — the app reads this file locally (no remote catalog fetch). Users download weights from the settings UI on first use.
Push tag v0.1.x → GitHub Actions builds a single NSIS installer (VoxType_<version>_x64-setup.exe) that includes:
- Tauri desktop app (settings, overlay, hotkeys, typing)
voxtype-runtime(PyInstaller onedir, next to the app underruntime/voxtype-runtime/)catalog/models.json
git tag v0.1.3
git push origin v0.1.3Local full build:
cd runtime && pwsh -NoProfile -File ./scripts/build-win.ps1
cd ../app && pnpm install && pnpm tauri buildInstall the plugin/ package into your action {packagePath}:
load {packagePath}/VoxType.Plugin.{version}.dll
type VoxType.Plugin.Launcher, VoxType.Plugin
Control via quicker_in_param (same pattern as QuickerRpc):
| Param | Action |
|---|---|
start |
Begin recording |
stop |
End, type into focused window → Quicker var voxtype_text |
toggle |
Toggle recording |
download |
Download NSIS to Downloads → var voxtype_installer_path (user runs setup) |
C#: Launcher.Start(context), StartFromQuickerInParam("start", context), StartDictation / StopDictation.
HTTP (no plugin): POST http://127.0.0.1:6020/dictate/start|stop|toggle
Full design (HTTP vs pipe): quicker-rpc docs/voxtype-quicker-integration.md when monorepo sibling exists, or see scripts/setup-quicker-action.ps1.
Hold-to-dictate interaction was influenced by CapsWriter-Offline.
MIT