Fork-friendly Python runtime implementing the quicker-voice-v1 protocol (docs/voice-input-plugin.md).
Inspired by CapsWriter-Offline (C/S + offline ASR), but uses a simpler JSON+binary PCM WebSocket contract for QuickerAgent Composer integration.
cd voice-asr-runtime
uv sync
uv run download-asr-model # first time: ~160 MB SenseVoice int8 (ITN/punctuation)
uv run quicker-voice-runtime- HTTP health:
http://127.0.0.1:6016/health - WebSocket:
ws://127.0.0.1:6016(subprotocolquicker-voice-v1)
From agent-gui:
pnpm voice:dev-serverThen in QuickerAgent settings → disable mock, hold the Composer microphone.
| Backend | When | Output |
|---|---|---|
| stub | No model files | [stub] 收到约 Xs 音频 — protocol/UI testing |
| sherpa-onnx | models/sensevoice/ + uv sync |
Real offline ASR (SenseVoice, ITN/punctuation) |
| sherpa-onnx (fallback) | models/paraformer-zh/ |
Paraformer zh-small, no auto punctuation |
See models/README.md for model layout.
| Variable | Default | Description |
|---|---|---|
QUICKER_VOICE_HOST |
127.0.0.1 |
Bind address |
QUICKER_VOICE_PORT |
6016 |
HTTP + WS port |
QUICKER_VOICE_MODEL_DIR |
auto | Sherpa model directory |
QUICKER_VOICE_MODEL_TYPE |
auto | sensevoice / paraformer / whisper |
QUICKER_VOICE_PROVIDER |
cpu |
ONNX provider: cpu / directml (Windows GPU) / cuda / coreml |
QUICKER_VOICE_NUM_THREADS |
4 |
CPU thread count when provider is cpu |
QUICKER_VOICE_LOG_LEVEL |
INFO |
Logging |
| Trigger | What runs |
|---|---|
Git tag v*.*.* push |
.github/workflows/release.yml — runtime zip + channel manifest → GitHub Release + Bitiful |
Git tag model-sensevoice push |
Same workflow — model zip → GitHub Release + Bitiful |
| Local one-shot | publish/Publish-VoiceAsrRelease.ps1 — optional -UploadBitiful for manual mirror sync |
# Runtime only (typical)
pwsh ./publish/Publish-VoiceAsrRelease.ps1 -SkipBuild -UploadBitiful -UpdateChannelJson
# First-time or model update
pwsh ./publish/Publish-VoiceAsrRelease.ps1 -SkipBuild -PublishModel -UploadBitiful -UpdateChannelJsonRelease layout
| Asset | GitHub tag | Filename |
|---|---|---|
| Runtime | v0.1.1 |
voice-asr-runtime-0.1.1-win-x64.zip |
| Model | model-sensevoice (fixed) |
voice-asr-model-sensevoice.zip |
# CI: push runtime tag
git tag v0.1.1 && git push origin v0.1.1
# CI: push model (only when model files change)
git tag -f model-sensevoice && git push origin model-sensevoice --force
# Local full pipeline (monorepo root)
pwsh ./publish/Publish-VoiceAsrRelease.ps1 -SkipBuild -UploadBitiful -UpdateChannelJson
# voice-asr-runtime repo only
pwsh -NoProfile -File ./publish/Publish-VoiceAsrRelease.ps1 -SkipBuild -UploadBitiful
# Bitiful only (after GitHub release exists)
pwsh -NoProfile -File ./publish/Upload-VoiceAsrToBitiful.ps1 -Version 0.1.0 -UseLocalVoiceRootBitiful upload runs automatically in GitHub Actions on every release tag. Configure repo Secrets: BITIFUL_ACCESS_KEY, BITIFUL_SECRET_KEY, BITIFUL_BUCKET_NAME; optional Variables: BITIFUL_ENDPOINT_URL, BITIFUL_VOICE_ASR_OBJECT_PREFIX. Local fallback: publish/Upload-VoiceAsrToBitiful.ps1 (see publish/.env.example).
Domestic mirror (Bitiful) — same bucket layout as QuickerAgent:
| Asset | URL pattern |
|---|---|
| Runtime zip | https://s3.bitiful.net/quicker-pkgs/quicker-rpc/voice-asr/voice-asr-runtime-<ver>-win-x64.zip |
| Model zip | https://s3.bitiful.net/quicker-pkgs/quicker-rpc/voice-asr/voice-asr-model-sensevoice.zip |
| version.txt | https://s3.bitiful.net/quicker-pkgs/quicker-rpc/voice-asr/version.txt |
Tauri 一键安装 tries *MirrorUrl first (Bitiful), then GitHub release; verifies *Sha256 when set in voice-plugin-channel.json.
pwsh -NoProfile -File ./scripts/build-win.ps1
pwsh -NoProfile -File ./scripts/package-release.ps1
# -> publish/voice-asr-runtime-<ver>-win-x64.zip
# -> publish/voice-asr-model-sensevoice.zipUser install (Tauri):设置 → 本地语音输入 → 一键安装。
Dev without network: Tauri install copies from voice-asr-runtime/dist/ + models/sensevoice/ when present.
Installed layout:
Documents/QuickerAgent/plugins/voice-asr/
manifest.json
settings.json
runtime/quicker-voice-runtime.exe
runtime/_internal/...
models/sensevoice/tokens.txt
models/sensevoice/model.int8.onnx
This directory is designed to be split out:
cd voice-asr-runtime
git init
git add .
git commit -m "chore: initial quicker-voice-runtime fork"QuickerAgent consumes it via:
- dev:
pnpm voice:dev-server→uv run quicker-voice-runtime - release (planned): Tauri copies/spawns packaged
quicker-voice-runtime.exeunderDocuments/QuickerAgent/plugins/voice-asr/
Host ↔ Runtime messages are documented in docs/voice-input-plugin.md.
Client reference: agent-gui/lib/voice-input/voice-input-ws-client.ts.
MIT — see LICENSE.