Skip to content

QuickerHub/voice-asr-runtime

Repository files navigation

quicker-voice-runtime — local ASR server for QuickerAgent

Fork-friendly Python runtime implementing the quicker-voice-v1 protocol (docs/voice-input-plugin.md).

Inspired by CapsWriter-Offline (C/S + offline ASR), but uses a simpler JSON+binary PCM WebSocket contract for QuickerAgent Composer integration.

Quick start

cd voice-asr-runtime
uv sync
uv run download-asr-model   # first time: ~160 MB SenseVoice int8 (ITN/punctuation)
uv run quicker-voice-runtime
  • HTTP health: http://127.0.0.1:6016/health
  • WebSocket: ws://127.0.0.1:6016 (subprotocol quicker-voice-v1)

From agent-gui:

pnpm voice:dev-server

Then in QuickerAgent settings → disable mock, hold the Composer microphone.

Backends

Backend When Output
stub No model files [stub] 收到约 Xs 音频 — protocol/UI testing
sherpa-onnx models/sensevoice/ + uv sync Real offline ASR (SenseVoice, ITN/punctuation)
sherpa-onnx (fallback) models/paraformer-zh/ Paraformer zh-small, no auto punctuation

See models/README.md for model layout.

Environment

Variable Default Description
QUICKER_VOICE_HOST 127.0.0.1 Bind address
QUICKER_VOICE_PORT 6016 HTTP + WS port
QUICKER_VOICE_MODEL_DIR auto Sherpa model directory
QUICKER_VOICE_MODEL_TYPE auto sensevoice / paraformer / whisper
QUICKER_VOICE_PROVIDER cpu ONNX provider: cpu / directml (Windows GPU) / cuda / coreml
QUICKER_VOICE_NUM_THREADS 4 CPU thread count when provider is cpu
QUICKER_VOICE_LOG_LEVEL INFO Logging

Release automation

Trigger What runs
Git tag v*.*.* push .github/workflows/release.yml — runtime zip + channel manifest → GitHub Release + Bitiful
Git tag model-sensevoice push Same workflow — model zip → GitHub Release + Bitiful
Local one-shot publish/Publish-VoiceAsrRelease.ps1 — optional -UploadBitiful for manual mirror sync
# Runtime only (typical)
pwsh ./publish/Publish-VoiceAsrRelease.ps1 -SkipBuild -UploadBitiful -UpdateChannelJson

# First-time or model update
pwsh ./publish/Publish-VoiceAsrRelease.ps1 -SkipBuild -PublishModel -UploadBitiful -UpdateChannelJson

Release layout

Asset GitHub tag Filename
Runtime v0.1.1 voice-asr-runtime-0.1.1-win-x64.zip
Model model-sensevoice (fixed) voice-asr-model-sensevoice.zip
# CI: push runtime tag
git tag v0.1.1 && git push origin v0.1.1

# CI: push model (only when model files change)
git tag -f model-sensevoice && git push origin model-sensevoice --force

# Local full pipeline (monorepo root)
pwsh ./publish/Publish-VoiceAsrRelease.ps1 -SkipBuild -UploadBitiful -UpdateChannelJson

# voice-asr-runtime repo only
pwsh -NoProfile -File ./publish/Publish-VoiceAsrRelease.ps1 -SkipBuild -UploadBitiful

# Bitiful only (after GitHub release exists)
pwsh -NoProfile -File ./publish/Upload-VoiceAsrToBitiful.ps1 -Version 0.1.0 -UseLocalVoiceRoot

Bitiful upload runs automatically in GitHub Actions on every release tag. Configure repo Secrets: BITIFUL_ACCESS_KEY, BITIFUL_SECRET_KEY, BITIFUL_BUCKET_NAME; optional Variables: BITIFUL_ENDPOINT_URL, BITIFUL_VOICE_ASR_OBJECT_PREFIX. Local fallback: publish/Upload-VoiceAsrToBitiful.ps1 (see publish/.env.example).

Domestic mirror (Bitiful) — same bucket layout as QuickerAgent:

Asset URL pattern
Runtime zip https://s3.bitiful.net/quicker-pkgs/quicker-rpc/voice-asr/voice-asr-runtime-<ver>-win-x64.zip
Model zip https://s3.bitiful.net/quicker-pkgs/quicker-rpc/voice-asr/voice-asr-model-sensevoice.zip
version.txt https://s3.bitiful.net/quicker-pkgs/quicker-rpc/voice-asr/version.txt

Tauri 一键安装 tries *MirrorUrl first (Bitiful), then GitHub release; verifies *Sha256 when set in voice-plugin-channel.json.

Packaging (Windows)

pwsh -NoProfile -File ./scripts/build-win.ps1
pwsh -NoProfile -File ./scripts/package-release.ps1
# -> publish/voice-asr-runtime-<ver>-win-x64.zip
# -> publish/voice-asr-model-sensevoice.zip

User install (Tauri):设置 → 本地语音输入 → 一键安装

Dev without network: Tauri install copies from voice-asr-runtime/dist/ + models/sensevoice/ when present.

Installed layout:

Documents/QuickerAgent/plugins/voice-asr/
  manifest.json
  settings.json
  runtime/quicker-voice-runtime.exe
  runtime/_internal/...
  models/sensevoice/tokens.txt
  models/sensevoice/model.int8.onnx

Fork as standalone repo

This directory is designed to be split out:

cd voice-asr-runtime
git init
git add .
git commit -m "chore: initial quicker-voice-runtime fork"

QuickerAgent consumes it via:

  • dev: pnpm voice:dev-serveruv run quicker-voice-runtime
  • release (planned): Tauri copies/spawns packaged quicker-voice-runtime.exe under Documents/QuickerAgent/plugins/voice-asr/

Protocol

Host ↔ Runtime messages are documented in docs/voice-input-plugin.md.

Client reference: agent-gui/lib/voice-input/voice-input-ws-client.ts.

License

MIT — see LICENSE.

About

Local ASR runtime for QuickerAgent (quicker-voice-v1 WebSocket protocol)

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors