A FastAPI-based service for real-time speech-to-text using faster-whisper and WebRTC VAD.
stt_preview.mov
# Install system requirements
sudo apt install portaudio19-dev
# Install python dependencies
python3 src/setup.py
source src/stt-venv/bin/activateStart the service:
cd src/
python app.pyPython example:
import requests
with open("audio.wav", "rb") as f:
response = requests.post(
"http://localhost:47102/transcribe",
files={"file": f}
)
print(response.json()["text"])| Method | Path | Description |
|---|---|---|
| GET | /health |
Check service health and loaded model |
| POST | /transcribe |
Transcribe audio, with optional segments, word timestamps, or translation |
| POST | /vad/analyze |
Analyze uploaded audio for voice activity |
| GET | /vad/status |
Check VAD availability |
| WebSocket | /ws/vad |
Real-time voice activity detection |
| WebSocket | /ws/stt |
Streaming speech-to-text |