ServiceNow · weiz9 · Jun 10, 2026 · Jun 10, 2026 · Jun 10, 2026 · Jun 10, 2026
diff --git a/.env.example b/.env.example
@@ -117,7 +117,7 @@ EVA_MODEL__TTS_PARAMS='{"api_key": "your_cartesia_api_key", "model": "sonic"}'
 # --- Framework (S2S / AudioLLM) ---
 #i Base framework for S2S or AudioLLM pipelines.
 #d enum
-#e pipecat,openai_realtime,gemini_live,elevenlabs,grok_voice
+#e pipecat,openai_realtime,gemini_live,elevenlabs,grok_voice,deepgram
 #v EVA_FRAMEWORK=openai_realtime
 
 # ==============================================

diff --git a/docs/assistant_server_contract.md b/docs/assistant_server_contract.md
@@ -535,3 +535,59 @@ the run to fail or produce `None` latency fields in the result.
 | `audio_assistant.wav` | Yes | TTS quality metrics |
 | `framework_logs.jsonl` | Yes | Turn boundary metrics |
 | `pipecat_metrics.jsonl` | Yes | `model_response_latency` in `ConversationResult` |
+
+---
+
+## 13. Reference implementation: Deepgram Voice Agent
+
+`src/eva/assistant/deepgram_server.py` (`framework: deepgram`) bridges to Deepgram's
+**Voice Agent API** (unified STT→LLM→TTS over one WebSocket) via the `deepgram-sdk`
+`client.agent.v1.connect()` interface. It is the closest analogue to the Gemini Live
+server and a good template for a new S2S framework.
+
+Notable points specific to Deepgram:
+
+- **Config.** `framework: deepgram`, `model: {s2s: deepgram, s2s_params: {...}}`. Recognised
+  `s2s_params`: `api_key` and `model` (both **required**; `model` is the exact Deepgram LLM id,
+  e.g. `gpt-4o-mini` or `claude-haiku-4-5`), `think_provider` (default `open_ai`; use `anthropic`
+  for Claude models, `aws_bedrock`/`google`/`groq` for the others), `think_label` (optional short
+  metrics/run_id label — Deepgram still receives `model`), `listen_model` (STT, default `nova-3`),
+  `speak_model` (TTS, default `aura-2-thalia-en`). The conversation language comes from the run's
+  `language` (base server), not `s2s_params`.
+- **Managed vs. BYO think.** By default Deepgram runs the LLM with its *managed* provider
+  (Deepgram's own credentials/quota). To drive think with **your own** credentials — for an
+  apples-to-apples comparison with the cascade pipeline (same Bedrock model) and to bypass
+  managed prompt-size limits — add any of these `s2s_params`:
+  - `think_credentials` (dict) → `think.provider.credentials`. Only `aws_bedrock` takes
+    credentials (`{type: iam|sts, region, access_key_id, secret_access_key, session_token?}`).
+    If omitted for `aws_bedrock`, the server builds IAM/STS creds from `AWS_ACCESS_KEY_ID` /
+    `AWS_SECRET_ACCESS_KEY` / `AWS_SESSION_TOKEN` env vars (region from `think_region`,
+    else `AWS_REGION`/`AWS_DEFAULT_REGION`, else `us-east-1`).
+  - `think_endpoint` (dict `{url, headers}`) → `think.endpoint`. This is how `anthropic`/
+    `open_ai`/`google` BYO works (they have no `credentials` field): point at your own URL and
+    pass your key in `headers`.
+  - `think_params` (dict) → merged into `think.provider` (e.g. `temperature`, `version`,
+    `reasoning_mode`).
+  - `context_length` (`"max"` | int) → `think.context_length`, for long prompts.
+  - `think_region` (str) → region for the `aws_bedrock` env-credential builder.
+- **Fail-loud.** A think outage (provider error, or no model reply after the caller speaks —
+  greeting only) is logged at ERROR with guidance, instead of silently producing a clean-looking
+  1-turn conversation. Watch for `produced NO think response` / `conversation FAILED`.
+- **Evaluation.** Although configured via `s2s`, Deepgram is scored as a **cascade** pipeline
+  (`get_pipeline_type` → `CASCADE`), since it runs STT→LLM→TTS internally — so STT/TTS metrics
+  (`stt_wer`, `transcription_accuracy_key_entities`, `speakability`) run. `pipeline_parts` exposes
+  `{stt, llm, tts}` so the run_id/folder shows the three component models.
+- **Settings.** Sent once on connect via `send_settings(AgentV1Settings)`. Built from a plain
+  dict and validated with `AgentV1Settings.model_validate(...)`, which resolves the
+  discriminated provider unions. Audio is `linear16` @ 24 kHz both directions with output
+  `container: "none"` (raw PCM); `agent.greeting` carries `INITIAL_MESSAGE`.
+- **Tools.** Configured under `agent.think.functions` (no `endpoint` ⇒ *client-side*), so the
+  agent emits `FunctionCallRequest` events; reply with `send_function_call_response`.
+- **Events.** The session yields raw `bytes` (TTS audio) or JSON events: `Welcome`,
+  `SettingsApplied`, `ConversationText`, `History`, `UserStartedSpeaking`, `AgentThinking`,
+  `AgentStartedSpeaking`, `AgentAudioDone`, `FunctionCallRequest`, `Error`, `Warning`. Acted-on
+  events are `ConversationText` (transcript), `AgentAudioDone` (turn end), `UserStartedSpeaking`
+  (barge-in), `FunctionCallRequest` (tool call), `Error`/`FatalError` (fail-loud); the rest are
+  informational.
+- **Limitation.** The Voice Agent event stream exposes no token-usage event, so token usage is
+  not reported for this framework. Latency is still emitted on the first audio chunk per turn.
diff --git a/src/eva/__init__.py b/src/eva/__init__.py
@@ -7,7 +7,7 @@
 
 # Bump simulation_version when changes affect benchmark outputs (agent code,
 # user simulator, orchestrator, simulation prompts, agent configs, tool mocks).
-simulation_version = "2.0.1"
+simulation_version = "2.0.3"
 
 # Bump metrics_version when changes affect metric computation (metrics code,
 # judge prompts, pricing tables, postprocessor).