You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
explanation: "The scope `https://cognitiveservices.azure.com/.default`, or `https://ai.azure.com/.default`, is required for Microsoft Entra authentication."
explanation: "The scope `https://cognitiveservices.azure.com/.default`, or `https://ai.azure.com/.default`, is required for Microsoft Entra authentication."
Voice-enabled applications are transforming how we interact with technology, and this module guides you through building a real-time, interactive voice solutions using advanced APIs and tools. The Azure AI Voice live API is a solution enabling low-latency, high-quality speech to speech interactions for voice agents. The API is designed for developers seeking scalable and efficient voice-driven experiences as it eliminates the need to manually orchestrate multiple components.
1
+
Voice-enabled applications are transforming how we interact with technology, and this module guides you through building a real-time, interactive voice solutions using advanced APIs and tools. The Voice live API in Azure Speech in Foundry Tools is a solution enabling low-latency, high-quality speech to speech interactions for voice agents. The API is designed for developers seeking scalable and efficient voice-driven experiences as it eliminates the need to manually orchestrate multiple components.
2
2
3
3
After completing this module, you'll be able to:
4
4
5
-
- Implement the Azure AI Voice Live API to enable real-time, bidirectional communication.
5
+
- Implement the Azure Speech Voice Live API to enable real-time, bidirectional communication.
6
6
- Set up and configure the agent session.
7
7
- Develop and manage event handlers to create dynamic and interactive user experiences.
8
-
-Build and deploy a Python-based web app with real-time voice interaction capabilities to Azure.
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/develop-voice-live-agent/includes/2-voice-live-api.md
+4-6Lines changed: 4 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,6 +8,7 @@ The Voice live API provides real-time communication using WebSocket connections.
8
8
- Events are categorized into client events (sent from client to server) and server events (sent from server to client).
9
9
10
10
Key features include:
11
+
11
12
- Real-time audio processing with support for multiple formats like PCM16 and G.711.
12
13
- Advanced voice options, including OpenAI voices and Azure custom voices.
13
14
- Avatar integration using WebRTC for video and animation.
@@ -20,7 +21,7 @@ For a table of supported models and regions, visit the [Voice Live API overview]
20
21
21
22
## Connect to the Voice Live API
22
23
23
-
The Voice live API supports two authentication methods: Microsoft Entra (keyless) and API key. Microsoft Entra uses token-based authentication for a Microsoft Foundry resource. You apply a retrieved authentication token using a `Bearer` token with the `Authorization` header.
24
+
The Voice live API supports two authentication methods: Microsoft Entra (keyless) and API key. Microsoft Entra uses token-based authentication for a Microsoft Foundry resource. You apply a retrieved authentication token using a `Bearer` token with the `Authorization` header.
24
25
25
26
For the recommended keyless authentication with Microsoft Entra ID, you need to assign the **Cognitive Services User** role to your user account or a managed identity. You generate a token using the Azure CLI or Azure SDKs. The token must be generated with the `https://ai.azure.com/.default` scope, or the legacy `https://cognitiveservices.azure.com/.default` scope. Use the token in the `Authorization` header of the WebSocket connection request, with the format `Bearer <token>`.
26
27
@@ -31,10 +32,10 @@ For key access, an API key can be provided in one of two ways. You can use an `a
31
32
32
33
### WebSocket endpoint
33
34
34
-
The endpoint to use varies depending on how you want to access your resources. You can access resources through a connection to the AI Foundry project (Agent), or through a connection to the model.
35
+
The endpoint to use varies depending on how you want to access your resources. You can access resources through a connection to the Foundry project when implementing an agent, or through a direct connection to a model.
35
36
36
37
-**Project connection:** The endpoint is `wss://<your-ai-foundry-resource-name>.services.ai.azure.com/voice-live/realtime?api-version=2025-10-01`
37
-
-**Model connection:** The endpoint is `wss://<your-ai-foundry-resource-name>.cognitiveservices.azure.com/voice-live/realtime?api-version=2025-10-01`.
38
+
-**Model connection:** The endpoint is `wss://<your-ai-foundry-resource-name>.cognitiveservices.azure.com/voice-live/realtime?api-version=2025-10-01`.
38
39
39
40
The endpoint is the same for all models. The only difference is the required `model` query parameter, or, when using the Agent service, the `agent_id` and `project_id` parameters.
40
41
@@ -137,6 +138,3 @@ Example configuration:
137
138
138
139
> [!TIP]
139
140
> Use high-resolution video settings for enhanced visual quality in avatar interactions.
0 commit comments