Skip to content

Commit bb8875e

Browse files
authored
Merge pull request #53837 from GraemeMalcolm/main
Updated Voice live module
2 parents 9f3324b + 841dbbe commit bb8875e

13 files changed

Lines changed: 519 additions & 125 deletions

learn-pr/wwl-data-ai/develop-voice-live-agent/1-introduction.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,11 @@ title: Introduction
44
metadata:
55
title: Introduction
66
description: Introduction
7-
ms.date: 10/9/2025
7+
ms.date: 03/12/2026
88
author: jeffkoms
99
ms.author: jeffko
1010
ms.topic: unit
1111
azureSandbox: false
12-
durationInMinutes: 3
12+
durationInMinutes: 1
1313
content: |
14-
[!include[](includes/1-introduction.md)]
14+
[!include[](includes/1-introduction.md)]

learn-pr/wwl-data-ai/develop-voice-live-agent/2-voice-live-api.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,11 @@ title: Explore the Azure Voice Live API
44
metadata:
55
title: Explore the Azure Voice Live API
66
description: Explore the Azure Voice Live API
7-
ms.date: 10/9/2025
7+
ms.date: 03/12/2026
88
author: jeffkoms
99
ms.author: jeffko
1010
ms.topic: unit
1111
azureSandbox: false
1212
durationInMinutes: 5
1313
content: |
14-
[!include[](includes/2-voice-live-api.md)]
14+
[!include[](includes/2-voice-live-api.md)]

learn-pr/wwl-data-ai/develop-voice-live-agent/3-voice-live-sdk.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,11 @@ title: Explore the AI Voice Live client library for Python
44
metadata:
55
title: Explore the AI Voice Live Client Library for Python
66
description: Explore the AI Voice Live client library for Python
7-
ms.date: 10/9/2025
7+
ms.date: 03/12/2026
88
author: jeffkoms
99
ms.author: jeffko
1010
ms.topic: unit
1111
azureSandbox: false
1212
durationInMinutes: 5
1313
content: |
14-
[!include[](includes/3-voice-live-sdk.md)]
14+
[!include[](includes/3-voice-live-sdk.md)]
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.develop-voice-live-agent.voice-live-agent
3+
title: Create a Voice Live agent
4+
metadata:
5+
title: Create a Voice Live agent
6+
description: Create a Voice Live agent
7+
ms.date: 03/12/2026
8+
author: jeffkoms
9+
ms.author: jeffko
10+
ms.topic: unit
11+
azureSandbox: false
12+
durationInMinutes: 5
13+
content: |
14+
[!include[](includes/3b-voice-live-agent.md)]
Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
### YamlMime:ModuleUnit
22
uid: learn.wwl.develop-voice-live-agent.exercise-develop-agent
3-
title: Exercise - Develop an Azure AI Voice Live agent
3+
title: Exercise - Develop a Voice Live agent
44
metadata:
5-
title: Exercise - Develop an Azure AI Voice Live Agent
6-
description: Exercise - Develop an Azure AI Voice Live agent
7-
ms.date: 10/9/2025
5+
title: Exercise - Develop a Voice Live Agent
6+
description: Exercise - Develop a Voice Live agent
7+
ms.date: 03/12/2026
88
author: jeffkoms
99
ms.author: jeffko
1010
ms.topic: unit
1111
azureSandbox: false
1212
durationInMinutes: 30
1313
content: |
14-
[!include[](includes/4-exercise-develop-agent.md)]
14+
[!include[](includes/4-exercise-develop-agent.md)]

learn-pr/wwl-data-ai/develop-voice-live-agent/5-knowledge-check.yml

Lines changed: 56 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ title: Module assessment
55
metadata:
66
title: Module Assessment
77
description: Module assessment
8-
ms.date: 10/9/2025
8+
ms.date: 03/12/2026
99
author: jeffkoms
1010
ms.author: jeffko
1111
ms.topic: unit
@@ -14,58 +14,58 @@ durationInMinutes: 5
1414
content: |
1515
quiz:
1616
questions:
17-
- content: "What are the two authentication methods supported by the Voice Live API?"
18-
choices:
19-
- content: "OAuth 2.0 and JWT (JSON Web Tokens)"
20-
isCorrect: false
21-
explanation: "OAuth 2.0 and JWT tokens aren't supported authentication methods for the Voice Live API."
22-
- content: "Basic authentication and API keys"
23-
isCorrect: false
24-
explanation: "API keys are supported, but basic authentication isn't a supported authentication method for the Voice Live API."
25-
- content: "Microsoft Entra (keyless) and API key"
26-
isCorrect: true
27-
explanation: "Microsoft Entra (keyless) and API key are both supported authentication methods for the Voice Live API."
28-
- content: "Which scope is required when generating a token for Microsoft Entra authentication?"
29-
choices:
30-
- content: "`https://cognitiveservices.azure.com/.default`"
31-
isCorrect: true
32-
explanation: "The scope `https://cognitiveservices.azure.com/.default`, or `https://ai.azure.com/.default`, is required for Microsoft Entra authentication."
33-
- content: "`https://management.azure.com/.default`"
34-
isCorrect: false
35-
explanation: "This is the Azure Resource Manager scope, not for AI services"
36-
- content: "`https://graph.microsoft.com/.default`"
37-
isCorrect: false
38-
explanation: "This is the Microsoft Graph API scope, not for AI services"
39-
- content: "Which protocol is used for avatar streaming integration in Voice Live API?"
40-
choices:
41-
- content: "HTTP/2"
42-
isCorrect: false
43-
explanation: "HTTP/2 is used for standard web communication."
44-
- content: "WebRTC"
45-
isCorrect: true
46-
explanation: "The Voice live API supports WebRTC-based avatar streaming for interactive applications"
47-
- content: "gRPC"
48-
isCorrect: false
49-
explanation: "gRPC is used to enable communication between microservices by using HTTP/2."
50-
- content: "Which event should be handled to stop audio playback when a user interrupts the voice agent?"
51-
choices:
52-
- content: "`ServerEventType.RESPONSE_AUDIO_DELTA`"
53-
isCorrect: false
54-
explanation: "This event is for receiving audio chunks to play, not for stopping playback"
55-
- content: "`ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED`"
56-
isCorrect: true
57-
explanation: "This event can be used to stop audio playback, and cancel any current response."
58-
- content: "`ServerEventType.SESSION_UPDATED`"
59-
isCorrect: false
60-
explanation: "This event indicates session readiness, not user speech interruption."
61-
- content: "What is the recommended authentication method for production applications using the SDK?"
62-
choices:
63-
- content: "API key authentication"
64-
isCorrect: false
65-
explanation: "API key authentication isn't recommended for production applications."
66-
- content: "Microsoft Entra authentication with DefaultAzureCredential"
67-
isCorrect: true
68-
explanation: "Microsoft Entra authentication with DefaultAzureCredential is the recommended approach for production applications."
69-
- content: "Basic username/password authentication"
70-
isCorrect: false
71-
explanation: "Basic username/password authentication isn't a supported authentication method."
17+
- content: "What are the two authentication methods supported by the Voice Live API?"
18+
choices:
19+
- content: "OAuth 2.0 and JWT (JSON Web Tokens)"
20+
isCorrect: false
21+
explanation: "OAuth 2.0 and JWT tokens aren't supported authentication methods for the Voice Live API."
22+
- content: "Basic authentication and API keys"
23+
isCorrect: false
24+
explanation: "API keys are supported, but basic authentication isn't a supported authentication method for the Voice Live API."
25+
- content: "Microsoft Entra (keyless) and API key"
26+
isCorrect: true
27+
explanation: "Microsoft Entra (keyless) and API key are both supported authentication methods for the Voice Live API."
28+
- content: "Which scope is required when generating a token for Microsoft Entra authentication?"
29+
choices:
30+
- content: "`https://cognitiveservices.azure.com/.default`"
31+
isCorrect: true
32+
explanation: "The scope `https://cognitiveservices.azure.com/.default`, or `https://ai.azure.com/.default`, is required for Microsoft Entra authentication."
33+
- content: "`https://management.azure.com/.default`"
34+
isCorrect: false
35+
explanation: "This is the Azure Resource Manager scope, not for AI services"
36+
- content: "`https://graph.microsoft.com/.default`"
37+
isCorrect: false
38+
explanation: "This is the Microsoft Graph API scope, not for AI services"
39+
- content: "Which protocol is used for avatar streaming integration in Voice Live API?"
40+
choices:
41+
- content: "HTTP/2"
42+
isCorrect: false
43+
explanation: "HTTP/2 is used for standard web communication."
44+
- content: "WebRTC"
45+
isCorrect: true
46+
explanation: "The Voice live API supports WebRTC-based avatar streaming for interactive applications"
47+
- content: "gRPC"
48+
isCorrect: false
49+
explanation: "gRPC is used to enable communication between microservices by using HTTP/2."
50+
- content: "Which event should be handled to stop audio playback when a user interrupts the voice agent?"
51+
choices:
52+
- content: "`ServerEventType.RESPONSE_AUDIO_DELTA`"
53+
isCorrect: false
54+
explanation: "This event is for receiving audio chunks to play, not for stopping playback"
55+
- content: "`ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED`"
56+
isCorrect: true
57+
explanation: "This event can be used to stop audio playback, and cancel any current response."
58+
- content: "`ServerEventType.SESSION_UPDATED`"
59+
isCorrect: false
60+
explanation: "This event indicates session readiness, not user speech interruption."
61+
- content: "What is the recommended authentication method for production applications using the SDK?"
62+
choices:
63+
- content: "API key authentication"
64+
isCorrect: false
65+
explanation: "API key authentication isn't recommended for production applications."
66+
- content: "Microsoft Entra authentication with DefaultAzureCredential"
67+
isCorrect: true
68+
explanation: "Microsoft Entra authentication with DefaultAzureCredential is the recommended approach for production applications."
69+
- content: "Basic username/password authentication"
70+
isCorrect: false
71+
explanation: "Basic username/password authentication isn't a supported authentication method."

learn-pr/wwl-data-ai/develop-voice-live-agent/6-summary.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,11 @@ title: Summary
44
metadata:
55
title: Summary
66
description: Summary
7-
ms.date: 10/9/2025
7+
ms.date: 03/12/2026
88
author: jeffkoms
99
ms.author: jeffko
1010
ms.topic: unit
1111
azureSandbox: false
12-
durationInMinutes: 3
12+
durationInMinutes: 1
1313
content: |
14-
[!include[](includes/6-summary.md)]
14+
[!include[](includes/6-summary.md)]
Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
Voice-enabled applications are transforming how we interact with technology, and this module guides you through building a real-time, interactive voice solutions using advanced APIs and tools. The Azure AI Voice live API is a solution enabling low-latency, high-quality speech to speech interactions for voice agents. The API is designed for developers seeking scalable and efficient voice-driven experiences as it eliminates the need to manually orchestrate multiple components.
1+
Voice-enabled applications are transforming how we interact with technology, and this module guides you through building a real-time, interactive voice solutions using advanced APIs and tools. The Voice live API in Azure Speech in Foundry Tools is a solution enabling low-latency, high-quality speech to speech interactions for voice agents. The API is designed for developers seeking scalable and efficient voice-driven experiences as it eliminates the need to manually orchestrate multiple components.
22

33
After completing this module, you'll be able to:
44

5-
- Implement the Azure AI Voice Live API to enable real-time, bidirectional communication.
5+
- Implement the Azure Speech Voice Live API to enable real-time, bidirectional communication.
66
- Set up and configure the agent session.
77
- Develop and manage event handlers to create dynamic and interactive user experiences.
8-
- Build and deploy a Python-based web app with real-time voice interaction capabilities to Azure.
8+
- Use Voice Live with a Foundry Agent.

learn-pr/wwl-data-ai/develop-voice-live-agent/includes/2-voice-live-api.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ The Voice live API provides real-time communication using WebSocket connections.
88
- Events are categorized into client events (sent from client to server) and server events (sent from server to client).
99

1010
Key features include:
11+
1112
- Real-time audio processing with support for multiple formats like PCM16 and G.711.
1213
- Advanced voice options, including OpenAI voices and Azure custom voices.
1314
- Avatar integration using WebRTC for video and animation.
@@ -20,7 +21,7 @@ For a table of supported models and regions, visit the [Voice Live API overview]
2021

2122
## Connect to the Voice Live API
2223

23-
The Voice live API supports two authentication methods: Microsoft Entra (keyless) and API key. Microsoft Entra uses token-based authentication for a Microsoft Foundry resource. You apply a retrieved authentication token using a `Bearer` token with the `Authorization` header.
24+
The Voice live API supports two authentication methods: Microsoft Entra (keyless) and API key. Microsoft Entra uses token-based authentication for a Microsoft Foundry resource. You apply a retrieved authentication token using a `Bearer` token with the `Authorization` header.
2425

2526
For the recommended keyless authentication with Microsoft Entra ID, you need to assign the **Cognitive Services User** role to your user account or a managed identity. You generate a token using the Azure CLI or Azure SDKs. The token must be generated with the `https://ai.azure.com/.default` scope, or the legacy `https://cognitiveservices.azure.com/.default` scope. Use the token in the `Authorization` header of the WebSocket connection request, with the format `Bearer <token>`.
2627

@@ -31,10 +32,10 @@ For key access, an API key can be provided in one of two ways. You can use an `a
3132
3233
### WebSocket endpoint
3334

34-
The endpoint to use varies depending on how you want to access your resources. You can access resources through a connection to the AI Foundry project (Agent), or through a connection to the model.
35+
The endpoint to use varies depending on how you want to access your resources. You can access resources through a connection to the Foundry project when implementing an agent, or through a direct connection to a model.
3536

3637
- **Project connection:** The endpoint is `wss://<your-ai-foundry-resource-name>.services.ai.azure.com/voice-live/realtime?api-version=2025-10-01`
37-
- **Model connection:** The endpoint is `wss://<your-ai-foundry-resource-name>.cognitiveservices.azure.com/voice-live/realtime?api-version=2025-10-01`.
38+
- **Model connection:** The endpoint is `wss://<your-ai-foundry-resource-name>.cognitiveservices.azure.com/voice-live/realtime?api-version=2025-10-01`.
3839

3940
The endpoint is the same for all models. The only difference is the required `model` query parameter, or, when using the Agent service, the `agent_id` and `project_id` parameters.
4041

@@ -137,6 +138,3 @@ Example configuration:
137138

138139
> [!TIP]
139140
> Use high-resolution video settings for enhanced visual quality in avatar interactions.
140-
141-
142-

0 commit comments

Comments
 (0)