Skip to content

Commit 2faedec

Browse files
authored
Merge pull request #54430 from Orin-Thomas/orthomas-28APR26a
Freshness & Technical Review
2 parents 2bdfdb1 + 8ac0b36 commit 2faedec

15 files changed

Lines changed: 703 additions & 1682 deletions

learn-pr/paths/tensorflow-fundamentals/index.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,6 @@ prerequisites: |
1818
- Basic knowledge about how to use Jupyter Notebooks
1919
- Basic understanding of machine learning
2020
iconUrl: /training/achievements/tensor-intro-trophy.svg
21-
hidden: true
2221
levels:
2322
- beginner
2423
- intermediate
@@ -32,7 +31,7 @@ modules:
3231
- learn.tensorflow.intro-machine-learning-keras
3332
- learn.tensorflow.intro-computer-vision
3433
- learn.tensorflow.intro-natural-language-processing
35-
- learn.tensorflow.intro-audio-classification-tensorflow
34+
- learn.tensorflow.intro-audio-classification
3635
- learn.tensorflow.intro-machine-learning-tensorflow
3736
trophy:
3837
uid: learn.tensorflow.tensorflow-fundamentals.trophy

learn-pr/tensorflow/intro-audio-classification-tensorflow/1-introduction.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
### YamlMime:ModuleUnit
2-
uid: learn.tensorflow.intro-audio-classification-tensorflow.introduction
2+
uid: learn.tensorflow.intro-audio-classification.introduction
33
title: Introduction
44
metadata:
55
title: Introduction
66
description: Introduction
77
author: Orin-Thomas
88
ms.author: orthomas
9-
ms.date: 08/03/2021
9+
ms.date: 04/20/2026
10+
ms.update-cycle: 180-days
1011
ms.topic: unit
12+
ms.collection: ce-advocates-ai-copilot
1113
ms.custom:
1214
- team=nextgen
1315
- team=cloud_advocates
Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,46 @@
11
### YamlMime:ModuleUnit
2-
uid: learn.tensorflow.intro-audio-classification-tensorflow.understand-audio-data
2+
uid: learn.tensorflow.intro-audio-classification.understand-audio-data
33
title: Understanding audio data
44
metadata:
55
title: Understanding audio data
66
description: Understanding audio data
77
author: Orin-Thomas
88
ms.author: orthomas
9-
ms.date: 08/03/2021
9+
ms.date: 04/20/2026
10+
ms.update-cycle: 180-days
1011
ms.topic: unit
12+
ms.collection: ce-advocates-ai-copilot
1113
ms.custom:
1214
- team=nextgen
1315
- team=cloud_advocates
1416
ms.product: learning-tensorflow
1517
ms.contributors:
1618
- cassieb-08202021
1719
durationInMinutes: 10
18-
sandbox: true
19-
notebook: notebooks/2-understand-audio-data.ipynb
20+
content: |
21+
[!include[](includes/2-understand-audio-data.md)]
2022
quiz:
2123
title: Check your knowledge
2224
questions:
2325
- content: "What is the sample rate?"
2426
choices:
25-
- content: "Frequency mapped to time."
27+
- content: "The number of audio samples captured per second."
28+
isCorrect: true
29+
explanation: "Correct. A 16 kHz sample rate means 16,000 samples are captured each second."
30+
- content: "Frequency mapped over time."
2631
isCorrect: false
27-
explanation: "Incorrect, frequency mapped to time is a Spectrogram."
28-
- content: "The audio channels."
32+
explanation: "Incorrect. Frequency content over time is represented by a spectrogram."
33+
- content: "The number of audio channels."
2934
isCorrect: false
30-
explanation: "Although audio channels can be used in sampling, this is not what sample rate is"
31-
- content: "Sampling analog sound at consistent intervals of time to create a digital sound representation."
32-
isCorrect: true
33-
explanation: "Correct!"
35+
explanation: "Incorrect. Channels describe how many separate audio signals are stored, such as mono or stereo."
3436
- content: "What is the waveform?"
3537
choices:
36-
- content: "Frequency mapped to time."
37-
isCorrect: false
38-
explanation: "Incorrect, frequency mapped to time is a Spectrogram."
39-
- content: "Sample rate and frequency visualized."
38+
- content: "The amplitude of an audio signal over time."
4039
isCorrect: true
41-
explanation: "Correct! We can visualize our data using a waveform to map sample rate and frequency"
42-
- content: "The audio channels."
40+
explanation: "Correct. A waveform shows how the signal amplitude changes across samples or time."
41+
- content: "Frequency mapped over time."
42+
isCorrect: false
43+
explanation: "Incorrect. Frequency content over time is represented by a spectrogram."
44+
- content: "The number of audio channels."
4345
isCorrect: false
44-
explanation: "Incorrect."
46+
explanation: "Incorrect. Channels describe separate audio signals, not the waveform itself."
Lines changed: 30 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,49 +1,57 @@
11
### YamlMime:ModuleUnit
2-
uid: learn.tensorflow.intro-audio-classification-tensorflow.visualizations-transforms
2+
uid: learn.tensorflow.intro-audio-classification.visualizations-transforms
33
title: Visualizing and transforming data
44
metadata:
55
title: Visualizing and transforming data
66
description: Visualizing and transforming data
77
author: Orin-Thomas
88
ms.author: orthomas
9-
ms.date: 08/03/2021
9+
ms.date: 04/20/2026
10+
ms.update-cycle: 180-days
1011
ms.topic: unit
12+
ms.collection: ce-advocates-ai-copilot
1113
ms.custom:
1214
- team=nextgen
1315
- team=cloud_advocates
1416
ms.product: learning-tensorflow
1517
ms.contributors:
1618
- cassieb-08202021
1719
durationInMinutes: 15
18-
sandbox: true
19-
notebook: notebooks/3-visualizations-transforms.ipynb
20+
content: |
21+
[!include[](includes/3-visualizations-transforms.md)]
2022
quiz:
2123
title: Check your knowledge
2224
questions:
2325
- content: "When you resample the audio, you are..."
2426
choices:
25-
- content: "Increasing the size."
26-
isCorrect: false
27-
explanation: "Incorrect."
28-
- content: "Reducing the size."
27+
- content: "Changing the number of samples used to represent each second of audio."
2928
isCorrect: true
30-
explanation: "Correct! We can reduce the size of the file by reducing the sample rate for the audio track."
29+
explanation: "Correct. Resampling changes the sample rate; it can downsample or upsample depending on the target rate."
30+
- content: "Always increasing the size."
31+
isCorrect: false
32+
explanation: "Incorrect. Resampling can increase or decrease the number of samples."
33+
- content: "Always reducing the size."
34+
isCorrect: false
35+
explanation: "Incorrect. Downsampling can reduce size, but resampling also includes upsampling."
3136
- content: "What is a spectrogram?"
3237
choices:
33-
- content: "Maps the frequency to time of an audio file."
38+
- content: "A visualization of frequency content over time, usually with intensity or color showing magnitude."
3439
isCorrect: true
35-
explanation: "Correct!"
36-
- content: "The audio channels."
40+
explanation: "Correct. A spectrogram shows how the strength of different frequencies changes over time."
41+
- content: "The number of audio channels."
3742
isCorrect: false
38-
explanation: "Incorrect."
39-
- content: "Sample rate and frequency visualized."
43+
explanation: "Incorrect. Channels describe separate audio signals, such as left and right stereo channels."
44+
- content: "The amplitude of the audio signal over time."
4045
isCorrect: false
41-
explanation: "Incorrect this is a waveform."
42-
- content: "Audio classification can only be done with computer vision on spectrograms."
46+
explanation: "Incorrect. That describes a waveform."
47+
- content: "Which input representation can be used for audio classification?"
4348
choices:
44-
- content: "True"
45-
isCorrect: False
46-
explanation: "Incorrect. There's more than one way to build audio classification models."
47-
- content: "False"
48-
isCorrect: True
49-
explanation: "Correct! There's more than one way to build audio classification models."
49+
- content: "Waveforms, engineered audio features, or spectrogram tensors, depending on the model design."
50+
isCorrect: true
51+
explanation: "Correct. This module uses spectrograms, but audio classifiers can also learn from raw waveforms or other audio features."
52+
- content: "Only PNG images created from spectrograms."
53+
isCorrect: false
54+
explanation: "Incorrect. Saving spectrograms as images is optional and can add unnecessary file I/O or resizing artifacts."
55+
- content: "Only the number of audio channels."
56+
isCorrect: false
57+
explanation: "Incorrect. Channel count is useful metadata, but it doesn't represent the audio pattern to classify."
Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,21 @@
11
### YamlMime:ModuleUnit
2-
uid: learn.tensorflow.intro-audio-classification-tensorflow.speech-model
2+
uid: learn.tensorflow.intro-audio-classification.speech-model
33
title: Build the model
44
metadata:
55
title: Build the model
66
description: Build the model
77
author: Orin-Thomas
88
ms.author: orthomas
9-
ms.date: 08/03/2021
9+
ms.date: 04/20/2026
10+
ms.update-cycle: 180-days
1011
ms.topic: unit
12+
ms.collection: ce-advocates-ai-copilot
1113
ms.custom:
1214
- team=nextgen
1315
- team=cloud_advocates
1416
ms.product: learning-tensorflow
1517
ms.contributors:
1618
- cassieb-08202021
1719
durationInMinutes: 15
18-
sandbox: true
19-
notebook: notebooks/4-speech-model.ipynb
20+
content: |
21+
[!include[](includes/4-speech-model.md)]

learn-pr/tensorflow/intro-audio-classification-tensorflow/5-summary.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
### YamlMime:ModuleUnit
2-
uid: learn.tensorflow.intro-audio-classification-tensorflow.summary
2+
uid: learn.tensorflow.intro-audio-classification.summary
33
title: Summary
44
metadata:
55
title: Summary
66
description: Summary
77
author: Orin-Thomas
88
ms.author: orthomas
9-
ms.date: 08/03/2021
9+
ms.date: 04/20/2026
10+
ms.update-cycle: 180-days
1011
ms.topic: unit
12+
ms.collection: ce-advocates-ai-copilot
1113
ms.custom:
1214
- team=nextgen
1315
- team=cloud_advocates
Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,17 @@
1-
Ever wonder how the voice assistants actually work? How do they understand the words that we say? When you think about voice assistants you have the first step, which is speech to text, then the Natural Language Processing (NLP) step, which is the word embedding (turning words into numbers), then you have a classification of the utterance (what people say) to the intent (what they want the voice assistant to do). If you are following this learning path, you will have learned how the NLP part works already. Now we want to look at how we get the text from the spoken audio. Audio classification can be used for many things, not just speech assistants. For example, in music you can classify genres, or detect illness by the tone in someone's voice, and even more applications that we haven't even thought of yet.
1+
Ever wonder how voice assistants recognize short commands such as "yes," "no," or "stop"? Full speech assistants usually combine many systems, including audio capture, speech recognition, natural language processing, and intent classification. This module focuses on one smaller but important task: keyword classification from short audio clips.
22

3-
In this learn module we will be learning how to do audio classification with TensorFlow. There are multiple ways to build an audio classification model. You can use the waveform, tag sections of a wave file, or even use computer vision on the spectrogram image. In this tutorial, we will first break down how to understand audio data, from analog to digital representations, then we will build the model using computer vision on the spectrogram images. That's right, you can turn audio into an image representation and then use computer vision to classify the word spoken! We will be building a simple model that can understand `yes` and `no`. The dataset we will be using is the open dataset Speech Commands which are built into TensorFlow datasets. This dataset has 36 total different words/sounds to be used for classification. Each utterance is stored as a one-second (or less) WAVE format file. We will only be using `yes` and `no` for a binary classification.
3+
There are multiple ways to build an audio classification model. A model can learn directly from waveforms, from engineered audio features, or from spectrograms that represent frequency content over time. In this module, you use TensorFlow to transform audio waveforms into spectrogram tensors and train a simple convolutional neural network to classify the words `yes` and `no`.
4+
5+
The examples use the smaller mini Speech Commands dataset that TensorFlow provides for tutorials. The original [Speech Commands dataset](https://www.tensorflow.org/datasets/catalog/speech_commands) ([Warden, 2018](https://arxiv.org/abs/1804.03209)) contains more than 105,000 one-second or shorter WAV files across 35 spoken words. The mini Speech Commands dataset contains eight commands, and this module uses only the `yes` and `no` folders for binary classification.
46

57
## Learning objectives
6-
- Understand some key features of audio data.
7-
- Introduction to how to build audio machine learning models.
8-
- Learn how to build a binary classification model from wave files.
8+
9+
- Understand key features of audio data, including sample rate, amplitude, channels, and waveforms.
10+
- Convert audio waveforms into spectrogram tensors.
11+
- Build and evaluate a binary keyword classification model from WAV files.
912

1013
## Prerequisites
11-
- Knowledge of Python
12-
- Basic understand of machine learning
14+
15+
- Basic Python knowledge
16+
- Basic understanding of machine learning
17+
- A Python environment that supports TensorFlow 2.10 or later, with TensorFlow and Matplotlib installed. Use a Python version supported by the TensorFlow release you install. For setup guidance, see [Install TensorFlow with pip](https://www.tensorflow.org/install/pip) and [Install Matplotlib](https://matplotlib.org/stable/users/installing/index.html).

0 commit comments

Comments
 (0)