You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: learn-pr/tensorflow/intro-natural-language-processing-tensorflow/3-embeddings.yml
+3-4Lines changed: 3 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -6,9 +6,8 @@ metadata:
6
6
description: Embeddings are a way to represent words using some vector representation that has nice semantic properties. We discuss different embeddings, and how using embeddings can improve classification accuracy.
Copy file name to clipboardExpand all lines: learn-pr/tensorflow/intro-natural-language-processing-tensorflow/4-recurrent-networks.yml
+3-4Lines changed: 3 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -6,9 +6,8 @@ metadata:
6
6
description: While traditional fully connected networks don't allow us to capture word order, RNN is a mechanism that can capture patterns in sequences. We show how to use RNN for text classification, and discuss different RNN architectures, such as LSTM and GRU.
Copy file name to clipboardExpand all lines: learn-pr/tensorflow/intro-natural-language-processing-tensorflow/6-knowledge-check.yml
+27-19Lines changed: 27 additions & 19 deletions
Original file line number
Diff line number
Diff line change
@@ -4,23 +4,22 @@ title: Module assessment
4
4
metadata:
5
5
title: Module assessment
6
6
description: Check your knowledge
7
-
ms.date: 07/07/2021
7
+
ms.date: 03/29/2026
8
8
author: Orin-Thomas
9
9
ms.author: orthomas
10
10
ms.topic: unit
11
-
ms.custom: team=nextgen
12
11
module_assessment: true
13
12
durationInMinutes: 5
14
13
quiz:
15
14
questions:
16
-
- content: "Suppose your text corpus contains 80,000 different words. Which of the below would you complete reducing the dimensionality of the input vector to neural classifier?"
15
+
- content: "Suppose your text corpus contains 80,000 different words. Which of the following would help reduce the dimensionality of the input vector to a neural classifier?"
17
16
choices:
18
17
- content: "Randomly select 10% of the words and ignore the rest."
19
18
isCorrect: false
20
19
explanation: "It's definitely not a good idea, especially because you risk omitting semantically important words"
21
20
- content: "Use convolutional layer before fully connected classifier layer"
22
21
isCorrect: false
23
-
explanation: "Convolutional layers don't reduce the dimensionality of input vectors"
22
+
explanation: "Convolutional layers extract spatial features from an already-encoded input, but they don't reduce the vocabulary dimension itself. An embedding layer is the standard approach to map sparse high-dimensional word representations into dense low-dimensional vectors."
24
23
- content: "Use embedding layer before fully connected classifier layer"
25
24
isCorrect: true
26
25
explanation: "This is correct"
@@ -45,32 +44,41 @@ quiz:
45
44
choices:
46
45
- content: "A network is applied for each input element and output from the previous application is passed to the next one"
47
46
isCorrect: true
48
-
explanation: "This is correct."
47
+
explanation: "This is correct. The same network weights are applied at each time step, and the hidden state from the previous step is passed as input to the next, creating a recurrence."
49
48
- content: "It's trained by a recurrent process"
50
49
isCorrect: false
51
-
explanation: "Recurrent neural network is trained in the same manner as any other neural network"
52
-
- content: "It consists of layers which include other subnetworks"
50
+
explanation: "Recurrent neural networks are trained using backpropagation through time, but that is the training algorithm, not the reason they're called recurrent."
51
+
- content: "It consists of layers, which include other subnetworks"
53
52
isCorrect: false
54
-
explanation: "While you can consider recurrent block to be a combination of two linear layers, it has nothing to do with recurrence"
53
+
explanation: "While you can consider a recurrent block to be a combination of two linear layers, nesting subnetworks has nothing to do with recurrence."
54
+
- content: "The network processes the entire input multiple times in repeated passes"
55
+
isCorrect: false
56
+
explanation: "An RNN processes the input sequence once, stepping through it one token at a time. The recurrence refers to passing state between steps, not revisiting the entire input."
55
57
- content: "What is the main idea behind LSTM network architecture?"
56
58
choices:
57
59
- content: "Fixed number of LSTM blocks for the whole dataset"
58
60
isCorrect: false
59
-
explanation: "Number of LSTM blocks depend on the sequence length in the minibatch"
61
+
explanation: "The number of LSTM blocks depends on the sequence length in the minibatch, not on the dataset as a whole."
60
62
- content: "It contains many layers of recurrent neural networks"
61
63
isCorrect: false
62
-
explanation: "LSTM can consist of one or more levels"
63
-
- content: "Explicit state management with forgetting and state triggering"
64
+
explanation: "An LSTM can consist of one or more layers. The defining feature of an LSTM is its gating mechanism, not the number of layers."
65
+
- content: "LSTMs use gating mechanisms (forget, input, and output gates) that explicitly control which information is retained or discarded across time steps"
64
66
isCorrect: true
65
-
explanation: "In LSTM, each block receives and outputs a state, which is manipulated upon inside the block depending on input and previous state."
66
-
- content: "What is the main idea of attention?"
67
+
explanation: "Correct. LSTM gates solve the vanishing gradient problem found in simple RNNs by allowing the network to selectively retain or discard information in its cell state across many time steps."
68
+
- content: "LSTMs use a larger hidden state vector than simple RNNs"
69
+
isCorrect: false
70
+
explanation: "The hidden state size is a hyperparameter that can be set to any value for both simple RNNs and LSTMs. The key innovation of LSTMs is the gating mechanism, not the size of the hidden state."
71
+
- content: "What is the main advantage of using TF-IDF representation over a simple bag-of-words representation?"
67
72
choices:
68
-
- content: "Attention assigns a weight coefficient to each word in the vocabulary to show how important it's"
73
+
- content: "TF-IDF captures the order of words in a sentence"
69
74
isCorrect: false
70
-
explanation: "Not correct. Attention works inside each sentence, and reflects relative importance between words."
71
-
- content: "Attention is a network layer that uses attention matrix to see how much input states from each step affect the final result."
75
+
explanation: "Neither bag-of-words nor TF-IDF captures word order. Both represent documents as unordered collections of word weights."
76
+
- content: "TF-IDF gives higher weight to words that are more important for distinguishing documents, by down-weighting common words"
72
77
isCorrect: true
73
-
explanation: "Correct. By looking at attention matrix we can visually estimate which words play more important role in different parts of the sentence."
74
-
- content: "Attention builds global correlation matrix between all words in vocabulary, showing their cooccurrence"
78
+
explanation: "Correct. TF-IDF reduces the weight of frequently occurring words (like 'the' and 'a') and increases the weight of words that are distinctive to specific documents."
79
+
- content: "TF-IDF uses neural networks to learn word importance"
80
+
isCorrect: false
81
+
explanation: "TF-IDF is a purely statistical method based on term frequency and document frequency. It doesn't involve any neural network training."
82
+
- content: "TF-IDF produces lower-dimensional vectors than bag-of-words"
75
83
isCorrect: false
76
-
explanation: "This isn't correct, attention computer relative importance of words inside each sentence."
84
+
explanation: "TF-IDF vectors have the same dimensionality as bag-of-words vectors (one element per vocabulary term). The difference is that TF-IDF assigns floating-point weights instead of simple counts."
In this module, we will explore different neural network architectures for dealing with natural language text. In recent years, **Natural Language Processing** (NLP) has experienced fast growth as a field, both because of improvements to the language model architectures and because they've been trained on increasingly large text corpora. As a result, their ability to "understand" text has vastly improved, and large pre-trained models such as BERT have become widely used.
1
+
In this module, we explore different neural network architectures for dealing with natural language text. In recent years, **Natural Language Processing** (NLP) has experienced fast growth as a field, both because of improvements to the language model architectures and because they've been trained on increasingly large text corpora. As a result, their ability to "understand" text has vastly improved.
2
2
3
-
We will focus on the fundamental aspects of representing NLP as tensors in TensorFlow, and on classical NLP architectures, such as using bag-of-words, embeddings and recurrent neural networks.
3
+
We focus on the fundamental aspects of representing NLP as tensors in TensorFlow, and on classical NLP architectures, such as using bag-of-words, embeddings, and recurrent neural networks.
4
4
5
-
## Natural Language Tasks
5
+
## Natural language tasks
6
6
7
7
There are several NLP tasks that we can solve using neural networks:
8
-
***Text Classification** is used when we need to classify a text fragment into one of several predefined classes. Examples include e-mail spam detection, news categorization, assigning a support request to a category, and more.
9
-
***Intent Classification** is one specific case of text classification, where we want to map an input utterance in the conversational AI system into one of the intents that represent the actual meaning of the phrase, or intent of the user.
10
-
***Sentiment Analysis** is a regression task, where we want to understand the degree of positivity of a given piece of text. We may want to label text in a dataset from most negative (-1) to most positive (+1), and train a model that will output a number representing the positivity of the input text.
11
-
***Named Entity Recognition** (NER) is the task of extracting entities from text, such as dates, addresses, people names, etc. Together with intent classification, NER is often used in dialog systems to extract parameters from the user's utterance.
12
-
* A similar task of **Keyword Extraction** can be used to find the most meaningful words inside a text, which can then be used as tags.
13
-
***Text Summarization** extracts the most meaningful pieces of text, giving the user a compressed version of the original text.
8
+
***Text Classification** is used when we need to classify a text fragment into one of several predefined classes. Examples include e-mail spam detection, news categorization, assigning a support request to a category, and more.
9
+
***Intent Classification** is one specific case of text classification, where we want to map an input utterance in the conversational AI system into one of the intents that represent the actual meaning of the phrase, or intent of the user.
10
+
***Sentiment Analysis** is the task of understanding the degree of positivity of a given piece of text. It can be approached as a classification task (for example, labeling text as positive, negative, or neutral) or as a regression task, where we label text from most negative (-1) to most positive (+1) and train a model that outputs a number representing the positivity of the input text.
11
+
***Named Entity Recognition** (NER) is the task of extracting entities from text, such as dates, addresses, people names, etc. Together with intent classification, NER is often used in dialog systems to extract parameters from the user's utterance.
12
+
* A similar task of **Keyword Extraction** can be used to find the most meaningful words inside a text, which can then be used as tags.
13
+
***Text Summarization** extracts the most meaningful pieces of text, giving the user a compressed version of the original text.
14
14
***Question Answering** is the task of extracting an answer from a piece of text. This model takes a text fragment and a question as input, and finds the exact place within the text that contains the answer. For example, the text "*John is a 22 year old student who loves to use Microsoft Learn*", and the question *How old is John* should provide us with the answer *22*.
15
15
16
-
In this module, we will mostly focus on the **Text Classification** task. However, we will learn all the important concepts that we need to handle more difficult tasks in the future.
17
-
18
-
## Learning objectives
19
-
- Understand how text is processed for NLP tasks
20
-
- Learn about Recurrent Neural Networks (RNNs) and Generative Neural Networks (GNNs)
21
-
- Learn about Attention Mechanisms
22
-
- Learn how to build text classification models
23
-
24
-
## Prerequisites
25
-
- Knowledge of Python
26
-
- Basic understanding of machine learning
16
+
In this module, we'll mostly focus on the **Text Classification** task. However, we'll learn all the important concepts that we need to handle more difficult tasks in the future.
0 commit comments