MicrosoftDocs
diff --git a/‎learn-pr/tensorflow/intro-natural-language-processing-tensorflow/1-introduction.yml‎
Lines changed: 1 addition & 2 deletions b/‎learn-pr/tensorflow/intro-natural-language-processing-tensorflow/1-introduction.yml‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎learn-pr/tensorflow/intro-natural-language-processing-tensorflow/2-represent-text-as-tensors.yml‎
Lines changed: 4 additions & 5 deletions b/‎learn-pr/tensorflow/intro-natural-language-processing-tensorflow/2-represent-text-as-tensors.yml‎
Lines changed: 4 additions & 5 deletions
diff --git a/‎learn-pr/tensorflow/intro-natural-language-processing-tensorflow/3-embeddings.yml‎
Lines changed: 3 additions & 4 deletions b/‎learn-pr/tensorflow/intro-natural-language-processing-tensorflow/3-embeddings.yml‎
Lines changed: 3 additions & 4 deletions
diff --git a/‎learn-pr/tensorflow/intro-natural-language-processing-tensorflow/4-recurrent-networks.yml‎
Lines changed: 3 additions & 4 deletions b/‎learn-pr/tensorflow/intro-natural-language-processing-tensorflow/4-recurrent-networks.yml‎
Lines changed: 3 additions & 4 deletions
diff --git a/‎learn-pr/tensorflow/intro-natural-language-processing-tensorflow/5-generative-networks.yml‎
Lines changed: 3 additions & 4 deletions b/‎learn-pr/tensorflow/intro-natural-language-processing-tensorflow/5-generative-networks.yml‎
Lines changed: 3 additions & 4 deletions
diff --git a/‎learn-pr/tensorflow/intro-natural-language-processing-tensorflow/6-knowledge-check.yml‎
Lines changed: 27 additions & 19 deletions b/‎learn-pr/tensorflow/intro-natural-language-processing-tensorflow/6-knowledge-check.yml‎
Lines changed: 27 additions & 19 deletions
diff --git a/‎learn-pr/tensorflow/intro-natural-language-processing-tensorflow/7-summary.yml‎
Lines changed: 1 addition & 2 deletions b/‎learn-pr/tensorflow/intro-natural-language-processing-tensorflow/7-summary.yml‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎learn-pr/tensorflow/intro-natural-language-processing-tensorflow/includes/1-introduction.md‎
Lines changed: 10 additions & 20 deletions b/‎learn-pr/tensorflow/intro-natural-language-processing-tensorflow/includes/1-introduction.md‎
Lines changed: 10 additions & 20 deletions
@@ -6,9 +6,8 @@ metadata:
   description: Introduction to natural language processing with TensorFlow
   author: Orin-Thomas
   ms.author: orthomas
-  ms.date: 07/07/2021
+  ms.date: 03/29/2026
   ms.topic: unit
-  ms.custom: team=nextgen
 durationInMinutes: 1
 content: |
   [!include[](includes/1-introduction.md)]
@@ -3,12 +3,11 @@ uid: learn.tensorflow.intro-natural-language-processing.representing-text
 title: Representing text as Tensors
 metadata:
   title: Representing text as Tensors
-  description: In this unit, we discuss different ways text can be represented as tensors and solve a simple text classification problem.
+  description: In this unit, we discuss different ways text can be represented as tensors and solve a text classification problem.
   author: Orin-Thomas
   ms.author: orthomas
-  ms.date: 07/07/2021
+  ms.date: 03/29/2026
   ms.topic: unit
-  ms.custom: team=nextgen
 durationInMinutes: 10
-sandbox: true
-notebook: notebooks/2-represent-text-as-tensors.ipynb
+content: |
+  [!include[](includes/2-represent-text-as-tensors.md)]
@@ -6,9 +6,8 @@ metadata:
   description: Embeddings are a way to represent words using some vector representation that has nice semantic properties. We discuss different embeddings, and how using embeddings can improve classification accuracy.
   author: Orin-Thomas
   ms.author: orthomas
-  ms.date: 07/07/2021
+  ms.date: 03/29/2026
   ms.topic: unit
-  ms.custom: team=nextgen
 durationInMinutes: 15
-sandbox: true
-notebook: notebooks/3-embeddings.ipynb
+content: |
+  [!include[](includes/3-embeddings.md)]
@@ -6,9 +6,8 @@ metadata:
   description: While traditional fully connected networks don't allow us to capture word order, RNN is a mechanism that can capture patterns in sequences. We show how to use RNN for text classification, and discuss different RNN architectures, such as LSTM and GRU.
   author: Orin-Thomas
   ms.author: orthomas
-  ms.date: 07/07/2021
+  ms.date: 03/29/2026
   ms.topic: unit
-  ms.custom: team=nextgen
 durationInMinutes: 15
-sandbox: true
-notebook: notebooks/4-recurrent-networks.ipynb
+content: |
+  [!include[](includes/4-recurrent-networks.md)]
@@ -6,9 +6,8 @@ metadata:
   description: In this unit, we learn how to use recurrent networks to generate text, by using the output at each network layer.
   author: Orin-Thomas
   ms.author: orthomas
-  ms.date: 07/07/2021
+  ms.date: 03/29/2026
   ms.topic: unit
-  ms.custom: team=nextgen
 durationInMinutes: 15
-sandbox: true
-notebook: notebooks/5-generative-networks.ipynb
+content: |
+  [!include[](includes/5-generative-networks.md)]
@@ -4,23 +4,22 @@ title: Module assessment
 metadata:
   title: Module assessment
   description: Check your knowledge
-  ms.date: 07/07/2021
+  ms.date: 03/29/2026
   author: Orin-Thomas
   ms.author: orthomas
   ms.topic: unit
-  ms.custom: team=nextgen
   module_assessment: true
 durationInMinutes: 5
 quiz:
   questions:
-    - content: "Suppose your text corpus contains 80,000 different words. Which of the below would you complete reducing the dimensionality of the input vector to neural classifier?"
+    - content: "Suppose your text corpus contains 80,000 different words. Which of the following would help reduce the dimensionality of the input vector to a neural classifier?"
       choices:
         - content: "Randomly select 10% of the words and ignore the rest."
           isCorrect: false
           explanation: "It's definitely not a good idea, especially because you risk omitting semantically important words"
         - content: "Use convolutional layer before fully connected classifier layer"
           isCorrect: false
-          explanation: "Convolutional layers don't reduce the dimensionality of input vectors"
+          explanation: "Convolutional layers extract spatial features from an already-encoded input, but they don't reduce the vocabulary dimension itself. An embedding layer is the standard approach to map sparse high-dimensional word representations into dense low-dimensional vectors."
         - content: "Use embedding layer before fully connected classifier layer"
           isCorrect: true
           explanation: "This is correct"
@@ -45,32 +44,41 @@ quiz:
       choices:
         - content: "A network is applied for each input element and output from the previous application is passed to the next one"
           isCorrect: true
-          explanation: "This is correct."
+          explanation: "This is correct. The same network weights are applied at each time step, and the hidden state from the previous step is passed as input to the next, creating a recurrence."
         - content: "It's trained by a recurrent process"
           isCorrect: false
-          explanation: "Recurrent neural network is trained in the same manner as any other neural network"
-        - content: "It consists of layers which include other subnetworks"
+          explanation: "Recurrent neural networks are trained using backpropagation through time, but that is the training algorithm, not the reason they're called recurrent."
+        - content: "It consists of layers, which include other subnetworks"
           isCorrect: false
-          explanation: "While you can consider recurrent block to be a combination of two linear layers, it has nothing to do with recurrence"
+          explanation: "While you can consider a recurrent block to be a combination of two linear layers, nesting subnetworks has nothing to do with recurrence."
+        - content: "The network processes the entire input multiple times in repeated passes"
+          isCorrect: false
+          explanation: "An RNN processes the input sequence once, stepping through it one token at a time. The recurrence refers to passing state between steps, not revisiting the entire input."
     - content: "What is the main idea behind LSTM network architecture?"
       choices:
         - content: "Fixed number of LSTM blocks for the whole dataset"
           isCorrect: false
-          explanation: "Number of LSTM blocks depend on the sequence length in the minibatch"
+          explanation: "The number of LSTM blocks depends on the sequence length in the minibatch, not on the dataset as a whole."
         - content: "It contains many layers of recurrent neural networks"
           isCorrect: false
-          explanation: "LSTM can consist of one or more levels"
-        - content: "Explicit state management with forgetting and state triggering"
+          explanation: "An LSTM can consist of one or more layers. The defining feature of an LSTM is its gating mechanism, not the number of layers."
+        - content: "LSTMs use gating mechanisms (forget, input, and output gates) that explicitly control which information is retained or discarded across time steps"
           isCorrect: true
-          explanation: "In LSTM, each block receives and outputs a state, which is manipulated upon inside the block depending on input and previous state."
-    - content: "What is the main idea of attention?"
+          explanation: "Correct. LSTM gates solve the vanishing gradient problem found in simple RNNs by allowing the network to selectively retain or discard information in its cell state across many time steps."
+        - content: "LSTMs use a larger hidden state vector than simple RNNs"
+          isCorrect: false
+          explanation: "The hidden state size is a hyperparameter that can be set to any value for both simple RNNs and LSTMs. The key innovation of LSTMs is the gating mechanism, not the size of the hidden state."
+    - content: "What is the main advantage of using TF-IDF representation over a simple bag-of-words representation?"
       choices:
-        - content: "Attention assigns a weight coefficient to each word in the vocabulary to show how important it's"
+        - content: "TF-IDF captures the order of words in a sentence"
           isCorrect: false
-          explanation: "Not correct. Attention works inside each sentence, and reflects relative importance between words."
-        - content: "Attention is a network layer that uses attention matrix to see how much input states from each step affect the final result."
+          explanation: "Neither bag-of-words nor TF-IDF captures word order. Both represent documents as unordered collections of word weights."
+        - content: "TF-IDF gives higher weight to words that are more important for distinguishing documents, by down-weighting common words"
           isCorrect: true
-          explanation: "Correct. By looking at attention matrix we can visually estimate which words play more important role in different parts of the sentence."
-        - content: "Attention builds global correlation matrix between all words in vocabulary, showing their cooccurrence"
+          explanation: "Correct. TF-IDF reduces the weight of frequently occurring words (like 'the' and 'a') and increases the weight of words that are distinctive to specific documents."
+        - content: "TF-IDF uses neural networks to learn word importance"
+          isCorrect: false
+          explanation: "TF-IDF is a purely statistical method based on term frequency and document frequency. It doesn't involve any neural network training."
+        - content: "TF-IDF produces lower-dimensional vectors than bag-of-words"
           isCorrect: false
-          explanation: "This isn't correct, attention computer relative importance of words inside each sentence."
+          explanation: "TF-IDF vectors have the same dimensionality as bag-of-words vectors (one element per vocabulary term). The difference is that TF-IDF assigns floating-point weights instead of simple counts."
@@ -6,9 +6,8 @@ metadata:
   description: In this final unit, we summarize what we have learned and what should you focus on next if you want to continue your journey into NLP.
   author: Orin-Thomas
   ms.author: orthomas
-  ms.date: 07/07/2021
+  ms.date: 03/29/2026
   ms.topic: unit
-  ms.custom: team=nextgen
 durationInMinutes: 2
 content: |
   [!include[](includes/7-summary.md)]
@@ -1,26 +1,16 @@
-In this module, we will explore different neural network architectures for dealing with natural language text. In recent years, **Natural Language Processing** (NLP) has experienced fast growth as a field, both because of improvements to the language model architectures and because they've been trained on increasingly large text corpora. As a result, their ability to "understand" text has vastly improved, and large pre-trained models such as BERT have become widely used.
+In this module, we explore different neural network architectures for dealing with natural language text. In recent years, **Natural Language Processing** (NLP) has experienced fast growth as a field, both because of improvements to the language model architectures and because they've been trained on increasingly large text corpora. As a result, their ability to "understand" text has vastly improved.
 
-We will focus on the fundamental aspects of representing NLP as tensors in TensorFlow, and on classical NLP architectures, such as using bag-of-words, embeddings and recurrent neural networks.
+We focus on the fundamental aspects of representing NLP as tensors in TensorFlow, and on classical NLP architectures, such as using bag-of-words, embeddings, and recurrent neural networks.
 
-## Natural Language Tasks
+## Natural language tasks
 
 There are several NLP tasks that we can solve using neural networks:
-* **Text Classification** is used when we need to classify a text fragment into one of several predefined classes. Examples include e-mail spam detection, news categorization, assigning a support request to a category, and more. 
-* **Intent Classification** is one specific case of text classification, where we want to map an input utterance in the conversational AI system into one of the intents that represent the actual meaning of the phrase, or intent of the user. 
-* **Sentiment Analysis** is a regression task, where we want to understand the degree of positivity of a given piece of text. We may want to label text in a dataset from most negative (-1) to most positive (+1), and train a model that will output a number representing the positivity of the input text.
-* **Named Entity Recognition** (NER) is the task of extracting entities from text, such as dates, addresses, people names, etc. Together with intent classification, NER is often used in dialog systems to extract parameters from the user's utterance. 
-* A similar task of **Keyword Extraction** can be used to find the most meaningful words inside a text, which can then be used as tags.    
-* **Text Summarization** extracts the most meaningful pieces of text, giving the user a compressed version of the original text. 
+* **Text Classification** is used when we need to classify a text fragment into one of several predefined classes. Examples include e-mail spam detection, news categorization, assigning a support request to a category, and more.
+* **Intent Classification** is one specific case of text classification, where we want to map an input utterance in the conversational AI system into one of the intents that represent the actual meaning of the phrase, or intent of the user.
+* **Sentiment Analysis** is the task of understanding the degree of positivity of a given piece of text. It can be approached as a classification task (for example, labeling text as positive, negative, or neutral) or as a regression task, where we label text from most negative (-1) to most positive (+1) and train a model that outputs a number representing the positivity of the input text.
+* **Named Entity Recognition** (NER) is the task of extracting entities from text, such as dates, addresses, people names, etc. Together with intent classification, NER is often used in dialog systems to extract parameters from the user's utterance.
+* A similar task of **Keyword Extraction** can be used to find the most meaningful words inside a text, which can then be used as tags.
+* **Text Summarization** extracts the most meaningful pieces of text, giving the user a compressed version of the original text.
 * **Question Answering** is the task of extracting an answer from a piece of text. This model takes a text fragment and a question as input, and finds the exact place within the text that contains the answer. For example, the text "*John is a 22 year old student who loves to use Microsoft Learn*", and the question *How old is John* should provide us with the answer *22*.
 
-In this module, we will mostly focus on the **Text Classification** task. However, we will learn all the important concepts that we need to handle more difficult tasks in the future.
-
-## Learning objectives
-- Understand how text is processed for NLP tasks
-- Learn about Recurrent Neural Networks (RNNs) and Generative Neural Networks (GNNs)
-- Learn about Attention Mechanisms
-- Learn how to build text classification models
-
-## Prerequisites
-- Knowledge of Python
-- Basic understanding of machine learning
+In this module, we'll mostly focus on the **Text Classification** task. However, we'll learn all the important concepts that we need to handle more difficult tasks in the future.