Skip to content

Commit 165c172

Browse files
Merge pull request #54079 from GraemeMalcolm/main
Updates to include language detection and PII
2 parents 86783f0 + 57a9f0c commit 165c172

2 files changed

Lines changed: 6 additions & 2 deletions

File tree

learn-pr/wwl-data-ai/get-started-ai-fundamentals/includes/5-natural-language-processing.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,13 @@
1414

1515
Natural language processing (NLP) is a broad term that covers AI models and techniques for making sense of language. NLP is the foundation on which generative AI large language models (LLMs) are built.
1616

17-
While many natural language processing scenarios are handled by generative AI models today, there are common text analysis use cases where simpler NLP language models can be more cost-effective.
17+
While many natural language processing scenarios are handled by generative AI models today, there are common text analysis use cases where specialist NLP tools are used to produce predictable results or apply custom rules.
1818

1919
![Diagram of text being analyzed for sentiment, keywords, and summarization.](../media/text-analysis.png)
2020

21+
- *Language detection* - determining which language (or languages) a document is written in. Language detection is often the first step in a multi-stage text processing workflow.
2122
- *Text classification* - assigning document to a specific category; including *sentiment analysis* to determine whether a body of text is positive, negative, or neutral.
22-
- *Key-term extraction* and *entity detection* - identifying key words or phrases in a document, and finding mentions of entities like people, places, organizations.
23+
- *Key-term extraction* and *entity detection* - identifying key words or phrases in a document, and finding mentions of entities like people, places, and organizations. A particularly specialized form of entity detection is to detect and redact *personally identifiable information (PII)*; such as names, addresses, telephone numbers, and other private details.
2324
- *Summarization* - Reducing the volume of text while still encapsulating the main points.
2425

2526
## Text analysis scenarios
@@ -29,5 +30,6 @@ Common uses of NLP technologies for text analysis include:
2930
- Analyzing document or transcripts of calls and meetings to determine key subjects and identify specific mentions of people, places, organizations, products, or other entities.
3031
- Analyzing social media posts, product reviews, or articles to evaluate sentiment and opinion.
3132
- Implementing chatbots that can answer frequently asked questions or orchestrate predictable conversational dialogs that don't require the complexity of generative AI.
33+
- Redacting PII before sharing or analyzing data to comply with privacy policies and legislation.
3234

3335
::: zone-end

learn-pr/wwl-data-ai/introduction-language/includes/1-introduction.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,10 @@ Within artificial intelligence (AI), text analysis is a subset of natural langua
1010

1111
Techniques to process and analyze text evolved over many years, from simple statistical calculations based on term-frequency to vector-based language models that encapsulate semantic meaning. Some common use cases for text analysis include:
1212

13+
- **Language detection**: Determining the language (or languages) in which text is written - often as the first step in a multi-step text processing workflow.
1314
- **Key term extraction**: Identifying important words and phrases in text, to help determine the topics and themes it discusses.
1415
- **Entity detection**: Identifying named entities mentioned in text; for example, places, people, dates, and organizations.
16+
- **Personally identifiable information (PII) detection**: Identifying and redacting personal details in text, such as names, addresses, telephone numbers, financial account details, and other sensitive information.
1517
- **Text classification**: Categorizing text documents based on their contents. For example, filtering email as *spam* or *not spam*.
1618
- **Sentiment analysis**: A particular form of text classification that predicts the *sentiment* of text - for example, categorizing social media posts as *positive*, *neutral*, or *negative*.
1719
- **Text summarization**: Reducing the volume of text while retaining its salient points. For example, generating a short one-paragraph summary from a multi-page document.

0 commit comments

Comments
 (0)