Skip to content

Commit 3029b41

Browse files
authored
Merge pull request #310609 from paulth1/de-identification-service-articles
[AQ] edit pass: De identification service articles
2 parents 4801e68 + 1fb7e47 commit 3029b41

4 files changed

Lines changed: 278 additions & 232 deletions

File tree

Lines changed: 54 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -1,121 +1,110 @@
11
---
2-
title: Overview of the De-identification service in Azure Health Data Services
3-
description: Learn how the De-identification service in Azure Health Data Services de-identifies clinical data, adhering to HIPAA compliance while retaining data relevance for research and analytics.
4-
author: kimiamavon-msft
2+
title: Overview of the De-identification Service in Azure Health Data Services
3+
description: Learn how the de-identification service in Azure Health Data Services de-identifies clinical data, adhering to HIPAA compliance while retaining data relevance for research and analytics.
4+
author: LeaKass
55
ms.service: azure-health-data-services
66
ms.subservice: deidentification-service
77
ms.topic: overview
88
ms.date: 10/24/2025
9-
ms.author: kimiamavon
9+
ms.author: leakassab
1010
---
1111

1212
# What is the de-identification service?
1313

14-
![Tag Redact and Surrogation operations.](tag-redact-surrogate-operations.png)
15-
14+
![Screenshot that shows Tag, Redact, and Surrogate operations.](tag-redact-surrogate-operations.png)
1615

1716
The de-identification service in Azure Health Data Services enables healthcare organizations to de-identify clinical data in [multiple languages](languages-supported.md) so that the resulting data retains its clinical relevance and distribution while also adhering to the:
18-
- Health Insurance Portability and Accountability Act of 1996 (HIPAA) Privacy Rule
19-
- unlinked pseudonymization principle under General Data Protection Regulation
2017

21-
The service uses state-of-the-art machine learning models to automatically extract, redact, or surrogate 27 entities (*including the HIPAA 18 Protected Health Information (PHI) identifiers*) from unstructured text such as clinical notes, transcripts, messages, or clinical trial studies.
18+
- Health Insurance Portability and Accountability Act of 1996 (HIPAA) privacy rule.
19+
- Unlinked pseudonymization principle under the General Data Protection Regulation (GDPR).
20+
21+
The service uses state-of-the-art machine learning models to automatically extract, redact, or surrogate 27 entities. These entities include the *18 HIPAA protected health information (PHI) identifiers*. The entities come from unstructured text such as clinical notes, transcripts, messages, or clinical trial studies.
2222

23-
## How do you benefit from de-identifying your data?
23+
## How do you benefit from de-identifying your data?
2424

25-
| As a | AHDS De-identification enables you to |
25+
| As a: | Health Data Services de-identification enables you to: |
2626
|-------------------------|----------------------------------------------------------------------------------------------------------|
27-
| Data Scientist | Use de-identified data to train robust machine learning models, build conversational agents, and conduct longitudinal studies. |
28-
| Data Analyst | Monitor trends, build dashboards, and analyze outcomes without compromising privacy. |
29-
| Data Engineer | Build and test dev environments using realistic, non-identifiable data for safer deployment. |
30-
| Customer Service Agent | Summarize support conversations and extract insights while maintaining patient confidentiality. |
31-
| Executive Leader (C-Suite) | Reduce risks of data exposure, enable secure data sharing, drive AI adoption responsibly, and ensure regulatory compliance. |
32-
| Regulatory & Compliance Officer | Ensure data handling aligns with HIPAA Safe Harbor and GDPR pseudonymization standards across multiple languages and geographies. |
27+
| Data scientist | Use de-identified data to train robust machine learning models, build conversational agents, and conduct longitudinal studies. |
28+
| Data analyst | Monitor trends, build dashboards, and analyze outcomes without compromising privacy. |
29+
| Data engineer | Build and test development environments by using realistic, nonidentifiable data for safer deployment. |
30+
| Customer service agent | Summarize support conversations and extract insights while maintaining patient confidentiality. |
31+
| Executive leader (C-suite) | Reduce risks of data exposure, enable secure data sharing, drive AI adoption responsibly, and ensure regulatory compliance. |
32+
| Regulatory and compliance officer | Ensure that data handling aligns with HIPAA Safe Harbor and GDPR pseudonymization standards across multiple languages and geographies. |
3333

34-
## Why is this service the right fit for your use case?
34+
## Why is this service the right fit for your use case?
3535

36-
The de-identification service unlocks the power of your data by automating three operations:
36+
The de-identification service unlocks the power of your data by automating three operations:
3737

38-
- **TAG** identifies and Tags PHI in your clinical text, specifying the entity types (i.e. Patient Name, Doctor Name, Age, etc.)
39-
- **REDACT** replaces the identified PHI in your clinical text with the entity types
40-
- **SURROGATE** replaces the identified PHI in your clinical text with realistic pseudonyms (names of people, organizations, hospitals) and randomizes number based PHI (dates and alphanumeric entities such as ID Numbers and more)
38+
- `TAG`: Identifies and tags PHI in your clinical text. It specifies entity types like patient name, doctor name, and age.
39+
- `REDACT`: Replaces the identified PHI in your clinical text with entity types.
40+
- `SURROGATE`: Replaces the identified PHI in your clinical text with realistic pseudonyms like names of people, organizations, and hospitals. It randomizes number-based PHI like dates and alphanumeric entities such as ID numbers.
4141

4242
> [!TIP]
43-
> **Surrogation**, or synthetic replacement, is a best practice for PHI protection. The service can replace PHI elements with plausible replacement values, resulting in data that is most representative of the source data. Surrogation strengthens privacy protections as any false-negative PHI values are hidden within a document.
43+
> *Surrogation*, or synthetic replacement, is a best practice for PHI protection. The service can replace PHI elements with plausible replacement values, which results in data that represents the source data most accurately. Surrogation strengthens privacy protections if any false-negative PHI values are hidden within a document.
4444
45-
### **Consistent replacement to preserve patient timelines**
46-
Consistent surrogation results enable organizations to retain relationships occurring in the underlying dataset, which is critical for research, analytics, and machine learning. By submitting data in the same batch, our service allows for consistent replacement across entities and preserves the relative temporal relationships between events.
45+
### Consistent replacement to preserve patient timelines
4746

48-
![Screenshot of consistent surrogation for English.](consistent-surrogation.png)
47+
Consistent surrogation results enable organizations to retain relationships that occur in the underlying dataset, which is critical for research, analytics, and machine learning. By submitting data in the same batch, Health Data Services allows for consistent replacement across entities and preserves the relative temporal relationships between events.
48+
49+
![Screenshot that shows consistent surrogation for English.](consistent-surrogation.png)
4950

5051
## De-identify clinical data securely and efficiently
5152

5253
The de-identification service offers many benefits, including:
5354

54-
- **Expanded PHI coverage:**
55-
The service expands beyond the 18 HIPAA Identifiers to provide stronger privacy protections and more fine-grained distinctions between entity types. It distinguishes between Doctor and Patient, and covers [27 PHI entities the service de-identifies](/rest/api/health-dataplane/deidentify-text/deidentify-text#phicategory).
56-
57-
- **PHI compliance**: The de-identification service is designed for protected health information (PHI). The service uses machine learning to identify PHI entities, including HIPAA’s 18 identifiers, using the “TAG” operation. The redaction and surrogation operations replace these identified PHI values with a tag of the entity type or a surrogate, or pseudonym. The service supports compliance requirements such as HIPAA and GDPR principles.
58-
59-
- **Security**: The de-identification service is a stateless service. Customer data stays within the customer’s tenant.
60-
61-
- **Role-based Access Control (RBAC)**: Azure role-based access control (RBAC) enables you to manage how your organization's data is processed, stored, and accessed. You determine who has access to de-identify datasets based on roles you define for your environment.
62-
63-
## Easy API Integration Into Your Workflow
55+
- **Expanded PHI coverage**: The service expands beyond the 18 HIPAA identifiers to provide stronger privacy protections and more fine-grained distinctions between entity types. It distinguishes between doctor and patient and covers [27 PHI entities that the service de-identifies](/rest/api/health-dataplane/deidentify-text/deidentify-text#phicategory).
56+
- **PHI compliance**: The de-identification service is designed for PHI. The service uses machine learning to identify PHI entities, including HIPAA's 18 identifiers, by using the `TAG` operation. The redaction and surrogation operations replace these identified PHI values with a tag of the entity type or a surrogate or pseudonym. The service supports compliance requirements such as HIPAA and GDPR principles.
57+
- **Security**: The de-identification service is a stateless service. Customer data stays within the customer's tenant.
58+
- **Role-based access control (RBAC)**: Azure RBAC enables you to manage how your organization's data is processed, stored, and accessed. You determine who has access to de-identify datasets based on roles that you define for your environment.
6459

65-
![API Integration Workflow](workflow.png)
60+
## Easy API integration into your workflow
6661

67-
Integrating Azure’s de-identification service into your environment is fast, flexible, and secure — built from the ground up to support health and life sciences workflows with minimal effort.
62+
![Screenshot that shows the API integration workflow.](workflow.png)
6863

69-
- **API-First Design:** Whether you need real-time de-identification or asynchronous batch processing from Azure Blob Storage, our REST API and SDKs provide easy integration points to fit your system.
64+
Integrating the Azure de-identification service into your environment is fast, flexible, and secure. The service is built to support health and life sciences workflows with minimal effort.
7065

71-
- **Quick Setup:** Deploy the service in minutes using Azure portal, ARM templates, Bicep, or CLI. You can be up and running quickly without complex configuration.
72-
73-
- **Secure Access:** Enable private endpoints using Azure Private Link to keep data traffic off the public internet.
74-
75-
- **Fully Managed Identity Support:** Use managed identities for secure, credential-free access to Azure Blob Storage.
76-
77-
- **Compliance-Ready:** The service operates within your Azure tenant and adheres with HIPAA.
66+
- **API-first design**: Determine whether you need real-time de-identification or asynchronous batch processing from Azure Blob Storage. The REST API and SDKs provide easy integration points to fit your system.
67+
- **Quick setup**: Deploy the service in minutes by using the Azure portal, Azure Resource Manager templates, Bicep, or the Azure CLI. You can be up and running quickly without complex configuration.
68+
- **Secure access**: Enable private endpoints by using Azure Private Link to keep data traffic off the public internet.
69+
- **Fully managed identity support**: Use managed identities for secure, credential-free access to Azure Blob Storage.
70+
- **Compliance-ready**: Operate the service within your Azure tenant and to adhere with HIPAA.
7871

7972
## Synchronous or asynchronous endpoints
8073

81-
The de-identification service offers two ways to interact with the REST API or Client library (Azure SDK).
74+
The de-identification service offers two ways to interact with the REST API or client library (Azure SDK):
8275

8376
- Directly submit raw unstructured text for analysis. The API output is returned in your application.
84-
- Submit a job to asynchronously endpoint process files in bulk from Azure Blob Storage using tag, redact, or surrogation with consistency within a job.
77+
- Submit a job for asynchronous endpoint processing of files in bulk from Blob Storage by using tag, redact, or surrogation with consistency within a job.
8578

8679
## Input requirements and service limits
8780

88-
The de-identification service is designed to receive unstructured text. To de-identify data stored in the FHIR® service, see [Export de-identified data](/azure/healthcare-apis/fhir/deidentified-export).
81+
The de-identification service is designed to receive unstructured text. To de-identify data stored in the Fast Healthcare Interoperability Resources service, see [Export de-identified data](/azure/healthcare-apis/fhir/deidentified-export).
82+
83+
The following service limits apply:
8984

90-
The following service limits are applicable:
9185
- Requests can't exceed 50 KB.
9286
- Jobs can process no more than 10,000 documents.
9387
- Each document processed by a job can't exceed 2 MB.
94-
- Requests are throttled if you exceed 1 MB per 5 seconds or 100 requests per 5 seconds.<sup>1</sup>
95-
96-
<sup>1</sup> If your use case requires higher throughput, please submit a support request for consideration.
88+
- Requests are throttled if you exceed 1 MB per 5 seconds or 100 requests per 5 seconds. If your use case requires higher throughput, submit a support request for consideration.
9789

9890
## Pricing
9991

100-
The de-identification service pricing is dependent on the amount of data de-identified by our service.
101-
You are charged per MB, for any of the three operations we offer, whether you are using the asynchronous or synchronous endpoint.
92+
The de-identification service pricing depends on the amount of data de-identified by Health Data Services. You're charged per MB for any of the three operations that are offered, whether you use the asynchronous or synchronous endpoint.
10293

103-
The cost per MB de-identified is displayed in the row "Unstructured De-identification" in the table "Transformation Operations" in the [Azure Pricing Page](https://azure.microsoft.com/pricing/details/health-data-services/?msockid=2982a916bc2461731022bd6cbdbd6053#pricing)
104-
105-
You also have a monthly allotment of 50 MB that enables you to try the product for free.
94+
The cost per MB de-identified appears in the **Unstructured De-identification** row in the **Transformation Operations (per MB)** table on the [Azure Health Data Services pricing webpage](https://azure.microsoft.com/pricing/details/health-data-services/?msockid=2982a916bc2461731022bd6cbdbd6053#pricing).
10695

107-
The [Azure Pricing Calculator](https://azure.microsoft.com/pricing/calculator/) helps you estimate the cost based on your use case.
96+
You also have a monthly allotment of 50 MB so that you can try the product for free.
10897

109-
When you choose to store documents in Azure Blob Storage, you are charged based on Azure Storage pricing.
98+
The [Azure Pricing Calculator](https://azure.microsoft.com/pricing/calculator/) helps you estimate the cost based on your use case.
11099

111-
## Responsible use of AI
100+
When you choose to store documents in Blob Storage, charges are based on Azure Storage pricing.
112101

113-
An AI system includes the technology, the people who use it, the people affected by it, and the environment where you deploy it. Read the transparency note for the de-identification service to learn about responsible AI use and deployment in your systems.
102+
## Responsible use of AI
114103

115-
## Next steps
104+
An AI system includes the technology, the people who use it, the people affected by it, and the environment where you deploy it. To learn about responsible AI use and deployment in your systems, read the Transparency Note for the de-identification service.
116105

117-
> [!div class="nextstepaction"]
118-
> [Quickstart: Deploy the de-identification service](quickstart.md)
106+
## Related content
119107

108+
- [Quickstart: Deploy the de-identification service](quickstart.md)
120109
- [Integration and responsible use](/legal/cognitive-services/language-service/guidance-integration-responsible-use?context=%2Fazure%2Fai-services%2Flanguage-service%2Fcontext%2Fcontext)
121110
- [Data, privacy, and security](/legal/cognitive-services/language-service/data-privacy?context=%2Fazure%2Fai-services%2Flanguage-service%2Fcontext%2Fcontext)

0 commit comments

Comments
 (0)