You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: microsoft-365/compliance/create-a-custom-sensitive-information-type-in-scc-powershell.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ ms.author: chrfox
6
6
author: chrfox
7
7
manager: laurawi
8
8
audience: Admin
9
-
ms.article: article
9
+
ms.topic: article
10
10
ms.service: O365-seccomp
11
11
ms.localizationpriority: medium
12
12
ms.collection:
@@ -230,7 +230,7 @@ Sensitive information types can also use built-in functions to identify corrobor
230
230
231
231
For example, an employee ID badge has a hire date on it, so this custom entity can use the built-in `Func_us_date` function to identify a date in the format that's commonly used in the US.
232
232
233
-
For more information, see [What the DLP functions look for](what-the-dlp-functions-look-for.md).
233
+
For more information, see [Sensitive information type functions](sit-functions.md).
234
234
235
235

236
236
@@ -320,7 +320,7 @@ In addition to confidenceLevel for each Pattern, the Entity has a recommendedCon
320
320
321
321
## Do you want to support other languages in the UI of the Compliance center? [LocalizedStrings element]
322
322
323
-
If your compliance team uses the Microsoft 365 Compliance center to create polices policies in different locales and in different languages, you can provide localized versions of the name and description of your custom sensitive information type. When your compliance team uses Microsoft 365 in a language that you support, they'll see the localized name in the UI.
323
+
If your compliance team uses the Microsoft 365 Compliance center to create policies in different locales and in different languages, you can provide localized versions of the name and description of your custom sensitive information type. When your compliance team uses Microsoft 365 in a language that you support, they'll see the localized name in the UI.
324
324
325
325

326
326
@@ -907,4 +907,4 @@ You can copy this markup, save it as an XSD file, and use it to validate your ru
907
907
908
908
-[Learn about data loss prevention](dlp-learn-about-dlp.md)
909
909
-[Sensitive information type entity definitions](sensitive-information-type-entity-definitions.md)
910
-
-[What the DLP functions look for](what-the-dlp-functions-look-for.md)
910
+
-[Sensitive information type functions](sit-functions.md)
title: "Get started with custom sensitive information types"
2
+
title: "Create a custom sensitive information types"
3
3
f1.keywords:
4
4
- NOCSH
5
5
ms.author: chrfox
@@ -15,10 +15,10 @@ ms.collection:
15
15
search.appverid:
16
16
- MOE150
17
17
- MET150
18
-
description: "Learn how to create, modify, remove, and test custom sensitive information types for DLP in the Security & Compliance Center."
18
+
description: "Learn how to create, modify, remove, and test custom sensitive information types in the Compliance Center."
19
19
ms.custom: seo-marvel-apr2020
20
20
---
21
-
# Get started with custom sensitive information types
21
+
# Create custom sensitive information types in the Compliance center
22
22
23
23
If the pre-configured sensitive information types don't meet your needs, you can create your own custom sensitive information types that you fully define or you can copy one of the pre-configured ones and modify it.
24
24
@@ -36,7 +36,7 @@ There are two ways to create a new sensitive information type:
36
36
-[regular expressions](https://www.boost.org/doc/libs/1_68_0/libs/regex/doc/html/) - Microsoft 365 sensitive information types uses the Boost.RegEx 5.1.3 engine
37
37
- keyword lists - you can create your own as you define your sensitive information type or choose from existing keyword lists
- You must have Global admin or Compliance admin permissions to create, test, and deploy a custom sensitive information type through the UI. See [About admin roles](/office365/admin/add-users/about-admin-roles) in Office 365.
@@ -59,13 +59,13 @@ Use this procedure to create a new sensitive information type that you fully def
59
59
60
60
4. Choose the default confidence level for the pattern. The values are **Low confidence**, **Medium confidence**, and **High confidence**.
61
61
62
-
5. Choose and define **Primary element**. The primary element can be a **Regular expression** with an optional validator, a **Keyword list**, a **Keyword dictionary**, or one of the pre-configured **Functions**. For more information on DLP functions, see [What the DLP functions look for](what-the-dlp-functions-look-for.md). For more information on the date and the checksum validators, see [More information on regular expression validators](#more-information-on-regular-expression-validators).
62
+
5. Choose and define **Primary element**. The primary element can be a **Regular expression** with an optional validator, a **Keyword list**, a **Keyword dictionary**, or one of the pre-configured **Functions**. For more information on DLP functions, see [Sensitive information type functions](sit-functions.md). For more information on the date and the checksum validators, see [Sensitive Information Type regular expression validators](sit-regex-validators-additional-checks.md#sensitive-information-type-regular-expression-validators).
63
63
64
64
6. Fill in a value for **Character proximity**.
65
65
66
66
7. (Optional) Add supporting elements if you have any. Supporting elements can be a regular expression with an optional validator, a keyword list, a keyword dictionary or one of the pre-defined functions. Supporting elements can have their own **Character proximity** configuration.
67
67
68
-
8. (Optional) Add any [**additional checks**](#more-information-on-additional-checks) from the list of available checks.
68
+
8. (Optional) Add any [**additional checks**](sit-regex-validators-additional-checks.md#sensitive-information-type-additional-checks) from the list of available checks.
69
69
70
70
9. Choose **Create**.
71
71
@@ -84,6 +84,22 @@ Use this procedure to create a new sensitive information type that you fully def
84
84
85
85
Use this procedure to create a new sensitive information type that is based on an existing sensitive information type.
86
86
87
+
> [!NOTE]
88
+
> These SITs can't be copied:
89
+
> - Canada driver's license number
90
+
> - EU driver's license number
91
+
> - EU national identification number
92
+
> - EU passport number
93
+
> - EU social security number or equivalent identification
94
+
> - EU tax identification number
95
+
> - International classification of diseases (ICD-10-CM)
96
+
> - International classification of diseases (ICD-9-CM)
97
+
> - U.S. driver's license number
98
+
99
+
You can also create custom sensitive information types by using PowerShell and Exact Data Match capabilities. To learn more about those methods, see:
100
+
-[Create a custom sensitive information type in Security & Compliance Center PowerShell](create-a-custom-sensitive-information-type-in-scc-powershell.md)
101
+
-[Learn about exact data match based sensitive information types](sit-learn-about-exact-data-match-based-sits.md#learn-about-exact-data-match-based-sensitive-information-types)
102
+
87
103
1. In the Compliance Center, go to **Data classification**\>**Sensitive info types** and choose the sensitive information type that you want to copy.
88
104
89
105
2. In the flyout, choose **Copy**.
@@ -98,11 +114,11 @@ Use this procedure to create a new sensitive information type that is based on a
98
114
99
115
7. You can choose to edit or remove the existing patterns and add new ones. Choose the default confidence level for the new pattern. The values are **Low confidence**, **Medium confidence**, and **High confidence**.
100
116
101
-
8. Choose and define **Primary element**. The primary element can be a **Regular expression**, a **Keyword list**, a **Keyword dictionary**, or one of the pre-configured **Functions**. See, [What the DLP functions look for](what-the-dlp-functions-look-for.md).
117
+
8. Choose and define **Primary element**. The primary element can be a **Regular expression**, a **Keyword list**, a **Keyword dictionary**, or one of the pre-configured **Functions**. See, [Sensitive information type functions](sit-functions.md).
102
118
103
119
9. Fill in a value for **Character proximity**.
104
120
105
-
10. (Optional) If you have **Supporting elements** or any [**Additional checks**](#more-information-on-additional-checks) add them. If needed you can group your **Supporting elements**.
121
+
10. (Optional) If you have **Supporting elements** or any [**additional checks**](sit-regex-validators-additional-checks.md#sensitive-information-type-additional-checks) add them. If needed you can group your **Supporting elements**.
106
122
107
123
11. Choose **Create**.
108
124
@@ -163,111 +179,6 @@ For a scanned item to satisfy rule criteria, the number of unique instances of a
163
179
164
180
For example, if you want the rule to trigger a match when at least 500 unique instances of a SIT are found in a single item, set the **min** value to `500` and the **max** value to `Any`.
165
181
166
-
## Modify custom sensitive information types in the Compliance Center
167
-
168
-
1. In the Compliance Center, go to **Data classification**\>**Sensitive info types** and choose the sensitive information type from the list that you want to modify choose **Edit**.
169
-
170
-
2. You can add other patterns, with unique primary and supporting elements, confidence levels, character proximity, and [**additional checks**](#more-information-on-additional-checks) or edit/remove the existing ones.
171
-
172
-
## Remove custom sensitive information types in the Compliance Center
173
-
174
-
> [!NOTE]
175
-
> You can only remove custom sensitive information types; you can't remove built-in sensitive information types.
176
-
177
-
> [!IMPORTANT]
178
-
> Before your remove a custom sensitive information type, verify that no DLP policies or Exchange mail flow rules (also known as transport rules) still reference the sensitive information type.
179
-
180
-
1. In the Compliance Center, go to **Data classification**\>**Sensitive info types** and choose the sensitive information type from the list that you want to remove.
181
-
182
-
2. In the fly-out that opens, choose **Delete**.
183
-
184
-
> [!NOTE]
185
-
> These SITs can't be copied:
186
-
> - Canada driver's license number
187
-
> - EU driver's license number
188
-
> - EU national identification number
189
-
> - EU passport number
190
-
> - EU social security number or equivalent identification
191
-
> - EU tax identification number
192
-
> - International classification of diseases (ICD-10-CM)
193
-
> - International classification of diseases (ICD-9-CM)
194
-
> - U.S. driver's license number
195
-
196
-
You can also create custom sensitive information types by using PowerShell and Exact Data Match capabilities. To learn more about those methods, see:
197
-
-[Create a custom sensitive information type in Security & Compliance Center PowerShell](create-a-custom-sensitive-information-type-in-scc-powershell.md)
198
-
-[Learn about exact data match based sensitive information types](sit-learn-about-exact-data-match-based-sits.md#learn-about-exact-data-match-based-sensitive-information-types)
199
-
200
-
## More information on regular expression validators
201
-
202
-
### Checksum validator
203
-
204
-
If you need to run a checksum on a digit in a regular expression, you can use the *checksum validator*. For example, say you need to create a SIT for an eight digit license number where the last digit is a checksum digit that is validated using a mod 9 calculation. You've set up the checksum algorithm like this:
1. Define the primary element with this regular expression:
216
-
217
-
```console
218
-
\d{8}
219
-
```
220
-
221
-
2. Then add the checksum validator.
222
-
223
-
3. Add the weight values separated by commas, the position of the check digit and the Mod value. For more information on the Modulo operation, see [Modulo operation](https://en.wikipedia.org/wiki/Modulo_operation).
224
-
225
-
> [!NOTE]
226
-
> If the check digit is not part of the checksum calculation then use 0 as the weight for the check digit. For example, in the above case weight 8 will be equal to 0 if the check digit is not to be used for calculating the check digit. Modulo_operation).
227
-
228
-
:::image type="content" alt-text="screenshot of configured checksum validator." source="../media/checksum-validator.png" lightbox="../media/checksum-validator.png":::
229
-
230
-
### Date validator
231
-
232
-
If a date value that is embedded in regular expression is part of a new pattern you are creating, you can use the *date validator* to test that it meets your criteria. For example, say you want to create a SIT for a nine digit employee identification number. The first six digits are the date of hire in DDMMYY format and the last three are randomly generated numbers. To validate that the first six digits are in the correct format.
233
-
234
-
1. Define the primary element with this regular expression:
235
-
236
-
```console
237
-
\d{9}
238
-
```
239
-
240
-
2. Then add the date validator.
241
-
242
-
3. Select the date format and the start offset. Since the date string is the first six digits, the offset is `0`.
243
-
244
-
:::image type="content" alt-text="screenshot of configured date validator." source="../media/date-validator.png" lightbox="../media/date-validator.png":::
245
-
246
-
### Functional processors as validators
247
-
248
-
You can use function processors for some of the most commonly used SITs as validators. This allows you to define your own regular expression while ensuring they pass the additional checks required by the SIT. For example, Func_India_Aadhar will ensure that the custom regular expression defined by you passes the validation logic required for Indian Aadhar card. For more information on DLP functions that can be used as validators, see [What the DLP functions look for](what-the-dlp-functions-look-for.md#what-the-dlp-functions-look-for).
249
-
250
-
### Luhn check validator
251
-
252
-
You can use the Luhn check validator if you have a custom Sensitive information type that includes a regular expression which should pass the [Luhn algorithm](https://en.wikipedia.org/wiki/Luhn_algorithm).
253
-
254
-
## More information on additional checks
255
-
256
-
Here are the definitions and some examples for the available additional checks.
257
-
258
-
**Exclude specific matches**: This check lets you define keywords to exclude when detecting matches for the pattern you are editing. For example, you might exclude test credit card numbers like '4111111111111111' so that they're not matched as a valid number.
259
-
260
-
**Starts or doesn't start with characters**: This check lets you define the characters that the matched items must or must not start with. For example, if you want the pattern to detect only credit card numbers that start with 41, 42, or 43, select **Starts with** and add 41, 42, and 43 to the list, separated by commas.
261
-
262
-
**Ends or doesn't end with characters**: This check lets you define the characters that the matched items must or must not end with. For example, if your Employee ID number cannot end with 0 or 1, select **Doesn't end with** and add 0 and 1 to the list, separated by commas.
263
-
264
-
**Exclude duplicate characters**: This check lets you ignore matches in which all the digits are the same. For example, if the six digit employee ID number cannot have all the digits be the same, you can select **Exclude duplicate characters** to exclude 111111, 222222, 333333, 444444, 555555, 666666, 777777, 888888, 999999, and 000000 from the list of valid matches for the employee ID.
265
-
266
-
**Include or exclude prefixes**: This check lets you define the keywords that must or must not be found immediately before the matching entity. Depending on your selection, entities will be matched or not matched if they're preceded by the prefixes you include here. For example, if you **Exclude** the prefix **GUID:**, any entity that's preceded by **GUID:** won't be considered a match.
267
-
268
-
**Include or exclude suffixes** This check lets you define the keywords that must or must not be found immediately after the matching entity. Depending on your selection, entities will be matched or not matched if they're followed by the suffixes you include here. For example, if you **Exclude** the suffix **:GUID**, any text that's followed by **:GUID** won't be matched.
269
-
270
-
271
182
> [!NOTE]
272
183
> Microsoft 365 Information Protection supports double byte character set languages for:
0 commit comments