You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The `llm-content-safety` policy enforces content safety checks on large language model (LLM) requests (prompts) by transmitting them to the [Azure AI Content Safety](/azure/ai-services/content-safety/overview) service before sending to the backend LLM API. When the policy is enabled, and Azure AI Content Safety detects malicious content, API Management blocks the request and returns a `403` error code.
20
+
The `llm-content-safety` policy enforces content safety checks on large language model (LLM) requests (prompts) or responses (completions) by sending them to the [Azure AI Content Safety](/azure/ai-services/content-safety/overview) service. When you enable the policy and Azure AI Content Safety detects malicious content, API Management blocks the request or response and returns a `403` error code.
21
21
22
22
> [!NOTE]
23
-
> The terms _category_ and _categories_ used in API Management are synonymous with _harm category_ and _harm categories_ in the Azure AI Content Safety service. Details can be found on the [Harm categories in Azure AI Content Safety](/azure/ai-services/content-safety/concepts/harm-categories) page.
23
+
> The terms _category_ and _categories_ used in API Management are synonymous with _harm category_ and _harm categories_ in the Azure AI Content Safety service. For more information, see [Harm categories in Azure AI Content Safety](/azure/ai-services/content-safety/concepts/harm-categories).
24
24
25
25
Use the policy in scenarios such as the following:
26
26
27
-
* Block requests that contain predefined categories of harmful content or hate speech
28
-
* Apply custom blocklists to prevent specific content from being sent
29
-
* Shield against prompts that match attack patterns
27
+
* Block requests or responses that contain predefined categories of harmful content or hate speech.
28
+
* Apply custom blocklists to prevent specific content from being sent or received.
29
+
* Shield against prompts that match attack patterns.
| backend-id | Identifier (name) of the Azure AI Content Safety backend to route content-safety API calls to. Policy expressions are allowed. | Yes | N/A |
63
-
| shield-prompt | If set to `true`, content is checked for user attacks. Otherwise, skip this check. Policy expressions are allowed. | No |`false`|
64
-
| enforce-on-completions| If set to `true`, content safety checks are enforced on chat completions for response validation. Otherwise, skip this check. Policy expressions are allowed. | No |`false`|
63
+
| shield-prompt | If set to `true`, check content for user attacks. Otherwise, skip this check. Policy expressions are allowed. | No |`false`|
64
+
| enforce-on-completions| If set to `true`, enforce content safety checks on chat completions for response validation. Otherwise, skip this check. When you set the policy in the outbound section, this attribute is ignored. Policy expressions are allowed. | No |`false`|
65
+
| window-size | The size of text windows in characters that the policy sends to Azure AI Content Safety for evaluation. If you don't specify a value, the entire content is sent as one window. Policy expressions are allowed. | No | N/A |
66
+
| window-overlap-size | The size of overlaps in characters between text windows when the content is split by using the `window-size` attribute. If you don't specify a value, windows don't overlap. Policy expressions are allowed. | No | N/A |
65
67
66
68
67
69
## Elements
@@ -83,24 +85,24 @@ Use the policy in scenarios such as the following:
| name | Specifies the name of this category. The attribute must have one of the following values: `Hate`, `SelfHarm`, `Sexual`, `Violence`. Policy expressions are allowed. | Yes | N/A |
86
-
| threshold | Specifies the threshold value for this category at which request are blocked. Requests with content severities less than the threshold aren't blocked. The value must be between 0 (most restrictive) and 7 (least restrictive). Policy expressions are allowed. | Yes | N/A |
88
+
| threshold | Specifies the threshold value for this category at which requests or responses are blocked. Requests with content severities less than the threshold aren't blocked. The value must be between 0 (most restrictive) and 7 (least restrictive). Policy expressions are allowed. | Yes | N/A |
*The policy runs on a concatenation of all text content in a completion or chat completion request.
98
-
* If the request exceeds the character limit of Azure AI Content Safety, a `403` error is returned.
99
-
*This policy can be used multiple times per policy definition.
99
+
*Unless you specify `window-size`, the policy runs on a concatenation of all text content in a completion or chat completion request or response. If you specify `window-size`, the policy runs on windows of text content with the specified size and overlaps.
100
+
* If the request or response exceeds the character limit of Azure AI Content Safety, the policy returns a `403` error.
101
+
*You can use this policy multiple times per policy definition.
100
102
101
103
## Example
102
104
103
-
The following exampleenforces content safety checks on LLM requests using the Azure AI Content Safety service. The policy blocks requests that contain speech in the `Hate` or `Violence` category with a severity level of 4 or higher. In other words, the filter allows levels 0-3 to continue whereas levels 4-7 are blocked. Raising a category's threshold raises the tolerance and potentially decreases the number of blocked requests. Lowering the threshold lowers the tolerance and potentially increases the number of blocked requests. The `shield-prompt` attribute is set to `true` to check for adversarial attacks.
105
+
The following example, when configured in the inbound section, enforces content safety checks on LLM requests by using the Azure AI Content Safety service. The policy blocks requests that contain speech in the `Hate` or `Violence` category with a severity level of 4 or higher. In other words, the filter allows levels 0-3 to continue whereas levels 4-7 are blocked. Raising a category's threshold raises the tolerance and potentially decreases the number of blocked requests. Lowering the threshold lowers the tolerance and potentially increases the number of blocked requests. The `shield-prompt` attribute is set to `true` to check for adversarial attacks.
104
106
105
107
```xml
106
108
<policies>
@@ -117,7 +119,7 @@ The following example enforces content safety checks on LLM requests using the A
0 commit comments