Skip to content

Commit ca79f45

Browse files
Merge pull request #313200 from dlepow/danlep-patch-637793
[APIM] Content-safety policy in outbound
2 parents d474359 + e175581 commit ca79f45

1 file changed

Lines changed: 19 additions & 16 deletions

File tree

articles/api-management/llm-content-safety-policy.md

Lines changed: 19 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.service: azure-api-management
88
ms.collection: ce-skilling-ai-copilot
99
ms.custom:
1010
ms.topic: reference
11-
ms.date: 09/03/2025
11+
ms.date: 03/23/2026
1212
ms.update-cycle: 180-days
1313
ms.author: danlep
1414
---
@@ -17,16 +17,16 @@ ms.author: danlep
1717

1818
[!INCLUDE [api-management-availability-premium-dev-standard-basic-premiumv2-standardv2-basicv2](../../includes/api-management-availability-premium-dev-standard-basic-premiumv2-standardv2-basicv2.md)]
1919

20-
The `llm-content-safety` policy enforces content safety checks on large language model (LLM) requests (prompts) by transmitting them to the [Azure AI Content Safety](/azure/ai-services/content-safety/overview) service before sending to the backend LLM API. When the policy is enabled, and Azure AI Content Safety detects malicious content, API Management blocks the request and returns a `403` error code.
20+
The `llm-content-safety` policy enforces content safety checks on large language model (LLM) requests (prompts) or responses (completions) by sending them to the [Azure AI Content Safety](/azure/ai-services/content-safety/overview) service. When you enable the policy and Azure AI Content Safety detects malicious content, API Management blocks the request or response and returns a `403` error code.
2121

2222
> [!NOTE]
23-
> The terms _category_ and _categories_ used in API Management are synonymous with _harm category_ and _harm categories_ in the Azure AI Content Safety service. Details can be found on the [Harm categories in Azure AI Content Safety](/azure/ai-services/content-safety/concepts/harm-categories) page.
23+
> The terms _category_ and _categories_ used in API Management are synonymous with _harm category_ and _harm categories_ in the Azure AI Content Safety service. For more information, see [Harm categories in Azure AI Content Safety](/azure/ai-services/content-safety/concepts/harm-categories).
2424
2525
Use the policy in scenarios such as the following:
2626

27-
* Block requests that contain predefined categories of harmful content or hate speech
28-
* Apply custom blocklists to prevent specific content from being sent
29-
* Shield against prompts that match attack patterns
27+
* Block requests or responses that contain predefined categories of harmful content or hate speech.
28+
* Apply custom blocklists to prevent specific content from being sent or received.
29+
* Shield against prompts that match attack patterns.
3030

3131
[!INCLUDE [api-management-policy-generic-alert](../../includes/api-management-policy-generic-alert.md)]
3232

@@ -41,7 +41,7 @@ Use the policy in scenarios such as the following:
4141
## Policy statement
4242

4343
```xml
44-
<llm-content-safety backend-id="name of backend entity" shield-prompt="true | false" enforce-on-completions="true | false">
44+
<llm-content-safety backend-id="name of backend entity" shield-prompt="true | false" enforce-on-completions="true | false" window-size="integer" window-overlap-size="integer">
4545
<categories output-type="FourSeverityLevels | EightSeverityLevels">
4646
<category name="Hate | SelfHarm | Sexual | Violence" threshold="integer" />
4747
<!-- If there are multiple categories, add more category elements -->
@@ -60,8 +60,10 @@ Use the policy in scenarios such as the following:
6060
| Attribute | Description | Required | Default |
6161
| -------------- | ----------------------------------------------------------------------------------------------------- | -------- | ------- |
6262
| backend-id | Identifier (name) of the Azure AI Content Safety backend to route content-safety API calls to. Policy expressions are allowed. | Yes | N/A |
63-
| shield-prompt | If set to `true`, content is checked for user attacks. Otherwise, skip this check. Policy expressions are allowed. | No | `false` |
64-
| enforce-on-completions| If set to `true`, content safety checks are enforced on chat completions for response validation. Otherwise, skip this check. Policy expressions are allowed. | No | `false` |
63+
| shield-prompt | If set to `true`, check content for user attacks. Otherwise, skip this check. Policy expressions are allowed. | No | `false` |
64+
| enforce-on-completions| If set to `true` when you set the policy in the inbound section for content safety checks on requests, enforce content safety checks also on chat completions for response validation. When you set the policy in the outbound section for content safety checks on responses, this attribute is ignored. Policy expressions are allowed. | No | `false` |
65+
| window-size | The size of the text window in characters that the policy sends to Azure AI Content Safety for evaluation. Configurable only for responses; for requests, the default window size is always used. Policy expressions are allowed. | No | 10,000 characters (Azure AI Content Safety limit) |
66+
| window-overlap-size | The size of the overlap in characters between text windows when the content is split by using the `window-size` attribute. If you don't specify a value, windows don't overlap. Policy expressions are allowed. | No | N/A |
6567

6668

6769
## Elements
@@ -83,24 +85,25 @@ Use the policy in scenarios such as the following:
8385
| Attribute | Description | Required | Default |
8486
| -------------- | ----------------------------------------------------------------------------------------------------- | -------- | ------- |
8587
| name | Specifies the name of this category. The attribute must have one of the following values: `Hate`, `SelfHarm`, `Sexual`, `Violence`. Policy expressions are allowed. | Yes | N/A |
86-
| threshold | Specifies the threshold value for this category at which request are blocked. Requests with content severities less than the threshold aren't blocked. The value must be between 0 (most restrictive) and 7 (least restrictive). Policy expressions are allowed. | Yes | N/A |
88+
| threshold | Specifies the threshold value for this category at which requests or responses are blocked. Requests with content severities less than the threshold aren't blocked. The value must be between 0 (most restrictive) and 7 (least restrictive). Policy expressions are allowed. | Yes | N/A |
8789

8890

8991
## Usage
9092

91-
- [**Policy sections:**](./api-management-howto-policies.md#understanding-policy-configuration) inbound
93+
- [**Policy sections:**](./api-management-howto-policies.md#understanding-policy-configuration) inbound, outbound
9294
- [**Policy scopes:**](./api-management-howto-policies.md#scopes) global, workspace, product, API
9395
- [**Gateways:**](api-management-gateways-overview.md) classic, v2, consumption, self-hosted, workspace
9496

9597
### Usage notes
9698

97-
* The policy runs on a concatenation of all text content in a completion or chat completion request.
98-
* If the request exceeds the character limit of Azure AI Content Safety, a `403` error is returned.
99-
* This policy can be used multiple times per policy definition.
99+
* Configure the policy in the inbound section to check requests and in the outbound section to check responses.
100+
* For streaming responses, the stream handler buffers events in a sliding window and, if a content safety violation is detected, stops forwarding further events to the client. A `403` error isn't returned in this case.
101+
* If the request or response exceeds the character limit of Azure AI Content Safety, the policy returns a `403` error.
102+
* You can use this policy multiple times per policy definition.
100103

101104
## Example
102105

103-
The following example enforces content safety checks on LLM requests using the Azure AI Content Safety service. The policy blocks requests that contain speech in the `Hate` or `Violence` category with a severity level of 4 or higher. In other words, the filter allows levels 0-3 to continue whereas levels 4-7 are blocked. Raising a category's threshold raises the tolerance and potentially decreases the number of blocked requests. Lowering the threshold lowers the tolerance and potentially increases the number of blocked requests. The `shield-prompt` attribute is set to `true` to check for adversarial attacks.
106+
The following example, when configured in the inbound section, enforces content safety checks on LLM requests by using the Azure AI Content Safety service. The policy blocks requests that contain speech in the `Hate` or `Violence` category with a severity level of 4 or higher. In other words, the filter allows levels 0-3 to continue whereas levels 4-7 are blocked. Raising a category's threshold raises the tolerance and potentially decreases the number of blocked requests. Lowering the threshold lowers the tolerance and potentially increases the number of blocked requests. The `shield-prompt` attribute is set to `true` to check for adversarial attacks.
104107

105108
```xml
106109
<policies>
@@ -117,7 +120,7 @@ The following example enforces content safety checks on LLM requests using the A
117120

118121
## Related policies
119122

120-
* [Content validation](api-management-policies.md#content-validation)
123+
* [Content validation](api-management-policies.md#content-validation) policies
121124
* [llm-token-limit](llm-token-limit-policy.md) policy
122125
* [llm-emit-token-metric](llm-emit-token-metric-policy.md) policy
123126

0 commit comments

Comments
 (0)