Skip to content

Commit 55641a0

Browse files
committed
Switch Foundry Tools back to AI services when general services are intended
1 parent a9051ef commit 55641a0

3 files changed

Lines changed: 11 additions & 11 deletions

File tree

articles/api-management/backends.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ For other APIs, such as APIs from Azure services, you import an Azure resource w
2828

2929
API Management also supports using other resources as an API backend, such as:
3030
* A [Service Fabric cluster](how-to-configure-service-fabric-backend.yml).
31-
* Foundry Tools.
31+
* AI services.
3232
* A custom service.
3333

3434
For these backends, you can create a *backend entity* in API Management and reference it in your APIs.
@@ -51,7 +51,7 @@ You can configure and manage backend entities in the Azure portal, or by using A
5151
You can create a backend in the Azure portal, or by using Azure APIs or tools.
5252

5353
> [!NOTE]
54-
> When you import certain APIs, such as APIs from Microsoft Foundry or other Foundry Tools, API Management automatically configures a backend entity.
54+
> When you import certain APIs, such as APIs from Microsoft Foundry or other AI services, API Management automatically configures a backend entity.
5555
5656
To create a backend in the portal:
5757

articles/api-management/genai-gateway-capabilities.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ AI adoption in organizations involves several phases:
4545

4646
As AI adoption matures, especially in larger enterprises, the AI gateway helps address key challenges. It helps you:
4747

48-
* Authenticate and authorize access to Foundry Tools
48+
* Authenticate and authorize access to AI services
4949
* Load balance across multiple AI endpoints
5050
* Monitor and log AI interactions
5151
* Manage token usage and quotas across multiple applications
@@ -78,15 +78,15 @@ More information:
7878

7979
One of the main resources in generative AI services is *tokens*. Microsoft Foundry and other providers assign quotas for your model deployments as tokens per minute (TPM). You distribute these tokens across your model consumers, such as different applications, developer teams, or departments within the company.
8080

81-
If you have a single app connecting to an AI service backend, you can manage token consumption with a TPM limit that you set directly on the model deployment. However, when your application portfolio grows, you might have multiple apps calling single or multiple Azure AI Services endpoints. These endpoints can be pay-as-you-go or [Provisioned Throughput Units](/azure/ai-services/openai/concepts/provisioned-throughput) (PTU) instances. You need to make sure that one app doesn't use the whole TPM quota and block other apps from accessing the backends they need.
81+
If you have a single app connecting to an AI service backend, you can manage token consumption with a TPM limit that you set directly on the model deployment. However, when your application portfolio grows, you might have multiple apps calling single or multiple AI service endpoints. These endpoints can be pay-as-you-go or [Provisioned Throughput Units](/azure/ai-services/openai/concepts/provisioned-throughput) (PTU) instances. You need to make sure that one app doesn't use the whole TPM quota and block other apps from accessing the backends they need.
8282

8383
### Token rate limiting and quotas
8484

85-
Configure a token limit policy on your LLM APIs to manage and enforce limits per API consumer based on the usage of Foundry Tool tokens. By using this policy, you can set a TPM limit or a token quota over a specified period, such as hourly, daily, weekly, monthly, or yearly.
85+
Configure a token limit policy on your LLM APIs to manage and enforce limits per API consumer based on the usage of AI service tokens. By using this policy, you can set a TPM limit or a token quota over a specified period, such as hourly, daily, weekly, monthly, or yearly.
8686

8787
:::image type="content" source="media/genai-gateway-capabilities/token-rate-limiting.png" alt-text="Diagram of limiting Azure OpenAI Service tokens in API Management.":::
8888

89-
This policy provides flexibility to assign token-based limits on any counter key, such as subscription key, originating IP address, or an arbitrary key defined through a policy expression. The policy also enables precalculation of prompt tokens on the Azure API Management side, minimizing unnecessary requests to the Foundry Tool backend if the prompt already exceeds the limit.
89+
This policy provides flexibility to assign token-based limits on any counter key, such as subscription key, originating IP address, or an arbitrary key defined through a policy expression. The policy also enables precalculation of prompt tokens on the Azure API Management side, minimizing unnecessary requests to the AI service backend if the prompt already exceeds the limit.
9090

9191
The following basic example demonstrates how to set a TPM limit of 500 per subscription key:
9292

@@ -102,7 +102,7 @@ More information:
102102

103103
### Semantic caching
104104

105-
Semantic caching is a technique that improves the performance of LLM APIs by caching the results (completions) of previous prompts and reusing them by comparing the vector proximity of the prompt to prior requests. This technique reduces the number of calls made to the Foundry Tool backend, improves response times for end users, and can help reduce costs.
105+
Semantic caching is a technique that improves the performance of LLM APIs by caching the results (completions) of previous prompts and reusing them by comparing the vector proximity of the prompt to prior requests. This technique reduces the number of calls made to the AI service backend, improves response times for end users, and can help reduce costs.
106106

107107
In API Management, enable semantic caching by using [Azure Managed Redis](/azure/redis/overview) or another external cache compatible with RediSearch and onboarded to Azure API Management. By using the Embeddings API, the [llm-semantic-cache-store](llm-semantic-cache-store-policy.md) and [llm-semantic-cache-lookup](llm-semantic-cache-lookup-policy.md) policies store and retrieve semantically similar prompt completions from the cache. This approach ensures completions reuse, resulting in reduced token consumption and improved response performance.
108108

@@ -124,13 +124,13 @@ More information:
124124
* [Deploy an API Management instance in multiple regions](api-management-howto-deploy-multi-region.md)
125125

126126
> [!NOTE]
127-
> While API Management can scale gateway capacity, you also need to scale and distribute traffic to your AI backends to accommodate increased load (see the [Resiliency](#resiliency) section). For example, to take advantage of geographical distribution of your system in a multiregion configuration, deploy backend Foundry Tools in the same regions as your API Management gateways.
127+
> While API Management can scale gateway capacity, you also need to scale and distribute traffic to your AI backends to accommodate increased load (see the [Resiliency](#resiliency) section). For example, to take advantage of geographical distribution of your system in a multiregion configuration, deploy backend AI services in the same regions as your API Management gateways.
128128
129129
## Security and safety
130130

131131
An AI gateway secures and controls access to your AI APIs. By using the AI gateway, you can:
132132

133-
* Use managed identities to authenticate to Foundry Tools, so you don't need API keys for authentication
133+
* Use managed identities to authenticate to AI services, so you don't need API keys for authentication
134134
* Configure OAuth authorization for AI apps and agents to access APIs or MCP servers by using API Management's credential manager
135135
* Apply policies to automatically moderate LLM prompts by using [Azure AI Content Safety](/azure/ai-services/content-safety/overview)
136136

@@ -146,7 +146,7 @@ More information:
146146

147147
## Resiliency
148148

149-
One challenge when building intelligent applications is ensuring that the applications are resilient to backend failures and can handle high loads. By configuring your LLM endpoints with [backends](backends.md) in Azure API Management, you can balance the load across them. You can also define circuit breaker rules to stop forwarding requests to Foundry Tool backends if they're not responsive.
149+
One challenge when building intelligent applications is ensuring that the applications are resilient to backend failures and can handle high loads. By configuring your LLM endpoints with [backends](backends.md) in Azure API Management, you can balance the load across them. You can also define circuit breaker rules to stop forwarding requests to AI service backends if they're not responsive.
150150

151151
### Load balancer
152152

articles/api-management/openai-compatible-llm-api.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ Learn more about managing AI APIs in API Management:
2323

2424
## Language model API types
2525

26-
API Management supports two types of language model APIs for this scenario. Choose the option suitable for your model deployment. The option determines how clients call the API and how the API Management instance routes requests to the Foundry Tool.
26+
API Management supports two types of language model APIs for this scenario. Choose the option suitable for your model deployment. The option determines how clients call the API and how the API Management instance routes requests to the AI service.
2727

2828
* **OpenAI-compatible** - Language model endpoints that are compatible with OpenAI's API. Examples include certain models exposed by inference providers such as [Hugging Face Text Generation Inference (TGI)](https://huggingface.co/docs/text-generation-inference/en/index) and [Google Gemini API](openai-compatible-google-gemini-api.md).
2929

0 commit comments

Comments
 (0)