Skip to content

Commit 03ee052

Browse files
Merge pull request #54442 from MicrosoftDocs/NEW-entra-ai-configure-workload-identities-v2
New entra ai configure workload identities
2 parents f771fcb + 635cda6 commit 03ee052

18 files changed

Lines changed: 477 additions & 0 deletions
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.entra-ai-configure-workload-identities.assign-least-privilege-access
3+
title: Assign least-privilege access
4+
metadata:
5+
title: Assign Least-Privilege Access
6+
description: "Right-size permissions across Azure RBAC, Microsoft Graph, and app roles for AI workload identities."
7+
ms.date: 04/27/2026
8+
author: riswinto
9+
ms.author: riswinto
10+
ms.topic: unit
11+
durationInMinutes: 8
12+
content: |
13+
[!include[](includes/assign-least-privilege-access.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.entra-ai-configure-workload-identities.configure-secure-authentication-ai-workloads
3+
title: Configure secure authentication for AI workloads
4+
metadata:
5+
title: Configure Secure Authentication for AI Workloads
6+
description: "Select and configure the correct credential type for an AI workload identity based on where the workload runs."
7+
ms.date: 04/27/2026
8+
author: riswinto
9+
ms.author: riswinto
10+
ms.topic: unit
11+
durationInMinutes: 7
12+
content: |
13+
[!include[](includes/configure-secure-authentication-ai-workloads.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.entra-ai-configure-workload-identities.identify-setup-risks-ai-workload-identities
3+
title: What makes identity setup decisions hard to fix
4+
metadata:
5+
title: What Makes Identity Setup Decisions Hard to Fix
6+
description: "Understand why credential, permission, ownership, and validation decisions made during workload identity setup are hard to change once the workload reaches production."
7+
ms.date: 04/27/2026
8+
author: riswinto
9+
ms.author: riswinto
10+
ms.topic: unit
11+
durationInMinutes: 5
12+
content: |
13+
[!include[](includes/identify-setup-risks-ai-workload-identities.md)]
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
The workload can authenticate. The next question is what it should be allowed to do. That answer comes from what the workload's task actually requires, mapped to the specific permission planes that control access to those resources.
2+
3+
## Identify which permission planes the workload needs
4+
5+
Start from the workload's task. List what it actually does: read secrets from Key Vault, call an AI services endpoint, query user profiles from Microsoft Graph, invoke an operation on a custom API. Each of those tasks maps to a different permission plane, and each plane has its own assignment mechanism, consent model, and scoping rules.
6+
7+
- **Azure RBAC**: The workload accesses Azure resources such as storage accounts, Key Vault, or AI services. Roles are assigned at a specific scope: resource, resource group, subscription, or management group.
8+
- **Microsoft Graph application permissions**: The workload reads or writes directory data or Microsoft 365 resources. These permissions are granted on the app registration and require admin consent.
9+
- **App roles**: The workload calls a custom application or API. The target application defines the available roles.
10+
11+
Most AI workloads touch more than one plane. A workload that reads secrets from Key Vault and queries user profiles needs both an Azure RBAC role assignment and a Graph application permission. Treating these planes as interchangeable leads to permissions assigned in the wrong place or at the wrong scope.
12+
13+
:::image type="content" source="../media/ai-workload-least-privilege-access.png" alt-text="Diagram of an AI workload with three granted permissions and one not granted, showing least-privilege access." border="false":::
14+
15+
## Right-size Azure RBAC role assignments
16+
17+
The principle is straightforward. Find the narrowest built-in role that covers what the workload actually does, and assign it at the resource level. In practice, "what the workload actually does" is often broader than expected when AI workloads span multiple services.
18+
19+
Assigning Contributor at the subscription level when the workload only needs to read from a single storage account grants write access to every resource in the subscription. The same over-privilege happens at the resource group level. An AI workload that only needs to read secrets from Key Vault and call an Azure OpenAI endpoint doesn't need write access to every resource in the group. Resource-level role assignments for those specific services cover the actual workload task.
20+
21+
For AI services specifically, the distinction between inference and management matters for role selection. A workload that sends prompts and receives completions needs only inference access. A workload that also deploys or deletes models needs management access. These are different roles with different blast radii. The specific role choices come into play when assigning access to Azure AI services, Key Vault, and storage.
22+
23+
Can you name every Azure resource your workload accesses and the specific operation it performs on each? If not, the RBAC assignments aren't ready yet.
24+
25+
## Scope Microsoft Graph application permissions to the workload's task
26+
27+
Graph application permissions apply tenant-wide, so each one should be justified by a specific workload task. Choosing the right scope matters here because Graph permissions vary widely for what sounds like similar access.
28+
29+
A workload that reads user profiles for grounding data needs `User.Read.All`. `Directory.Read.All` would also work, but it grants access to groups, roles, and other directory objects the workload doesn't need. `Directory.ReadWrite.All` grants write access that most AI workloads should never have. The difference between "reads user profiles" and "reads and writes the entire directory" is one permission selection.
30+
31+
Every Graph permission should map to a specific workload operation. If a permission doesn't map to something the workload actually does, it shouldn't be on the app registration. For SharePoint access specifically, use the `Sites.Selected` permission to limit access to specific site collections rather than all sites in the tenant.
32+
33+
## Evaluate app roles before assigning them
34+
35+
App roles apply when the workload calls a custom application or API. Unlike RBAC and Graph, the target application's developer defines the available roles. That means you need to understand what the roles actually grant before assigning them.
36+
37+
If the target application exposes a single broad role like "Application.FullAccess," that role might grant more operations than your workload needs. Ask the application owner whether narrower roles exist or can be created. Accepting a broad role because it's the only one available carries the same over-privilege risk as assigning Contributor at the subscription level.
38+
39+
App roles are scoped to the target application, so they don't grant access to Azure resources or Microsoft Graph. Each permission plane must be configured independently.
40+
41+
## Record why each permission exists
42+
43+
When someone reviews this identity six months from now, or during an incident, they need to understand what access was granted, why, and whether it's still justified. Without that record, excess permissions persist because no one can determine whether they're needed.
44+
45+
For each permission assignment, record:
46+
47+
- Which identity holds it
48+
- Which role or permission was assigned
49+
- The resource boundary or scope
50+
- The workload task it supports
51+
- When it was assigned
52+
53+
This record is what makes future access reviews possible without re-investigating every permission from scratch.
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
The first setup decision is how the workload authenticates. The answer depends on where the workload runs, because the hosting environment determines which credential types are available.
2+
3+
## Select the credential type based on where the workload runs
4+
5+
Use the workload's hosting model to determine how it should authenticate.
6+
7+
| Workload scenario | Preferred credential type | Why it fits |
8+
| --- | --- | --- |
9+
| Azure-hosted AI workload on a resource that supports managed identity | Managed identity | Azure manages the credential lifecycle. No secrets to store or rotate. |
10+
| Workload outside Azure, with a hosting environment that issues identity tokens | Federated identity credential | The application trusts an external token instead of storing a secret. |
11+
| Hosting environment can't issue tokens, but the workload needs production credentials | Certificate credential | Certificates require possession of the private key, making them stronger than client secrets. |
12+
| Local development or temporary testing only | Client secret | Simplest to configure for short-lived scenarios. Not recommended for production. |
13+
14+
The hosting environment determines the options. An inference API on App Service, a processing pipeline on Azure Functions, or a workload on Container Apps can use a managed identity directly. A model serving workload on a Kubernetes cluster outside Azure can use workload identity federation if the cluster has an OIDC issuer. A workload that can't use either option falls back to certificate credentials for production.
15+
16+
:::image type="content" source="../media/managed-identity-authentication-flow.png" alt-text="Diagram of managed identity flow from Azure workload through Microsoft Entra ID to Key Vault, Azure AI, and Storage." border="false":::
17+
18+
## Why the credential type matters more than getting authentication working
19+
20+
Each credential type carries a different operational cost:
21+
22+
- **Managed identity**: No rotation, no storage, no cleanup.
23+
- **Federated credential**: Maintain the trust relationship with the external identity provider.
24+
- **Certificate**: Secure the private key and rotate before expiration.
25+
- **Client secret**: Rotation, secure storage, and cleanup, plus the risk of exposure in code, logs, or configuration files.
26+
27+
The common shortcut is to use whichever credential gets the workload authenticating fastest. That's usually a client secret. But if the hosting environment supports managed identity, every day the workload runs with a secret is a day you're maintaining a credential that doesn't need to exist. Start from what the hosting environment makes possible, then pick the option with the lowest operational burden for the life of the workload.
28+
29+
## System-assigned vs. user-assigned managed identities
30+
31+
If managed identity is the right credential type, the next decision is whether to use system-assigned or user-assigned. System-assigned identities are tied to the lifecycle of the Azure resource, so when the resource is deleted, the identity is removed. User-assigned identities have an independent lifecycle and can be shared across related workloads.
32+
33+
For most single-workload scenarios, system-assigned is simpler. User-assigned identities make sense when multiple related workloads need the same identity, or when the identity needs to survive resource redeployment. Make the choice deliberately and document it.
34+
35+
## Key considerations for federation and certificates
36+
37+
When configuring workload identity federation, the issuer, subject, and audience claims must match the external token exactly. The matching is case-sensitive. If any value is wrong, token exchange fails silently from the workload's perspective.
38+
39+
For certificate credentials, store the private key in Azure Key Vault or a hardware security module. Plan rotation before the certificate expires, and test the rotation process before you need it.
40+
41+
## Confirm the credential model before configuring permissions
42+
43+
Before treating authentication as complete, confirm:
44+
45+
- Token acquisition succeeds from the workload's actual hosting environment, not just from a developer workstation.
46+
- The credential type matches the hosting scenario. If the workload runs on a resource that supports managed identity, it should be using managed identity, not a client secret.
47+
- No unnecessary secret remains in the design.
48+
- You can identify which object holds the credential (the managed identity, the app registration, or both).
49+
50+
A successful sign-in from the actual hosting environment confirms the credential model works as intended. If authentication fails at this point, permission configuration won't fix it, so resolve credential issues first. Once the identity can authenticate, the next decision is what it should be allowed to access.
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
If you've managed service principals or app registrations, you've seen these problems before. A secret that nobody remembers creating. A Contributor role assigned at a scope nobody can justify. An identity with no listed owner. These problems are common across all workload identities.
2+
3+
What changes with AI workloads is how many services a single identity touches. A workload that runs inference might authenticate to an Azure OpenAI endpoint, retrieve connection strings from Key Vault, pull grounding data from a storage account, and query user profiles from Microsoft Graph. That's four service boundaries behind one identity. If the credential is compromised or the permissions are too broad, the exposure spans every service in that chain, and the workload processes requests continuously without a human reviewing each call.
4+
5+
## Why set up decisions persist after production
6+
7+
Once an AI workload is running in production, every identity choice becomes the baseline that teams are reluctant to change:
8+
9+
- Switching a credential type requires coordinated updates to the workload, the app registration, and whatever automation deploys the workload.
10+
- Narrowing permissions risks breaking the workload if someone removes access the workload actually uses.
11+
- Assigning an owner requires finding someone who understands what the identity does and is willing to accept accountability for it.
12+
13+
Each change requires deliberate intervention that competes with other priorities. Setup decisions made in five minutes during initial configuration persist for the life of the workload.
14+
15+
## Why these risks don't surface on their own
16+
17+
A workload with a stored secret, broad permissions, no owner, and no validation functions identically to one that's set up correctly. Sign-in succeeds. API calls return data. Nothing in the workload's behavior signals that the identity configuration is wrong.
18+
19+
These configuration gaps surface only during incident response or an access review. At that point, someone needs a clear record of what was granted, why, and whether it's still justified. But the setup decisions have been in production for months, the person who made them might have moved on, and the documentation that would explain the reasoning was never created.
20+
21+
## The four decisions and how they compound
22+
23+
Every workload identity requires four setup decisions:
24+
25+
- **Credential type.** Secrets, certificates, managed identities, or federated credentials. The hosting environment constrains the options, but the choice sets the operational burden for the life of the workload.
26+
- **Permission scope.** Azure RBAC roles, Microsoft Graph application permissions, and app roles each have different scoping models. Broad defaults are easy to grant and difficult to narrow later.
27+
- **Role selection at each dependent service.** Key Vault, Azure AI services, storage accounts, and downstream APIs each have their own role definitions. Over-permissioned identities usually originate here, where someone picks a familiar role instead of the narrowest one that works.
28+
- **Pre-production validation.** Testing sign-in from the actual hosting environment, confirming permissions match the documented intent, and establishing a rotation plan. Without this step, gaps from the first three decisions persist undetected.
29+
30+
These decisions compound each other. A weak credential is worse when permissions are broad, and broad permissions are harder to fix when nobody validated them before production. That interaction is why each decision deserves deliberate attention.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
AI workload identities are often set up quickly and left running in production without validation. A client secret gets created because it's the fastest option. Broad permissions get assigned because they work. No one is listed as the identity's owner.
2+
3+
Consider a workload that calls AI models, reads secrets from Key Vault, and accesses user profiles through Microsoft Graph. If that identity is compromised, the attacker inherits every excess permission the workload was granted.
4+
5+
These consequences are amplified because a single AI workload identity often spans multiple services: an Azure OpenAI endpoint, a Key Vault, a storage account, and Microsoft Graph. One misconfigured identity means excess access across every service in that chain, and the workload runs without the interactive sign-in controls that apply to human users. Choosing the right credential model, scoping permissions to the workload's actual task, and validating the configuration before production determine whether a workload identity is a manageable security surface or an unmonitored exposure.
6+
7+
## Learning objectives
8+
9+
In this module, you'll:
10+
11+
- Explain what makes identity setup decisions hard to fix after a workload reaches production.
12+
- Configure the right credential type for an AI workload based on its hosting environment.
13+
- Assign least-privilege permissions across Azure RBAC, Microsoft Graph, and app roles.
14+
- Select the correct roles for Key Vault, Azure AI services, storage, and downstream APIs.
15+
- Validate the identity configuration and plan credential maintenance before production.

0 commit comments

Comments
 (0)