Merge pull request #54076 from ivorb/agents-bugfix

Court72 · web-flow · commit ba3a28647c5b · 2026-04-01T12:46:10.000-06:00
improve deployment info
diff --git a/learn-pr/wwl-data-ai/develop-ai-agents-azure-vs-code/8-test-deploy-integrate.yml b/learn-pr/wwl-data-ai/develop-ai-agents-azure-vs-code/8-test-deploy-integrate.yml
@@ -9,6 +9,6 @@ metadata:
   ms.author: berryivor
   ms.topic: unit
   ai-usage: ai-generated
-durationInMinutes: 10
+durationInMinutes: 9
 content: |
   [!include[](includes/8-test-deploy-integrate.md)]
diff --git a/learn-pr/wwl-data-ai/develop-ai-agents-azure-vs-code/includes/8-test-deploy-integrate.md b/learn-pr/wwl-data-ai/develop-ai-agents-azure-vs-code/includes/8-test-deploy-integrate.md
@@ -1,234 +1,136 @@
-Testing, deploying, and integrating agents are critical steps in moving from development to production. Microsoft Foundry provides comprehensive capabilities for validating agent behavior, deploying to production environments, and connecting agents to your applications. These final steps transform prototypes into reliable automation that delivers business value.
+Testing, deploying, and publishing agents are critical steps in moving from development to production. Microsoft Foundry provides comprehensive capabilities for validating agent behavior, deploying to your Foundry project, and publishing agents as callable endpoints that external consumers and applications can use.
 
 ## Testing strategies for agents
 
-Thorough testing ensures your agents behave reliably across diverse scenarios before reaching users. Testing should cover expected interactions, edge cases, and error conditions.
-
-### Testing with integrated playgrounds
-
-Both the Foundry portal and Visual Studio Code extension provide playgrounds for interactive testing. These environments simulate real user interactions while providing visibility into agent decision-making.
+Thorough testing ensures your agents behave reliably across diverse scenarios before reaching users. Both the Foundry portal and Visual Studio Code extension provide playgrounds for interactive testing.
 
 **Using the playground effectively:**
 
-Start with **happy path testing** - Verify the agent handles common, expected requests correctly. Test typical user questions and workflows to confirm basic functionality works as intended.
-
-Move to **edge case testing** - Try ambiguous inputs, incomplete information, and unusual requests. Edge cases reveal how agents handle uncertainty and unexpected situations.
-
-Perform **boundary testing** - Test the limits of what your agent should and shouldn't do. Confirm the agent respects boundaries defined in its instructions.
-
-Conduct **multi-turn conversation testing** - Verify the agent maintains context across multiple exchanges. Test whether the agent remembers prior information and builds on previous responses appropriately.
-
-Execute **tool invocation testing** - When agents use tools, verify they call the right tools at the right times and incorporate results correctly.
-
-### Testing scenarios to validate
-
-For a customer service agent, test these scenarios:
-
-**Expected requests:**
-- "I need to schedule an appointment"
-- "What are your hours?"
-- "Can I reschedule my appointment?"
-
-**Out-of-scope requests:**
-- "What medication should I take?" (should decline and suggest consulting a provider)
-- "Can you access my medical records?" (should explain privacy boundaries)
-
-**Ambiguous inputs:**
-- "I need help" (should ask clarifying questions)
-- "appointment" (should gather more context)
-
-**Error conditions:**
-- Tool failures or timeouts
-- Requests requiring unavailable information
-- System errors during processing
-
-Recording test results helps you track improvements over time and ensures regressions don't reintroduce old issues.
-
-## Working with conversations
-
-Understanding how the Responses API manages conversations helps you design better agent experiences and troubleshoot issues effectively.
-
-### Conversation lifecycle
-
-**Conversation creation** - A new conversation starts when a user interacts with your agent. Each conversation maintains its own message history, separate from other users' interactions.
-
-**Message exchange** - As users send messages, the Responses API processes them with your agent's configuration and generates responses based on conversation context.
-
-**Context preservation** - Conversations preserve the full message history, enabling agents to reference earlier exchanges and maintain continuity.
+- **Happy path testing** - Verify the agent handles common, expected requests correctly.
+- **Edge case testing** - Try ambiguous inputs, incomplete information, and unusual requests to reveal how agents handle uncertainty.
+- **Boundary testing** - Confirm the agent respects boundaries defined in its instructions by testing out-of-scope requests.
+- **Multi-turn conversation testing** - Verify the agent maintains context across multiple exchanges and builds on previous responses.
+- **Tool invocation testing** - Verify agents call the right tools at the right times and incorporate results correctly.
 
-**Conversation completion** - Conversations can be explicitly ended or allowed to expire based on inactivity. Completed conversations preserve their history for review.
+Record test results to track improvements and catch regressions.
 
-### Managing conversations in production
+## Deploying agents to your project
 
-When deploying agents, consider conversation management strategies:
-
-**Session boundaries** - Decide when new conversations should start. Customer service agents might create new conversations for each support case, while productivity assistants might maintain longer conversations.
-
-**Context limits** - Conversations can grow large over extended interactions. Monitor conversation length and implement strategies for summarizing or archiving old context when needed.
-
-**Privacy and retention** - Define retention policies for conversation data. Determine how long message histories should be preserved and when they should be deleted.
-
-You can view and manage conversations through the Foundry portal or programmatically through the Responses API, providing visibility into how users interact with your deployed agents.
-
-## Deployment approaches
-
-Microsoft Foundry supports multiple deployment approaches to match different operational needs and team workflows.
+Microsoft Foundry supports deploying agents from the portal or Visual Studio Code. Deploying saves your agent configuration to your Foundry project so you can test and iterate.
 
 ### Deploying from the Foundry portal
 
-Portal deployment provides a visual, guided experience:
-
 1. Navigate to your agent in the Foundry portal
 1. Verify configuration and test results are satisfactory
-1. Select **Deploy** from the agent's page
-1. Confirm deployment settings
-1. Wait for deployment to complete
-
-Deployed agents appear in your project's resource list with active status indicators.
+1. Select **Save** from the agent's page
+1. Confirm version and deployment settings
 
 ### Deploying from Visual Studio Code
 
-VS Code deployment integrates with your development workflow:
-
-1. Open your agent in the Agent Designer
-1. Select **Update on Microsoft Foundry** to push your configuration changes
-1. For hosted agents, use the **Deploy Hosted Agents** option in the Tools section
-1. Wait for deployment confirmation
-1. Refresh the Resources view to see the updated agent
+1. Open your agent in the AI Toolkit
+1. Select **Save to Foundry** to push configuration changes
+1. For hosted agents, open the **+Build** menu in the developer tools and select **Deploy to Microsoft Foundry**
+1. Select your container configuration and confirm
 
-This streamlined process keeps you in your development environment, eliminating context switches during deployment.
+Both approaches keep your agent within your project workspace where team members can access and test it.
 
-### Deployment considerations
+## Publishing agents to an endpoint
 
-When deploying agents, consider:
+Publishing moves an agent from your project workspace into a managed Azure resource called an **Agent Application**. This step is what makes your agent externally callable through a stable endpoint.
 
-**Model availability** - Ensure your selected model deployment has sufficient capacity for expected load. Monitor usage and scale as needed.
+### What publishing creates
 
-**Tool dependencies** - Verify all tools your agent uses are properly configured. File Search requires vector stores with uploaded documents, API tools need valid credentials.
+When you publish an agent version, Foundry creates:
 
-**Instruction clarity** - Double-check instructions before deployment. Changes after deployment require redeployment and may affect user experiences.
+- **Agent Application** - An Azure resource with its own invocation URL, authentication policy, and Entra agent identity.
+- **Deployment** - A running instance of a specific agent version inside the application, with start/stop lifecycle management.
 
-**Testing validation** - Confirm comprehensive testing is complete. Deploying untested changes risks production issues.
+The key difference between deploying and publishing is scope. Deploying keeps the agent within your project. Publishing creates a dedicated endpoint that external consumers can call without needing access to your Foundry project.
 
-## Generating integration code
-
-Once deployed, agents need to connect to your applications. The Microsoft Foundry extension generates sample integration code that accelerates this process.
-
-### Code generation process
+### Publishing from the Foundry portal
 
-To generate integration code:
+1. In the portal, select the agent version you want to publish
+1. Select **Publish** to create the Agent Application and deployment
 
-1. Select your deployed agent in the Azure Resources view (VS Code)
-1. Select **Open Code File** from the available actions
-1. The extension presents structured options:
-   - **Choose your preferred SDK** - Select the SDK framework for your integration
-   - **Choose your language** - Select your programming language (Python, JavaScript, C#, etc.)
-   - **Choose your authentication method** - Select how your application authenticates (managed identity, service principal, interactive, etc.)
-1. The extension generates sample code showing how to:
-   - Authenticate with Microsoft Foundry
-   - Connect to your specific agent
-   - Send messages using the Responses API
-   - Process agent responses
+### Publishing from Visual Studio Code
 
-## Production integration patterns
+1. Open the Command Palette (**Ctrl+Shift+P**) and run **Microsoft Foundry: Deploy Hosted Agent** for hosted agents
+1. Select the target workspace and container configuration
+1. Confirm and deploy
 
-Different applications require different integration approaches. Common patterns include:
+After publishing, the agent appears in the **Hosted Agents (Preview)** section of the AI Toolkit extension tree view.
 
-### Web application integration
+### The Agent Application endpoint
 
-Integrate agents into web applications to provide AI-powered features:
-- Start conversations when users interact with your agent
-- Send user messages to the agent through the Responses API
-- Display agent responses in your UI
-- Maintain conversation context across user sessions
+Published agents expose a stable endpoint using the Responses API protocol:
 
-### API-driven workflows
+`https://<foundry-resource-name>.services.ai.azure.com/api/projects/<project-name>/applications/<app-name>/protocols/openai/responses`
 
-Use agents in backend workflows triggered by events or schedules:
-- Send structured data as messages using the Responses API
-- Process agent responses programmatically
-- Use agent outputs to drive next steps in workflows
+This URL stays the same even as you roll out new agent versions, so downstream consumers aren't disrupted by updates.
 
-### Chatbot implementations
+### Authentication and identity
 
-Build conversational interfaces powered by agents:
-- Map user sessions to agent conversations
-- Handle real-time message exchange through the Responses API
-- Implement typing indicators while agents process requests
-- Support rich media in responses
+Agent Applications use Microsoft Entra ID for authentication. Callers must have the **Azure AI User** role on the Agent Application resource. API key authentication isn't supported for Agent Applications.
 
-### Background automation
+> [!IMPORTANT]
+> When you publish an agent, it receives its own dedicated Entra identity, separate from the project's shared identity. Permissions don't transfer automatically. You must reassign RBAC roles to the new agent identity for any resources the agent accesses. If you skip this step, tool calls that work during development fail with authorization errors once the agent is published.
 
-Deploy agents for automated tasks running without user interaction:
-- Schedule agent executions for regular tasks
-- Feed data from systems into agents using the Responses API
-- Process agent outputs to update business systems
-- Monitor agent performance and results
-
-## Production considerations
+### Verifying the endpoint
 
-Successfully running agents in production requires attention to operational aspects:
+After publishing, verify the endpoint works:
 
-### Monitoring and observability
+1. Get an access token:
 
-**Track key metrics:**
-- Response times and latency
-- Tool invocation success rates
-- Error rates and failure patterns
-- Conversation volume and message counts
-- Model token consumption
+   ```azurecli
+   az account get-access-token --resource https://ai.azure.com
+   ```
 
-These metrics help you identify performance issues and optimize agent behavior.
+1. Call the Agent Application endpoint:
 
-### Security and compliance
+   ```bash
+   curl -X POST \
+     "https://<foundry-resource-name>.services.ai.azure.com/api/projects/<project-name>/applications/<app-name>/protocols/openai/responses?api-version=2025-11-15-preview" \
+     -H "Authorization: Bearer <access-token>" \
+     -H "Content-Type: application/json" \
+     -d '{"input":"Say hello"}'
+   ```
 
-**Implement security best practices:**
-- Use managed identities or service principals for authentication
-- Apply least-privilege access controls
-- Encrypt sensitive data in transit and at rest
-- Audit agent actions and conversations
-- Implement data retention policies compliant with regulations
+If you receive `403 Forbidden`, confirm the caller has the **Azure AI User** role on the Agent Application resource.
 
-### Cost management
+### Updating published agents
 
-**Monitor and optimize costs:**
-- Track token usage across agents and conversations
-- Set response length limits to control costs
-- Choose appropriate models balancing capability and cost
-- Implement rate limiting to prevent unexpected usage spikes
-- Manage conversation history retention to reduce storage costs
+To roll out a new agent version:
 
-### Performance optimization
+1. Make changes in your development environment and test thoroughly
+1. In the Foundry portal, select **Publish Updates** from the Agent playground
+1. The Agent Application routes 100% of traffic to the new version automatically
 
-**Optimize agent performance:**
-- Cache frequently requested information
-- Optimize instructions for clarity and conciseness
-- Remove unnecessary tools that add latency
-- Monitor model selection, as some models are faster than others
-- Implement timeout handling for long-running operations
+The endpoint URL remains unchanged, so existing integrations continue working.
 
-## Error handling and resilience
-
-Robust agent implementations handle errors gracefully:
-
-**Network failures** - Implement retry logic with exponential backoff when API calls fail due to transient network issues.
+## Generating integration code
 
-**Tool failures** - When tools timeout or error, ensure agents provide helpful fallback responses rather than failing silently.
+The Microsoft Foundry VS Code extension generates sample integration code to connect your application to a published agent:
 
-**Rate limiting** - Handle rate limit responses from Azure by implementing backoff strategies and queueing mechanisms.
+1. Select your deployed agent in the My Resources view
+1. Select **View Code**
+1. Choose your folder
+1. The extension generates code for authenticating, connecting, sending messages, and processing responses
 
-**Invalid inputs** - Validate user inputs before sending to agents, filtering malicious content or formatting issues.
+## Integration patterns
 
-## Updating production agents
+Common patterns for integrating published agents include:
 
-As requirements evolve, you'll need to update deployed agents:
+- **Web applications** - Send user messages to the Responses API endpoint and display responses in your UI. Store conversation history client-side for multi-turn interactions.
+- **API-driven workflows** - Call the agent endpoint from backend services triggered by events or schedules. Process responses programmatically to drive downstream actions.
+- **Chatbot interfaces** - Map user sessions to conversations. Handle real-time message exchange through the endpoint.
+- **Background automation** - Schedule agent calls for recurring tasks. Feed system data into agents and process outputs to update business systems.
 
-1. Make changes in your development environment
-1. Test thoroughly before deploying updates
-1. Deploy updates during low-traffic periods when possible
-1. Monitor for issues after deployment
-1. Have rollback plans if updates cause problems
+## Production considerations
 
-The agent ID remains constant across updates, so existing integrations continue working with updated behavior.
+Running agents in production requires attention to several operational areas:
 
-Testing, deploying, and integrating agents transforms development efforts into production value. By following systematic testing approaches, leveraging integrated deployment tools, and implementing robust integration patterns, you can confidently deliver AI agents that enhance your applications and automate workflows while maintaining enterprise-grade reliability and security.
+- **Monitoring** - Track response times, tool invocation success rates, error patterns, and token consumption using Application Insights integration.
+- **Security** - Use managed identities for authentication, apply least-privilege access, and define data retention policies.
+- **Cost management** - Monitor token usage, set response length limits, and implement rate limiting to prevent unexpected spikes.
+- **Error handling** - Implement retry logic with exponential backoff for transient failures. Handle rate limiting with backoff strategies. Validate inputs before sending to agents.
+- **Conversation management** - Agent Application endpoints currently support only the stateless Responses API. Store conversation history in your client for multi-turn experiences.