Merge pull request #309103 from craigshoemaker/sre/memory-system

denrea · web-flow · commit f7893bc583a5 · 2025-12-19T09:34:01.000-08:00
[SRE Agent] New: Memory system
diff --git a/articles/sre-agent/documentation-connector.md b/articles/sre-agent/documentation-connector.md
@@ -0,0 +1,45 @@
+---
+title: Documentation Connector in Azure SRE Agent Preview
+description: Discover how the Azure SRE Agent documentation connector enables automated crawling, semantic search, and wide file format support for Azure DevOps repositories.
+author: craigshoemaker
+ms.author: cshoe
+ms.reviewer: cshoe
+ms.date: 12/18/2025
+ms.topic: article
+ms.service: azure-sre-agent
+---
+
+# Documentation connector in Azure SRE Agent preview
+
+The Azure SRE Agent documentation connector automatically crawls Azure DevOps repositories to index troubleshooting guides, runbooks, and documentation for agent retrieval.
+
+### Key features
+
+- **Automated crawling**: Runs every 24 hours without manual intervention
+
+- **Wide file format support**: Indexes `.md`, `.txt`, `.rst`, `.adoc`, `.html`, `.json`, `.yaml`, `.yml`, `.xml`, `.csv`, and more
+
+- **Azure DevOps integration**: Connects to Git repositories using managed identity
+
+- **Semantic search**: Documents are chunked, embedded, and indexed for AI-powered retrieval
+
+### Prerequisites
+
+Before setting up a documentation connector:
+
+- Azure DevOps repository containing documentation
+- Managed identity configured for the agent (User-Assigned or System-Assigned)
+- Repository read access granted to the managed identity
+
+### Setup
+
+1. In the portal, go to **Settings** > **Basics** and note the managed identity name.
+1. In Azure DevOps, add the managed identity as a user with **Basic** access level.
+1. Grant **Read** permission on the target repository.
+1. Go to **Settings** > **Connectors** and select **Add connector**.
+1. Select **Documentation connector**, enter the repository URL, and select the managed identity.
+1. The connector starts indexing right away.
+
+## Related content
+
+- [Memory system](./memory-system.md)
diff --git a/articles/sre-agent/media/memory-system/azure-sre-agent-memory-system-loop.png b/articles/sre-agent/media/memory-system/azure-sre-agent-memory-system-loop.png
diff --git a/articles/sre-agent/memory-system.md b/articles/sre-agent/memory-system.md
@@ -0,0 +1,308 @@
+---
+title: Memory System in SRE Agent Preview
+description: Use the SRE Agent memory system to build team knowledge that agents retrieve during incident handling, enabling context-aware responses that improve over time.
+author: craigshoemaker
+ms.author: cshoe
+ms.reviewer: cshoe
+ms.date: 12/18/2025
+ms.topic: article
+ms.service: azure-sre-agent
+ms.collection: ce-skilling-ai-copilot
+#customer intent: As an SRE team member, I want to understand how the memory system works so I can add knowledge that helps agents provide better responses during incident handling.
+---
+
+# Memory system in SRE Agent preview
+
+The SRE Agent memory system gives agents the knowledge they need to troubleshoot effectively. By adding runbooks, team standards, and service-specific context, you help agents provide better answers during incidents. The system learns from each session to improve over time.
+
+## Memory components
+
+The memory system consists of four complementary components:
+
+| Component | Purpose | Setup | Best for |
+|-----------|---------|-------|----------|
+| **User Memories** | Quick chat commands for team knowledge | Instant (chat commands) | Team standards, service configurations, workflow patterns |
+| **Knowledge Base** | Direct document uploads for runbooks | Quick (file upload) | Static runbooks, troubleshooting guides, internal documentation |
+| **Documentation connector** | Automated Azure DevOps synchronization | Configuration required | Living documentation, frequently updated guides |
+| **Session insights** | Agent-generated memories from sessions | Automatic | Learned troubleshooting patterns, past incident resolutions |
+
+### How agents retrieve memory
+
+During conversations, agents retrieve information from memory sources through configured tools.
+
+:::image type="content" source="media/memory-system/azure-sre-agent-memory-system-loop.png" alt-text="Diagram of the Azure SRE Agent memory system loop.":::
+
+<!--
+```mermaid
+flowchart TD
+    subgraph Trigger
+        A[User Question / Incident / Scheduled Task]
+    end
+    
+    subgraph Memory Sources
+        B[User Memories<br/>chat commands]
+        C[Knowledge Base<br/>documents]
+        D[Documentation Connector<br/>ADO repos]
+        E[Session Insights<br/>auto-generated]
+    end
+    
+    subgraph Retrieval
+        F[SearchMemory Tool]
+    end
+    
+    A -- > B & C & D & E
+    B & C & D & E -- > F
+    F -- > G[Agent Reasoning]
+    G -- > H[Relevant Context Retrieved]
+    H -- > I[Agent Response]
+```
+-->
+
+### Tool configuration
+
+The `SearchMemory` tool retrieves all memory components. It searches across user memories, knowledge base, session insights, and documentation connector simultaneously.
+
+- SRE Agent (default): `SearchMemory` is built in
+- Custom subagents: Add `SearchMemory` tool to your configuration
+
+> [!IMPORTANT]
+> Don't store secrets, credentials, API keys, or sensitive data in any memory component. Memories are shared across your team and indexed for search.
+
+## Quick start
+
+Begin by establishing foundational knowledge with user memories, and then expand to document storage and automated synchronization as your needs grow.
+
+### 1. Start with user memories
+
+Use chat commands to save immediate team knowledge:
+
+```text
+#remember Team owns services: app-service-prod, redis-cache-prod, and sql-db-prod
+
+#remember For latency issues, check Redis cache health first
+
+#remember Production deployments happen Tuesdays at 2 PM PST
+```
+
+These facts are now available across all conversations.
+
+### 2. Upload key documents
+
+Add critical runbooks and guides to the knowledge base:
+
+1. Open your SRE Agent in the Azure portal.
+
+1. Go to **Settings** > **Knowledge base**.
+
+1. Select **Add file** or drag and drop files into the upload area.
+
+1. Upload `.md` or `.txt` files (up to 16 MB each).
+
+1. The system indexes files and makes them available for retrieval through `SearchMemory`.
+
+### 3. Review session insights
+
+After troubleshooting sessions, check **Settings** > **Session insights** to see what went well and where the agent needs more context. Use the insights to identify knowledge gaps and add targeted memories or documentation.
+
+### 4. Connect repositories (optional)
+
+For teams with existing documentation in Azure DevOps:
+
+1. Go to **Settings** > **Connectors**.
+
+1. Select **Add connector** and select **Documentation connector**.
+
+1. Enter your Azure DevOps repository URL and select a managed identity.
+
+    The connector starts indexing automatically.
+
+## User memories
+
+User memories let you save team facts, standards, and context that agents remember across all conversations. By using simple chat commands (`#remember`, `#forget`, `#retrieve`), you can build a persistent knowledge base that automatically enhances agent responses.
+
+### Chat commands
+
+#### Save information by using `#remember`
+
+Save facts, standards, or context for future conversations.
+
+**Syntax:**
+
+```text
+#remember [content to save]
+```
+
+**Examples:**
+
+```text
+#remember Team owns app-service-prod in East US region
+#remember For app-service-prod latency issues, check Redis cache health first
+#remember Team uses Kusto for logs. Workspace is "myteam-prod-logs"
+```
+
+Content is embedded by using OpenAI, stored in Azure AI Search, and becomes available for automatic retrieval across all conversations. You see a confirmation: `✅ Agent Memory saved.`
+
+#### Remove memories by using `#forget`
+
+Delete previously saved memories by searching for them.
+
+**Syntax:**
+
+```text
+#forget [description of what to forget]
+```
+
+**Examples:**
+
+```text
+#forget NSG rules information
+#forget production environment location
+```
+
+The system searches your memories semantically for the best match, shows you the content, and deletes it. You see a confirmation: `✅ Agent Memory forgotten: [deleted content]`
+
+#### Query memories by using `#retrieve`
+
+Explicitly search and display saved memories without triggering agent reasoning.
+
+**Syntax:**
+
+```text
+#retrieve [search query]
+```
+
+**Examples:**
+
+```text
+#retrieve production environment
+#retrieve deployment process
+```
+
+Searches memories semantically, and then uses the top five matches to synthesize a response. Both the individual memories and the synthesized answer are displayed.
+
+### Scope and storage
+
+- **Shared across the team**: All users of the SRE Agent can access it.
+
+- **Persist across all conversations**: Save it once, and it's available forever.
+
+- **Automatically retrieved when relevant**: Agents search memories semantically during reasoning.
+
+## Knowledge base
+
+The knowledge base provides direct document upload capabilities for runbooks, troubleshooting guides, and internal documentation that agents can retrieve during conversations.
+
+### Supported file types and limits
+
+- **Formats**: `.md` (markdown, recommended), `.txt` (plain text)
+- **Per file**: 16 MB maximum (Azure AI Search limit)
+- **Per request**: 100 MB total for all files in a single upload
+
+### Upload documents
+
+1. Go to **Settings** > **Knowledge Base**.
+1. Select **Add file** or drag and drop files into the upload area.
+
+    The portal automatically validates, uploads, and indexes files.
+
+### Manage documents
+
+- **View**: Go to **Settings** > **Knowledge Base** to see all uploaded documents.
+
+- **Update**: To overwrite the previous version, upload a file with the same name.
+
+- **Delete**: Select documents and use the delete action. Changes take effect immediately.
+
+## Session insights
+
+As the agent handles your incidents, it learns. Session insights capture what worked, what didn't, and key learnings from each session. The agent automatically applies that knowledge to help with similar issues in the future.
+
+### Automatic improvement
+
+The agent learns from every session without any manual effort:
+
+* The agent handles an issue autonomously or works with you directly.
+* The agent captures symptoms, resolution steps, root cause, and pitfalls.
+* These insights become searchable memories.
+* Future sessions automatically retrieve relevant past insights.
+
+The result: the agent gets better over time, suggesting proven resolutions and avoiding known pitfalls.
+
+### Discover opportunities
+
+While session insights work automatically, reviewing them can surface valuable patterns you might want to act on.
+
+| Pattern you might discover | Potential action |
+|---------------------------|------------------|
+| Same issue keeps recurring | Fix the underlying code or configuration |
+| Agent lacks context about your service | Create a custom subagent with domain knowledge |
+| Troubleshooting steps aren't documented | Update or create a runbook |
+| Telemetry gaps made diagnosis harder | Improve logging or add metrics |
+| Alert triggered but wasn't actionable | Tune the alert or add runbook links |
+
+Think of session insights as a window into what the agent learns. You might find something worth acting on, or you might just let the agent handle any surfaced issues.
+
+### How it works
+
+Session insights create a continuous improvement loop: the agent captures symptoms, steps, root cause, and pitfalls from each session, then retrieves relevant past insights when similar issues arise. This automatic cycle helps the agent resolve problems faster over time.
+
+<!--
+```mermaid
+flowchart TD
+    subgraph Loop["Automatic Learning Loop"]
+        A["Issue arises<br/>Incident, alert, or question"] -- > B["Agent captures insight<br/>symptoms, steps, root cause,<br/>pitfalls, learnings"]
+        B -- > C["Insight indexed<br/>Becomes searchable memory"]
+        C -- > D["Future sessions benefit<br/>Agent retrieves relevant insights"]
+        D -.- >|Similar issue arises| A
+    end
+    
+    Loop -- > E["Automatic: Agent improves over time"]
+    Loop -- > F["Optional: Review insights for<br/>code/telemetry/runbook opportunities"]
+```
+-->
+
+:::image type="content" source="media/memory-system/azure-sre-agent-memory-system-loop.png" alt-text="Diagram of Azure SRE Agent memory system loop.":::
+
+### What the agent captures
+
+The agent captures series of data points from each session to improve future troubleshooting.
+
+| Captured | How the agent uses it |
+|----------|----------------------|
+| **Symptoms observed** | Recognizes similar patterns in future problems |
+| **Steps that worked** | Suggests proven resolution paths |
+| **Root cause found** | Jumps to likely causes faster |
+| **Pitfalls encountered** | Avoids repeating mistakes |
+| **Context you provided** | Remembers facts about your environment |
+| **Resources involved** | Connects past problems on same resources |
+
+### When insights are generated
+
+The system generates insights automatically after conversations finish, or you can request them on-demand.
+
+- **Automatically**: After conversations finish (runs periodically, approximately every 30 minutes)
+- **On-demand**: Select **Generate Session insights** in the chat footer for immediate results (about 30 seconds)
+
+### Browse insights
+
+Go to **Settings** > **Session insights** to see what the agent learned:
+
+- **Total count** in the header
+- **List of insights** with session title and timestamp
+- **Detail view** with expandable Timeline and Agent Performance sections
+- **Go to Thread** to revisit the original conversation
+
+> [!NOTE]
+> While periodic manual browsing of insights can surface recurring patterns worth addressing, the agent benefits from these insights whether you review them or not.
+
+### Insight structure
+
+Each insight includes:
+
+- **Timeline**: Chronological milestones of the troubleshooting session (up to eight)
+- **Agent Performance**: What went well, areas for improvement, and key learnings
+- **Investigation quality score**: 1-5 rating for investigation completeness
+
+## Related content
+
+- [Documentation connector](./documentation-connector.md)
diff --git a/articles/sre-agent/toc.yml b/articles/sre-agent/toc.yml
@@ -19,6 +19,8 @@ items:
     items:
       - name: Connectors overview
         href: connectors.md
+      - name: Documentation connector
+        href: documentation-connector.md
       - name: Connect to custom MCP server
         href: custom-mcp-server.md
   - name: Build custom subagents
@@ -33,6 +35,8 @@ items:
     href: code-repository-connect.md
   - name: Scheduled tasks
     href: scheduled-tasks.md
+  - name: Knowledge retention
+    href: memory-system.md
   - name: Incident management
     items:
       - name: Overview