Skip to content

Commit 0f54ab7

Browse files
authored
Merge pull request #314796 from mukesh-dua/release-microsoft-discovery
Add overview of key scenarios and use cases for Microsoft Discovery in scientific R&D. Adding Observability articles to help with troubleshooting issues in environment
2 parents 76255ea + 8a49867 commit 0f54ab7

20 files changed

Lines changed: 2831 additions & 285 deletions

articles/microsoft-discovery/concept-network-security.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,16 +15,16 @@ Microsoft Discovery provides two layers of network security to protect your work
1515

1616
| Layer | What it protects | How it works |
1717
|-------|-----------------|--------------|
18-
| **Network hardening** | Managed resources databases, storage, AI services, and other backend services | Network Security Perimeters (NSP) and private endpoints of MRG resources restrict access to authorized Discovery components only |
18+
| **Network hardening** | Managed resources like databases, storage, AI services, and other backend services | Network Security Perimeters (NSP) and private endpoints of Managed Resource Groups (MRG) resources restrict access to authorized Discovery components only |
1919
| **Private endpoints** | Workspace and bookshelf data-plane APIs | Azure Private Link routes API traffic through the Azure backbone, eliminating public internet exposure |
2020

2121
Network hardening is enabled by default for all workspaces and bookshelves managed with the `2026-02-01-preview` API version and later. Private endpoints for data-plane access are optional and can be configured separately.
2222

2323
## Why network security matters
2424

25-
When you create a Microsoft Discovery workspace or bookshelf, the service provisions managed resources (databases, storage accounts, AI services) on your behalf. During Private Preview, these resources had public endpoints and data-plane API traffic traversed the public internet.
25+
When you create a Microsoft Discovery workspace or bookshelf, the service provisions managed resources (databases, storage accounts, AI services) on your behalf. During early Preview, these resources had public endpoints and data-plane API traffic traversed the public internet.
2626

27-
With network hardening enabled by default, all managed resources are now protected automatically. Enabling private endpoints for data-plane access provides additional security:
27+
With network hardening enabled by default, all managed resources are now protected automatically. Enabling private endpoints for data-plane access provides extra security:
2828

2929
- **Data protection** - All traffic stays on the Azure backbone network, never traversing the public internet.
3030
- **Compliance** - Meet regulatory requirements for network hardening and private connectivity.
@@ -33,15 +33,15 @@ With network hardening enabled by default, all managed resources are now protect
3333

3434
## Before and after comparison
3535

36-
### Before: Private Preview (without network hardening)
36+
### Before: Public Preview (without network hardening)
3737

3838
:::image type="content" source="media/concept-network-security/before-network-isolation.jpg" alt-text="Diagram showing deployment without network hardening where traffic flows over public internet." lightbox="media/concept-network-security/before-network-isolation.jpg":::
3939

4040
### After: Public Preview (with network hardening)
4141

4242
:::image type="content" source="media/concept-network-security/after-network-isolation.jpg" alt-text="Diagram showing network-hardened deployment with private endpoints where traffic stays on Azure backbone." lightbox="media/concept-network-security/after-network-isolation.jpg":::
4343

44-
| Aspect | Without network hardening (Private Preview) | With network hardening (default) |
44+
| Aspect | Without network hardening (Early Preview) | With network hardening (default) |
4545
|--------|----------------------------------------------|----------------------------------|
4646
| Managed resources | Public endpoints | Locked behind NSP + private endpoints |
4747
| Data-plane traffic | Public internet | Azure backbone through Private Link |
@@ -102,11 +102,11 @@ Discovery resources support autoapproval for private endpoints created within th
102102
- Each Discovery resource (workspace, bookshelf, supercomputer) requires its own unique, non-overlapping subnets. Subnets can't be shared across different Discovery resource instances.
103103
- The supercomputer's AKS API server has a public FQDN. Workload traffic stays within the virtual network, but the Kubernetes API server endpoint is publicly accessible. Private cluster support is planned for a future release.
104104
- Managed resources that don't support NSP are protected through virtual network injection or delegated subnets instead.
105-
- Network hardening is supported in these regions: **East US**, **East US 2**, **UK South**, and **Sweden Central**.
105+
- Network hardening is supported in these regions: **East US**, **UK South**, and **Sweden Central**.
106106

107107
## Next steps
108108

109-
- [Configure network security](how-to-configure-network-security.md) Assign roles, configure subnets, and create private endpoints.
110-
- [End-to-end network-hardened deployment](how-to-deploy-network-hardened-stack.md) Deploy a fully network-isolated Discovery stack.
109+
- [Configure network security](how-to-configure-network-security.md) - Assign roles, configure subnets, and create private endpoints.
110+
- [End-to-end network-hardened deployment](how-to-deploy-network-hardened-stack.md) - Deploy a fully network-isolated Discovery stack.
111111
- [What is Azure Private Link?](/azure/private-link/private-link-overview)
112112
- [What is a Network Security Perimeter?](/azure/private-link/network-security-perimeter-concepts)
Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
---
2+
title: Observability in Microsoft Discovery
3+
description: Learn about the observability capabilities available in Microsoft Discovery, including application logs stored in Managed Resource Group Log Analytics workspaces, Azure activity logs for control plane operations, and correlation ID tracing.
4+
author: mukesh-dua
5+
ms.author: mukeshdua
6+
ms.service: azure
7+
ms.topic: concept-article
8+
ms.date: 04/15/2026
9+
---
10+
11+
# Observability in Microsoft Discovery
12+
13+
Microsoft Discovery integrates with Azure Monitor to provide comprehensive observability across all platform resources. You can monitor and troubleshoot workspaces, supercomputers, and bookshelves by querying application logs in dedicated Log Analytics workspaces and reviewing activity logs for control plane operations.
14+
15+
## Log types
16+
17+
Microsoft Discovery surfaces three categories of logs:
18+
19+
| Log type | Source | Configuration | Purpose |
20+
|---|---|---|---|
21+
| **Application logs** | Log Analytics workspace in Managed Resource Group | Automatic (no setup required) | Trace agent execution, tool runs, indexing jobs, and query operations within the platform |
22+
| **Activity logs** | Azure Monitor | Automatic (available by default in Azure Monitor) | Audit and trace control plane operations on Discovery resources (create, update, delete) |
23+
| **Audit logs** | Azure Storage Account or Log Analytics workspace (customer-owned) | Customer-configurable via diagnostic settings | Archive or query platform and resource-level logs for compliance, security auditing, and long-term retention |
24+
25+
## Managed Resource Group and Log Analytics workspaces
26+
27+
When you create a Microsoft Discovery resource like a workspace, supercomputer, or bookshelf, Microsoft Discovery service provisions a **Managed Resource Group (MRG)** alongside the resource in your subscription. The MRG contains the managed infrastructure required to operate the resource, including a dedicated **Log Analytics workspace** that automatically collects application logs.
28+
29+
Logs flow into the MRG Log Analytics workspace without any other configuration. Each Discovery resource type (a workspace, supercomputer, or bookshelf), has its own MRG and its own Log Analytics workspace, which keeps log data isolated and scoped to the corresponding resource.
30+
31+
> [!NOTE]
32+
> The MRG is managed by Microsoft. You can read data from its Log Analytics workspace, but you shouldn't modify or delete resources within the MRG.
33+
34+
## Application log tables by resource
35+
36+
The following tables describe the log tables available in each resource's MRG Log Analytics workspace.
37+
38+
### Workspace
39+
40+
| Table | Description |
41+
|---|---|
42+
| `DiscoveryLogs_CL` | Execution traces for agent invocations, tool calls, workflow steps, and error diagnostics for investigations run within the workspace |
43+
| `DiscoveryCogLoopLogs` | AI orchestration logs from the CogLoop engine, including Act and Cognition sub-loop activity, investigation progress, decisions, and error diagnostics |
44+
45+
### Supercomputer
46+
47+
| Table | Description |
48+
|---|---|
49+
| `KubePodInventory_CL` | Tracks pod creation, placement, and lifecycle state across the cluster |
50+
| `KubeEvents_CL` | Captures Kubernetes events such as scheduling failures, volume mount errors, and pod state transitions |
51+
| `Syslog_CL` | Records OS-level signals related to node health and underlying infrastructure issues |
52+
| `SystemContainerLogs_CL` | Logs from platform-managed controllers and orchestration services |
53+
| `DiscoveryToolLogs_CL` | Application logs from tool containers, including execution output and failures |
54+
| `DiscoveryBookshelfLogs_CL` | Logs from bookshelf indexing jobs running on the supercomputer |
55+
56+
### Bookshelf
57+
58+
| Table | Description |
59+
|---|---|
60+
| `DiscoveryLogs_CL` | Logs from the knowledgebase query container, including query execution traces and error diagnostics |
61+
| `AzureDiagnostics` | Diagnostic logs from the Azure AI Search service associated with the bookshelf, including indexing operations, query operations, and service-level diagnostics |
62+
63+
> [!NOTE]
64+
> Bookshelf indexing jobs run on the supercomputer, so indexing logs appear in the supercomputer's MRG under `DiscoveryBookshelfLogs_CL`, not the bookshelf's MRG.
65+
66+
The `AzureDiagnostics` table is populated by the Azure AI Search resource provisioned in the bookshelf's MRG.
67+
68+
## Activity logs
69+
70+
Azure Activity Logs record all **control plane operations** performed on Microsoft Discovery resources. These are write and delete operations initiated by users or automated processes through the Azure Resource Manager (ARM) API—for example, creating a workspace, updating a supercomputer configuration, or deleting a bookshelf.
71+
72+
Activity logs are available in Azure Monitor and are retained for **90 days** by default. You can extend retention by exporting activity logs to a Log Analytics workspace or storage account via diagnostic settings.
73+
74+
Common Discovery control plane operations visible in activity logs include:
75+
76+
| Operation | Description |
77+
|---|---|
78+
| `Microsoft.Discovery/workspaces/write` | Create or update a workspace |
79+
| `Microsoft.Discovery/workspaces/delete` | Delete a workspace |
80+
| `Microsoft.Discovery/supercomputers/write` | Create or update a supercomputer |
81+
| `Microsoft.Discovery/supercomputers/delete` | Delete a supercomputer |
82+
| `Microsoft.Discovery/bookshelves/write` | Create or update a bookshelf |
83+
| `Microsoft.Discovery/bookshelves/delete` | Delete a bookshelf |
84+
85+
For instructions on accessing activity logs, see [View activity logs for Microsoft Discovery resources](how-to-view-activity-logs.md).
86+
87+
## Audit logs
88+
89+
Audit logs are customer-configurable logs that you export from Microsoft Discovery resources to an Azure Storage account or a Log Analytics workspace in your own subscription. Unlike application logs (which are automatically collected in the MRG) and activity logs (which are always available in Azure Monitor), audit logs must be explicitly enabled through Azure Monitor **diagnostic settings**.
90+
91+
You can choose to export all available log categories or audit logs only. Audit logs are well suited for:
92+
93+
- **Compliance requirements** - Retain resource-level platform logs for a defined period to satisfy regulatory or organizational policies.
94+
- **Security auditing** - Store logs in a destination you control to support security reviews and incident investigations.
95+
- **Long-term retention** - Extend log retention beyond the 90-day default of Azure Monitor activity logs.
96+
97+
Audit logging is supported on the following Discovery resource types:
98+
99+
| Resource type | Resource provider path |
100+
|---|---|
101+
| Workspace | `Microsoft.Discovery/workspaces` |
102+
| Bookshelf | `Microsoft.Discovery/bookshelves` |
103+
| Supercomputer | `Microsoft.Discovery/supercomputers` |
104+
105+
For step-by-step setup instructions, see [Enable audit logging for Microsoft Discovery resources](how-to-enable-audit-logging.md).
106+
107+
## Correlation IDs
108+
109+
When an operation is performed in Discovery Studio, such as running an investigation, the platform assigns a **correlation ID** that uniquely identifies the request. This ID is propagated across all system components involved in processing that request, and every related log entry in the workspace's `DiscoveryLogs_CL` table includes the correlation ID in the `CorrelationId` field.
110+
111+
Using a correlation ID, you can trace the complete end-to-end flow of a request: when it started, which agents and tools were invoked, any errors that occurred, and when it completed.
112+
113+
Correlation IDs are returned in the **`X-Ms-Correlation-Request-Id`** response header of API calls made by Discovery Studio. You can retrieve them from the browser's developer tools Network tab.
114+
115+
For a step-by-step walkthrough of how to locate and use a correlation ID, see [Query workspace logs](how-to-query-workspace-logs.md).
116+
117+
For control plane issues, check the Activity logs for a correlation ID associated with the failed operation. Include this correlation ID when opening a support request with Microsoft Support to help expedite investigation.
118+
119+
## Log ingestion latency
120+
121+
Logs are typically available in Log Analytics within a few seconds of the event. In rare cases, ingestion can be delayed by up to five minutes. When troubleshooting recent activity, wait a moment and refresh your query if you don't see expected log entries.
122+
123+
## Next steps
124+
125+
- [Access resource logs for Microsoft Discovery resources](how-to-access-resource-logs.md)
126+
- [Query workspace logs](how-to-query-workspace-logs.md)
127+
- [Query CogLoop logs](how-to-query-cognitive-loop-logs.md)
128+
- [Query supercomputer logs](how-to-query-supercomputer-logs.md)
129+
- [Query bookshelf knowledgebase query logs](how-to-query-bookshelf-query-logs.md)
130+
- [Query bookshelf indexing logs](how-to-query-bookshelf-indexing-logs.md)
131+
- [View activity logs for Microsoft Discovery resources](how-to-view-activity-logs.md)
132+
- [Enable audit logging for Microsoft Discovery resources](how-to-enable-audit-logging.md)

articles/microsoft-discovery/concept-quota-reservation.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -62,9 +62,9 @@ To learn more, see [Request units in Azure Cosmos DB](/azure/cosmos-db/request-u
6262

6363
### Cosmos DB account quota behavior
6464

65-
- There is no per-subscription quota limit on RU/s.
65+
- There's no per-subscription quota limit on RU/s.
6666
- Throughput availability is managed per Cosmos DB account.
67-
- Discovery Platform manage the Cosmos DB, which uses throughput within the default assignment range.
67+
- Discovery Platform manages the Cosmos DB, which uses throughput within the default assignment range.
6868
- If there's a quota issue due to region-level restrictions (for example, a high-demand region), [raise a support ticket](/azure/cosmos-db/nosql/create-support-request-quota-increase) to request the appropriate extension.
6969

7070
For more information, see [Azure Cosmos DB service quotas](/azure/cosmos-db/concepts-limits?source=recommendations).
@@ -96,7 +96,7 @@ Azure OpenAI and Azure AI Foundry quotas are essential for Microsoft Discovery's
9696

9797
### Model TPM requirements summary
9898

99-
The following table shows the total TPM required per model for a workspace with a single Bookshelf instance. For each additional Bookshelf, add the corresponding per-instance Bookshelf TPM values. For detailed per-service breakdowns, see [Chat completion and text embedding model quotas](#chat-completion-and-text-embedding-model-quotas).
99+
The following table shows the total TPM required per model for a workspace with a single Bookshelf instance. For each Bookshelf, add the corresponding per-instance Bookshelf TPM values. For detailed per-service breakdowns, see [Chat completion and text embedding model quotas](#chat-completion-and-text-embedding-model-quotas).
100100

101101
| Model | Total minimum TPM | Total recommended TPM | Contributing services |
102102
|---|---|---|---|
@@ -183,7 +183,7 @@ Bookshelf deploys always-on infrastructure for domain specific knowledge search
183183

184184
#### Scaling behavior
185185

186-
- **Indexing (Text Embedding)**—Scales with dataset size. Large datasets across multiple Bookshelves might require millions of TPM. High embedding quota is generally easy to obtain.
186+
- **Indexing (Text Embedding)**—Scales with dataset size. Large datasets across multiple Bookshelves might require millions of TPM. High embedding quota is easy to obtain.
187187
- **Querying (GPT models)**—Independent of dataset size. Driven by concurrent users, search frequency, and relevance budget. Quota is shared at the subscription and region level across all Bookshelves.
188188
- **Fixed infrastructure**—Azure AI Search, Azure Container Apps dedicated profile, and Azure SQL DB are always-on resources created at Bookshelf deployment.
189189
- **Variable components**—Enrichment processing, embedding generation, and model inference scale with usage.
@@ -220,10 +220,10 @@ Before requesting quotas, verify regional availability:
220220

221221
```azurecli
222222
# Check VM quota availability by region
223-
az vm list-usage --location "eastus2" --query "[?contains(name.value, 'cores')]"
223+
az vm list-usage --location "swedencentral" --query "[?contains(name.value, 'cores')]"
224224
225225
# Check Azure OpenAI model availability
226-
az cognitiveservices model list --location "eastus2" --kind "OpenAI"
226+
az cognitiveservices model list --location "swedencentral" --kind "OpenAI"
227227
```
228228

229229
## Quota request best practices

0 commit comments

Comments
 (0)