Skip to content

Commit ba88d80

Browse files
author
Mukesh Dua
committed
Refactor CogLoop logs documentation: standardize bullet point formatting for consistency and clarity
1 parent 88bd779 commit ba88d80

1 file changed

Lines changed: 60 additions & 43 deletions

File tree

articles/microsoft-discovery/how-to-query-cognitive-loop-logs.md

Lines changed: 60 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,10 @@ ms.date: 04/15/2026
1414

1515
Microsoft Discovery CogLoop is the AI orchestration engine that drives investigation progress. Cognition Engine logs capture:
1616

17-
- **Instance lifecycle** Cognition Engine instance start, stop, and polling activity
18-
- **Reasoning decisions** Thinking module (fast/slow) and acting module tool selections
19-
- **Task management operations** Task execution, validation, status transitions, and agent assignments
20-
- **Error diagnostics** Serialization failures, Cosmos DB connectivity issues, loop errors, and tool call failures
17+
- **Instance lifecycle** - Cognition Engine instance start, stop, and polling activity
18+
- **Reasoning decisions** - Thinking module (fast/slow) and acting module tool selections
19+
- **Task management operations** - Task execution, validation, status transitions, and agent assignments
20+
- **Error diagnostics** - Serialization failures, Cosmos DB connectivity issues, loop errors, and tool call failures
2121

2222
It continuously runs two subloops - **Act** and **Cognition** to plan and execute research tasks on your behalf. CogLoop logs are automatically stored in the `DiscoveryCogLoopLogs_CL` table in the Log Analytics workspace inside the workspace's Managed Resource Group (MRG).
2323

@@ -72,7 +72,7 @@ The `DiscoveryCogLoopLogs_CL` table includes the following key fields:
7272
| `InstanceId` | Cognition Engine instance identifier (format: `cog:<project>:<investigation>`) |
7373
| `ModuleName` | Reasoning module name (`Cognition` or `Act`) |
7474
| `ChosenTool` | The tool/function selected by the PickBest decision engine |
75-
| `ClassName` | Source class name (e.g., `CogLoopInstanceManager`, `CosmosDbService`) |
75+
| `ClassName` | Source class name (for example, `CogLoopInstanceManager`, `CosmosDbService`) |
7676
| `MethodName` | Source method name |
7777
| `Goal` | The reasoning prompt goal submitted to the PickBest engine |
7878
| `SleepTime` | Wait duration in seconds when the Cognition Engine decides to wait |
@@ -130,10 +130,10 @@ DiscoveryCogLoopLogs_CL
130130
Lists every Cognition Engine instance (investigation) managed by the service and classifies their activity into
131131
three operation types.
132132

133-
- **InstanceId** The investigation identifier (format: cog:`<project>:<investigation>`).
134-
- **StatusChecks** Routine polling checks. The service checks all known instances each cycle, so this count is typically similar across instances.
135-
- **Starts** How many times the instance was started. Instances with Starts > 0 were actively launched during the time window.
136-
- **Retrievals** How many times instance state was fetched for execution. Indicates the Cognition Engine actively engaged with this investigation.
133+
- **InstanceId** - The investigation identifier (format: cog:`<project>:<investigation>`).
134+
- **StatusChecks** - Routine polling checks. The service checks all known instances each cycle, so this count is typically similar across instances.
135+
- **Starts** - How many times the instance was started. Instances with Starts > 0 were actively launched during the time window.
136+
- **Retrievals** - How many times instance state was fetched for execution. Indicates the Cognition Engine actively engaged with this investigation.
137137

138138
```kql
139139
DiscoveryCogLoopLogs_CL
@@ -189,14 +189,11 @@ DiscoveryCogLoopLogs_CL
189189

190190
### Track Instance Startup
191191

192-
Retrieves the full chronological log trail for a specific Cognition Engine instance, showing every event from
193-
first appearance through its reasoning cycles. Replace `<your-instance-id>` with the target instance (e.g.,
194-
`cog:myproject:inv01-experiment-abc123`).
192+
Retrieves the full chronological log trail for a specific Cognition Engine instance, showing every event from first appearance through its reasoning cycles Replace `<your-instance-id>` with the target instance (for example, `cog:myproject:inv01-experiment-abc123`).
195193

196-
- **TimeGenerated** — When the event occurred, sorted oldest-first to reconstruct the sequence of events.
197-
- **LogLevel** — Severity level, useful for spotting where errors or warnings interrupted the instance lifecycle.
198-
- **Message** — The log message content, showing startup steps, reasoning decisions, task operations, and any
199-
failures in order.
194+
- **TimeGenerated** - When the event occurred, sorted oldest-first to reconstruct the sequence of events.
195+
- **LogLevel** - Severity level, useful for spotting where errors or warnings interrupted the instance lifecycle.
196+
- **Message** - The log message content, showing startup steps, reasoning decisions, task operations, and any failures in order.
200197

201198
```kql
202199
DiscoveryCogLoopLogs_CL
@@ -252,11 +249,9 @@ DiscoveryCogLoopLogs_CL
252249

253250
If the Cognition Engine repeatedly selects `Cognition-Wait`, it can mean tasks are stalled, all work is already complete, or an internal error is preventing progress.
254251

255-
- **IdlePct** — Percentage of waits where nothing changed. Sustained 100%
256-
signals a stuck investigation.
257-
- **AvgSleepSec** — Short sleeps (30s) mean cognition expects progress soon;
258-
long sleeps (300s) mean it has stopped trying.
259-
- **SampleReason** — The LLM's own explanation
252+
- **IdlePct** - Percentage of waits where nothing changed. Sustained 100% signals a stuck investigation.
253+
- **AvgSleepSec** - Short sleeps (30s) mean cognition expects progress soon; long sleeps (300s) mean it has stopped trying.
254+
- **SampleReason** - The LLM's own explanation
260255

261256
```kql
262257
DiscoveryCogLoopLogs_CL
@@ -282,9 +277,9 @@ DiscoveryCogLoopLogs_CL
282277

283278
Trace the reasoning steps the Cognition Engine takes before acting. Slow thinking indicates complex deliberation; fast thinking indicates straightforward decisions.
284279

285-
- **TimeGenerated** When the thinking step completed.
286-
- **ThinkingType** `FastThinking` or `SlowThinking`, indicating the depth of reasoning applied.
287-
- **Message** The full thinking output, including the thought content and reasoning context.
280+
- **TimeGenerated** - When the thinking step completed.
281+
- **ThinkingType** - `FastThinking` or `SlowThinking`, indicating the depth of reasoning applied.
282+
- **Message** - The full thinking output, including the thought content and reasoning context.
288283

289284
```kql
290285
DiscoveryCogLoopLogs_CL
@@ -295,7 +290,7 @@ DiscoveryCogLoopLogs_CL
295290
| order by TimeGenerated desc
296291
```
297292

298-
> If `SlowThinking` entries dominate, the Cognition Engine is spending significant effort on complex decisions this may be expected for difficult investigations or could indicate unclear task definitions forcing repeated deep analysis.
293+
> If `SlowThinking` entries dominate, the Cognition Engine is spending significant effort on complex decisions, this may be expected for difficult investigations or could indicate unclear task definitions forcing repeated deep analysis.
299294
300295
## Task Management Operations
301296

@@ -312,7 +307,7 @@ DiscoveryCogLoopLogs_CL
312307
| order by TimeGenerated asc
313308
```
314309

315-
> This is the primary query for debugging a specific task it shows exactly how the Cognition Engine handled the task from creation to completion (or failure), making it easy to pinpoint where and why a task stalled or failed.
310+
> This is the primary query for debugging a specific task, it shows exactly how the Cognition Engine handled the task from creation to completion (or failure), making it easy to pinpoint where and why a task stalled or failed.
316311
317312
### View Task Validation Results
318313

@@ -330,7 +325,7 @@ DiscoveryCogLoopLogs_CL
330325
331326
### View TaskValidationAgent Lifecycle
332327

333-
Traces the full lifecycle of the TaskValidationAgent from provisioning and upsert through invocation and completion. Shows whether the validation agent was successfully created and is being used by the Cognition Engine.
328+
Traces the full lifecycle of the TaskValidationAgent from provisioning and upsert through invocation and completion. Shows whether the validation agent was successfully created and is being used by the Cognition Engine.
334329

335330
```kql
336331
DiscoveryCogLoopLogs_CL
@@ -340,7 +335,7 @@ DiscoveryCogLoopLogs_CL
340335
| order by TimeGenerated asc
341336
```
342337

343-
> If no entries appear, the TaskValidationAgent was never provisioned tasks will not be validated. If entries show errors during upsert or invocation, check that the required model deployment (e.g., `gpt-5-2`) is available in the workspace.
338+
> If no entries appear, the TaskValidationAgent was never provisioned, tasks will not be validated. If entries show errors during upsert or invocation, check that the required model deployment (e.g., `gpt-5-2`) is available in the workspace.
344339
345340
## Error Diagnostics
346341

@@ -360,9 +355,8 @@ DiscoveryCogLoopLogs_CL
360355

361356
Summarize errors by message to identify the most frequent failure modes.
362357

363-
- **ErrorMessage** — The first 80 characters of the error message, used as a grouping key to cluster similar
364-
errors together.
365-
- **ErrorCount** — How many times each error occurred. The highest counts point to the most impactful issue.
358+
- **ErrorMessage** - The first 80 characters of the error message, used as a grouping key to cluster similar errors together.
359+
- **ErrorCount** - How many times each error occurred. The highest counts point to the most impactful issue.
366360

367361
```kql
368362
DiscoveryCogLoopLogs_CL
@@ -372,7 +366,7 @@ DiscoveryCogLoopLogs_CL
372366
| order by ErrorCount desc
373367
```
374368

375-
> If one error type vastly outnumbers the rest, start your troubleshooting there it is likely the root cause. For example, a high count of `JsonException` serialization errors typically cascades into Cosmos DB health failures, polling cycle errors, and tool call failures downstream.
369+
> If one error type vastly outnumbers the rest, start your troubleshooting there, it is likely the root cause. For example, a high count of `JsonException` serialization errors typically cascades into Cosmos DB health failures, polling cycle errors, and tool call failures downstream.
376370
377371
### Detect Cosmos DB Connectivity Issues
378372

@@ -391,12 +385,9 @@ DiscoveryCogLoopLogs_CL
391385

392386
Isolates JSON serialization errors and groups them by message, including a sample stack trace for each. These errors typically prevent the Cognition Engine from loading instance state from Cosmos DB, blocking all reasoning activity.
393387

394-
- **ErrorMessage** — The first 80 characters of the error message, grouping related serialization failures
395-
together.
396-
- **Count** — How many times each serialization error occurred. High counts confirm this is a systemic issue
397-
rather than a one-off.
398-
- **SampleException** — A full exception with stack trace, showing the exact JSON path and property that failed
399-
to deserialize (e.g., `AuthorRole`).
388+
- **ErrorMessage** - The first 80 characters of the error message, grouping related serialization failures together.
389+
- **Count** - How many times each serialization error occurred. High counts confirm it's a systemic issue rather than a one-off.
390+
- **SampleException** - A full exception with stack trace, showing the exact JSON path and property that failed to deserialize (for example, `AuthorRole`).
400391

401392
```kql
402393
DiscoveryCogLoopLogs_CL
@@ -410,7 +401,7 @@ DiscoveryCogLoopLogs_CL
410401
| order by Count desc
411402
```
412403

413-
> Serialization errors are often the root cause behind cascading failures. When instance state cannot be deserialized, it triggers downstream errors: Cosmos DB health check failures, polling cycle errors, and tool call failures. Use the `SampleException` to identify the specific schema mismatch this typically happens after a service upgrade that changes model schemas.
404+
> Serialization errors are often the root cause behind cascading failures. When instance state cannot be deserialized, it triggers downstream errors: Cosmos DB health check failures, polling cycle errors, and tool call failures. Use the `SampleException` to identify the specific schema mismatch, this typically happens after a service upgrade that changes model schemas.
414405
415406
### Detect Polling Cycle Failures
416407

@@ -426,7 +417,7 @@ DiscoveryCogLoopLogs_CL
426417

427418
### Error Timeline for Incident Investigation
428419

429-
Correlate errors over time to identify when an incident started and whether it is ongoing.
420+
Correlate errors over time to identify when an incident started and whether it's ongoing.
430421

431422
```kql
432423
DiscoveryCogLoopLogs_CL
@@ -459,7 +450,7 @@ DiscoveryCogLoopLogs_CL
459450

460451
### Monitor Instance Retrieval Errors
461452

462-
Repeated failures in instance retrieval indicate the service cannot load investigation state from Cosmos DB.
453+
Repeated failures in instance retrieval indicate the service can't load investigation state from Cosmos DB.
463454

464455
```kql
465456
DiscoveryCogLoopLogs_CL
@@ -494,7 +485,7 @@ DiscoveryCogLoopLogs_CL
494485
| order by TimeGenerated desc
495486
```
496487

497-
The `reasoning` field explains exactly why CogLoop chose to wait or act, this is the most useful field for understanding investigation stalls.
488+
The `reasoning` field explains exactly why CogLoop chose to wait or act, it's the most useful field for understanding investigation stalls.
498489

499490
### Detect Act and Cognition subloop errors
500491

@@ -539,7 +530,7 @@ DiscoveryCogLoopLogs_CL
539530
540531
### Detect circuit breaker events
541532

542-
A circuit breaker opening means that repeated LLM API call failures have caused CogLoop to pause sending requests temporarily to prevent cascading failures.
533+
A circuit breaker opening means that repeated LLM API call failures caused CogLoop to pause sending requests temporarily to prevent cascading failures.
543534

544535
```kql
545536
DiscoveryCogLoopLogs_CL
@@ -570,6 +561,32 @@ DiscoveryCogLoopLogs_CL
570561
| CogLoop is waiting for a running tool task to complete | Check Supercomputer logs for the associated job. See [Query supercomputer logs](how-to-query-supercomputer-logs.md) |
571562
| Context window saturation reset failed | Contact your Discovery administrator |
572563

564+
### Serialization Errors Blocking All Instances
565+
566+
**Possible Causes:**
567+
568+
- A Cognition Engine instance has working memory state that can't be deserialized (for example, after a service upgrade that changes model schemas)
569+
- Corrupted instance data in Cosmos DB
570+
571+
**Resolution:**
572+
573+
1. Run the [Detect Serialization Errors](#detect-serialization-errors-jsonexception) query to confirm the error pattern
574+
2. Look at the `Exception` field for the specific JSON path and property that fails
575+
3. Escalate to the service team with the instance ID and exception details
576+
577+
### Tasks Not Being Validated
578+
579+
**Possible Causes:**
580+
581+
- TaskValidationAgent isn't deployed in the workspace
582+
- Model deployment required for validation (`gpt-5-2`) isn't available
583+
584+
**Resolution:**
585+
586+
1. Run the [View TaskValidationAgent Lifecycle](#view-taskvalidationagent-lifecycle) query to check if the agent was created
587+
2. Look for upsert failures or provisioning errors
588+
3. Verify the required model deployment exists in the workspace
589+
573590
### Query timeout or slow performance
574591

575592
| Cause | Resolution |

0 commit comments

Comments
 (0)