You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/microsoft-discovery/how-to-query-cognitive-loop-logs.md
+60-43Lines changed: 60 additions & 43 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,10 +14,10 @@ ms.date: 04/15/2026
14
14
15
15
Microsoft Discovery CogLoop is the AI orchestration engine that drives investigation progress. Cognition Engine logs capture:
16
16
17
-
-**Instance lifecycle**— Cognition Engine instance start, stop, and polling activity
18
-
-**Reasoning decisions**— Thinking module (fast/slow) and acting module tool selections
19
-
-**Task management operations**— Task execution, validation, status transitions, and agent assignments
20
-
-**Error diagnostics**— Serialization failures, Cosmos DB connectivity issues, loop errors, and tool call failures
17
+
-**Instance lifecycle**- Cognition Engine instance start, stop, and polling activity
18
+
-**Reasoning decisions**- Thinking module (fast/slow) and acting module tool selections
19
+
-**Task management operations**- Task execution, validation, status transitions, and agent assignments
20
+
-**Error diagnostics**- Serialization failures, Cosmos DB connectivity issues, loop errors, and tool call failures
21
21
22
22
It continuously runs two subloops - **Act** and **Cognition** to plan and execute research tasks on your behalf. CogLoop logs are automatically stored in the `DiscoveryCogLoopLogs_CL` table in the Log Analytics workspace inside the workspace's Managed Resource Group (MRG).
23
23
@@ -72,7 +72,7 @@ The `DiscoveryCogLoopLogs_CL` table includes the following key fields:
|`ModuleName`| Reasoning module name (`Cognition` or `Act`) |
74
74
|`ChosenTool`| The tool/function selected by the PickBest decision engine |
75
-
|`ClassName`| Source class name (e.g., `CogLoopInstanceManager`, `CosmosDbService`) |
75
+
|`ClassName`| Source class name (for example, `CogLoopInstanceManager`, `CosmosDbService`) |
76
76
|`MethodName`| Source method name |
77
77
|`Goal`| The reasoning prompt goal submitted to the PickBest engine |
78
78
|`SleepTime`| Wait duration in seconds when the Cognition Engine decides to wait |
@@ -130,10 +130,10 @@ DiscoveryCogLoopLogs_CL
130
130
Lists every Cognition Engine instance (investigation) managed by the service and classifies their activity into
131
131
three operation types.
132
132
133
-
-**InstanceId**— The investigation identifier (format: cog:`<project>:<investigation>`).
134
-
-**StatusChecks**— Routine polling checks. The service checks all known instances each cycle, so this count is typically similar across instances.
135
-
-**Starts**— How many times the instance was started. Instances with Starts > 0 were actively launched during the time window.
136
-
-**Retrievals**— How many times instance state was fetched for execution. Indicates the Cognition Engine actively engaged with this investigation.
133
+
-**InstanceId**- The investigation identifier (format: cog:`<project>:<investigation>`).
134
+
-**StatusChecks**- Routine polling checks. The service checks all known instances each cycle, so this count is typically similar across instances.
135
+
-**Starts**- How many times the instance was started. Instances with Starts > 0 were actively launched during the time window.
136
+
-**Retrievals**- How many times instance state was fetched for execution. Indicates the Cognition Engine actively engaged with this investigation.
137
137
138
138
```kql
139
139
DiscoveryCogLoopLogs_CL
@@ -189,14 +189,11 @@ DiscoveryCogLoopLogs_CL
189
189
190
190
### Track Instance Startup
191
191
192
-
Retrieves the full chronological log trail for a specific Cognition Engine instance, showing every event from
193
-
first appearance through its reasoning cycles. Replace `<your-instance-id>` with the target instance (e.g.,
194
-
`cog:myproject:inv01-experiment-abc123`).
192
+
Retrieves the full chronological log trail for a specific Cognition Engine instance, showing every event from first appearance through its reasoning cycles Replace `<your-instance-id>` with the target instance (for example, `cog:myproject:inv01-experiment-abc123`).
195
193
196
-
-**TimeGenerated** — When the event occurred, sorted oldest-first to reconstruct the sequence of events.
197
-
-**LogLevel** — Severity level, useful for spotting where errors or warnings interrupted the instance lifecycle.
198
-
-**Message** — The log message content, showing startup steps, reasoning decisions, task operations, and any
199
-
failures in order.
194
+
-**TimeGenerated** - When the event occurred, sorted oldest-first to reconstruct the sequence of events.
195
+
-**LogLevel** - Severity level, useful for spotting where errors or warnings interrupted the instance lifecycle.
196
+
-**Message** - The log message content, showing startup steps, reasoning decisions, task operations, and any failures in order.
200
197
201
198
```kql
202
199
DiscoveryCogLoopLogs_CL
@@ -252,11 +249,9 @@ DiscoveryCogLoopLogs_CL
252
249
253
250
If the Cognition Engine repeatedly selects `Cognition-Wait`, it can mean tasks are stalled, all work is already complete, or an internal error is preventing progress.
254
251
255
-
-**IdlePct** — Percentage of waits where nothing changed. Sustained 100%
256
-
signals a stuck investigation.
257
-
-**AvgSleepSec** — Short sleeps (30s) mean cognition expects progress soon;
258
-
long sleeps (300s) mean it has stopped trying.
259
-
-**SampleReason** — The LLM's own explanation
252
+
-**IdlePct** - Percentage of waits where nothing changed. Sustained 100% signals a stuck investigation.
253
+
-**AvgSleepSec** - Short sleeps (30s) mean cognition expects progress soon; long sleeps (300s) mean it has stopped trying.
254
+
-**SampleReason** - The LLM's own explanation
260
255
261
256
```kql
262
257
DiscoveryCogLoopLogs_CL
@@ -282,9 +277,9 @@ DiscoveryCogLoopLogs_CL
282
277
283
278
Trace the reasoning steps the Cognition Engine takes before acting. Slow thinking indicates complex deliberation; fast thinking indicates straightforward decisions.
284
279
285
-
-**TimeGenerated**— When the thinking step completed.
286
-
-**ThinkingType**—`FastThinking` or `SlowThinking`, indicating the depth of reasoning applied.
287
-
-**Message**— The full thinking output, including the thought content and reasoning context.
280
+
-**TimeGenerated**- When the thinking step completed.
281
+
-**ThinkingType**-`FastThinking` or `SlowThinking`, indicating the depth of reasoning applied.
282
+
-**Message**- The full thinking output, including the thought content and reasoning context.
288
283
289
284
```kql
290
285
DiscoveryCogLoopLogs_CL
@@ -295,7 +290,7 @@ DiscoveryCogLoopLogs_CL
295
290
| order by TimeGenerated desc
296
291
```
297
292
298
-
> If `SlowThinking` entries dominate, the Cognition Engine is spending significant effort on complex decisions — this may be expected for difficult investigations or could indicate unclear task definitions forcing repeated deep analysis.
293
+
> If `SlowThinking` entries dominate, the Cognition Engine is spending significant effort on complex decisions, this may be expected for difficult investigations or could indicate unclear task definitions forcing repeated deep analysis.
299
294
300
295
## Task Management Operations
301
296
@@ -312,7 +307,7 @@ DiscoveryCogLoopLogs_CL
312
307
| order by TimeGenerated asc
313
308
```
314
309
315
-
> This is the primary query for debugging a specific task — it shows exactly how the Cognition Engine handled the task from creation to completion (or failure), making it easy to pinpoint where and why a task stalled or failed.
310
+
> This is the primary query for debugging a specific task, it shows exactly how the Cognition Engine handled the task from creation to completion (or failure), making it easy to pinpoint where and why a task stalled or failed.
316
311
317
312
### View Task Validation Results
318
313
@@ -330,7 +325,7 @@ DiscoveryCogLoopLogs_CL
330
325
331
326
### View TaskValidationAgent Lifecycle
332
327
333
-
Traces the full lifecycle of the TaskValidationAgent — from provisioning and upsert through invocation and completion. Shows whether the validation agent was successfully created and is being used by the Cognition Engine.
328
+
Traces the full lifecycle of the TaskValidationAgent from provisioning and upsert through invocation and completion. Shows whether the validation agent was successfully created and is being used by the Cognition Engine.
334
329
335
330
```kql
336
331
DiscoveryCogLoopLogs_CL
@@ -340,7 +335,7 @@ DiscoveryCogLoopLogs_CL
340
335
| order by TimeGenerated asc
341
336
```
342
337
343
-
> If no entries appear, the TaskValidationAgent was never provisioned — tasks will not be validated. If entries show errors during upsert or invocation, check that the required model deployment (e.g., `gpt-5-2`) is available in the workspace.
338
+
> If no entries appear, the TaskValidationAgent was never provisioned, tasks will not be validated. If entries show errors during upsert or invocation, check that the required model deployment (e.g., `gpt-5-2`) is available in the workspace.
344
339
345
340
## Error Diagnostics
346
341
@@ -360,9 +355,8 @@ DiscoveryCogLoopLogs_CL
360
355
361
356
Summarize errors by message to identify the most frequent failure modes.
362
357
363
-
-**ErrorMessage** — The first 80 characters of the error message, used as a grouping key to cluster similar
364
-
errors together.
365
-
-**ErrorCount** — How many times each error occurred. The highest counts point to the most impactful issue.
358
+
-**ErrorMessage** - The first 80 characters of the error message, used as a grouping key to cluster similar errors together.
359
+
-**ErrorCount** - How many times each error occurred. The highest counts point to the most impactful issue.
366
360
367
361
```kql
368
362
DiscoveryCogLoopLogs_CL
@@ -372,7 +366,7 @@ DiscoveryCogLoopLogs_CL
372
366
| order by ErrorCount desc
373
367
```
374
368
375
-
> If one error type vastly outnumbers the rest, start your troubleshooting there — it is likely the root cause. For example, a high count of `JsonException` serialization errors typically cascades into Cosmos DB health failures, polling cycle errors, and tool call failures downstream.
369
+
> If one error type vastly outnumbers the rest, start your troubleshooting there, it is likely the root cause. For example, a high count of `JsonException` serialization errors typically cascades into Cosmos DB health failures, polling cycle errors, and tool call failures downstream.
376
370
377
371
### Detect Cosmos DB Connectivity Issues
378
372
@@ -391,12 +385,9 @@ DiscoveryCogLoopLogs_CL
391
385
392
386
Isolates JSON serialization errors and groups them by message, including a sample stack trace for each. These errors typically prevent the Cognition Engine from loading instance state from Cosmos DB, blocking all reasoning activity.
393
387
394
-
-**ErrorMessage** — The first 80 characters of the error message, grouping related serialization failures
395
-
together.
396
-
-**Count** — How many times each serialization error occurred. High counts confirm this is a systemic issue
397
-
rather than a one-off.
398
-
-**SampleException** — A full exception with stack trace, showing the exact JSON path and property that failed
399
-
to deserialize (e.g., `AuthorRole`).
388
+
-**ErrorMessage** - The first 80 characters of the error message, grouping related serialization failures together.
389
+
-**Count** - How many times each serialization error occurred. High counts confirm it's a systemic issue rather than a one-off.
390
+
-**SampleException** - A full exception with stack trace, showing the exact JSON path and property that failed to deserialize (for example, `AuthorRole`).
400
391
401
392
```kql
402
393
DiscoveryCogLoopLogs_CL
@@ -410,7 +401,7 @@ DiscoveryCogLoopLogs_CL
410
401
| order by Count desc
411
402
```
412
403
413
-
> Serialization errors are often the root cause behind cascading failures. When instance state cannot be deserialized, it triggers downstream errors: Cosmos DB health check failures, polling cycle errors, and tool call failures. Use the `SampleException` to identify the specific schema mismatch — this typically happens after a service upgrade that changes model schemas.
404
+
> Serialization errors are often the root cause behind cascading failures. When instance state cannot be deserialized, it triggers downstream errors: Cosmos DB health check failures, polling cycle errors, and tool call failures. Use the `SampleException` to identify the specific schema mismatch, this typically happens after a service upgrade that changes model schemas.
414
405
415
406
### Detect Polling Cycle Failures
416
407
@@ -426,7 +417,7 @@ DiscoveryCogLoopLogs_CL
426
417
427
418
### Error Timeline for Incident Investigation
428
419
429
-
Correlate errors over time to identify when an incident started and whether it is ongoing.
420
+
Correlate errors over time to identify when an incident started and whether it's ongoing.
430
421
431
422
```kql
432
423
DiscoveryCogLoopLogs_CL
@@ -459,7 +450,7 @@ DiscoveryCogLoopLogs_CL
459
450
460
451
### Monitor Instance Retrieval Errors
461
452
462
-
Repeated failures in instance retrieval indicate the service cannot load investigation state from Cosmos DB.
453
+
Repeated failures in instance retrieval indicate the service can't load investigation state from Cosmos DB.
463
454
464
455
```kql
465
456
DiscoveryCogLoopLogs_CL
@@ -494,7 +485,7 @@ DiscoveryCogLoopLogs_CL
494
485
| order by TimeGenerated desc
495
486
```
496
487
497
-
The `reasoning` field explains exactly why CogLoop chose to wait or act, this is the most useful field for understanding investigation stalls.
488
+
The `reasoning` field explains exactly why CogLoop chose to wait or act, it's the most useful field for understanding investigation stalls.
498
489
499
490
### Detect Act and Cognition subloop errors
500
491
@@ -539,7 +530,7 @@ DiscoveryCogLoopLogs_CL
539
530
540
531
### Detect circuit breaker events
541
532
542
-
A circuit breaker opening means that repeated LLM API call failures have caused CogLoop to pause sending requests temporarily to prevent cascading failures.
533
+
A circuit breaker opening means that repeated LLM API call failures caused CogLoop to pause sending requests temporarily to prevent cascading failures.
543
534
544
535
```kql
545
536
DiscoveryCogLoopLogs_CL
@@ -570,6 +561,32 @@ DiscoveryCogLoopLogs_CL
570
561
| CogLoop is waiting for a running tool task to complete | Check Supercomputer logs for the associated job. See [Query supercomputer logs](how-to-query-supercomputer-logs.md)|
0 commit comments