Skip to content

Commit 7f0ea1e

Browse files
committed
AB#5025: Convert Wiki TSG to LMC
1 parent 47168b0 commit 7f0ea1e

10 files changed

Lines changed: 353 additions & 0 deletions
5.25 KB
Loading
33.2 KB
Loading
18.6 KB
Loading
10.3 KB
Loading
314 KB
Loading
172 KB
Loading
51.7 KB
Loading
Loading
24.8 KB
Loading
Lines changed: 353 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,353 @@
1+
---
2+
title: Troubleshoot high data ingestion in Application Insights
3+
description: Provides a step-by-step guide to troubleshoot high data ingestion scenarios and provides methods to reduce costs.
4+
ms.date: 03/27/2025
5+
ms.service: azure-monitor
6+
ms.reviewer: jeanbisutti, toddfous, aaronmax, v-weizhu
7+
ms.custom: sap:Application Insights
8+
---
9+
# Troubleshoot high data ingestion in Application Insights
10+
11+
This article helps you troubleshoot high data ingestion that occurs in Application Insights resources or Log Analytics workspaces.
12+
13+
## General troubleshooting steps
14+
15+
### Step 1: Identify resources presenting high data ingestion
16+
17+
In the Azure portal, navigate to cost analysis for your scope. For example: **Cost Management + Billing** > **Cost Management** > **Cost analysis**. This blade offers cost analysis views to chart costs per resource, as follows:
18+
19+
:::image type="content" source="media/troubleshoot-high-ingestion/cost-analysis.png" alt-text="A screenshot thst shows the 'cost analysis' blade." border="false":::
20+
21+
22+
### Step 2: Identify costly tables with high data ingestion
23+
24+
Once you've identified an Application Insights resource or a Log Analytics workspace, analyze the data and determine where the highest ingestion occurs. Consider the approach that best suits your scenario:
25+
26+
- Based on raw record count
27+
28+
Use the following query to compare record counts across tables:
29+
30+
```Kusto
31+
search \*
32+
| where timestamp > ago(7d)
33+
| summarize count() by $table
34+
| sort by count\_ desc
35+
```
36+
37+
This query can help identify the *noisiest* tables. From there, you can refine your queries to narrow down the investigation.
38+
39+
- Based on consumed bytes
40+
41+
Determine tables with the highest byte ingestion using the [format_bytes()](/kusto/query/format-bytes-function) scalar function:
42+
43+
```Kusto
44+
systemEvents
45+
| where timestamp > ago(7d)
46+
| where type == "Billing"
47+
| extend BillingTelemetryType = tostring(dimensions["BillingTelemetryType"])
48+
| extend BillingTelemetrySizeInBytes = todouble(measurements["BillingTelemetrySize"])
49+
| summarize TotalBillingTelemetrySize = sum(BillingTelemetrySizeInBytes) by BillingTelemetryType
50+
| extend BillingTelemetrySizeGB = format\_bytes(TotalBillingTelemetrySize, 1 ,"GB")
51+
| sort by BillingTelemetrySizeInBytes desc
52+
| project-away BillingTelemetrySizeInBytes
53+
```
54+
55+
Similar to the record count queries, these queries above can assist in identifying the most active tables, allowing you to pinpoint specific tables for further investigation.
56+
57+
- Using Log Analytics Usage workbooks
58+
59+
In the Azure portal, navigate to your Log Analytics workspace, select **Workbooks**, and select **Usage** under **Log Analytics Workspace Insights**.
60+
61+
:::image type="content" source="media/troubleshoot-high-ingestion/log-analytics-usage-workbook.png" alt-text="A screenshot thst shows the Log Analytics workbook pane." border="false":::
62+
63+
This workbook provides valuable insights, such as the percentage of data ingestion for each table and detailed ingestion statistics for each resource reporting to the same workspace.
64+
65+
### Step 3: Identify driving factors in high data ingestion
66+
67+
Once you've identified the tables with high data ingestion, take the table with the highest activity and identify the driving factors for that excess telemetry. This could be a specific application that generates more data than the others, an exception message that gets logged too frequently, or a new logger category that emits too information.
68+
69+
Here are some sample queries you can use for this identification:
70+
71+
```Kusto
72+
requests
73+
| where timestamp > ago(7d)
74+
| summarize count() by cloud_RoleInstance
75+
| sort by count_ desc
76+
```
77+
78+
```Kusto
79+
requests
80+
| where timestamp > ago(7d)
81+
| summarize count() by operation_Name
82+
| sort by count_ desc
83+
```
84+
85+
```Kusto
86+
dependencies
87+
| where timestamp > ago(7d)
88+
| summarize count() by cloud_RoleName
89+
| sort by count_ desc
90+
```
91+
92+
```Kusto
93+
dependencies
94+
| where timestamp > ago(7d)
95+
| summarize count() by type
96+
| sort by count_ desc
97+
```
98+
99+
```Kusto
100+
traces
101+
| where timestamp > ago(7d)
102+
| summarize count() by message
103+
| sort by count_ desc
104+
```
105+
106+
```Kusto
107+
exceptions
108+
| where timestamp > ago(7d)
109+
| summarize count() by message
110+
| sort by count_ desc
111+
```
112+
113+
114+
You can try out different telemetry fields. For example, perhaps you first run the query below and see no evident culprit for the excess of telemetry:
115+
116+
```Kusto
117+
dependencies
118+
| where timestamp > ago(7d)
119+
| summarize count() by target
120+
| sort by count_ desc
121+
```
122+
123+
However, you can try another telemetry field instead of `target`, such as `type`. This might show more compelling results to help your investigation.
124+
125+
```Kusto
126+
dependencies
127+
| where timestamp > ago(7d)
128+
| summarize count() by type
129+
| sort by count_ desc
130+
```
131+
132+
In some scenarios, you might need to investigate a specific application or instance further. Use the following queries to identify noisy messages or exception types:
133+
134+
```Kusto
135+
exceptions
136+
| where timestamp > ago(7d)
137+
| where cloud_RoleName == 'Specify a role name'
138+
| summarize count() by type
139+
| sort by count_ desc<o:p></o:p>
140+
```
141+
142+
```Kusto
143+
exceptions
144+
| where timestamp > ago(7d)
145+
| where cloud_RoleInstance == 'Specify a role instance'
146+
| summarize count() by type
147+
| sort by count_ desc
148+
```
149+
150+
### Step 4: Investigate evolution of ingestion over time
151+
152+
Examine the evolution of ingestion over time based on the driving factors identified previously. This way can determine whether this behavior has been consistent or if changes occurred at a specific point. By analyzing data in this way, you can pinpoint when the change happened and provide a clearer understanding of the causes behind the high data ingestion. This insight will be important for addressing the issue and implementing effective solutions.
153+
154+
In the following queries, the [bin()](/kusto/query/bin-function) Kusto Query Language (KQL) scalar function is used to segment data into 1-day intervals. This approach facilitates trend analysis as you can see how data has changed or not changed over time.
155+
156+
```Kusto
157+
dependencies
158+
| where timestamp &gt; ago(30d)
159+
| summarize count() by bin(timestamp, 1d), operation\_Name
160+
| sort by timestamp desc
161+
```
162+
163+
Use the `min()` aggregation function to identify the earliest recorded timestamp for specific factors. This approach helps establish a baseline and offers insights into when events or changes first occurred.
164+
165+
```Kusto
166+
dependencies
167+
| where timestamp > ago(30d)
168+
| where type == 'Specify dependency type being investigated'
169+
| summarize min(timestamp) by type
170+
| sort by min_timestamp desc
171+
```
172+
173+
## Troubleshooting steps for specific scenarios
174+
175+
### Scenario 1: High ingestion in Log Analytics
176+
177+
1. Query all tables within a Log Analytics workspace.
178+
179+
```Kusto
180+
search \*
181+
| where TimeGenerated &gt; ago(7d)
182+
| where \_IsBillable == true
183+
| summarize TotalBilledSize = sum(\_BilledSize) by $table
184+
| extend IngestedVolumeGB = format\_bytes(TotalBilledSize, 1, "GB")
185+
| sort by TotalBilledSize desc
186+
| project-away TotalBilledSize
187+
```
188+
189+
You can get what table is the biggest contributor to costs. Here's an exmaple of `AppTraces`:
190+
191+
:::image type="content" source="media/troubleshoot-high-ingestion/apptraces-table.png" alt-text="A screenshot thst shows that the AppTraces table is the biggest contributor to costs.":::
192+
193+
2. Query the specific application driving the costs for traces:
194+
195+
```Kusto
196+
AppTraces
197+
| where TimeGenerated > ago(7d)
198+
| where _IsBillable == true
199+
| summarize TotalBilledSize = sum(_BilledSize) by AppRoleName
200+
| extend IngestedVolumeGB = format_bytes(TotalBilledSize, 1, "GB")
201+
| sort by TotalBilledSize desc
202+
| project-away TotalBilledSize
203+
```
204+
205+
:::image type="content" source="media/troubleshoot-high-ingestion/application-driving-costs-for-traces.png" alt-text="A screenshot thst shows the specific application driving the costs for traces.":::
206+
207+
3. Run the following query specific to that application and look further into the specific logger categories sending telemetry to the `AppTraces` table:
208+
209+
```Kusto
210+
AppTraces
211+
| where TimeGenerated > ago(7d)
212+
| where _IsBillable == true
213+
| where AppRoleName contains 'transformation'
214+
| extend LoggerCategory = Properties['Category']
215+
| summarize TotalBilledSize = sum(_BilledSize) by tostring(LoggerCategory)
216+
| extend IngestedVolumeGB = format_bytes(TotalBilledSize, 1, "GB")
217+
| sort by TotalBilledSize desc
218+
| project-away TotalBilledSize
219+
```
220+
221+
The result shows two main categories responsible for the costs:
222+
223+
:::image type="content" source="media/troubleshoot-high-ingestion/logger-categories-sending-telemetry-to-apptraces.png" alt-text="A screenshot thst shows the specific logger categories sending telemetry to the AppTraces table.":::
224+
225+
### Scenario 2: High ingestion in Application Insight
226+
227+
To identify what specifically is driving the costs, follow these steps:
228+
229+
1. Query the telemetry across all tables and obtain a record count per table and SDK version:
230+
231+
```Kusto
232+
search \*
233+
| where TimeGenerated > ago(7d)
234+
| summarize count() by $table, SDKVersion
235+
| sort by count_ desc
236+
```
237+
238+
Here's an exmaple that shows Azure Functions is generating lots of trace and exception telemetry:
239+
240+
:::image type="content" source="media/troubleshoot-high-ingestion/table-sdkversion-count.png" alt-text="A screenshot thst shows what table and SDK is generating most Trace and Exception telemetry.":::
241+
242+
243+
2. Run the following query to get the specific app generating more traces than the others:
244+
245+
```Kusto
246+
AppTraces
247+
| where TimeGenerated > ago(7d)
248+
| where SDKVersion == 'azurefunctions: 4.34.2.22820'
249+
| summarize count() by AppRoleName
250+
| sort by count_ desc
251+
```
252+
253+
254+
:::image type="content" source="media/troubleshoot-high-ingestion/app-generating-more-traces.png" alt-text="A screenshot thst shows what app is generating most traces.":::
255+
256+
3. Refine the query to include that specific app and generate a count of records per each individual message:
257+
258+
```Kusto
259+
AppTraces
260+
| where TimeGenerated > ago(7d)
261+
| where SDKVersion == 'azurefunctions: 4.34.2.22820'
262+
| where AppRoleName contains 'inbound'
263+
| summarize count() by Message
264+
| sort by count_ desc
265+
```
266+
267+
The result can show the specific message driving up ingestion costs:
268+
269+
:::image type="content" source="media/troubleshoot-high-ingestion/app-message-counts.png" alt-text="A screenshot thst shows a count of records per each individual message.":::
270+
271+
### Scenario 3: Reach daily cap unexpectedly
272+
273+
Assume you reached daily cap unexpected on September 4th. Use the following query to obtain a count of custom events and identify the most recent timestamp associated with each event:
274+
275+
```Kusto
276+
customEvents
277+
| where timestamp between(datetime(8/25/2024) .. 15d)
278+
| summarize count(), min(timestamp) by name
279+
```
280+
281+
This analysis revealed that certain events started ingested on September 4th and subsequently became noisy very quickly.
282+
283+
:::image type="content" source="media/troubleshoot-high-ingestion/custom-events.png" alt-text="A screenshot thst shows a count of custom events.":::
284+
285+
## Methods to reduce costs
286+
287+
After identifying the driving factors in the Azure Monitor tables that explain the unexpected data ingestion, reduce costs by using the following methods per your scenario:
288+
289+
### Update daily cap configuration
290+
291+
Adjust the daily cap to prevent excess telemetry ingestion.
292+
293+
### Switch table plans
294+
295+
Switch to another supported table plan for Application Insights. See [Table plans](azure/azure-monitor/logs/data-platform-logs) and [Tables that support the basic table plan in Azure Monitor Logs](/azure/azure-monitor/logs/basic-logs-azure-tables).
296+
297+
### Use telemetry SDK features for Java agent
298+
299+
#### Default recommended solution
300+
301+
The default recommended solution is using [sampling overrides](/azure/azure-monitor/app/java-standalone-sampling-overrides). A common use case is [suppressing collecting telemetry for health checks](/azure/azure-monitor/app/java-standalone-sampling-overrides#suppress-collecting-telemetry-for-health-checks). The Application Insights Java agent provides [two types of sampling](/azure/azure-monitor/app/java-standalone-config#sampling).
302+
303+
#### Supplemental methods to sampling overrides:
304+
305+
- Reduce cost from the `traces` table (**logs** and **Trace** on the Application Insights page):
306+
307+
- [Reduce the telemetry log level](/azure/azure-monitor/app/java-standalone-config#autocollected-logging)
308+
- [Remove application (not frameworks/libs) logs with MDC attribute and sampling override](/azure/azure-monitor/app/java-standalone-sampling-overrides#suppress-collecting-telemetry-for-log)
309+
- Disable log instrumentation by updating the applicationinsights.json file:
310+
311+
```JSON
312+
{
313+
"instrumentation": {
314+
"logging": {
315+
"enabled": false
316+
}
317+
}
318+
}
319+
```
320+
321+
- Reduce cost from the `dependencies` table:
322+
323+
- [Suppress collecting telemetry for the Java method producing the dependency telemetry](/azure/azure-monitor/app/java-standalone-sampling-overrides#suppress-collecting-telemetry-for-a-java-method)
324+
- [Disable the instrumentation](/azure/azure-monitor/app/java-standalone-config#suppress-specific-autocollected-telemetry) producing the dependency telemetry data.
325+
326+
If the dependency is a database call, you then won't see the database on the application map. If you remove the dependency instrumentation of an HTTP call or a message (for example a Kafka message), all the downstream telemetry data are dropped.
327+
328+
- Reduce cost from the `customMetrics` table:
329+
330+
- [Increase the metrics interval](/azure/azure-monitor/app/java-standalone-config#metric-interval)
331+
- [Exclude a metric with a telemetry processor](/azure/azure-monitor/app/java-standalone-telemetry-processors#metric-filter)
332+
- [Increase the heartbeat interval](/azure/azure-monitor/app/java-standalone-config#heartbeat)
333+
334+
- Reduce OpenTelemetry attributes cost:
335+
336+
OpenTelemetry attributes are added to the **customDimensions** column. They are represented as properties in Application Insights. You can remove attributes by using [an attribute telemetry processor](/azure/azure-monitor/app/java-standalone-telemetry-processors#attribute-processor). For more information, see [Telemetry processor examples - Delete](/azure/azure-monitor/app/java-standalone-telemetry-processors-examples#delete).
337+
338+
### Update application code (log levels and exceptions)
339+
340+
In some scenarios, updating the application code directly might help reduce the amount of telemetry being generated and consumed by the Application Insights backend service.
341+
342+
## References
343+
344+
- [Azure Monitor Pricing](https://azure.microsoft.com/pricing/details/monitor/)
345+
- [Change pricing tier for Log Analytics workspace](/azure/azure-monitor/logs/change-pricing-tier)
346+
- [Table plans in Azure Monitor](/azure/azure-monitor/logs/data-platform-logs)
347+
- [Azure Monitor cost and usage](/azure/azure-monitor/cost-usage)
348+
- [Analyze usage in a Log Analytics workspace](/azure/azure-monitor/logs/analyze-usage)
349+
- [Cost optimization in Azure Monitor](/azure/azure-monitor/fundamentals/best-practices-cost)
350+
351+
[!INCLUDE [Third-party disclaimer](../../../../includes/third-party-contact-disclaimer.md)]
352+
353+
[!INCLUDE [Azure Help Support](../../../../includes/azure-help-support.md)]

0 commit comments

Comments
 (0)