|
| 1 | +--- |
| 2 | +title: Troubleshoot high data ingestion in Application Insights |
| 3 | +description: Provides a step-by-step guide to troubleshoot high data ingestion scenarios and provides methods to reduce costs. |
| 4 | +ms.date: 03/27/2025 |
| 5 | +ms.service: azure-monitor |
| 6 | +ms.reviewer: jeanbisutti, toddfous, aaronmax, v-weizhu |
| 7 | +ms.custom: sap:Application Insights |
| 8 | +--- |
| 9 | +# Troubleshoot high data ingestion in Application Insights |
| 10 | + |
| 11 | +This article helps you troubleshoot high data ingestion that occurs in Application Insights resources or Log Analytics workspaces. |
| 12 | + |
| 13 | +## General troubleshooting steps |
| 14 | + |
| 15 | +### Step 1: Identify resources presenting high data ingestion |
| 16 | + |
| 17 | +In the Azure portal, navigate to cost analysis for your scope. For example: **Cost Management + Billing** > **Cost Management** > **Cost analysis**. This blade offers cost analysis views to chart costs per resource, as follows: |
| 18 | + |
| 19 | + :::image type="content" source="media/troubleshoot-high-ingestion/cost-analysis.png" alt-text="A screenshot thst shows the 'cost analysis' blade." border="false"::: |
| 20 | + |
| 21 | + |
| 22 | +### Step 2: Identify costly tables with high data ingestion |
| 23 | + |
| 24 | +Once you've identified an Application Insights resource or a Log Analytics workspace, analyze the data and determine where the highest ingestion occurs. Consider the approach that best suits your scenario: |
| 25 | + |
| 26 | +- Based on raw record count |
| 27 | + |
| 28 | + Use the following query to compare record counts across tables: |
| 29 | + |
| 30 | + ```Kusto |
| 31 | + search \* |
| 32 | + | where timestamp > ago(7d) |
| 33 | + | summarize count() by $table |
| 34 | + | sort by count\_ desc |
| 35 | + ``` |
| 36 | +
|
| 37 | + This query can help identify the *noisiest* tables. From there, you can refine your queries to narrow down the investigation. |
| 38 | +
|
| 39 | +- Based on consumed bytes |
| 40 | +
|
| 41 | + Determine tables with the highest byte ingestion using the [format_bytes()](/kusto/query/format-bytes-function) scalar function: |
| 42 | + |
| 43 | + ```Kusto |
| 44 | + systemEvents |
| 45 | + | where timestamp > ago(7d) |
| 46 | + | where type == "Billing" |
| 47 | + | extend BillingTelemetryType = tostring(dimensions["BillingTelemetryType"]) |
| 48 | + | extend BillingTelemetrySizeInBytes = todouble(measurements["BillingTelemetrySize"]) |
| 49 | + | summarize TotalBillingTelemetrySize = sum(BillingTelemetrySizeInBytes) by BillingTelemetryType |
| 50 | + | extend BillingTelemetrySizeGB = format\_bytes(TotalBillingTelemetrySize, 1 ,"GB") |
| 51 | + | sort by BillingTelemetrySizeInBytes desc |
| 52 | + | project-away BillingTelemetrySizeInBytes |
| 53 | + ``` |
| 54 | +
|
| 55 | + Similar to the record count queries, these queries above can assist in identifying the most active tables, allowing you to pinpoint specific tables for further investigation. |
| 56 | +
|
| 57 | +- Using Log Analytics Usage workbooks |
| 58 | +
|
| 59 | + In the Azure portal, navigate to your Log Analytics workspace, select **Workbooks**, and select **Usage** under **Log Analytics Workspace Insights**. |
| 60 | +
|
| 61 | + :::image type="content" source="media/troubleshoot-high-ingestion/log-analytics-usage-workbook.png" alt-text="A screenshot thst shows the Log Analytics workbook pane." border="false"::: |
| 62 | +
|
| 63 | + This workbook provides valuable insights, such as the percentage of data ingestion for each table and detailed ingestion statistics for each resource reporting to the same workspace. |
| 64 | +
|
| 65 | +### Step 3: Identify driving factors in high data ingestion |
| 66 | +
|
| 67 | +Once you've identified the tables with high data ingestion, take the table with the highest activity and identify the driving factors for that excess telemetry. This could be a specific application that generates more data than the others, an exception message that gets logged too frequently, or a new logger category that emits too information. |
| 68 | +
|
| 69 | +Here are some sample queries you can use for this identification: |
| 70 | +
|
| 71 | +```Kusto |
| 72 | +requests |
| 73 | +| where timestamp > ago(7d) |
| 74 | +| summarize count() by cloud_RoleInstance |
| 75 | +| sort by count_ desc |
| 76 | +``` |
| 77 | + |
| 78 | +```Kusto |
| 79 | +requests |
| 80 | +| where timestamp > ago(7d) |
| 81 | +| summarize count() by operation_Name |
| 82 | +| sort by count_ desc |
| 83 | +``` |
| 84 | + |
| 85 | +```Kusto |
| 86 | +dependencies |
| 87 | +| where timestamp > ago(7d) |
| 88 | +| summarize count() by cloud_RoleName |
| 89 | +| sort by count_ desc |
| 90 | +``` |
| 91 | + |
| 92 | +```Kusto |
| 93 | +dependencies |
| 94 | +| where timestamp > ago(7d) |
| 95 | +| summarize count() by type |
| 96 | +| sort by count_ desc |
| 97 | +``` |
| 98 | + |
| 99 | +```Kusto |
| 100 | +traces |
| 101 | +| where timestamp > ago(7d) |
| 102 | +| summarize count() by message |
| 103 | +| sort by count_ desc |
| 104 | +``` |
| 105 | + |
| 106 | +```Kusto |
| 107 | +exceptions |
| 108 | +| where timestamp > ago(7d) |
| 109 | +| summarize count() by message |
| 110 | +| sort by count_ desc |
| 111 | +``` |
| 112 | + |
| 113 | + |
| 114 | +You can try out different telemetry fields. For example, perhaps you first run the query below and see no evident culprit for the excess of telemetry: |
| 115 | + |
| 116 | +```Kusto |
| 117 | +dependencies |
| 118 | +| where timestamp > ago(7d) |
| 119 | +| summarize count() by target |
| 120 | +| sort by count_ desc |
| 121 | +``` |
| 122 | + |
| 123 | +However, you can try another telemetry field instead of `target`, such as `type`. This might show more compelling results to help your investigation. |
| 124 | + |
| 125 | +```Kusto |
| 126 | +dependencies |
| 127 | +| where timestamp > ago(7d) |
| 128 | +| summarize count() by type |
| 129 | +| sort by count_ desc |
| 130 | +``` |
| 131 | + |
| 132 | +In some scenarios, you might need to investigate a specific application or instance further. Use the following queries to identify noisy messages or exception types: |
| 133 | + |
| 134 | +```Kusto |
| 135 | +exceptions |
| 136 | +| where timestamp > ago(7d) |
| 137 | +| where cloud_RoleName == 'Specify a role name' |
| 138 | +| summarize count() by type |
| 139 | +| sort by count_ desc<o:p></o:p> |
| 140 | +``` |
| 141 | + |
| 142 | +```Kusto |
| 143 | +exceptions |
| 144 | +| where timestamp > ago(7d) |
| 145 | +| where cloud_RoleInstance == 'Specify a role instance' |
| 146 | +| summarize count() by type |
| 147 | +| sort by count_ desc |
| 148 | +``` |
| 149 | + |
| 150 | +### Step 4: Investigate evolution of ingestion over time |
| 151 | + |
| 152 | +Examine the evolution of ingestion over time based on the driving factors identified previously. This way can determine whether this behavior has been consistent or if changes occurred at a specific point. By analyzing data in this way, you can pinpoint when the change happened and provide a clearer understanding of the causes behind the high data ingestion. This insight will be important for addressing the issue and implementing effective solutions. |
| 153 | + |
| 154 | +In the following queries, the [bin()](/kusto/query/bin-function) Kusto Query Language (KQL) scalar function is used to segment data into 1-day intervals. This approach facilitates trend analysis as you can see how data has changed or not changed over time. |
| 155 | + |
| 156 | +```Kusto |
| 157 | +dependencies |
| 158 | +| where timestamp > ago(30d) |
| 159 | +| summarize count() by bin(timestamp, 1d), operation\_Name |
| 160 | +| sort by timestamp desc |
| 161 | +``` |
| 162 | + |
| 163 | +Use the `min()` aggregation function to identify the earliest recorded timestamp for specific factors. This approach helps establish a baseline and offers insights into when events or changes first occurred. |
| 164 | + |
| 165 | +```Kusto |
| 166 | +dependencies |
| 167 | +| where timestamp > ago(30d) |
| 168 | +| where type == 'Specify dependency type being investigated' |
| 169 | +| summarize min(timestamp) by type |
| 170 | +| sort by min_timestamp desc |
| 171 | +``` |
| 172 | + |
| 173 | +## Troubleshooting steps for specific scenarios |
| 174 | + |
| 175 | +### Scenario 1: High ingestion in Log Analytics |
| 176 | + |
| 177 | +1. Query all tables within a Log Analytics workspace. |
| 178 | + |
| 179 | + ```Kusto |
| 180 | + search \* |
| 181 | + | where TimeGenerated > ago(7d) |
| 182 | + | where \_IsBillable == true |
| 183 | + | summarize TotalBilledSize = sum(\_BilledSize) by $table |
| 184 | + | extend IngestedVolumeGB = format\_bytes(TotalBilledSize, 1, "GB") |
| 185 | + | sort by TotalBilledSize desc |
| 186 | + | project-away TotalBilledSize |
| 187 | + ``` |
| 188 | + |
| 189 | + You can get what table is the biggest contributor to costs. Here's an exmaple of `AppTraces`: |
| 190 | +
|
| 191 | + :::image type="content" source="media/troubleshoot-high-ingestion/apptraces-table.png" alt-text="A screenshot thst shows that the AppTraces table is the biggest contributor to costs."::: |
| 192 | +
|
| 193 | +2. Query the specific application driving the costs for traces: |
| 194 | +
|
| 195 | + ```Kusto |
| 196 | + AppTraces |
| 197 | + | where TimeGenerated > ago(7d) |
| 198 | + | where _IsBillable == true |
| 199 | + | summarize TotalBilledSize = sum(_BilledSize) by AppRoleName |
| 200 | + | extend IngestedVolumeGB = format_bytes(TotalBilledSize, 1, "GB") |
| 201 | + | sort by TotalBilledSize desc |
| 202 | + | project-away TotalBilledSize |
| 203 | + ``` |
| 204 | +
|
| 205 | + :::image type="content" source="media/troubleshoot-high-ingestion/application-driving-costs-for-traces.png" alt-text="A screenshot thst shows the specific application driving the costs for traces."::: |
| 206 | +
|
| 207 | +3. Run the following query specific to that application and look further into the specific logger categories sending telemetry to the `AppTraces` table: |
| 208 | +
|
| 209 | + ```Kusto |
| 210 | + AppTraces |
| 211 | + | where TimeGenerated > ago(7d) |
| 212 | + | where _IsBillable == true |
| 213 | + | where AppRoleName contains 'transformation' |
| 214 | + | extend LoggerCategory = Properties['Category'] |
| 215 | + | summarize TotalBilledSize = sum(_BilledSize) by tostring(LoggerCategory) |
| 216 | + | extend IngestedVolumeGB = format_bytes(TotalBilledSize, 1, "GB") |
| 217 | + | sort by TotalBilledSize desc |
| 218 | + | project-away TotalBilledSize |
| 219 | + ``` |
| 220 | +
|
| 221 | + The result shows two main categories responsible for the costs: |
| 222 | +
|
| 223 | + :::image type="content" source="media/troubleshoot-high-ingestion/logger-categories-sending-telemetry-to-apptraces.png" alt-text="A screenshot thst shows the specific logger categories sending telemetry to the AppTraces table."::: |
| 224 | +
|
| 225 | +### Scenario 2: High ingestion in Application Insight |
| 226 | +
|
| 227 | +To identify what specifically is driving the costs, follow these steps: |
| 228 | +
|
| 229 | +1. Query the telemetry across all tables and obtain a record count per table and SDK version: |
| 230 | +
|
| 231 | + ```Kusto |
| 232 | + search \* |
| 233 | + | where TimeGenerated > ago(7d) |
| 234 | + | summarize count() by $table, SDKVersion |
| 235 | + | sort by count_ desc |
| 236 | + ``` |
| 237 | +
|
| 238 | + Here's an exmaple that shows Azure Functions is generating lots of trace and exception telemetry: |
| 239 | + |
| 240 | + :::image type="content" source="media/troubleshoot-high-ingestion/table-sdkversion-count.png" alt-text="A screenshot thst shows what table and SDK is generating most Trace and Exception telemetry."::: |
| 241 | + |
| 242 | +
|
| 243 | +2. Run the following query to get the specific app generating more traces than the others: |
| 244 | +
|
| 245 | + ```Kusto |
| 246 | + AppTraces |
| 247 | + | where TimeGenerated > ago(7d) |
| 248 | + | where SDKVersion == 'azurefunctions: 4.34.2.22820' |
| 249 | + | summarize count() by AppRoleName |
| 250 | + | sort by count_ desc |
| 251 | + ``` |
| 252 | +
|
| 253 | +
|
| 254 | + :::image type="content" source="media/troubleshoot-high-ingestion/app-generating-more-traces.png" alt-text="A screenshot thst shows what app is generating most traces."::: |
| 255 | +
|
| 256 | +3. Refine the query to include that specific app and generate a count of records per each individual message: |
| 257 | +
|
| 258 | + ```Kusto |
| 259 | + AppTraces |
| 260 | + | where TimeGenerated > ago(7d) |
| 261 | + | where SDKVersion == 'azurefunctions: 4.34.2.22820' |
| 262 | + | where AppRoleName contains 'inbound' |
| 263 | + | summarize count() by Message |
| 264 | + | sort by count_ desc |
| 265 | + ``` |
| 266 | +
|
| 267 | + The result can show the specific message driving up ingestion costs: |
| 268 | +
|
| 269 | + :::image type="content" source="media/troubleshoot-high-ingestion/app-message-counts.png" alt-text="A screenshot thst shows a count of records per each individual message."::: |
| 270 | +
|
| 271 | +### Scenario 3: Reach daily cap unexpectedly |
| 272 | +
|
| 273 | +Assume you reached daily cap unexpected on September 4th. Use the following query to obtain a count of custom events and identify the most recent timestamp associated with each event: |
| 274 | +
|
| 275 | +```Kusto |
| 276 | +customEvents |
| 277 | +| where timestamp between(datetime(8/25/2024) .. 15d) |
| 278 | +| summarize count(), min(timestamp) by name |
| 279 | +``` |
| 280 | + |
| 281 | +This analysis revealed that certain events started ingested on September 4th and subsequently became noisy very quickly. |
| 282 | + |
| 283 | +:::image type="content" source="media/troubleshoot-high-ingestion/custom-events.png" alt-text="A screenshot thst shows a count of custom events."::: |
| 284 | + |
| 285 | +## Methods to reduce costs |
| 286 | + |
| 287 | +After identifying the driving factors in the Azure Monitor tables that explain the unexpected data ingestion, reduce costs by using the following methods per your scenario: |
| 288 | + |
| 289 | +### Update daily cap configuration |
| 290 | + |
| 291 | +Adjust the daily cap to prevent excess telemetry ingestion. |
| 292 | + |
| 293 | +### Switch table plans |
| 294 | + |
| 295 | +Switch to another supported table plan for Application Insights. See [Table plans](azure/azure-monitor/logs/data-platform-logs) and [Tables that support the basic table plan in Azure Monitor Logs](/azure/azure-monitor/logs/basic-logs-azure-tables). |
| 296 | + |
| 297 | +### Use telemetry SDK features for Java agent |
| 298 | + |
| 299 | +#### Default recommended solution |
| 300 | + |
| 301 | +The default recommended solution is using [sampling overrides](/azure/azure-monitor/app/java-standalone-sampling-overrides). A common use case is [suppressing collecting telemetry for health checks](/azure/azure-monitor/app/java-standalone-sampling-overrides#suppress-collecting-telemetry-for-health-checks). The Application Insights Java agent provides [two types of sampling](/azure/azure-monitor/app/java-standalone-config#sampling). |
| 302 | + |
| 303 | +#### Supplemental methods to sampling overrides: |
| 304 | + |
| 305 | +- Reduce cost from the `traces` table (**logs** and **Trace** on the Application Insights page): |
| 306 | + |
| 307 | + - [Reduce the telemetry log level](/azure/azure-monitor/app/java-standalone-config#autocollected-logging) |
| 308 | + - [Remove application (not frameworks/libs) logs with MDC attribute and sampling override](/azure/azure-monitor/app/java-standalone-sampling-overrides#suppress-collecting-telemetry-for-log) |
| 309 | + - Disable log instrumentation by updating the applicationinsights.json file: |
| 310 | + |
| 311 | + ```JSON |
| 312 | + { |
| 313 | + "instrumentation": { |
| 314 | + "logging": { |
| 315 | + "enabled": false |
| 316 | + } |
| 317 | + } |
| 318 | + } |
| 319 | + ``` |
| 320 | + |
| 321 | +- Reduce cost from the `dependencies` table: |
| 322 | + |
| 323 | + - [Suppress collecting telemetry for the Java method producing the dependency telemetry](/azure/azure-monitor/app/java-standalone-sampling-overrides#suppress-collecting-telemetry-for-a-java-method) |
| 324 | + - [Disable the instrumentation](/azure/azure-monitor/app/java-standalone-config#suppress-specific-autocollected-telemetry) producing the dependency telemetry data. |
| 325 | + |
| 326 | + If the dependency is a database call, you then won't see the database on the application map. If you remove the dependency instrumentation of an HTTP call or a message (for example a Kafka message), all the downstream telemetry data are dropped. |
| 327 | + |
| 328 | +- Reduce cost from the `customMetrics` table: |
| 329 | + |
| 330 | + - [Increase the metrics interval](/azure/azure-monitor/app/java-standalone-config#metric-interval) |
| 331 | + - [Exclude a metric with a telemetry processor](/azure/azure-monitor/app/java-standalone-telemetry-processors#metric-filter) |
| 332 | + - [Increase the heartbeat interval](/azure/azure-monitor/app/java-standalone-config#heartbeat) |
| 333 | + |
| 334 | +- Reduce OpenTelemetry attributes cost: |
| 335 | + |
| 336 | + OpenTelemetry attributes are added to the **customDimensions** column. They are represented as properties in Application Insights. You can remove attributes by using [an attribute telemetry processor](/azure/azure-monitor/app/java-standalone-telemetry-processors#attribute-processor). For more information, see [Telemetry processor examples - Delete](/azure/azure-monitor/app/java-standalone-telemetry-processors-examples#delete). |
| 337 | + |
| 338 | +### Update application code (log levels and exceptions) |
| 339 | + |
| 340 | +In some scenarios, updating the application code directly might help reduce the amount of telemetry being generated and consumed by the Application Insights backend service. |
| 341 | + |
| 342 | +## References |
| 343 | + |
| 344 | +- [Azure Monitor Pricing](https://azure.microsoft.com/pricing/details/monitor/) |
| 345 | +- [Change pricing tier for Log Analytics workspace](/azure/azure-monitor/logs/change-pricing-tier) |
| 346 | +- [Table plans in Azure Monitor](/azure/azure-monitor/logs/data-platform-logs) |
| 347 | +- [Azure Monitor cost and usage](/azure/azure-monitor/cost-usage) |
| 348 | +- [Analyze usage in a Log Analytics workspace](/azure/azure-monitor/logs/analyze-usage) |
| 349 | +- [Cost optimization in Azure Monitor](/azure/azure-monitor/fundamentals/best-practices-cost) |
| 350 | + |
| 351 | +[!INCLUDE [Third-party disclaimer](../../../../includes/third-party-contact-disclaimer.md)] |
| 352 | + |
| 353 | +[!INCLUDE [Azure Help Support](../../../../includes/azure-help-support.md)] |
0 commit comments