Skip to content

Commit 7069643

Browse files
Merge pull request #128210 from apoorvaMSFT/redis/server-load-small-sku-guidance
Azure Managed Redis: Add server load guidance for small SKUs
2 parents d5bfa4e + 32cfae6 commit 7069643

1 file changed

Lines changed: 16 additions & 6 deletions

File tree

articles/redis/best-practices-server-load.md

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -36,16 +36,26 @@ High memory usage on the server makes it more likely that the system needs to pa
3636

3737
Redis server is a single-threaded system. Long running commands can cause latency or timeouts on the client side because the server can't respond to any other requests while it's busy working on a long running command. For more information, see [Troubleshoot Azure Cache for Redis server-side issues](troubleshoot-server.md).
3838

39-
## Monitor Server Load
39+
## Monitor Server Load and CPU
4040

41-
Add monitoring on server load to ensure you get notifications when high server load occurs. Monitoring can help you understand your application constraints. Then, you can work proactively to mitigate issues. We recommend trying to keep server load under 80% to avoid negative performance effects. Sustained server load over 80% can lead to unplanned failovers.
42-
Currently, Azure Managed Redis exposes two metrics in **Insights** under **Monitoring** on the Resource menu on the left of the portal: **CPU** and **Server Load**. Understanding what is measured by each metric is important when monitoring server load.
41+
Add monitoring on server load and CPU to ensure you get notifications when either one of them is high. Monitoring can help you understand your application constraints. Then, you can work proactively to mitigate issues. We recommend trying to keep server load under 80% to avoid negative performance effects. Sustained server load over 80% can lead to unplanned failovers.
42+
Currently, Azure Managed Redis exposes two metrics in **Insights** under **Monitoring** on the Resource menu on the left of the portal: **CPU** and **Server Load**. Understanding what is measured by each metric is important when monitoring them.
4343

44-
The **CPU** metric indicates the CPU usage for the node that hosts the cache. The CPU metric also includes processes that aren't strictly Redis server processes. CPU includes background processes for anti-malware and others. As a result, the CPU metric can sometimes spike and might not be a perfect indicator of CPU usage for the Redis server.
44+
The **CPU** (a.k.a. percentProcessorTime) metric indicates the CPU usage for the node that hosts the cache. The CPU metric also includes processes that aren't strictly Redis server processes. CPU includes background processes for anti-malware and others. As a result, the CPU metric can sometimes spike and might not be a perfect indicator of CPU usage for the Redis server.
4545

46-
The **Server Load** metric represents the load on the Redis Server alone. We recommend monitoring the **Server Load** metric instead of **CPU**.
46+
The **Server Load** metric reflects the Redis server's own assessment of overall load and is similar to CPU metric but at a cluster level.
4747

48-
When monitoring server load, we also recommend that you examine the max spikes of Server Load rather than average because even brief spikes can trigger failovers and command timeouts.
48+
### Recommendations for Smaller SKUs
49+
50+
On Azure Managed Redis SKUs backed by 2-vCPU VMs (B0–B5, X3, and M10), percentage-based metrics like **Server Load** and **CPU** are inherently more sensitive. A single short-lived background thread can consume a significant percentage of total CPU, causing metrics to appear elevated even when actual workload is light. As a result, these metrics can overestimate actual load on small SKUs and may not indicate workload saturation.
51+
52+
When reviewing metrics over longer time periods, such as several hours or days, we recommend:
53+
54+
- Using **CPU** instead of **Server Load** as it can be viewed at instance level adding more granularity
55+
- Splitting by instance ID of the virtual machines backing the Azure Managed Redis instance
56+
- Using **Average** aggregation instead of **Maximum** for these longer time ranges
57+
58+
You can still use **Maximum** aggregation over short time windows to catch brief spikes or events (such as those that might cause timeouts or failovers), while relying on **Average** over longer windows for trend analysis on small SKUs, especially when using **CPU**.
4959

5060
## Test for increased server load after failover
5161

0 commit comments

Comments
 (0)