Skip to content

RDKB-64847: Observed spike in load and CPU due to cpu_telemetry2_0#382

Open
tabbas651 wants to merge 4 commits into
support/1.9from
topic/RDKB-64847
Open

RDKB-64847: Observed spike in load and CPU due to cpu_telemetry2_0#382
tabbas651 wants to merge 4 commits into
support/1.9from
topic/RDKB-64847

Conversation

@tabbas651

Copy link
Copy Markdown
Contributor

Reason for change: When DCA top/process marker sampling (processTopPattern) runs concurrently with telemetry HTTP uploads (curl), both CPU-intensive paths compete for resources causing WAN timeouts, FD exhaustion, and failed report uploads on resource-constrained devices. Introduced a CPU contention avoidance window that defers new curl handle acquisitions while DCA top pattern collection is active, preventing two CPU-intensive telemetry paths from running in parallel.
Test Procedure: performance testing
Risks: Medium
Priority: P0

Copilot AI review requested due to automatic review settings June 9, 2026 16:06
@tabbas651 tabbas651 requested a review from a team as a code owner June 9, 2026 16:06

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a “CPU contention avoidance window” to prevent telemetry HTTP upload work (curl pool acquisitions) from running concurrently with DCA top/process marker sampling, aiming to reduce CPU spikes and resulting WAN timeouts/FD pressure on resource-constrained devices.

Changes:

  • Added new public APIs to begin/end a sampling window in the HTTP connection pool interface.
  • Implemented sampling-window logic in the curl pool to defer new handle acquisitions while DCA sampling is active (with bounded waiting).
  • Integrated the sampling window into processTopPattern() so curl work is deferred during top/process collection.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
source/protocol/http/multicurlinterface.h Adds public begin/end sampling window APIs for coordinating with DCA sampling.
source/protocol/http/multicurlinterface.c Implements sampling-window flagging/drain logic and defers new curl handle acquisitions during the window.
source/dcautil/dca.c Wraps DCA top/process marker collection with begin/end sampling window calls.

Comment thread source/protocol/http/multicurlinterface.c Outdated
Comment thread source/protocol/http/multicurlinterface.c Outdated
Reason for change: When DCA top/process marker sampling (processTopPattern)
runs concurrently with telemetry HTTP uploads (curl), both CPU-intensive
paths compete for resources causing WAN timeouts, FD exhaustion, and failed
report uploads on resource-constrained devices. Introduced a CPU contention
avoidance window that defers new curl handle acquisitions while DCA top
pattern collection is active, preventing two CPU-intensive telemetry paths
from running in parallel.
Test Procedure: performance testing
Risks: Medium
Priority: P0
Signed-off-by: Thamim Razith Abbas Ali <[email protected]>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comment on lines +145 to +146
elapsed_ms = (unsigned int)((current_time.tv_sec - start_time.tv_sec) * 1000U);
elapsed_ms += (unsigned int)((current_time.tv_nsec - start_time.tv_nsec) / 1000000L);
Comment on lines 565 to 569
if (active_requests > 0)
{
active_requests--;
}
T2Info("Released curl handle = %d, active_requests = %d\n", idx, active_requests);
Comment thread source/dcautil/dca.c
Comment on lines 323 to +335
for (; var < vCount; ++var) // Loop of marker list starts here
{
if(!isSamplingActive)
{
/*
* Enter sampling window so new curl work is deferred while
* top/process markers are being collected.
*/
if(http_pool_begin_sampling_window(DCA_SAMPLING_CURL_WAIT_TIMEOUT_MS) == T2ERROR_SUCCESS)
{
isSamplingActive = true;
}
}

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

Comment on lines 78 to 81
static bool pool_initialized = false;
static bool pool_shutting_down = false;
static bool sampling_window_active = false;

Comment on lines +118 to +120
pthread_mutex_lock(&pool_mutex);
sampling_window_active = true;
pthread_mutex_unlock(&pool_mutex);
Comment on lines +179 to +181
pthread_mutex_lock(&pool_mutex);
sampling_window_active = false;
pthread_mutex_unlock(&pool_mutex);
Comment thread source/protocol/http/multicurlinterface.c Outdated
Comment thread source/dcautil/dca.c
Comment on lines 265 to 269
T2Debug("%s ++in\n", __FUNCTION__);
bool isSamplingActive = false;
if(profileName == NULL || topMarkerList == NULL)
{
T2Error("Invalid arguments for %s\n", __FUNCTION__);

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comment thread source/dcautil/dca.c
Comment on lines 323 to +335
for (; var < vCount; ++var) // Loop of marker list starts here
{
if(!isSamplingActive)
{
/*
* Enter sampling window so new curl work is deferred while
* top/process markers are being collected.
*/
if(http_pool_begin_sampling_window(DCA_SAMPLING_CURL_WAIT_TIMEOUT_MS) == T2ERROR_SUCCESS)
{
isSamplingActive = true;
}
}
Comment on lines +509 to +527
if (sampling_window_refcount > 0)
{
pthread_mutex_unlock(&pool_mutex);

if (clock_gettime(CLOCK_MONOTONIC, &current_time) != 0)
{
T2Error("clock_gettime failed for current_time: %s\n", strerror(errno));
return T2ERROR_FAILURE;
}

if ((current_time.tv_sec - start_time.tv_sec) >= POOL_ACQUIRE_TIMEOUT_SEC)
{
T2Error("Timeout waiting for sampling window to complete, treating as upload failure\n");
return T2ERROR_FAILURE;
}

usleep(POOL_ACQUIRE_RETRY_MS * 1000);
continue;
}
Comment on lines +107 to +121
T2ERROR http_pool_begin_sampling_window(unsigned int timeout_ms)
{
struct timespec start_time, current_time;
bool timeout_reached = false;

if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0)
{
T2Error("clock_gettime failed for start_time in %s: %s\n", __FUNCTION__, strerror(errno));
return T2ERROR_FAILURE;
}

pthread_mutex_lock(&pool_mutex);
sampling_window_refcount++;
pthread_mutex_unlock(&pool_mutex);

Copilot AI review requested due to automatic review settings June 9, 2026 18:33

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comment on lines +76 to +77
/* Maximum wait for in-flight requests to drain when DCA sampling starts. */
#define SAMPLE_WINDOW_DRAIN_RETRY_MS 100
Comment on lines +107 to +111
T2ERROR http_pool_begin_sampling_window(unsigned int timeout_ms)
{
struct timespec start_time, current_time;
bool timeout_reached = false;

Comment on lines 35 to 39
-I${top_srcdir}/source/utils \
-I${top_srcdir}/source/bulkdata \
-I${top_srcdir}/source/protocol/http \
-I${PKG_CONFIG_SYSROOT_DIR}$(includedir)/ccsp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants