Description
The log shows that the component was not started because the binary was missing, however, the state transition reports success.
Analysis results
This was caused by an incorrect check in Graph::queueHeadNodes(). In this specific case, a node would be queued and fail very fast before the function exits. The function checks how many nodes are "in flight", sees 0, and reports that there was no work to do and thus that the state transition was an unconditional success. This is incorrect, since the transition failed.
Solution
The "nodes in flight" check been replaced with a check on the number of executable nodes, and an additional error path added when a node is not enqueued. See #262
Error Occurrence Rate
Sporadic
How to reproduce
Run any integration test with a component binary path set incorrectly. Approximately 1 in 60 times, the transition will incorrectly succeed.
Supporting Information
[2026-06-24 09:02:48.794] [INFO] [launch_manager] �[0;34m 2026/6/24 9:2:48 LCLM LCLM DEBUG: [ Start transition to MainPG/run_target_app_does_not_report_krunning_in_time for PG MainPG ]�[0m
[2026-06-24 09:02:48.794] [INFO] [launch_manager] �[0;34m 2026/6/24 9:2:48 LCLM LCLM DEBUG: [ Graph::setState changes from kSuccess to kInTransition for PG 0 ( MainPG ) ]�[0m
[2026-06-24 09:02:48.794] [INFO] [launch_manager] �[0;34m 2026/6/24 9:2:48 LCLM LCLM DEBUG: [ Stop Dependencies: 0 ]�[0m
[2026-06-24 09:02:48.794] [INFO] [launch_manager] �[0;34m 2026/6/24 9:2:48 LCLM LCLM DEBUG: [ Stop Dependencies: 0 ]�[0m
[2026-06-24 09:02:48.794] [INFO] [launch_manager] �[0;34m 2026/6/24 9:2:48 LCLM LCLM DEBUG: [ Stop Dependencies: 0 ]�[0m
[2026-06-24 09:02:48.794] [INFO] [launch_manager] �[0;34m 2026/6/24 9:2:48 LCLM LCLM DEBUG: [ Start Dependencies: 0 ]�[0m
[2026-06-24 09:02:48.794] [INFO] [launch_manager] �[0;34m 2026/6/24 9:2:48 LCLM LCLM DEBUG: [ Start Dependencies: 0 ]�[0m
[ Starting process 1 ( component_does_not_report_krunning_in_time ) from executable /tmp/tests/process_wrong_binary_failure/abc_complex_reporting_process ]�[0m
[2026-06-24 09:02:48.794] [INFO] [launch_manager] �[101;30m !!! -> �[0m �[0;34m 2026/6/24 9:2:48 LCLM LCLM ERROR: [ File does not exist or is not executable: /tmp/tests/process_wrong_binary_failure/abc_complex_reporting_process ]�[0m
[2026-06-24 09:02:48.794] [INFO] [launch_manager] �[0;34m 2026/6/24 9:2:48 LCLM LCLM DEBUG: [ Graph::setState changes from kInTransition to kAborting for PG 0 ( MainPG ) ]�[0m
[2026-06-24 09:02:48.794] [INFO] [launch_manager] �[0;34m 2026/6/24 9:2:48 LCLM LCLM DEBUG: [ startProcess for MainPG process 1 ( component_does_not_report_krunning_in_time ) done ]�[0m
[2026-06-24 09:02:48.794] [INFO] [launch_manager] �[0;34m 2026/6/24 9:2:48 LCLM LCLM DEBUG: [ Graph::setState changes from kAborting to kUndefinedState for PG 0 ( MainPG ) ]�[0m
[2026-06-24 09:02:48.794] [INFO] [launch_manager] �[0;34m 2026/6/24 9:2:48 LCLM LCLM DEBUG: [ Control Client handler nudged ]�[0m
[2026-06-24 09:02:48.796] [INFO] [launch_manager] �[0;34m 2026/6/24 9:2:48 LCLM LCLM DEBUG: [ Graph::setState changes from kUndefinedState to kUndefinedState for PG 0 ( MainPG ) ]�[0m
[2026-06-24 09:02:48.796] [INFO] [launch_manager] �[0;34m 2026/6/24 9:2:48 LCLM LCLM INFO: [ Completed the request for PG MainPG to State MainPG/run_target_app_does_not_report_krunning_in_time in 1 ms ]�[0m
[2026-06-24 09:02:48.796] [INFO] [launch_manager] �[0;34m 2026/6/24 9:2:48 LCLM LCLM DEBUG: [ Control Client handler nudged ]�[0m
Classification
Major
First Affected Release
0.7
Last Affected Release
0.7
Expected Fixed Release
0.8
Category
Description
The log shows that the component was not started because the binary was missing, however, the state transition reports success.
Analysis results
This was caused by an incorrect check in
Graph::queueHeadNodes(). In this specific case, a node would be queued and fail very fast before the function exits. The function checks how many nodes are "in flight", sees 0, and reports that there was no work to do and thus that the state transition was an unconditional success. This is incorrect, since the transition failed.Solution
The "nodes in flight" check been replaced with a check on the number of executable nodes, and an additional error path added when a node is not enqueued. See #262
Error Occurrence Rate
Sporadic
How to reproduce
Run any integration test with a component binary path set incorrectly. Approximately 1 in 60 times, the transition will incorrectly succeed.
Supporting Information
Classification
Major
First Affected Release
0.7
Last Affected Release
0.7
Expected Fixed Release
0.8
Category