fix(telemetry): propagate span context in async generators and fix member agent input tracing#77
Open
Eliozhang wants to merge 1 commit into
Conversation
…mber agent input tracing - Use start_span + attach/detach instead of start_as_current_span in runners.py and _base_agent.py to properly propagate span context in async generators (CancelledError safe per PEP 492) - Fix trace_agent to prefer override_messages over user_content when tracing member agents delegated by TeamAgent
|
CLA Assistant Lite bot: I have read the CLA Document and I hereby sign the CLA You can retrigger this bot by commenting recheck in this Pull Request |
Author
|
I have read the CLA Document and I hereby sign the CLA |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
问题
1. Span context 在 async generator 中丢失
start_as_current_span返回的 context manager 在 async generator 被 cancel 时,__aexit__不保证执行(Python async generator 的已知行为)。这导致:context.detach()未被调用,span context 丢失agent_run、call_llm、execute_tool等)无法正确解析父 span复现路径: TeamAgent 调用 member agent 时,member agent 的执行是 async generator,cancel 时 span context 丢失。
2. Member agent 的 trace input 不准确
trace_agent函数始终使用user_content记录 agent input,但当 member agent 被 TeamAgent 委派时,user_content仍然是原始用户发给 leader agent 的内容,而不是 leader agent 转发给 member agent 的override_messages。这导致 trace 中 member agent 的 input 和实际执行不匹配。修复
Fix 1:
runners.py+agents/_base_agent.py— span context 传播用
start_span+context_api.attach/detach替代start_as_current_span:start_span创建 span 但不自动设为 currentcontext_api.attach(set_span_in_context(span, current_ctx))手动将 span 设为 current,返回 tokentry/finally的finally中调用context_api.detach(token)try/finally在CancelledError下也会执行(PEP 492),而 context manager 的__aexit__不保证Fix 2:
telemetry/_trace.py— override_messages 优先trace_agent中优先检查invocation_context.override_messages:override_messages提取 text parts 作为 inputuser_content逻辑改动文件
trpc_agent_sdk/runners.pystart_span+attach/detach传播 span contexttrpc_agent_sdk/agents/_base_agent.pytrpc_agent_sdk/telemetry/_trace.pyoverride_messages优先于user_content测试