Skip to content

fix(telemetry): propagate span context in async generators and fix member agent input tracing#77

Open
Eliozhang wants to merge 1 commit into
trpc-group:mainfrom
Eliozhang:fix/telemetry-span-context-propagation
Open

fix(telemetry): propagate span context in async generators and fix member agent input tracing#77
Eliozhang wants to merge 1 commit into
trpc-group:mainfrom
Eliozhang:fix/telemetry-span-context-propagation

Conversation

@Eliozhang
Copy link
Copy Markdown

@Eliozhang Eliozhang commented Jun 8, 2026

问题

1. Span context 在 async generator 中丢失

start_as_current_span 返回的 context manager 在 async generator 被 cancel 时,__aexit__ 不保证执行(Python async generator 的已知行为)。这导致:

  • context.detach() 未被调用,span context 丢失
  • 子 span(agent_runcall_llmexecute_tool 等)无法正确解析父 span
  • 链路追踪断裂,无法看到完整的调用链

复现路径: TeamAgent 调用 member agent 时,member agent 的执行是 async generator,cancel 时 span context 丢失。

2. Member agent 的 trace input 不准确

trace_agent 函数始终使用 user_content 记录 agent input,但当 member agent 被 TeamAgent 委派时,user_content 仍然是原始用户发给 leader agent 的内容,而不是 leader agent 转发给 member agent 的 override_messages。这导致 trace 中 member agent 的 input 和实际执行不匹配。

修复

Fix 1: runners.py + agents/_base_agent.py — span context 传播

start_span + context_api.attach/detach 替代 start_as_current_span

  • start_span 创建 span 但不自动设为 current
  • context_api.attach(set_span_in_context(span, current_ctx)) 手动将 span 设为 current,返回 token
  • try/finallyfinally 中调用 context_api.detach(token)
  • 关键: try/finallyCancelledError 下也会执行(PEP 492),而 context manager 的 __aexit__ 不保证

Fix 2: telemetry/_trace.py — override_messages 优先

trace_agent 中优先检查 invocation_context.override_messages

  • 如果存在,从 override_messages 提取 text parts 作为 input
  • 否则回退到原有的 user_content 逻辑

改动文件

文件 改动
trpc_agent_sdk/runners.py start_span + attach/detach 传播 span context
trpc_agent_sdk/agents/_base_agent.py 同上
trpc_agent_sdk/telemetry/_trace.py override_messages 优先于 user_content

测试

  • 本地运行 TeamAgent + Member Agent 场景,确认 span 链路完整
  • Cancel member agent 执行,确认 span 正确 close 且无 detach token error
  • Member agent trace 中 input 为 leader 转发的内容而非原始用户输入
  • 非 TeamAgent 场景(单 agent 直接运行)行为不变

…mber agent input tracing

- Use start_span + attach/detach instead of start_as_current_span in
  runners.py and _base_agent.py to properly propagate span context
  in async generators (CancelledError safe per PEP 492)
- Fix trace_agent to prefer override_messages over user_content when
  tracing member agents delegated by TeamAgent
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 8, 2026

CLA Assistant Lite bot:
Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request

@Eliozhang
Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants