Add tool-execution observability (0063)#178
Merged
Merged
Conversation
The model requests tools in its completion (0076); the caller runs them in node-body code, which was invisible to observers. Add the with_tool_call instrumentation scope -- a context manager (like with_active_prompt) the caller wraps a tool execution in -- plus the typed ToolCallEvent / ToolCallFailedEvent it dispatches at outcome time (re-raising on failure; the failure event carries error_type / error_message and deliberately no error_category). The OTel observer renders an openarmature.tool.call span (OA-namespace attributes, error.type on failure); the Langfuse observer renders a dedicated Tool observation (asType "tool"), which adds a tool() method to the client Protocol, the in-memory recorder, and the SDK adapter. Arguments and result are payload, gated by disable_provider_payload. Also harden both observers' payload serialization with default=str so an opaque tool result JSON can't encode renders via str() instead of crashing the observer, and back-date the Langfuse Tool observation (generalize the back-dating helper to wrap LangfuseTool). Implements proposal 0063 (graph-engine 6, observability 5.5 / 8.4).
Advance the spec pin v0.68.0 -> v0.69.0 across the four sync points (submodule, __spec_version__, pyproject, conformance manifest) and the smoke assertion; regenerate the bundled AGENTS.md. Wire conformance fixtures 092-098 through a tool-graph runner (calls_tool / calls_llm / update nodes) dispatching across the typed-event-collector, OTel span-tree, and Langfuse Tool-observation assertion shapes; teach the fixture-parser schema the calls_tool directive and the record state type, and defer-parse 092-095 (the typed-collector shape, like 050-056). Record proposal 0063 implemented, document the with_tool_call scope and the Tool observation, and add the CHANGELOG entry. Also reconcile a stale LangfuseClient method count and add the Tool observation to the Langfuse-mapping overview.
There was a problem hiding this comment.
Pull request overview
Implements accepted proposal 0063 (spec v0.69.0) by adding a node-body with_tool_call instrumentation scope that dispatches typed tool-execution events, and rendering those events in the OTel and Langfuse observers.
Changes:
- Added
with_tool_call+ToolCallScope, plusToolCallEvent/ToolCallFailedEventto the observer event union. - Rendered tool execution as
openarmature.tool.call(OTel span) and as a dedicated Langfusetoolobservation; updated Langfuse adapter/client protocol accordingly. - Bumped spec pin to v0.69.0 and added unit + conformance coverage for tool-execution observability.
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_tool_call.py | Unit tests for with_tool_call dispatch/re-raise behavior and edge cases. |
| tests/unit/test_observability_otel.py | Unit tests for OTel openarmature.tool.call span attributes, gating, failure mapping, serialization fallback. |
| tests/unit/test_observability_langfuse.py | Unit tests for Langfuse dedicated tool observation rendering and gating/serialization behavior. |
| tests/unit/test_observability_langfuse_adapter.py | Unit tests ensuring back-dated tool observations route via private OTel tracer path. |
| tests/test_smoke.py | Updated spec version assertion to 0.69.0. |
| tests/conformance/test_observability.py | Added runner + assertions for tool observability fixtures (092–098). |
| tests/conformance/test_fixture_parsing.py | Deferred parsing notes for tool typed-collector fixtures (092–095). |
| tests/conformance/harness/directives.py | Added calls_tool directive schema and mock tool spec. |
| tests/conformance/adapter.py | Added record state type mapping for tool fixtures. |
| src/openarmature/observability/tool_call.py | New implementation of with_tool_call scope and ToolCallScope. |
| src/openarmature/observability/otel/observer.py | OTel rendering for tool events; safer JSON serialization with default=str. |
| src/openarmature/observability/langfuse/observer.py | Langfuse rendering for tool events; safer JSON serialization with default=str. |
| src/openarmature/observability/langfuse/client.py | Extended protocol and in-memory client to support tool observations. |
| src/openarmature/observability/langfuse/adapter.py | Added adapter support for tool() and generalized back-dating helper. |
| src/openarmature/observability/correlation.py | Extended dispatch typing to include tool event variants. |
| src/openarmature/observability/init.py | Exported with_tool_call and ToolCallScope. |
| src/openarmature/graph/observer.py | Included tool events in observer union typing. |
| src/openarmature/graph/events.py | Added ToolCallEvent and ToolCallFailedEvent dataclasses and exports. |
| src/openarmature/AGENTS.md | Updated bundled agent doc spec version to v0.69.0. |
| src/openarmature/init.py | Updated __spec_version__ to 0.69.0. |
| pyproject.toml | Updated [tool.openarmature].spec_version to 0.69.0. |
| docs/concepts/observability.md | Documented with_tool_call and backend renderings; updated Langfuse section wording. |
| conformance.toml | Updated spec pin to v0.69.0 and marked proposal 0063 implemented. |
| CHANGELOG.md | Added release notes entry for tool-execution observability; updated spec pin summary. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- with_tool_call: drop the result sentinel. The scope result defaults to None and the event carries it directly; a forgotten set_result and a tool that returns None both emit a null result, which is correct, so the sentinel produced no distinguishable output. - observability docs: name tool_call_id explicitly in the Langfuse Tool metadata sentence (the feature has both tool_call_id and call_id). - conformance: _assert_langfuse_observation_tree consumes each matched observation, so two same-shape expected siblings can't both bind to one actual observation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements accepted proposal 0063 (spec v0.69.0): makes a caller's tool execution observable. This is the execution-side complement to 0076 (the request side), joined by
tool_call_id, and the last feature of the v0.15.0 cycle. Pin advances v0.68.0 to v0.69.0 (0063 is the only proposal in the delta).The primitive
A model requests tools in its completion; the caller executes them in node-body code (OpenArmature does not run, select, loop, or feed back tools), so that execution was invisible to the observer stream.
with_tool_callis a node-body instrumentation scope, a context manager modeled onwith_active_prompt:You run the tool inside it and report the outcome with
scope.set_result(...). On a clean exit it dispatches aToolCallEvent; if the tool raises, it dispatches aToolCallFailedEventand re-raises (it observes, it does not swallow). I chose the context-manager shape over an inline-wrapping helper to stay consistent with the existingwith_active_promptnode-body primitives.The events
Two typed variants on the graph-engine observer union, mirroring the LLM completion/failure pairing. Both carry the identity/scoping baseline plus
tool_name,tool_call_id(the link back to the requestingLlmCompletionEvent.output_tool_callsentry, or null for a standalone instrumented function),arguments,latency_ms, andcall_id.ToolCallEventaddsresult;ToolCallFailedEventaddserror_type/error_messageand deliberately carries noerror_category(tool code is arbitrary, with no closed failure taxonomy).Rendering
openarmature.tool.callspan (note.call, not.complete) parented under the calling node, with OA-namespaceopenarmature.tool.{name,call.id,call.arguments,call.result}attributes and the standarderror.typeon failure. The Developmentgen_ai.tool.*/execute_toolsurface is mirrored, not emitted in v1, so a future cutover is a prefix swap.Toolobservation (asType="tool", not aGeneration) under the node's Span observation, with arguments / result as input / output and the tool name / call id in metadata, ERROR level on failure. This is python's first non-Span/Generation observation type, sotool()was added to the client Protocol, the in-memory recorder, and the SDK adapter.argumentsandresultare payload, gated bydisable_provider_payload(no new flag);disable_llm_spansdoes not gate the tool span.Two fixes from the self-review
default=str, so an opaque tool result that JSON cannot natively encode (a Pydantic model, a datetime) renders via itsstr()instead of raising inside the observer and losing the whole span/observation.LangfuseTool), so the live observation's duration reflects the tool latency, matching the Generation path.Scope
v1 ships the inline bracketing form only. The deferred start/complete split (a tool result landing in a later turn) is a spec MAY with no fixture, deferred until a consumer needs event-driven tool execution; inline-only is fully conformant.
Tests
calls_tooldirective and therecordstate type; 092-095 defer-parse like the 050-056 typed-collector fixtures.mkdocs build --strictclean.