LunarCommand · chris-colinsky · Jun 22, 2026 · Jun 22, 2026 · Jun 22, 2026 · Jun 22, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -16,10 +16,11 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
 - **Inline-callable parallel branches and conditional `when`** (proposal 0075, pipeline-utilities §11, spec v0.66.0). `ParallelBranchesNode` gains two additive branch forms. A branch may now give its work as `call`, an inline async function over the parent state returning a parent-shaped partial update, instead of a compiled `subgraph` with its own state schema and `inputs` / `outputs` projection; the returned partial is the branch's contribution directly, merged via the parent reducer with no projection. This makes the primitive adoptable for the "M heterogeneous lightweight parallel calls over shared state, each independently failure-isolated" shape (hybrid recall, paired reads) that previously dropped to a hand-rolled gather, while reusing the existing concurrency, fail-fast cancellation, per-branch failure isolation, and reducer fan-in. A branch gives its work as exactly one of `subgraph` / `call`, and a callable branch declares no `inputs` / `outputs`, else a new compile-time `ParallelBranchesInvalidBranchSpec`; a node may mix the two forms freely. A branch (either form) may also carry an optional `when` predicate over the parent state, evaluated once at dispatch: a `False` result skips the branch entirely (no dispatch, contribution, observer events, or span), and an all-skipped node is a valid no-op distinct from the compile-time `ParallelBranchesNoBranches`. A callable branch is the unit of work, so it emits one `started` / `completed` observer pair keyed by `branch_name` (rendered as a single branch span); a skipped branch emits nothing. `ParallelBranchesInvalidBranchSpec` is exported from `openarmature.graph`. Conformance fixtures 073 (two callable branches merge to disjoint fields), 074 (conditional `when` skips / dispatches), and 075 (callable branch failure-isolation degrade) run in `test_pipeline_utilities`.
 - **Tool-call request observability on LLM spans** (proposal 0076, observability §5.5.1 / §5.5.10 / §5.5.5, spec v0.67.0). The tool calls a model requests in its completion now have an output-side home on the `openarmature.llm.complete` span, closing the gap where they surfaced only incidentally on the next turn's input history. *Which* tools were requested renders by default as three ungated identity projections (the class of `openarmature.llm.model`): `openarmature.llm.output.tool_calls.count`, `.names`, and `.ids`, with `.names` and `.ids` index-aligned in request order and `.count` equal to their length. The full request, arguments included, renders as the payload-gated `openarmature.llm.output.tool_calls`, a JSON `[{id, name, arguments}]` array reusing the input tool-call encoding, surfaced only with `disable_provider_payload=False`. The whole family is emitted only on a tool-calling completion; a completion that requests no tools emits none of it (absence, not `count = 0`). The typed `LlmCompletionEvent` gains an additive `output_tool_calls` field carrying the `ToolCall` records, the source the span attributes render from (in python the OTel span renders from the per-attempt `LlmRetryAttemptEvent`, which carries the field too). This is the request side; the tool-execution complement (a separate `openarmature.tool.call` span) is a later proposal, joined to this one by the `ToolCall.id`. A Langfuse request-side mapping is out of scope. Conformance fixtures 085 (two requested calls surface count / names / ids), 086 (no calls, family absent), and 087 (payload gating: identity survives payload-off while the full serialization is suppressed) run in `test_observability`.
 - **OTel GenAI metrics** (proposal 0067, observability §11, spec v0.68.0). The OTel observer can now emit the OpenTelemetry metrics signal alongside its spans: two histogram instruments over provider calls, opt in with `enable_metrics=True` (default off, independent of span emission). `openarmature.gen_ai.client.token.usage` records an LLM completion's input and output token counts (one observation each, tagged `openarmature.gen_ai.token.type`); `openarmature.gen_ai.client.operation.duration` records the call's wall-clock duration, once per attempt under call-level retry, including a failed attempt (which carries `error.type`). Both carry `openarmature.gen_ai.operation` (`"chat"`), `gen_ai.request.model`, and `gen_ai.system`, and use the spec's explicit bucket advisories. The `Meter` comes from the configured `MeterProvider` (injectable via `meter_provider=...`; the OTel global is the no-op fallback when none is set). The instrument names are OA-namespaced, mirroring the upstream `gen_ai.client.*` instruments (at Development status) so a future cutover is a mechanical prefix-strip; metrics target OTel only (no Langfuse mapping). They are a projection of the per-attempt event stream, so they record with spans disabled. `conformance.toml` records proposal 0067 `partial`: the LLM-call metrics (fixtures 088 / 090 / 091) are implemented, and the embedding-call metrics (fixture 089) are deferred until the embedding capability (proposal 0059) lands. The LLM fixtures run in `test_observability` via an in-memory `MetricReader` capture (the conformance-adapter §6.9 primitive).
+- **Tool-execution observability** (proposal 0063, graph-engine §6 + observability §5.5 / §8.4, spec v0.69.0). A model requests tools in its completion (the request side, proposal 0076); the caller executes them in node-body code, and that execution is now observable. `with_tool_call(tool_name, arguments, tool_call_id=...)` is a node-body instrumentation scope (a context manager, like `with_active_prompt`, exported from `openarmature.observability`): you run the tool inside it and report the outcome with `scope.set_result(...)`. OpenArmature observes the execution and emits a typed `ToolCallEvent` on success or a `ToolCallFailedEvent` (carrying `error_type` / `error_message`, deliberately with no `error_category`) on a raise, then re-raises (it observes, it does not run, select, loop, or swallow). Both events carry the identity / scoping baseline plus `tool_name`, `tool_call_id` (the link back to the requesting `LlmCompletionEvent.output_tool_calls` entry, or `None` for a standalone instrumented function), `arguments`, `latency_ms`, and `call_id`; `ToolCallEvent` adds `result`. The OTel observer renders an `openarmature.tool.call` span parented under the calling node, with OA-namespace `openarmature.tool.{name,call.id,call.arguments,call.result}` attributes and the standard `error.type` on failure; the Development `gen_ai.tool.*` / `execute_tool` surface is mirrored, not emitted in v1. The Langfuse observer renders a dedicated `Tool` observation (`asType="tool"`, not a `Generation`) under the node's Span observation, with the arguments / result as input / output and the tool name / call id in metadata, ERROR level on failure. Arguments and result are payload, gated by `disable_provider_payload` (no new flag); `disable_llm_spans` does not gate the tool span. Conformance fixtures 092-098 run in `test_observability`.
 
 ### Changed
 
-- **Pinned spec advances v0.60.0 → v0.68.0** across the v0.15.0 cycle: v0.61.0 (proposal 0061, the detached-trace invocation span above), v0.62.0 (proposal 0064, the Langfuse session/user population above), v0.63.0 (proposal 0072, the prompt cache control above), the v0.63.1 patch (pipeline-utilities coverage fixtures 070/071 for the already-implemented 0069 / 0070 behavior, no new proposal), and v0.64.0 (proposal 0073, GenAI semconv adoption reconciliation: OA retains `gen_ai.system` despite the upstream rename to `gen_ai.provider.name`; textual-only, with no emitted-attribute or fixture change, so the existing `gen_ai.*` fixtures stand as the retention regression), v0.65.0 (proposal 0074, the failure-isolation `catch` gate above), v0.66.0 (proposal 0075, the inline-callable parallel branches and conditional `when` above), the v0.66.1 patch (an observability §8 call-level-retry Langfuse-mapping clarification reconciling §8 with the per-attempt §5.5 spans: one terminal Generation per `complete()` call, not one per attempt, which the Langfuse observer already renders by driving the Generation from the terminal `LlmCompletionEvent` / `LlmFailedEvent` and skipping the per-attempt `LlmRetryAttemptEvent`; no behavior or fixture change), v0.67.0 (proposal 0076, the tool-call request observability above), and v0.68.0 (proposal 0067, the OTel GenAI metrics above). `conformance.toml` records 0061 / 0072 / 0074 / 0075 / 0076 `implemented`, 0064 `partial` (its `sessionId` half is dormant pending the sessions capability) and 0067 `partial` (its embedding-call metrics await the embedding capability), and 0073 `textual-only`. Proposal 0050 needed no pin bump of its own (it was already within the pin from its v0.42.0 acceptance); its v0.14.0 `partial` entry flips to `implemented` with the per-attempt span surface above.
+- **Pinned spec advances v0.60.0 → v0.69.0** across the v0.15.0 cycle: v0.61.0 (proposal 0061, the detached-trace invocation span above), v0.62.0 (proposal 0064, the Langfuse session/user population above), v0.63.0 (proposal 0072, the prompt cache control above), the v0.63.1 patch (pipeline-utilities coverage fixtures 070/071 for the already-implemented 0069 / 0070 behavior, no new proposal), and v0.64.0 (proposal 0073, GenAI semconv adoption reconciliation: OA retains `gen_ai.system` despite the upstream rename to `gen_ai.provider.name`; textual-only, with no emitted-attribute or fixture change, so the existing `gen_ai.*` fixtures stand as the retention regression), v0.65.0 (proposal 0074, the failure-isolation `catch` gate above), v0.66.0 (proposal 0075, the inline-callable parallel branches and conditional `when` above), the v0.66.1 patch (an observability §8 call-level-retry Langfuse-mapping clarification reconciling §8 with the per-attempt §5.5 spans: one terminal Generation per `complete()` call, not one per attempt, which the Langfuse observer already renders by driving the Generation from the terminal `LlmCompletionEvent` / `LlmFailedEvent` and skipping the per-attempt `LlmRetryAttemptEvent`; no behavior or fixture change), v0.67.0 (proposal 0076, the tool-call request observability above), v0.68.0 (proposal 0067, the OTel GenAI metrics above), and v0.69.0 (proposal 0063, the tool-execution observability above). `conformance.toml` records 0061 / 0072 / 0074 / 0075 / 0076 / 0063 `implemented`, 0064 `partial` (its `sessionId` half is dormant pending the sessions capability) and 0067 `partial` (its embedding-call metrics await the embedding capability), and 0073 `textual-only`. Proposal 0050 needed no pin bump of its own (it was already within the pin from its v0.42.0 acceptance); its v0.14.0 `partial` entry flips to `implemented` with the per-attempt span surface above.
 
 ## [0.14.0] — 2026-06-17
 

diff --git a/conformance.toml b/conformance.toml
@@ -32,7 +32,7 @@
 
 [manifest]
 implementation = "openarmature-python"
-spec_pin = "v0.68.0"
+spec_pin = "v0.69.0"
 
 # Status values:
 #   implemented   — shipped behavior matches the proposal's contract
@@ -706,6 +706,14 @@ status = "implemented"
 since = "0.15.0"
 note = "The OTel observer synthesizes an openarmature.invocation span at the root of each detached trace (a detached subgraph + each detached fan-out instance), carrying the parent's SHARED invocation_id (detached mode is observer-side trace rendering, not a new run) and the detached unit's own entry_node; the detached subgraph / instance span nests under it. A raising detached subgraph surfaces ERROR + the category + an OTel exception event on BOTH the parent dispatch span and the detached invocation span. Observer-side only -- no graph-engine change; the Langfuse observer is unchanged (its Trace entity already plays the invocation-level-container role). Fixtures 008 (rewritten) and 058 (newly wired) run in test_observability."
 
+# Spec v0.69.0 (proposal 0063).  Tool-execution observability (graph-engine
+# §6 instrumentation scope + two typed events; observability §5.5.11 OTel tool
+# span + §8.4.6 Langfuse Tool observation).
+[proposals."0063"]
+status = "implemented"
+since = "0.15.0"
+note = "A node-body tool-call instrumentation scope (with_tool_call, a sync context manager modelled on with_active_prompt) the caller wraps a tool execution in; OA observes (does NOT run / select / loop / feed back). On result it dispatches ToolCallEvent; on raise it dispatches ToolCallFailedEvent and RE-RAISES (observe, don't swallow). The two typed §6 events carry identity/scoping + tool_name / tool_call_id (links back to LlmCompletionEvent.output_tool_calls, null for a standalone instrumented function) / arguments / latency_ms / call_id; ToolCallEvent adds result, ToolCallFailedEvent adds error_type + error_message and deliberately NO error_category (tool code has no closed §7 taxonomy). OTel: an openarmature.tool.call span (note .call, not .complete) parented under the calling node, OA-namespace openarmature.tool.{name,call.id,call.arguments,call.result} attrs + standard error.type on failure (ERROR status + exception event); the Development gen_ai.tool.* / execute_tool surface is mirrored, NOT emitted in v1. Langfuse: the dedicated Tool observation (asType=tool) -- python's first non-Span/Generation observation type -- input=arguments, output=result, tool_name/tool_call_id in metadata, ERROR level + error fields on failure. arguments/result are payload, gated by disable_provider_payload (no new flag); disable_llm_spans does not gate the tool span. v1 ships the inline bracketing form; the deferred start/complete split is a spec MAY, not yet needed. Fixtures 092-098 run in test_observability (092-095 typed-event-collector, 096/097 OTel span_tree, 098 Langfuse Tool observation)."
+
 # Spec v0.62.0 (proposal 0064).  Langfuse trace.sessionId / trace.userId
 # population (observability §8.4.1 / §8.10).
 [proposals."0064"]

diff --git a/docs/concepts/observability.md b/docs/concepts/observability.md
@@ -871,6 +871,50 @@ The instrument names are OA-namespaced, mirroring the upstream
 cutover is a mechanical prefix-strip. Metrics target OTel only; there is
 no Langfuse mapping.
 
+### Tool-execution observability (`with_tool_call`)
+
+A model requests tools in its completion (the `output_tool_calls` above);
+the *caller* executes them in node-body code. OpenArmature does not run,
+choose, loop, or feed back tools (that orchestration stays in your graph),
+but it can observe a tool execution you wrap in the `with_tool_call`
+instrumentation scope:
+
+```python
+from openarmature.observability import with_tool_call
+
+async def run_tools(state: AgentState) -> dict:
+    with with_tool_call("get_weather", {"city": "Paris"}, tool_call_id="call_abc") as scope:
+        result = await get_weather(city="Paris")
+        scope.set_result(result)
+    return {"weather": result}
+```
+
+`with_tool_call` is a context manager (like `with_active_prompt`): you run
+the tool inside it and report the outcome with `scope.set_result(...)`. On a
+clean exit it dispatches a `ToolCallEvent`; if the tool raises, it dispatches
+a `ToolCallFailedEvent` and re-raises (it observes, it does not swallow, so
+your node body still sees the exception). `tool_call_id` links the execution
+back to the `output_tool_calls` entry that requested it, or is omitted for a
+standalone instrumented function.
+
+The events render on both backends:
+
+- OTel: an `openarmature.tool.call` span parented under the calling node,
+  carrying `openarmature.tool.name`, `openarmature.tool.call.id`, and (when
+  payload is on) `openarmature.tool.call.arguments` / `.result`. A failure
+  sets ERROR status with the standard `error.type` attribute.
+- Langfuse: a dedicated `Tool` observation (not a Generation) under the
+  node's Span observation, with the arguments / result as input / output and
+  the tool name and `tool_call_id` in metadata; a failure renders at ERROR
+  level.
+
+The arguments and result are payload, gated by `disable_provider_payload`
+exactly like the LLM payload attributes (default off keeps tool inputs and
+outputs out of traces). `disable_llm_spans` does not affect tool spans. The
+`openarmature.tool.*` attribute names mirror the upstream Development
+`gen_ai.tool.*` surface, which OpenArmature does not emit in v1, so a future
+cutover is a prefix swap.
+
 ### Identifying the service: `Resource`
 
 Pass an `opentelemetry.sdk.resources.Resource` to set
@@ -1044,7 +1088,8 @@ appear dropped. Two workarounds:
 A second sibling observer maps the same `NodeEvent` stream onto
 Langfuse's native Trace + Observation data model: Traces at the
 top, Span observations for graph nodes, Generation observations for
-LLM calls. Use it instead of (or alongside) the OTel observer when
+LLM calls, and Tool observations for instrumented tool executions.
+Use it instead of (or alongside) the OTel observer when
 your trace UI is Langfuse and you want first-class Generation
 rendering without going through Langfuse's OTLP ingest.
 
@@ -1106,7 +1151,7 @@ for a runnable demo.
 
     Earlier SDK versions (v2.x, v3.x) are NOT supported. Projects on
     those versions either upgrade to v4 or supply their own adapter
-    matching the `LangfuseClient` Protocol's four methods.
+    matching the `LangfuseClient` Protocol.
 
     A runtime `isinstance(adapter, LangfuseClient)` check ships in
     the unit suite, so if a future v4 patch breaks the Protocol's

diff --git a/openarmature-spec b/openarmature-spec
diff --git a/pyproject.toml b/pyproject.toml
@@ -63,7 +63,7 @@ Specification = "https://github.com/LunarCommand/openarmature-spec"
 openarmature = "openarmature.cli:main"
 
 [tool.openarmature]
-spec_version = "0.68.0"
+spec_version = "0.69.0"
 
 [dependency-groups]
 dev = [

diff --git a/src/openarmature/AGENTS.md b/src/openarmature/AGENTS.md
@@ -1,6 +1,6 @@
 # OpenArmature — Agent documentation
 
-*This is the agent guide bundled with the openarmature Python package, version 0.14.0 (spec v0.68.0). For the full docs site see [openarmature.ai](https://openarmature.ai). For the canonical spec text see [openarmature.org/capabilities](https://openarmature.org/capabilities/). For project-specific conventions for the code you're editing, see the host project's `AGENTS.md` or `CLAUDE.md`.*
+*This is the agent guide bundled with the openarmature Python package, version 0.14.0 (spec v0.69.0). For the full docs site see [openarmature.ai](https://openarmature.ai). For the canonical spec text see [openarmature.org/capabilities](https://openarmature.org/capabilities/). For project-specific conventions for the code you're editing, see the host project's `AGENTS.md` or `CLAUDE.md`.*
 
 ## TL;DR
 
@@ -10,7 +10,7 @@ OpenArmature is a workflow framework for LLM pipelines and tool-calling agents:
 
 ## Capability contracts
 
-_Sourced from openarmature-spec v0.68.0. Each entry below reproduces §1 (Purpose) and §2 (Concepts) of the capability's `spec.md` verbatim — including additions from accepted proposals that this Python implementation may not yet ship. For per-proposal implementation status (implemented / partial / textual-only / not-yet), see the `conformance.toml` manifest at the repo root. For the full spec text (execution model, error semantics, determinism, observer hooks, etc.) see the linked docs site._
+_Sourced from openarmature-spec v0.69.0. Each entry below reproduces §1 (Purpose) and §2 (Concepts) of the capability's `spec.md` verbatim — including additions from accepted proposals that this Python implementation may not yet ship. For per-proposal implementation status (implemented / partial / textual-only / not-yet), see the `conformance.toml` manifest at the repo root. For the full spec text (execution model, error semantics, determinism, observer hooks, etc.) see the linked docs site._
 
 ### Capability: `graph-engine`
 

diff --git a/src/openarmature/__init__.py b/src/openarmature/__init__.py
@@ -25,7 +25,7 @@
 """
 
 __version__ = "0.14.0"
-__spec_version__ = "0.68.0"
+__spec_version__ = "0.69.0"
 # Proposal 0052 (spec observability §5.1 / §8.4.1): canonical
 # package-registry name for this implementation. Surfaces on every
 # OTel invocation span as ``openarmature.implementation.name`` and on
+13 −0		CHANGELOG.md
+2 −3		README.md
+8 −0		docs/compatibility.md
+3 −3		docs/proposals.md
+60 −45		proposals/0063-tool-execution-observability.md
+118 −0		spec/graph-engine/spec.md
+24 −0		spec/observability/conformance/092-tool-call-event-dispatch.md
+60 −0		spec/observability/conformance/092-tool-call-event-dispatch.yaml
+27 −0		spec/observability/conformance/093-tool-call-failed-event-dispatch.md
+55 −0		spec/observability/conformance/093-tool-call-failed-event-dispatch.yaml
+20 −0		spec/observability/conformance/094-tool-call-event-mutual-exclusion.md
+70 −0		spec/observability/conformance/094-tool-call-event-mutual-exclusion.yaml
+26 −0		spec/observability/conformance/095-tool-call-id-links-to-llm-request.md
+93 −0		spec/observability/conformance/095-tool-call-id-links-to-llm-request.yaml
+23 −0		spec/observability/conformance/096-tool-call-payload-gating.md
+85 −0		spec/observability/conformance/096-tool-call-payload-gating.yaml
+24 −0		spec/observability/conformance/097-otel-tool-span-attributes.md
+56 −0		spec/observability/conformance/097-otel-tool-span-attributes.yaml
+25 −0		spec/observability/conformance/098-langfuse-tool-observation.md
+90 −0		spec/observability/conformance/098-langfuse-tool-observation.yaml
+102 −9		spec/observability/spec.md