LunarCommand · chris-colinsky · Jun 22, 2026 · Jun 22, 2026 · Jun 22, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -15,10 +15,11 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
 - **Failure-isolation `catch` gate + cause-chain classification primitive** (proposal 0074, pipeline-utilities §6.3 / §6.4, spec v0.65.0). `FailureIsolationMiddleware` gains an optional `catch`: a set of error categories. An exception is caught only if the *derived category* of its cause chain (the outermost non-carrier link's category, resolved through the engine's `node_exception` carriers, the same value reported as `caught_exception.category`) is in the set. This closes a degrade-into-crash footgun: at a wrapping placement (subgraph, fan-out instance, branch) the engine wraps the originating failure in a carrier, so a `predicate` inspecting the surface exception sees only the carrier and misses it, whereas `catch` classifies through the carrier. `catch` composes with `predicate` as a conjunction; both default permissive (both unset stays catch-all), and a null derived category never matches a non-empty set. The carrier-skipping walk behind `catch` and `caught_exception` is promoted to a public primitive, `classify_cause_chain(exc) -> CaughtException` (the ordered `chain`, the derived `category`, and its `message` — the same record the event carries), exported from `openarmature.graph` for use in a custom `predicate`, a router, a metric, or a full-chain retry classifier. The default retry classifier stays deliberately single-level (it classifies at re-attempt granularity); this is now documented, with no behavior change. Conformance fixture 072 (catch matches through an instance-placement carrier and degrades; a non-matching catch propagates with no event). The optional native-exception-type `catch` form (spec MAY) is not shipped.
 - **Inline-callable parallel branches and conditional `when`** (proposal 0075, pipeline-utilities §11, spec v0.66.0). `ParallelBranchesNode` gains two additive branch forms. A branch may now give its work as `call`, an inline async function over the parent state returning a parent-shaped partial update, instead of a compiled `subgraph` with its own state schema and `inputs` / `outputs` projection; the returned partial is the branch's contribution directly, merged via the parent reducer with no projection. This makes the primitive adoptable for the "M heterogeneous lightweight parallel calls over shared state, each independently failure-isolated" shape (hybrid recall, paired reads) that previously dropped to a hand-rolled gather, while reusing the existing concurrency, fail-fast cancellation, per-branch failure isolation, and reducer fan-in. A branch gives its work as exactly one of `subgraph` / `call`, and a callable branch declares no `inputs` / `outputs`, else a new compile-time `ParallelBranchesInvalidBranchSpec`; a node may mix the two forms freely. A branch (either form) may also carry an optional `when` predicate over the parent state, evaluated once at dispatch: a `False` result skips the branch entirely (no dispatch, contribution, observer events, or span), and an all-skipped node is a valid no-op distinct from the compile-time `ParallelBranchesNoBranches`. A callable branch is the unit of work, so it emits one `started` / `completed` observer pair keyed by `branch_name` (rendered as a single branch span); a skipped branch emits nothing. `ParallelBranchesInvalidBranchSpec` is exported from `openarmature.graph`. Conformance fixtures 073 (two callable branches merge to disjoint fields), 074 (conditional `when` skips / dispatches), and 075 (callable branch failure-isolation degrade) run in `test_pipeline_utilities`.
 - **Tool-call request observability on LLM spans** (proposal 0076, observability §5.5.1 / §5.5.10 / §5.5.5, spec v0.67.0). The tool calls a model requests in its completion now have an output-side home on the `openarmature.llm.complete` span, closing the gap where they surfaced only incidentally on the next turn's input history. *Which* tools were requested renders by default as three ungated identity projections (the class of `openarmature.llm.model`): `openarmature.llm.output.tool_calls.count`, `.names`, and `.ids`, with `.names` and `.ids` index-aligned in request order and `.count` equal to their length. The full request, arguments included, renders as the payload-gated `openarmature.llm.output.tool_calls`, a JSON `[{id, name, arguments}]` array reusing the input tool-call encoding, surfaced only with `disable_provider_payload=False`. The whole family is emitted only on a tool-calling completion; a completion that requests no tools emits none of it (absence, not `count = 0`). The typed `LlmCompletionEvent` gains an additive `output_tool_calls` field carrying the `ToolCall` records, the source the span attributes render from (in python the OTel span renders from the per-attempt `LlmRetryAttemptEvent`, which carries the field too). This is the request side; the tool-execution complement (a separate `openarmature.tool.call` span) is a later proposal, joined to this one by the `ToolCall.id`. A Langfuse request-side mapping is out of scope. Conformance fixtures 085 (two requested calls surface count / names / ids), 086 (no calls, family absent), and 087 (payload gating: identity survives payload-off while the full serialization is suppressed) run in `test_observability`.
+- **OTel GenAI metrics** (proposal 0067, observability §11, spec v0.68.0). The OTel observer can now emit the OpenTelemetry metrics signal alongside its spans: two histogram instruments over provider calls, opt in with `enable_metrics=True` (default off, independent of span emission). `openarmature.gen_ai.client.token.usage` records an LLM completion's input and output token counts (one observation each, tagged `openarmature.gen_ai.token.type`); `openarmature.gen_ai.client.operation.duration` records the call's wall-clock duration, once per attempt under call-level retry, including a failed attempt (which carries `error.type`). Both carry `openarmature.gen_ai.operation` (`"chat"`), `gen_ai.request.model`, and `gen_ai.system`, and use the spec's explicit bucket advisories. The `Meter` comes from the configured `MeterProvider` (injectable via `meter_provider=...`; the OTel global is the no-op fallback when none is set). The instrument names are OA-namespaced, mirroring the upstream `gen_ai.client.*` instruments (at Development status) so a future cutover is a mechanical prefix-strip; metrics target OTel only (no Langfuse mapping). They are a projection of the per-attempt event stream, so they record with spans disabled. `conformance.toml` records proposal 0067 `partial`: the LLM-call metrics (fixtures 088 / 090 / 091) are implemented, and the embedding-call metrics (fixture 089) are deferred until the embedding capability (proposal 0059) lands. The LLM fixtures run in `test_observability` via an in-memory `MetricReader` capture (the conformance-adapter §6.9 primitive).
 
 ### Changed
 
-- **Pinned spec advances v0.60.0 → v0.67.0** across the v0.15.0 cycle: v0.61.0 (proposal 0061, the detached-trace invocation span above), v0.62.0 (proposal 0064, the Langfuse session/user population above), v0.63.0 (proposal 0072, the prompt cache control above), the v0.63.1 patch (pipeline-utilities coverage fixtures 070/071 for the already-implemented 0069 / 0070 behavior, no new proposal), and v0.64.0 (proposal 0073, GenAI semconv adoption reconciliation: OA retains `gen_ai.system` despite the upstream rename to `gen_ai.provider.name`; textual-only, with no emitted-attribute or fixture change, so the existing `gen_ai.*` fixtures stand as the retention regression), v0.65.0 (proposal 0074, the failure-isolation `catch` gate above), v0.66.0 (proposal 0075, the inline-callable parallel branches and conditional `when` above), the v0.66.1 patch (an observability §8 call-level-retry Langfuse-mapping clarification reconciling §8 with the per-attempt §5.5 spans: one terminal Generation per `complete()` call, not one per attempt, which the Langfuse observer already renders by driving the Generation from the terminal `LlmCompletionEvent` / `LlmFailedEvent` and skipping the per-attempt `LlmRetryAttemptEvent`; no behavior or fixture change), and v0.67.0 (proposal 0076, the tool-call request observability above). `conformance.toml` records 0061 / 0072 / 0074 / 0075 / 0076 `implemented`, 0064 `partial` (its `sessionId` half is dormant pending the sessions capability), and 0073 `textual-only`. Proposal 0050 needed no pin bump of its own (it was already within the pin from its v0.42.0 acceptance); its v0.14.0 `partial` entry flips to `implemented` with the per-attempt span surface above.
+- **Pinned spec advances v0.60.0 → v0.68.0** across the v0.15.0 cycle: v0.61.0 (proposal 0061, the detached-trace invocation span above), v0.62.0 (proposal 0064, the Langfuse session/user population above), v0.63.0 (proposal 0072, the prompt cache control above), the v0.63.1 patch (pipeline-utilities coverage fixtures 070/071 for the already-implemented 0069 / 0070 behavior, no new proposal), and v0.64.0 (proposal 0073, GenAI semconv adoption reconciliation: OA retains `gen_ai.system` despite the upstream rename to `gen_ai.provider.name`; textual-only, with no emitted-attribute or fixture change, so the existing `gen_ai.*` fixtures stand as the retention regression), v0.65.0 (proposal 0074, the failure-isolation `catch` gate above), v0.66.0 (proposal 0075, the inline-callable parallel branches and conditional `when` above), the v0.66.1 patch (an observability §8 call-level-retry Langfuse-mapping clarification reconciling §8 with the per-attempt §5.5 spans: one terminal Generation per `complete()` call, not one per attempt, which the Langfuse observer already renders by driving the Generation from the terminal `LlmCompletionEvent` / `LlmFailedEvent` and skipping the per-attempt `LlmRetryAttemptEvent`; no behavior or fixture change), v0.67.0 (proposal 0076, the tool-call request observability above), and v0.68.0 (proposal 0067, the OTel GenAI metrics above). `conformance.toml` records 0061 / 0072 / 0074 / 0075 / 0076 `implemented`, 0064 `partial` (its `sessionId` half is dormant pending the sessions capability) and 0067 `partial` (its embedding-call metrics await the embedding capability), and 0073 `textual-only`. Proposal 0050 needed no pin bump of its own (it was already within the pin from its v0.42.0 acceptance); its v0.14.0 `partial` entry flips to `implemented` with the per-attempt span surface above.
 
 ## [0.14.0] — 2026-06-17
 

diff --git a/conformance.toml b/conformance.toml
@@ -32,7 +32,7 @@
 
 [manifest]
 implementation = "openarmature-python"
-spec_pin = "v0.67.0"
+spec_pin = "v0.68.0"
 
 # Status values:
 #   implemented   — shipped behavior matches the proposal's contract
@@ -640,6 +640,13 @@ since = "0.14.0"
 status = "implemented"
 since = "0.14.0"
 
+# Spec v0.68.0 (proposal 0067).  OTel GenAI metrics (observability §11 new
+# Metrics section + conformance-adapter §6.9 metric-capture primitive).
+[proposals."0067"]
+status = "partial"
+since = "0.15.0"
+note = "OTel GenAI metrics (observability §11): an opt-in enable_metrics flag (default off, normative name) on the bundled OTelObserver, independent of span emission (§11.1). When on, two OA-namespaced histograms record per provider-call ATTEMPT from the python-internal LlmRetryAttemptEvent (the per-attempt LLM-span source since 0050): openarmature.gen_ai.client.token.usage ({token}; two observations -- input + output token counts from the response usage record, openarmature.gen_ai.token.type dim) and openarmature.gen_ai.client.operation.duration (s; once per attempt INCLUDING failed attempts, error.type dim on failure), both configured with the §11.2 explicit bucket advisories. Dimensions: openarmature.gen_ai.operation ('chat'), gen_ai.request.model + gen_ai.system (recognized-core, used directly), openarmature.gen_ai.token.type, error.type. The Meter comes from the configured MeterProvider (injectable; falls back to the OTel global, which is the no-op meter when none is set). PARTIAL: the embedding-call metrics (the §11 embedding path, fixture 089) are deferred -- the embedding capability (proposal 0059, observability §5.5.8 / §5.5.9) is unimplemented in python until v0.16.0, so there is no embedding event/provider to record from. The LLM path (fixtures 088 / 090 / 091) is implemented and wired via a private MeterProvider + InMemoryMetricReader (the §6.9 metric-capture primitive). No Langfuse change (metrics are OTel-only). Streaming / server / rerank metrics + the cutover to the upstream gen_ai.client.* instrument names are out of scope per the proposal."
+
 # Spec v0.57.0 (proposal 0068).  Failure-isolation event structured cause
 # chain (pipeline-utilities §6.3).  ``caught_exception`` gains a ``chain`` of
 # cause links (``{category, message, carrier}``, outermost->innermost), with

diff --git a/docs/concepts/observability.md b/docs/concepts/observability.md
@@ -828,6 +828,49 @@ form, so custom observers consuming
 cannot accidentally leak raw bytes regardless of how they're
 written.
 
+### GenAI metrics (`enable_metrics`)
+
+Spans answer "what happened on this one call"; metrics answer "what is
+the token throughput and latency across all calls". The OTel observer
+can emit two histogram instruments over provider calls. Opt in with
+`enable_metrics=True` (default off):
+
+```python
+observer = OTelObserver(
+    span_processor=SimpleSpanProcessor(exporter),
+    enable_metrics=True,
+)
+```
+
+When enabled, the observer obtains a `Meter` from the configured
+`MeterProvider`. Pass `meter_provider=...` to use a private one;
+otherwise it falls back to the OTel global, and recording is a silent
+no-op when no provider is configured. The two instruments:
+
+- `openarmature.gen_ai.client.token.usage` (unit `{token}`). Per LLM
+  completion it records two observations: the input-token count, tagged
+  `openarmature.gen_ai.token.type="input"`, and the output-token count,
+  tagged `"output"`, sourced from the response usage record. Recorded
+  only when the call returned usage.
+- `openarmature.gen_ai.client.operation.duration` (unit `s`). The
+  provider-call wall-clock duration, one observation per attempt. A
+  failed attempt records too, carrying `error.type`.
+
+Both carry `openarmature.gen_ai.operation` (`"chat"`),
+`gen_ai.request.model`, and `gen_ai.system`. Under call-level retry the
+duration instrument records once per attempt; the token instrument
+records only for attempts that returned usage.
+
+**Metrics are independent of spans.** `enable_metrics` is orthogonal to
+the `disable_llm_spans` / `disable_provider_payload` flags: you can
+record metrics with spans off, or emit spans with metrics off. Both draw
+from the same event stream.
+
+The instrument names are OA-namespaced, mirroring the upstream
+`gen_ai.client.*` instruments (still at Development status), so a future
+cutover is a mechanical prefix-strip. Metrics target OTel only; there is
+no Langfuse mapping.
+
 ### Identifying the service: `Resource`
 
 Pass an `opentelemetry.sdk.resources.Resource` to set

diff --git a/openarmature-spec b/openarmature-spec
diff --git a/pyproject.toml b/pyproject.toml
@@ -63,7 +63,7 @@ Specification = "https://github.com/LunarCommand/openarmature-spec"
 openarmature = "openarmature.cli:main"
 
 [tool.openarmature]
-spec_version = "0.67.0"
+spec_version = "0.68.0"
 
 [dependency-groups]
 dev = [

diff --git a/src/openarmature/AGENTS.md b/src/openarmature/AGENTS.md
@@ -1,6 +1,6 @@
 # OpenArmature — Agent documentation
 
-*This is the agent guide bundled with the openarmature Python package, version 0.14.0 (spec v0.67.0). For the full docs site see [openarmature.ai](https://openarmature.ai). For the canonical spec text see [openarmature.org/capabilities](https://openarmature.org/capabilities/). For project-specific conventions for the code you're editing, see the host project's `AGENTS.md` or `CLAUDE.md`.*
+*This is the agent guide bundled with the openarmature Python package, version 0.14.0 (spec v0.68.0). For the full docs site see [openarmature.ai](https://openarmature.ai). For the canonical spec text see [openarmature.org/capabilities](https://openarmature.org/capabilities/). For project-specific conventions for the code you're editing, see the host project's `AGENTS.md` or `CLAUDE.md`.*
 
 ## TL;DR
 
@@ -10,7 +10,7 @@ OpenArmature is a workflow framework for LLM pipelines and tool-calling agents:
 
 ## Capability contracts
 
-_Sourced from openarmature-spec v0.67.0. Each entry below reproduces §1 (Purpose) and §2 (Concepts) of the capability's `spec.md` verbatim — including additions from accepted proposals that this Python implementation may not yet ship. For per-proposal implementation status (implemented / partial / textual-only / not-yet), see the `conformance.toml` manifest at the repo root. For the full spec text (execution model, error semantics, determinism, observer hooks, etc.) see the linked docs site._
+_Sourced from openarmature-spec v0.68.0. Each entry below reproduces §1 (Purpose) and §2 (Concepts) of the capability's `spec.md` verbatim — including additions from accepted proposals that this Python implementation may not yet ship. For per-proposal implementation status (implemented / partial / textual-only / not-yet), see the `conformance.toml` manifest at the repo root. For the full spec text (execution model, error semantics, determinism, observer hooks, etc.) see the linked docs site._
 
 ### Capability: `graph-engine`
 

diff --git a/src/openarmature/__init__.py b/src/openarmature/__init__.py
@@ -25,7 +25,7 @@
 """
 
 __version__ = "0.14.0"
-__spec_version__ = "0.67.0"
+__spec_version__ = "0.68.0"
 # Proposal 0052 (spec observability §5.1 / §8.4.1): canonical
 # package-registry name for this implementation. Surfaces on every
 # OTel invocation span as ``openarmature.implementation.name`` and on
+16 −0		CHANGELOG.md
+2 −3		README.md
+8 −0		docs/compatibility.md
+2 −2		docs/proposals.md
+16 −13		proposals/0067-observability-genai-metrics.md
+26 −0		spec/conformance-adapter/spec.md
+28 −0		spec/observability/conformance/088-llm-metrics-token-and-duration.md
+62 −0		spec/observability/conformance/088-llm-metrics-token-and-duration.yaml
+24 −0		spec/observability/conformance/089-embedding-metrics-token-and-duration.md
+53 −0		spec/observability/conformance/089-embedding-metrics-token-and-duration.yaml
+26 −0		spec/observability/conformance/090-metrics-error-type-on-duration.md
+43 −0		spec/observability/conformance/090-metrics-error-type-on-duration.yaml
+21 −0		spec/observability/conformance/091-metrics-disabled-no-measurements.md
+38 −0		spec/observability/conformance/091-metrics-disabled-no-measurements.yaml
+117 −3		spec/observability/spec.md