Skip to content

Add opt-in OTel GenAI metrics (0067)#177

Merged
chris-colinsky merged 2 commits into
mainfrom
feature/0067-genai-metrics
Jun 22, 2026
Merged

Add opt-in OTel GenAI metrics (0067)#177
chris-colinsky merged 2 commits into
mainfrom
feature/0067-genai-metrics

Conversation

@chris-colinsky

Copy link
Copy Markdown
Member

Implements accepted proposal 0067 (spec v0.68.0): adds the OpenTelemetry metrics signal to the bundled OTel observer, opt in with enable_metrics. Pin advances v0.67.0 to v0.68.0 (0067 is the only proposal in the delta).

What changed

Two OA-namespaced histogram instruments over provider calls, recorded only when enable_metrics=True (default off):

  • openarmature.gen_ai.client.token.usage ({token}): per LLM completion, two observations, the input and output token counts (tagged openarmature.gen_ai.token.type), from the response usage record. Recorded only when the call returned usage.
  • openarmature.gen_ai.client.operation.duration (s): the provider-call wall-clock duration, once per attempt under call-level retry, including a failed attempt (which carries error.type).

Both carry openarmature.gen_ai.operation ("chat"), gen_ai.request.model, and gen_ai.system, with the spec's explicit bucket advisories. The Meter comes from the configured MeterProvider (injectable via meter_provider=...; the OTel global no-op fallback when none is set). Metrics are independent of span emission: they record even with disable_llm_spans=True. Metrics target OTel only (no Langfuse mapping). The instrument names are OA-namespaced, mirroring the upstream gen_ai.client.* instruments (still at Development status), so a future cutover is a mechanical prefix-strip.

Implementation note

The proposal sources metrics from the typed completion/failure events and requires duration "once per attempt". In this implementation the per-attempt event is the internal LlmRetryAttemptEvent (the LLM-span source since 0050), which already carries latency, usage, error category, model, and provider, so metrics record from it: one duration sample per attempt, token usage only when usage is present. The terminal events are not used (they would double-count). This is the same internal-event latitude the spec blessed for the 0050 per-attempt span surface.

Embedding metrics deferred

The proposal's embedding-call metrics (fixture 089) are deferred: the embedding capability (proposal 0059) is unimplemented in python until a later release, so there is no embedding event or provider to record from. conformance.toml records 0067 partial on that basis. The LLM-call fixtures (088 / 090 / 091) are implemented and wired through a private MeterProvider plus an in-memory MetricReader (the conformance-adapter metric-capture primitive); 089 rides the deferred set.

Tests

  • Unit: token + duration emission, error.type on failure, the disabled no-op, span-independence, and once-per-attempt-under-retry (asserting on histogram counts, since identical-dimension observations aggregate).
  • Conformance: 088 / 090 / 091 run; 089 skipped. The fixture-parser schema gained an expected.metrics field, its discriminator key, and a calls_embed node directive so 089 still round-trips.
  • Full suite green; ruff + pyright clean; mkdocs build --strict clean.

The OTel observer can now emit the metrics signal alongside its spans:
two histogram instruments over provider calls, gated by a new
enable_metrics flag (default off, independent of span emission). One
records an LLM completion's input and output token counts; the other
records the call duration, once per attempt under call-level retry and
including a failed attempt (which carries error.type). Both draw from
the per-attempt LlmRetryAttemptEvent, the LLM-span source, so metrics
record even with spans disabled. The Meter comes from the configured
MeterProvider (injectable; falls back to the OTel global no-op when
none is set).

Implements proposal 0067 (observability metrics), LLM path.
Advance the spec pin v0.67.0 -> v0.68.0 across the four sync points
(submodule, __spec_version__, pyproject, conformance manifest) and the
smoke assertion; regenerate the bundled AGENTS.md.

Wire conformance fixtures 088 / 090 / 091 through a new metrics driver
that captures observations via a private MeterProvider plus an
in-memory MetricReader (the conformance-adapter metric-capture
primitive); the embedding fixture 089 is deferred until the embedding
capability lands. Teach the fixture-parser schema the new shapes
(expected.metrics and the calls_embed node directive). Record proposal
0067 partial, document the enable_metrics flag, and add the CHANGELOG
entry.
Copilot AI review requested due to automatic review settings June 22, 2026 18:05

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds opt-in OpenTelemetry metrics emission to the bundled OTelObserver per accepted spec proposal 0067 (spec v0.68.0), alongside the usual span emission, and updates the spec pin + conformance harness to validate the new fixtures.

Changes:

  • Add enable_metrics + meter_provider to OTelObserver, creating/recording two OA-namespaced GenAI histogram instruments from LlmRetryAttemptEvent.
  • Extend conformance + unit tests to capture/validate emitted metrics via a private MeterProvider + InMemoryMetricReader, and add fixture-schema support (expected.metrics, calls_embed for deferred 089).
  • Bump pinned spec version from 0.67.0 → 0.68.0 across runtime, pyproject, conformance manifest, docs, and changelog.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/openarmature/observability/otel/observer.py Implements opt-in metrics instruments + per-attempt recording for duration/token usage.
tests/unit/test_observability_otel.py Adds unit tests asserting metrics emission, disabling behavior, span-independence, and retry attempt counting.
tests/conformance/test_observability.py Wires new metrics fixtures (088/090/091) and adds a metrics fixture driver/capture/assertion helpers.
tests/conformance/harness/expectations.py Extends observability expected schema with metrics.
tests/conformance/harness/directives.py Adds calls_embed directive shape so deferred embedding fixture 089 parses/round-trips.
docs/concepts/observability.md Documents enable_metrics, instruments, dimensions, and meter-provider behavior.
tests/test_smoke.py Updates spec-version assertion to 0.68.0.
pyproject.toml Updates [tool.openarmature].spec_version to 0.68.0.
src/openarmature/__init__.py Updates __spec_version__ to 0.68.0.
src/openarmature/AGENTS.md Updates bundled agent-doc header to spec v0.68.0.
conformance.toml Advances spec_pin and records proposal 0067 as partial with rationale.
CHANGELOG.md Adds release note entry for OTel GenAI metrics and updates spec-pin summary.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/openarmature/observability/otel/observer.py
@chris-colinsky chris-colinsky merged commit 4c7198f into main Jun 22, 2026
7 checks passed
@chris-colinsky chris-colinsky deleted the feature/0067-genai-metrics branch June 22, 2026 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants