Skip to content

feat(otel): xtest tracing plugin + otdf-local Jaeger wiring (DSPX-3635)#549

Draft
dmihalcik-virtru wants to merge 1 commit into
mainfrom
DSPX-3635-otelo
Draft

feat(otel): xtest tracing plugin + otdf-local Jaeger wiring (DSPX-3635)#549
dmihalcik-virtru wants to merge 1 commit into
mainfrom
DSPX-3635-otelo

Conversation

@dmihalcik-virtru

Copy link
Copy Markdown
Member

DSPX-3635 — End-to-end OpenTelemetry tracing (test harness)

Part of the DSPX-3635 effort to thread a single distributed trace from the xtest
pytest harness → SDK CLI → platform/KAS, so a failing test links directly to one
Jaeger trace showing the full request chain.

Companion PR: opentdf/platform#3722 (Go CLI propagation + platform log correlation)

What's here

  • otdf-local:
    • up --tracing starts the existing Jaeger tracing docker-compose profile (UI :16686, OTLP :4317) — no new container.
    • Injects server.trace OTLP config into the generated platform and each KAS config (settings.trace_config_updates()), reusing the existing copy_yaml_with_updates seam.
    • env exports OTEL_EXPORTER_OTLP_ENDPOINT when the running config has tracing enabled.
  • xtest:
    • New fixtures/tracing.py — wraps each test in a pytest.test span (attrs: test.name/sdk/container), injects TRACEPARENT into the env the SDK CLI subprocess inherits, and prints Trace: <jaeger-url> on failure.
    • Enabled by --tracing or OTEL_EXPORTER_OTLP_ENDPOINT; a strict no-op otherwise (no OTel imports/cost).

Testing

  • ruff check / ruff format / pyright clean on otdf-local and xtest.
  • Plugin imports with real deps; --tracing registered on both pytest and otdf-local up.

How to run the loop

cd tests && uv run otdf-local up --tracing
cd xtest && OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4317 uv run pytest --sdks go test_tdfs.py -v --tracing

Expect a Trace: URL per test showing pytest.test → cli.encrypt/decrypt → platform Rewrap → KAS Rewrap, with matching trace_id in the platform/KAS logs.

Out of scope (follow-ups)

  • Java and JS SDK CLI propagation (needs the companion PR's pattern extended to those SDKs).

Draft: pairs with opentdf/platform#3722.

Add the client half of DSPX-3635 end-to-end tracing.

- otdf-local: `up --tracing` starts the Jaeger docker-compose profile and
  injects OTLP trace config into the generated platform and KAS configs;
  `env` exports OTEL_EXPORTER_OTLP_ENDPOINT when tracing is active.
- xtest: new fixtures/tracing.py wraps each test in a pytest.test span,
  propagates TRACEPARENT to the SDK CLI subprocess, and prints a Jaeger trace
  URL on failure. Enabled by --tracing or OTEL_EXPORTER_OTLP_ENDPOINT; a strict
  no-op otherwise.

Signed-off-by: Dave Mihalcik <[email protected]>
@coderabbitai

coderabbitai Bot commented Jul 3, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 27fc4454-94e5-43b4-ae77-f8d05b21d458

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch DSPX-3635-otelo

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@sonarqubecloud

sonarqubecloud Bot commented Jul 3, 2026

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces opt-in end-to-end OpenTelemetry tracing support for the local test environment and integration tests, enabling trace propagation and Jaeger UI integration. The feedback highlights a critical issue where the Jaeger container can be left running after stopping services because the tracing profile is omitted during 'docker compose down'. Additionally, suggestions are provided to improve type safety in the tracing fixture by using 'Any' instead of 'object' to eliminate type ignores, and to expand test failure detection to cover the setup and teardown phases rather than just the main test execution.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +35 to +42
def _compose_cmd(self, *args: str) -> list[str]:
"""Build a `docker compose` command, opting into the `tracing` profile
(which includes the Jaeger all-in-one container) when tracing is enabled.
"""
cmd = ["docker", "compose", "-f", str(self._compose_file)]
if self.settings.tracing:
cmd += ["--profile", "tracing"]
return cmd + list(args)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

When running otdf-local down, the CLI runs in a new process where self.settings.tracing is False (since --tracing is only an option on the up command). Consequently, _compose_cmd will not include the tracing profile, and the Jaeger container will be left running (orphaned) after down. We should ensure the tracing profile is included when stopping services (i.e., when "down" is in the arguments).

Suggested change
def _compose_cmd(self, *args: str) -> list[str]:
"""Build a `docker compose` command, opting into the `tracing` profile
(which includes the Jaeger all-in-one container) when tracing is enabled.
"""
cmd = ["docker", "compose", "-f", str(self._compose_file)]
if self.settings.tracing:
cmd += ["--profile", "tracing"]
return cmd + list(args)
def _compose_cmd(self, *args: str) -> list[str]:
"""Build a `docker compose` command, opting into the `tracing` profile
(which includes the Jaeger all-in-one container) when tracing is enabled.
"""
cmd = ["docker", "compose", "-f", str(self._compose_file)]
if self.settings.tracing or "down" in args:
cmd += ["--profile", "tracing"]
return cmd + list(args)

Comment thread xtest/fixtures/tracing.py
Comment on lines +15 to +18
import logging
import os
from collections.abc import Iterator
from dataclasses import dataclass

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Import Any from typing to allow proper typing of the TracingSession dataclass fields without requiring module-level imports of OpenTelemetry packages.

Suggested change
import logging
import os
from collections.abc import Iterator
from dataclasses import dataclass
import logging
import os
from collections.abc import Iterator
from dataclasses import dataclass
from typing import Any

Comment thread xtest/fixtures/tracing.py
Comment on lines +34 to +35
tracer: object # opentelemetry.trace.Tracer
provider: object # opentelemetry.sdk.trace.TracerProvider

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Typing tracer and provider as object requires using # type: ignore[attr-defined] when calling methods on them. Typing them as Any avoids type-checking errors and allows removing the type ignores.

Suggested change
tracer: object # opentelemetry.trace.Tracer
provider: object # opentelemetry.sdk.trace.TracerProvider
tracer: Any # opentelemetry.trace.Tracer
provider: Any # opentelemetry.sdk.trace.TracerProvider

Comment thread xtest/fixtures/tracing.py
from opentelemetry.propagate import inject

tracer = _tracing.tracer
with tracer.start_as_current_span("pytest.test") as span: # type: ignore[attr-defined]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

With tracer typed as Any, the # type: ignore[attr-defined] is no longer necessary and can be safely removed.

Suggested change
with tracer.start_as_current_span("pytest.test") as span: # type: ignore[attr-defined]
with tracer.start_as_current_span("pytest.test") as span:

Comment thread xtest/fixtures/tracing.py
Comment on lines +139 to +142
rep = getattr(request.node, "rep_call", None)
if rep is not None and rep.failed:
span.set_status(trace.Status(trace.StatusCode.ERROR))
print(f"\nTrace: {trace_url}")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Currently, only rep_call is checked to determine if the test failed. However, integration tests can also fail during the setup or teardown phases (e.g., if a fixture fails to initialize or clean up). We should check all three phases (setup, call, teardown) to ensure the span status is correctly set to ERROR and the trace URL is printed for any failure.

Suggested change
rep = getattr(request.node, "rep_call", None)
if rep is not None and rep.failed:
span.set_status(trace.Status(trace.StatusCode.ERROR))
print(f"\nTrace: {trace_url}")
failed = False
for phase in ("setup", "call", "teardown"):
rep = getattr(request.node, f"rep_{phase}", None)
if rep is not None and rep.failed:
failed = True
break
if failed:
span.set_status(trace.Status(trace.StatusCode.ERROR))
print(f"\nTrace: {trace_url}")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant