Skip to content

Add OpenTelemetry support for traces (and metrics) #194

Description

@stoopman

Summary

The Edge Proxy currently has no OpenTelemetry integration. For self-hosted deployments that run Edge Proxy as a critical path component (flag evaluation in front of apps), this makes it hard to observe latency, errors, and upstream behaviour in a standard way.

The main Flagsmith API already supports OTel export; Edge Proxy would benefit from similar support so operators can monitor both tiers consistently.

Use case

Edge Proxy sits between application SDKs and the Flagsmith core API. Operators typically want visibility into:

  • HTTP request handling — latency, status codes, route-level spans (/api/v1/flags/, /api/v1/identities/, health endpoints)
  • Upstream polling — spans/timing for refreshes against the Flagsmith API (API_URL, poll frequency/timeouts)
  • Cache behaviour — cache hit/miss or refresh events (especially with ENDPOINT_CACHES configured)
  • Errors — failed upstream requests, timeouts, proxy errors

Today this is limited to structured logs (structlog / LOGGING config). That helps, but does not integrate with OTLP-based observability stacks.

Expected behaviour

Opt-in OTel export via standard environment variables, aligned with the main Flagsmith OTel approach:

OTEL_EXPORTER_OTLP_ENDPOINT
OTEL_EXPORTER_OTLP_PROTOCOL   # support grpc and http/protobuf
OTEL_SERVICE_NAME             # default: e.g. flagsmith-edge-proxy
OTEL_RESOURCE_ATTRIBUTES
OTEL_TRACES_EXPORTER          # otlp | none
OTEL_SDK_DISABLED

When disabled (endpoint unset or OTEL_SDK_DISABLED=true), there should be no runtime overhead — same principle as the main Flagsmith docs.

Suggested scope

Minimum (high value):

  • FastAPI/uvicorn auto-instrumentation for incoming HTTP requests
  • Manual spans or instrumentation for upstream poll cycles
  • OTLP export with gRPC and HTTP support

Nice to have:

  • Metrics (request rate, latency histograms, cache hit ratio, upstream poll duration)
  • Propagation of W3C traceparent through proxy requests where applicable
  • Docs update in Edge Proxy configuration reference

Current state

  • Stack: FastAPI, uvicorn, structlog (see pyproject.toml)
  • No OpenTelemetry dependencies or instrumentation in the repo today
  • Health endpoints exist (/proxy/health/liveness, /proxy/health/readiness) and should remain excludable from tracing if desired

Environment

  • Edge Proxy version: 2.23.0 (Docker image)
  • Deployment: self-hosted, multiple environments

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions