Skip to content

fix(observability): idempotent metric registration (v26.06.97)#127

Merged
ancongui merged 1 commit into
mainfrom
fix/metrics-idempotent-registration
Jun 11, 2026
Merged

fix(observability): idempotent metric registration (v26.06.97)#127
ancongui merged 1 commit into
mainfrom
fix/metrics-idempotent-registration

Conversation

@ancongui

Copy link
Copy Markdown
Contributor

Problem

MetricsRegistry cached its Prometheus collectors in per-instance dicts
(self._counters / self._histograms / self._gauges created in __init__), but
prometheus_client registers every Counter/Histogram/Gauge on the
process-global default REGISTRY keyed by metric name.

So a second MetricsRegistry — e.g. a second create_app() in one pytest process —
re-created an already-registered collector and prometheus raised:

ValueError: Duplicated timeseries in CollectorRegistry: pyfly_db_query_duration_seconds...

Downstream flymgmt services worked around this with a per-repo conftest fixture that
reset the registry between apps. This release fixes it at the source so that workaround
can be retired.

Fix

Move the collector caches to module level (_COUNTERS / _HISTOGRAMS / _GAUGES),
making get-or-create idempotent across every MetricsRegistry in the process — matching
prometheus's one-collector-per-name model.

  • MetricsRegistry.__init__ keeps the _HAS_PROMETHEUS guard, drops the per-instance dicts.
  • counter() / histogram() / gauge() read+write the module-level caches.
  • Still uses the global default registry for /metrics exposition — production behavior
    unchanged; this only dedupes process-wide (no per-instance CollectorRegistry).
  • Module-level dicts are typed (dict[str, Counter] etc.) so mypy --strict stays clean.
  • Class docstring updated to state registration is process-global / idempotent.

Regression test

tests/observability/test_metrics_idempotent.py — two MetricsRegistry instances calling
.counter() / .histogram() / .gauge() with the same name do not raise and return the
same (is) collector object (the exact scenario that used to raise "Duplicated timeseries").

Two existing tests that poked the removed _counters dict now use the public .counter()
accessor (which returns the same cached collector).

Gates (all green locally)

  • ruff check src/ tests/ — pass
  • ruff format --check src/ tests/ — pass
  • mypy src/pyfly --strict — Success: no issues found in 673 source files
  • pytest tests/observability -q — 31 passed
  • pytest tests/container tests/context tests/observability -q — 377 passed

@ancongui ancongui merged commit 63f4ad9 into main Jun 11, 2026
6 checks passed
@ancongui ancongui deleted the fix/metrics-idempotent-registration branch June 11, 2026 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant