MetricGate

A tenant-aware quota and rate-limiting backend for SaaS APIs. Three independently deployable .NET services enforce plan limits in real time, persist usage events for billing and audit, and support multi-level tenant hierarchies (root, reseller, customer).

V1: Backend services with HTTP and Kafka APIs. OpenAPI/Scalar is the primary surface. V2 (in development): ASP.NET Core BFF + Angular/TypeScript admin frontend — see Scope V2 (Deutsch) and the V2 issues & milestones.

Core capabilities

Real-time enforcement — sub-10 ms decision API answering is this API key allowed to make this call?
Multi-strategy counters — fixed window for monthly quotas, token bucket (Redis Lua) for short-term rate limits
Tenant hierarchy — resellers manage sub-tenants; plan inheritance and constraint validation across the tree
Resource-based authorization — every operation scoped to the caller's tenant subtree
Cache invalidation under cascade — hierarchy changes invalidate plan resolutions, hierarchy caches, and authorization decisions across an entire sub-tree (tag-based via Redis sets)
Event-driven persistence — usage events flow through Kafka for at-least-once delivery to the Usage service
Plan lifecycle operations — mid-period plan migration, API key rotation with grace period, immediate revocation with cache eviction
Idempotent usage ingest — event-ID dedup window in the Usage service tolerates Kafka redelivery
Hot-path latency benchmarks — BenchmarkDotNet measures p50/p95/p100 for cache hit/miss and sustained throughput; measured p95 on cache hit: 109 µs (91× below 10 ms target); per-request latency at 50 concurrent: 4 µs (details)

System architecture

Three services, three PostgreSQL databases, one Redis, one Redpanda cluster (Kafka API), one Keycloak realm. Orchestrated locally via Docker Compose.

                    ┌──────────────────────────┐
                    │    API Consumer (ext.)   │
                    └────────────┬─────────────┘
                                 │ API Key
                                 ▼
                    ┌──────────────────────────┐
                    │   Enforcement Service    │  ◄── Redis (counters, plan cache, sessions)
                    │   - Hot-path check API   │
                    │   - Token bucket / win.  │
                    └─────┬──────────────┬─────┘
                          │              │
                  HTTP    │              │  Kafka: usage.events
              (cache miss)│              │
                          ▼              ▼
                    ┌──────────────┐   ┌──────────────────────────┐
                    │    Plans     │   │     Usage Service        │
                    │   Service    │   │   - Event persistence    │
                    │              │   │   - Aggregation worker   │
                    └─────┬────────┘   │   - Reports API (admin)  │
                          │            └──────────────────────────┘
                          │
            Kafka: plans.changes (broadcast)
                          │
                          └─► Enforcement (cache invalidation)
                          └─► Usage       (denormalized lookups)

Services

Service	Responsibility	Profile
Plans	Tenant hierarchy, plans, plan assignments, API keys, lifecycle operations	Low traffic, write-rare, configuration-heavy
Enforcement	Hot-path check API, counter and rate-limit state in Redis	Latency-critical, read-heavy, horizontally scalable
Usage	Event persistence, aggregation, reports API	Write-heavy on ingest, query-heavy on reports, append-only

Each service owns its database. Synchronous communication only on the cache-miss path (Enforcement → Plans). Everything else flows asynchronously via Kafka. Each service follows Clean Architecture internally (Domain, Application, Infrastructure, API) — the domain logic in each service (hierarchy invariants, counter semantics, aggregation rules) justifies layered architecture over a slice-based approach.

Authentication

Caller	Mechanism	Surface
External API consumer	API key in header	Enforcement check API
Tenant admin	OIDC (Keycloak) → JWT with refresh	Plans admin API, Usage reports API
Service-to-service	Internal JWT (or mTLS — see ADR-009)	Enforcement → Plans on cache miss

Policy-based: role gates (key rotation, reseller creation, plan assignment). Resource-based: custom IAuthorizationHandler resolves the caller's tenant subtree at request time and validates resource ownership. Rate limiting per tenant: the same machinery that enforces customer-facing limits also protects the admin APIs.

Caching strategy

Cache	Purpose	Invalidation
Plan resolution (Enforcement)	API key → tenant + effective plan + limits	TTL + Pub/Sub on `plans.changes`
Quota counters	Fixed-window `INCR` per tenant per period	TTL aligned to window expiry
Rate-limit buckets	Token bucket state per tenant	Atomic mutation via Lua script
Tenant hierarchy	Resolved parent chains	Tag-based cascade on `TenantHierarchyChanged`
Session store	Admin authentication	Standard session TTL

The non-trivial case is hierarchy cascade: a single TenantHierarchyChanged event invalidates plan resolutions, hierarchy caches, and authorization decisions across an entire sub-tree. See ADR-003 for the tag-based eviction design.

Tech stack

Concern	Technology
Runtime	.NET 10
API	ASP.NET Core Minimal API
ORM	EF Core 10, Npgsql
Background jobs	.NET Worker Host (`BackgroundService`)
Message broker	Redpanda (Kafka API)
Cache / counters / sessions	Redis 8
Auth provider	Keycloak (OIDC)
Request dispatch	No mediator — application services injected directly (ADR-005)
Logging	`Microsoft.Extensions.Logging` (structured)
Tracing	OpenTelemetry → Jaeger
Testing	xUnit, Shouldly, NSubstitute, Testcontainers, BenchmarkDotNet
API docs	`Microsoft.AspNetCore.OpenApi` + Scalar UI (`/scalar`)
Containers	Docker, Docker Compose

Architecture decisions

#	Topic
ADR-001	Tenant hierarchy: adjacency list with bounded depth
ADR-002	Plan inheritance and constraint validation
ADR-003	Cache invalidation: TTL vs. Pub/Sub eviction vs. tag-based cascades
ADR-004	Counter strategy: fixed window plus token bucket
ADR-005	In-house mediator abstraction over MediatR
ADR-006	Service communication: sync HTTP+cache vs. event-driven vs. shared database
ADR-007	Kafka topic design and partitioning
ADR-008	Idempotency for usage events (at-least-once delivery)
ADR-009	Service-to-service authentication: mTLS vs. internal JWT

All ADRs are in docs/adrs/.

Out of scope (deliberately)

API gateway / reverse proxy behavior — Enforcement is a check service, not a proxy
Real billing integration — overage events are produced and persisted; no payment provider wired up
Multi-region Redis or Kafka replication
Schema registry for Kafka events — JSON, versioning by convention
Webhook notifications — reports are pulled
Full observability stack — OpenTelemetry to Jaeger only (Prometheus/Grafana already demonstrated in Ingestor)
Admin frontend — moved to V2, now in development (BFF + Angular/TypeScript); see Scope V2

Quick start

Prerequisites: .NET 10 SDK and Docker

cp .env.example .env          # required before the first up
docker compose up -d          # builds and starts the full stack (infra + all 3 services)

docker compose up -d builds each service image from its Dockerfile and starts the whole stack. Host ports are the .env defaults below. To run a single service from source against the Compose infrastructure instead, e.g. dotnet run --project src/Plans/Plans.Api (it listens on :5000, matching the BFF's PlansClient:BaseUrl).

In Development, Plans.Api applies EF migrations and seeds a small realm-aligned tenant hierarchy (Root → Acme → Beta, with the fixed tenant_ids the seeded reseller-acme / customer-beta users carry) on startup — so a fresh stack is demonstrable without a manual dotnet ef database update or SQL seed. Production applies migrations as an explicit, reviewed deploy step and never seeds.

Surface	URL
Plans API docs (Scalar)	http://localhost:5000/scalar
Enforcement API docs (Scalar)	http://localhost:5001/scalar
Usage API docs (Scalar)	http://localhost:5002/scalar
Keycloak admin	http://localhost:8080
Jaeger UI	http://localhost:16686

A seeded demo realm provides one root admin, one reseller, and two customer tenants for end-to-end manual testing.

Demo API key (check path)

In Development, the Plans seed also provisions a demo plan assigned to the Beta customer tenant and a demo API key for it, so the Enforcement check path works out of the box (it lives in the Plans DB, not the Keycloak realm — API keys are a Plans domain concept). Only the key's SHA-256 hash is stored, as for any real key. Send it in the X-Api-Key header:

curl -X POST http://localhost:5001/check -H "X-Api-Key: demo-metricgate-key"
# → {"decision":"allow", ... ,"plan":{"name":"Demo Plan", ...}}

Dev-only. demo-metricgate-key is a known, seeded credential and is never created outside Development — production neither seeds nor ships it.

Running the BFF + frontend (V2)

The V1 stack above runs without any of this. The steps here are only needed to exercise the V2 admin frontend and its login flow — the BFF, Keycloak OIDC, and the Angular client. They are one-time local setup for a reviewer; after the first run they are not repeated.

The BFF holds the OIDC tokens server-side and issues the browser an httpOnly session cookie (ADR-010). Two consequences of that design require local setup: the session cookies are Secure, so the BFF must be reached over HTTPS even in development; and the OIDC issuer must be identical as seen by the browser and by the BFF container, or token validation fails on an iss mismatch.

One-time setup:

#	Step	Why
1	Export and trust a dev cert the container can load (commands below)	The BFF serves HTTPS on `:7003`; without a trusted cert the browser rejects it and the `Secure` session cookies never persist
2	Add `127.0.0.1 keycloak` to your hosts file (`/etc/hosts`, or `C:\Windows\System32\drivers\etc\hosts`)	The browser and the BFF container must resolve Keycloak under the same name (`keycloak:8080`) so the token's `iss` matches what the BFF validates

Step 1 — export the dev cert to the mounted path, trust it, and make it readable by the container's non-root user:

dotnet dev-certs https -ep ~/.aspnet/https/aspnetapp.pfx -p devcert
dotnet dev-certs https --trust
chmod 644 ~/.aspnet/https/aspnetapp.pfx

In your .env, set BFF_HTTPS_CERT_PASSWORD=devcert (matching the password above) and BFF_PORT=7003 (it must match the realm's redirect origin — see .env.example).

Then:

docker compose --profile v2 up -d   # starts the V1 stack plus the BFF

Surface	URL
Admin frontend (via BFF)	https://localhost:7003
BFF health	https://localhost:7003/health

Verify the login round-trip: open https://localhost:7003, log in as the seeded reseller against Keycloak, and confirm you land back in the admin shell authenticated. If the browser returns to the login page instead, the session cookie did not survive — check that step 1's cert is trusted and that you reached the BFF over https, not http.

Two address worlds, kept separate. Host ports (:7003 BFF, :5000–:5002 services, :8080 Keycloak) are for the browser and for running a service from source via dotnet run. Service names (plans-api:8080, keycloak:8080) are for container-to-container traffic inside Compose. The BFF's external origin (:7003) governs cookie scope and the Keycloak redirect URIs; the Keycloak issuer (keycloak:8080) governs token validation. These are different addresses for different purposes — do not collapse them.

Testing

Project	Scope	Approach
`Tests.{Plans,Enforcement,Usage}.Unit`	Counter logic, token bucket, hierarchy resolution, authorization handlers	Pure unit tests, no I/O
`Tests.{Plans,Enforcement,Usage}.Integration`	Full enforcement path with cache hit/miss, plan change propagation, hierarchy cascade invalidation	Testcontainers (PostgreSQL, Redis, Redpanda)
`Tests.Architecture`	Service boundary rules, no cross-service code dependencies, Clean Architecture layer rules	NetArchTest
`Benchmarks.Enforcement`	Hot-path latency (cache hit / cache miss) and sustained throughput (1/10/50 concurrent)	BenchmarkDotNet — see docs/benchmarks.md

dotnet test MetricGate.slnx -c Release

CI/CD

MetricGate uses a two-stage pipeline split across GitHub Actions and Azure DevOps.

GitHub Actions runs on every push and pull request to main, as two parallel jobs that each build the solution once (no separate build job); a concurrency group cancels superseded in-flight runs:

Unit & architecture tests — Plans, Enforcement, Usage unit suites + architecture rules
Integration tests (Testcontainers) — Plans, Enforcement, Usage integration suites

Azure DevOps is triggered manually via workflow_dispatch after the GitHub Actions jobs pass — never on automatic pushes:

Docker build (Buildx, multi-stage SDK → ASP.NET runtime) for Plans and Enforcement, with a registry-backed layer cache
Push to GitHub Container Registry (ghcr.io/goldbarth/metricgate)
Deploy to Azure Container Apps (rg-goldbarth-dev) — requires manual approval gate

Push to main
    └─► GitHub Actions: unit & architecture tests  ┐ (parallel, each builds once)
                        integration tests          ┘

Manual dispatch (when ready to deploy)
    └─► GitHub Actions: trigger Azure DevOps
            └─► Azure DevOps: buildx (cache) → ghcr.io push → approval gate → container apps deploy

Images are tagged with both the Git SHA (Build.BuildId) for traceability and latest for convenience. Container Apps scale to zero replicas when idle (min-replicas: 0).

Effort accounting

The scope and issue docs no longer carry calendar time estimates. AI-assisted development made them meaningless — a planned "two-week phase" routinely landed in a single focused session. This note records measured effort instead.

	Estimate	Basis
Without AI — mid-level dev (~2 yr), unaided	~350–550 h	~60 issues at a mid-level solo pace, plus learning overhead on the distributed-systems patterns (outbox, idempotency, invalidation cascades, recursive-CTE hierarchy, trace propagation) a dev ~2 years in researches rather than knows cold
With AI assistance (measured)	~38 h focused	commit-timestamp deltas across 10 working sessions
Speedup	~9–14×	raw build time and knowledge access — see note

How the measured number is derived: ~38 focused hours is the sum of first-to-last commit spans per working day (breaks included) — a proxy for active time, not tracked hours. Calendar span was ~4 weeks (2026-05-05 → 06-02), part-time: development was interleaved with job applications, company research, and coursework, so wall-clock weeks say nothing about effort.

The trade-off. The speedup is raw build time, not learning time. Generating low-level implementations quickly means less hands-on repetition of the wiring-level technical depth — the muscle you build by typing it yourself. The compensating effort shifts upward: more time goes into understanding and validating design patterns and high-level architecture — reading generated code critically, writing the ADRs, and owning the system-design decisions rather than the keystroke-level ones. Part of the gain is also knowledge access: the assistant surfaced idiomatic patterns at the point of need, so the honest reading is that some of the speedup is compressed learning time, not just compressed keystrokes. AI compresses the typing, not the thinking; the engineering value moved from low-level implementation toward architecture and review.

Full evaluation, per-session measurement, and trade-off rationale: Engineering Note — AI-Assisted Development.

Documentation

Scope V1 (English)
Scope V1 (Deutsch)
Scope V2 (English) — in development
Scope V2 (Deutsch) — in Entwicklung
V2 Issues & Milestones
Architecture Decision Records
Benchmark results
Operational runbook
Engineering Note — AI-Assisted Development

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
.github/workflows		.github/workflows
benchmarks/Enforcement/Benchmarks.Enforcement		benchmarks/Enforcement/Benchmarks.Enforcement
docs		docs
infra/keycloak		infra/keycloak
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MetricGate.slnx		MetricGate.slnx
README.md		README.md
azure-pipelines-deploy.yml		azure-pipelines-deploy.yml
azure-pipelines.yml		azure-pipelines.yml
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MetricGate

Core capabilities

System architecture

Services

Authentication

Caching strategy

Tech stack

Architecture decisions

Out of scope (deliberately)

Quick start

Demo API key (check path)

Running the BFF + frontend (V2)

Testing

CI/CD

Effort accounting

Documentation

License

About

Uh oh!

Releases 2

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MetricGate

Core capabilities

System architecture

Services

Authentication

Caching strategy

Tech stack

Architecture decisions

Out of scope (deliberately)

Quick start

Demo API key (check path)

Running the BFF + frontend (V2)

Testing

CI/CD

Effort accounting

Documentation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Contributors

Uh oh!

Languages