A production-style rate limiter in Go: four algorithms, atomic operations via Lua, an LRU decision cache, and a 3-shard Redis cluster behind a consistent hash ring. Pushes 18,769 req/s over HTTP at p95 16 ms on a MacBook Air. At sustained 2,000 req/s with 1,000 distinct keys, p50 is 343 µs and p99 stays under 2 ms.
Comes with a live React dashboard for load testing, key inspection, and per-algorithm comparison.
Companion repo: ratelimit-dashboard — the React dashboard that drives load tests against this backend.
Rate limiting sits at the front of every serious API — payment gateways, auth services, public APIs. Building one teaches:
- Distributed-systems primitives (consistent hashing, sharding)
- Atomicity under concurrency (race conditions, Redis Lua scripts)
- Cache design (LRU with TTL, hit-rate tradeoffs)
- Performance work (measure → hypothesize → tune → re-measure)
This isn't a tutorial reimplementation — it's the design I'd ship if I had to build a rate limiter for a real service tomorrow, scaled down to one laptop.
┌─────────────────────┐
│ React Dashboard │
│ (load tester, KPI, │
│ key inspector) │
└──────────┬──────────┘
│ HTTP
▼
┌─────────────────────┐
│ Go HTTP Server │
│ /check /metrics │
│ /inspect /load │
└──────────┬──────────┘
│
┌─────────────┼─────────────┐
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ LRU Cache │ │ Hash Ring │ │ Lua Scripts│
│ (10k, TTL) │ │ (150 vnode)│ │ (atomic) │
└────────────┘ └─────┬──────┘ └────────────┘
│
┌──────────────┼──────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ redis-0 │ │ redis-1 │ │ redis-2 │
│ :6382 │ │ :6380 │ │ :6381 │
└──────────┘ └──────────┘ └──────────┘
Request flow: dashboard → Go server → LRU cache check → hash ring picks shard → Lua script runs atomically on that Redis → result returned.
flowchart TB
subgraph Client
D[React Dashboard<br/>load tester · KPIs · inspector]
end
subgraph Server[Go HTTP Server :8080]
H[Request Handler]
L[LRU Decision Cache<br/>10k entries · 100ms TTL]
R[Hash Ring<br/>150 virtual nodes/shard]
S[Lua Scripts<br/>fixed · slog · swc · bucket]
end
subgraph Redis[Redis Cluster · 3 Shards]
R0[(redis-0<br/>:6382)]
R1[(redis-1<br/>:6380)]
R2[(redis-2<br/>:6381)]
end
D -->|POST /check| H
H --> L
L -->|miss| R
R --> R0
R --> R1
R --> R2
H -.->|Lua script via EVALSHA| S
S -.-> R0
S -.-> R1
S -.-> R2
Four algorithms, all atomic via Redis Lua scripts (single-threaded execution on the Redis side eliminates check-then-act races).
| Algorithm | Redis structure | Memory | Accuracy |
|---|---|---|---|
| Fixed Window | INCR + EXPIRE | O(1) | Coarse — boundary spike risk |
| Sliding Window Log | Sorted set per key | O(N) | Exact |
| Sliding Window Counter | Two integer buckets | O(1) | ~99% — weighted average |
| Token Bucket | Hash (tokens, last_us) | O(1) | Exact — handles bursts well |
Each lives in internal/limiter/<name>.lua (the algorithm) and
<name>.go (the wrapper that loads the script via redis.NewScript
and parses the result).
All measured on MacBook Air (Apple Silicon), localhost, with three Redis containers behind the hash ring.
| Metric | Value |
|---|---|
| Sustained RPS | 18,769 |
| Total requests | 844,619 |
| p95 latency | 15.93 ms |
| p90 latency | 13.55 ms |
| Errors | 0.00% |
| Metric | Value |
|---|---|
| p50 (median) | 343 µs |
| p95 | 724 µs |
| p99 | 1.93 ms |
| Errors | 0.00% |
Sub-millisecond p95 confirms the system has significant headroom at realistic production load.
Two config changes, same machine, back-to-back runs:
- Redis connection pool: 10 → 50 per shard (3 shards = 150 total)
- HTTP server:
IdleTimeout: 120s, tighter read/write, smaller header cap
| Metric | Before | After | Delta |
|---|---|---|---|
| RPS | 11,657 | 18,769 | +61% |
| p95 latency | 28.08 ms | 15.93 ms | -43% |
| avg latency | 13.34 ms | 8.42 ms | -37% |
The default 10-connection pool was the bottleneck. With 200 concurrent goroutines competing, requests queued waiting for a free connection.
The reason for consistent hashing: scaling the cluster shouldn't invalidate most state.
- Adding a 5th shard to a 4-shard cluster: 21.32% of keys remap (theoretical: 1/N = 20%)
- Removing 1 of 5 shards: 20.07% of keys remap (theoretical: 20%)
- Distribution across 5 shards / 10,000 keys: worst deviation 11.55% from mean
Full property tests in internal/hashring/ring_test.go.
Caches denials only (allows must hit Redis to increment the counter). TTL bounded by the rate-limit window.
- 93.5% cache hit rate measured against a single hot key
- Each cache hit saves a round trip to Redis (~200µs)
Requirements: Docker, Go 1.22+, Node 18+ (for the dashboard).
git clone https://github.com/sameer-sde/ratelimit
cd ratelimit
# Start three Redis containers
docker compose up -d
# Start the Go server
go run ./cmd/serverServer listens on http://localhost:8080. Try it:
curl -X POST http://localhost:8080/check \
-H 'Content-Type: application/json' \
-d '{"key":"user_42","limit":5,"window":60,"algorithm":"fixed"}'Run it twice and you'll see "allowed": true flip to "allowed": false
once you exceed the limit.
For the dashboard:
git clone https://github.com/sameer-sde/ratelimit-dashboard
cd ratelimit-dashboard
npm install
npm run devOpens on http://localhost:5180.
Decide if a request is allowed.
{
"key": "user_42",
"limit": 100,
"window": 60,
"algorithm": "fixed"
}Token bucket uses different parameters:
{
"key": "user_42",
"capacity": 100,
"refill": 10,
"algorithm": "bucket"
}Returns 200 OK (allowed) or 429 Too Many Requests (rate-limited),
with the responsible shard in the body.
| Endpoint | Purpose |
|---|---|
GET /health |
Liveness probe |
GET /metrics |
Total reqs, allows, denies, cache stats, per-algorithm totals |
GET /cluster/info |
Shard topology; ?key=X returns the shard owning X |
GET /inspect/<key> |
Raw Redis state for a key, across all four algorithms |
GET /cache/stats |
LRU cache hits / misses / size |
POST /load-test |
In-process load test (no HTTP overhead) |
ratelimit/
├── cmd/
│ ├── server/ # HTTP server + endpoints
│ └── ringdemo/ # CLI demo for the hash ring
├── internal/
│ ├── limiter/ # 4 algorithms, each .lua + .go pair
│ ├── cache/ # LRU with TTL
│ ├── hashring/ # Consistent hash ring + tests
│ └── rediscluster/ # Multi-Redis wrapper
├── benchmarks/ # k6 scripts + result files + RESULTS.md
└── docker-compose.yml # Three Redis containers
- Multi-region replication — currently single-region. AWS-style global rate limiting would need a CRDT or async replication layer.
- Backpressure when a shard is down — right now a dead shard silently fails requests routed to it; a real system would mark the shard down and either reject or fall back to a neighbor.
- gRPC alongside HTTP — for service-to-service callers, gRPC shaves ~30% of per-request overhead.
Built over ~6 weeks of evening and weekend work as a learning project. Influences: Stripe's rate limiter blog post, the Designing Data-Intensive Applications chapter on partitioning, and the Cloudflare engineering blog's posts on consistent hashing.