A production-grade Real-Time Bidding engine written in Rust, combining a thread-per-core io_uring fast-path capable of ~10M RPS with an embedded Claude AI agent that autonomously optimizes bid strategies every 5 minutes using live campaign performance data.
The engine is agentic in two senses: structurally (a background LLM agent with tools that read and mutate shared engine state) and economically (the agent's bid-shading decisions directly determine revenue — a 1% CPM improvement on $1M managed spend saves $10k/month).
- LLMs Used and What They Solve
- High-Level Design
- Low-Level Design
- Core Feature System Design
- Running Locally — See It in Action
- Production Deployment
- Commercial Economics in Production
Accessed via rig-core 0.35, the Rust-native LLM agent framework.
RTB bid optimization is a continuous online decision problem. The traditional approach is a static rule set: "bid floor × 1.10". This leaves money on the table in two ways simultaneously — overbidding in low-competition auctions (wasting CPM budget) and underbidding in high-value auctions (losing impressions you should have won).
Claude solves this by acting as a closed-loop controller over bid strategy parameters:
| Signal In | Claude Reasons About | Action Out |
|---|---|---|
participation_pct |
Am I entering enough auctions? | Adjust margin_multiplier |
budget_block_pct |
Is the campaign over-pacing? | Shade bids to slow spend |
hourly_spend_usd vs daily budget |
Will budget run out before end of day? | Pace aggressively or conservatively |
| SSP CPM ceilings by category | Which inventory is worth targeting? | Set max_cpm_usd per strategy |
agent_reasoning from prior cycle |
What did I try last time? | Avoid repeating failed strategies |
- Structured tool use: Claude reliably calls typed Rust tool functions (
set_bid_strategy,get_bid_performance) with correct JSON schemas, enabling safe state mutation from natural language reasoning. - Audit trail: Every strategy change includes a required
reasoningfield (≤512 chars) stored inBidStrategyStore, giving operators a human-readable explanation of every automated decision. - Safety-aware reasoning: The PREAMBLE instructs Claude to respect hard bounds (
margin_multiplier ∈ [0.90, 1.50]). Even if Claude hallucinates an extreme value, the tool implementation clamps it. - Cost efficiency: At
claude-sonnet-4-6pricing (~$3/MTok), a 500-token optimization cycle costs ~$0.002. At 288 cycles/day that is $0.58/day in LLM cost versus $100–$250/day in CPM savings from bid shading — a >170× daily ROI on inference cost.
┌─────────────────────────────────────────────────────────────────────────────────┐
│ Agentic RTB Engine Process │
├──────────────────────────────────┬──────────────────────────────────────────────┤
│ BID FAST-PATH │ INTELLIGENCE LAYER │
│ Port 8080 │ Port 8081 (AAMP API) │
│ │ │
│ ┌───────────────────────────┐ │ ┌────────────────────────────────────┐ │
│ │ Glommio Executor Pool │ │ │ RtbAgent (Claude) │ │
│ │ (Linux: N-1 cores) │ │ │ │ │
│ │ Tokio (macOS dev) │ │ │ Every 5 min: │ │
│ │ │ │ │ 1. build context from SharedState │ │
│ │ Per connection: │ │ │ 2. prompt Claude claude-sonnet-4-6 │ │
│ │ ① Parse HTTP/1.x │ │ │ 3. Claude calls tools: │ │
│ │ ② simd-json parse │ │ │ • get_bid_performance │ │
│ │ ③ Safety valve check │ │ │ • check_budget │ │
│ │ ④ Budget CAS (Redis or │ │ │ • get_ssp_inventory │ │
│ │ AtomicI64) │ │ │ • set_bid_strategy ◄──────────┼──┐ │
│ │ ⑤ Evaluate bid price │ │ │ 4. updated strategy stored │ │ │
│ │ ⑥ FlatBuffers response │ │ └────────────────────────────────────┘ │ │
│ │ ⑦ Write to socket │ │ │ │
│ └───────────┬───────────────┘ │ ┌─────────────────────────────────────┐ │ │
│ │ reads │ │ AAMP API (Axum) │ │ │
│ ▼ │ │ GET /health │ │ │
│ ┌───────────────────────────┐ │ │ GET /metrics (Prometheus) │ │ │
│ │ SharedState │◄──┼───│ POST /budget/set │ │ │
│ │ Arc<...> │ │ │ GET /budget/{id} │ │ │
│ │ │ │ │ POST /events/ingest │ │ │
│ │ • BudgetGuard │ │ │ GET /agent/strategy │ │ │
│ │ • BidStrategyStore ◄─────┼───┼───│ POST /agent/optimize (stub) │ │ │
│ │ • BidMetrics (Prometheus)│ │ │ GET /ssp/registry │ │ │
│ │ • SspRegistry │ │ │ POST /negotiate/deal (stub) │ │ │
│ │ • EventStore │ │ └─────────────────────────────────────┘ │ │
│ └───────────────────────────┘ │ │ │
│ │ ┌─────────────────────────────────────┐ │ │
│ │ │ Background Tasks │ │ │
│ │ │ • SSP Discovery Worker (hourly) │ │ │
│ │ │ • Daily Budget Reset (midnight UTC)│ │ │
│ │ │ • Hourly Velocity Reset │ │ │
│ │ │ • Strategy pub/sub subscriber │──┘ │
│ │ └─────────────────────────────────────┘ │
└──────────────────────────────────┴──────────────────────────────────────────────┘
│
┌────────▼────────┐
│ Redis │
│ (optional) │
│ │
│ budget:{id} │ ← atomic Lua CAS
│ strategy:{id} │ ← HSET + pub/sub
└─────────────────┘
On Linux, the process runs two runtimes that never share a thread:
| Runtime | Threads | Responsibilities |
|---|---|---|
| Glommio executor pool | N-1 CPU cores | Bid fast-path: accept → parse → evaluate → respond |
Tokio (dedicated std::thread) |
2 | AAMP Axum API, RtbAgent, Discovery worker, Reset tasks |
All coordination between runtimes uses std::sync primitives inside Arc<SharedState> — no Tokio channels crossing the executor boundary.
On macOS, a single Tokio multi-thread runtime runs everything (development only, not for benchmarking).
rtb-engine/
├── crates/
│ ├── fast-path/ # Binary: bid server (ports 8080 + 8081)
│ │ ├── bid_eval.rs # Core: evaluate() — hot path, async, ~50ns/request
│ │ ├── engine.rs # Linux: Glommio thread-per-core executor
│ │ ├── engine_dev.rs # macOS: Tokio fallback
│ │ ├── fbs_bid.rs # FlatBuffers BidResponse serialiser
│ │ ├── http_parser.rs # Zero-copy HTTP/1.x request parser (httparse)
│ │ ├── buffer_pool.rs # Thread-local slab allocator for output buffers
│ │ ├── config.rs # EngineConfig (listen addr, workers, ring depth)
│ │ └── xdp_loader.rs # eBPF XDP attach (Linux / Phase 3)
│ │
│ ├── agent/ # Library: rig-core Claude agent
│ │ ├── lib.rs # RtbAgent: run_optimization_cycle()
│ │ ├── tools.rs # 4 MCP-style tools (read/write SharedState)
│ │ └── context.rs # RAG context builder → compact prompt string
│ │
│ ├── aamp-protocol/ # Library: shared state + Axum API
│ │ ├── state.rs # SharedState, BudgetGuard, BidMetrics, EventStore
│ │ ├── api.rs # Axum router + all HTTP handlers
│ │ ├── discovery.rs # SSP discovery worker (reqwest, hourly)
│ │ └── lib.rs # run_api(), run_strategy_subscriber()
│ │
│ └── shared-types/ # Library: OpenRTB types, error kinds
│ └── bid.rs # BidRequest, BidResponse, Bid, SeatBid, Impression
│
├── crates/bench/ # Binary: constant-arrival-rate load generator (p99/p99.9 output)
├── xtask/ # Build automation (flatc, ebpf-build, docker-build, lint)
├── docker/
│ ├── Dockerfile.dev # Ubuntu 22.04 + Rust + io_uring deps (engine image)
│ ├── Dockerfile.bench # Multi-stage build: Rust bench binary, minimal runtime image
│ └── docker-compose.yml # CAP_IPC_LOCK + seccomp:unconfined + bench profile
└── BACKLOG.md # Prioritised feature backlog with implementation prompts
SSP ──POST /bid──► [TCP accept]
│
[httparse] zero-copy header parse, extracts body offset
│
[simd-json] in-place SIMD tokenisation → BidRequest struct
│
[bid_eval::evaluate()] ← async
│
┌──┴──────────────────────────────────┐
│ for each impression: │
│ 1. (floor × margin_multiplier) │
│ .min(strategy.max_cpm_usd) │
│ 2. is_price_safe() → safety valve │
│ 3. try_spend_redis().await │
│ → Redis Lua CAS (if configured) │
│ → AtomicI64 fallback │
│ 4. push SeatBid │
└──┬──────────────────────────────────┘
│
[BidMetrics::record()] Prometheus counter increment
│
[fbs_bid::serialize()] FlatBuffers into PooledBuf
│
[stream.write_all()] HTTP headers + binary body
│
▼
SSP receives FlatBuffers BidResponse
[background Tokio task]
│
▼
context::build(&state)
→ BUDGET STATUS: remaining, daily_limit, spent_today,
hourly_spend, daily_pace (elapsed-day extrapolation,
not hourly × 24 — more accurate under bursty traffic)
→ BID METRICS: participation%, budget-block%, requests
→ SSP INVENTORY: names, max CPM, categories
→ CURRENT STRATEGIES: margin, max_cpm, last reasoning
→ RECENT EVENTS: last 5 per campaign (newest first)
│
▼
claude-sonnet-4-6.prompt(context)
│
├── [tool call] get_bid_performance("default")
│ → BidPerformanceOutput { participation_pct, budget_block_pct,
│ hourly_spend_usd, daily_pace_usd,
│ recent_events }
│
├── [tool call] check_budget("default") [optional]
│
├── [tool call] get_ssp_inventory(category=None) [optional]
│
└── [tool call] set_bid_strategy("default", margin, max_cpm, reasoning)
→ reasoning: 2–3 sentences ≤280 chars (hard-truncated at sentence boundary)
→ clamps margin ∈ [0.90, 1.50]
→ BidStrategyStore.upsert_redis()
→ local RwLock write
→ Redis HSET strategy:default
→ Redis PUBLISH strategy:updates
→ all engine instances pick up new strategy
within one pub/sub delivery cycle (~1ms)
The budget guard is the primary financial safety system. A bug that lets bids through without checking can drain a $10k daily budget in 0.6 seconds at 10M RPS.
Hot path (called ~10M times/sec):
┌─────────────────────────────────────────────────────┐
│ try_spend_redis(campaign_id, amount_microdollars) │
│ │
│ if redis pool configured: │
│ EVAL lua_cas_script, KEYS[budget:{id}], ARGV[amt]│
│ → atomic: check remaining >= amount, decrement │
│ → returns: 1=success, 0=exhausted, -1=not found │
│ → fallback to AtomicI64 on Redis error │
│ │
│ else (single-process / dev): │
│ RwLock.read() → clone Arc<AtomicBudgetEntry> │
│ loop: │
│ current = remaining.load(Acquire) │
│ if current < amount: return false │
│ compare_exchange(current, current-amount) │
│ on Ok: track hourly velocity, return true │
│ on Err: retry (another thread raced) │
└─────────────────────────────────────────────────────┘
Why not a Mutex?
Mutex serialises all campaigns globally.
AtomicI64 CAS per-campaign: 50 campaigns bid in parallel
with zero cross-campaign contention.
At 10M RPS: Mutex P99 ≈ 18ms vs CAS P99 ≈ 2ms.
3 CPU cores recovered per host.
BidResponse (Rust struct)
│
▼
FlatBufferBuilder (512B stack scratch)
build inside-out: Bid → SeatBid → BidResponse
push vtable slots by pre-computed offsets:
bid_vt::ID = 4 (field 0)
bid_vt::IMPID = 6 (field 1)
bid_vt::PRICE = 8 (field 2)
│
▼
fbb.finished_data() → &[u8]
│
▼
PooledBuf::acquire() ← thread-local slab, zero malloc
extend_from_slice(finished_data)
│
▼
stream.write_all(headers).await
stream.write_all(&body_buf).await
│ PooledBuf drops → returns to pool
▼
SSP receives ~90-byte binary payload
(vs ~130-byte JSON — 30% smaller, zero float formatting)
Agent writes strategy:
upsert_redis(campaign_id, BidStrategy)
│
├─► local RwLock write (instant, in-process)
│
└─► Redis pipeline (cold path, ~0.5ms):
HSET strategy:{id}
margin_multiplier "1.08"
max_cpm_usd "4.50"
agent_reasoning "win rate low, raising margin..."
PUBLISH strategy:updates "{campaign_id}"
All other engine processes (subscriber task):
← receives PUBLISH message
→ load_from_redis(): KEYS strategy:* → HMGET each key
→ upsert() into local BidStrategyStore
→ hot path reads new strategy within next RwLock.read()
latency: ~1-2ms end-to-end across fleet
Hot-path read (once per request, amortised across all impressions):
BidStrategyStore.get_bid_params(campaign_id)
= RwLock.read() → HashMap.get() → (f64, f64)
= (margin_multiplier, max_cpm_usd) in one lock acquisition (~50ns)
No Redis on hot path. Local read only.
Per engine instance:
BidMetrics backed by prometheus::Registry
rtb_requests_total{campaign_id} Counter
rtb_bids_submitted_total{campaign_id} Counter
rtb_blocked_safety_total{campaign_id} Counter
rtb_blocked_budget_total{campaign_id} Counter
Scraped by central Prometheus server:
GET http://{engine-host}:8081/metrics
→ text/plain 0.0.4 format
→ Prometheus aggregates across all instances
→ Agent reads fleet-wide totals via GetBidPerformanceTool
# Rust stable (1.75+)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# wrk HTTP benchmarking tool
brew install wrk # macOS
apt install wrk # Ubuntu
# Optional: Redis (for multi-process budget coordination)
brew install redisexport ANTHROPIC_API_KEY=sk-ant-api03-...The engine starts and runs without this key — the agent cycle skips silently (debug-level log only). The bid fast-path continues operating normally.
git clone <repo>
cd rtb-engine
cargo run -p fast-pathExpected output:
{"timestamp":"...","level":"WARN","message":"Non-Linux platform: running Tokio dev fallback..."}
{"timestamp":"...","level":"INFO","message":"AAMP management API listening","addr":"0.0.0.0:8081"}
{"timestamp":"...","level":"INFO","message":"[DEV] Bid server listening via Tokio","addr":"0.0.0.0:8080"}curl -s -X POST http://localhost:8081/budget/set \
-H "Content-Type: application/json" \
-d '{
"campaign_id": "default",
"daily_limit_usd": 10000.0,
"max_single_bid_usd": 5.0
}' | jq .Expected:
{ "campaign_id": "default", "daily_limit_usd": 10000.0, "status": "set" }curl -s -X POST http://localhost:8080 \
-H "Content-Type: application/json" \
-d '{
"id": "auction-001",
"imp": [
{"id": "imp-1", "bidfloor": 1.50, "bidfloorcur": "USD"},
{"id": "imp-2", "bidfloor": 0.80, "bidfloorcur": "USD"}
],
"tmax": 100
}'The response is FlatBuffers binary. To verify it's non-empty:
curl -s -o /dev/null -w "%{http_code} size=%{size_download}\n" \
-X POST http://localhost:8080 \
-H "Content-Type: application/json" \
-d '{"id":"test","imp":[{"id":"i1","bidfloor":1.5}],"tmax":100}'
# 200 size=92204 means no-bid (budget exhausted or all impressions blocked by safety valve).
Save this as wrk_bid.lua in the repo root (it is excluded from git — see .gitignore):
-- wrk_bid.lua
-- Simulates a continuous stream of OpenRTB bid requests.
-- ${AUCTION_PRICE} is an SSP macro — include it literally to test nurl handling.
math.randomseed(os.time())
wrk.method = "POST"
wrk.headers["Content-Type"] = "application/json"
local auction_ids = {}
for i = 1, 1000 do
auction_ids[i] = string.format("auction-%06d", i)
end
local floors = {0.50, 0.80, 1.00, 1.25, 1.50, 2.00, 2.50, 3.00}
request = function()
local id = auction_ids[math.random(1, 1000)]
local floor = floors[math.random(1, #floors)]
local body = string.format(
'{"id":"%s","imp":[{"id":"imp-1","bidfloor":%.2f,"bidfloorcur":"USD"}],"tmax":100}',
id, floor
)
return wrk.format(nil, "/", nil, body)
endRun the benchmark:
# Warm up
wrk -t2 -c50 -d10s -s wrk_bid.lua http://localhost:8080
# Full load test — macOS Tokio dev (expect 50k–300k RPS depending on hardware)
wrk -t8 -c400 -d30s -s wrk_bid.lua http://localhost:8080Expected output on macOS (Tokio dev mode, M-series chip):
Running 30s test @ http://localhost:8080
8 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.82ms 0.91ms 18.43ms 87.34%
Req/Sec 28.14k 3.21k 41.07k 72.00%
6,731,204 requests in 30.02s, 1.23GB read
Requests/sec: 224,228.13
Transfer/sec: 41.97MB
Note on 10M RPS: The macOS Tokio dev mode is not for production benchmarking. 10M RPS requires the Glommio io_uring thread-per-core path on Linux. See Docker benchmark below.
The agent fires every 300 seconds. To force an early run and watch it:
# Terminal 1: build once, then pipe the binary's output directly to jq.
# Do NOT use `cargo run | jq` — cargo writes compilation progress to stderr,
# which 2>&1 merges with the program's JSON logs and breaks jq on the first
# non-JSON line. Running the pre-built binary produces JSON-only output.
cargo build -p fast-path
# Clean view — message only. Use for normal monitoring.
ANTHROPIC_API_KEY=sk-ant-... \
./target/debug/fast-path 2>&1 | \
jq -r 'select(.fields.message != null) | "\(.timestamp[11:19]) [\(.level)] \(.fields.message)"'
# Diagnostic view — appends the error field when present.
# Use when you see warnings (e.g. "Agent cycle failed") and need to know why.
ANTHROPIC_API_KEY=sk-ant-... \
./target/debug/fast-path 2>&1 | \
jq -r 'select(.fields.message != null) |
"\(.timestamp[11:19]) [\(.level)] \(.fields.message)\(
if .fields.error != null then " — \(.fields.error)" else "" end
)"'
# Terminal 2: generate some load first (so the agent has metrics to reason about)
wrk -t4 -c200 -d60s -s wrk_bid.lua http://localhost:8080 &
# Terminal 3: check current strategy before agent runs
curl -s http://localhost:8081/agent/strategy | jq .
# Wait ~5 min, or restart engine with lower interval for demo:
# macOS: #[cfg(not(target_os = "linux"))] edit line ~139 in crates/fast-path/src/main.rs
# Linux: #[cfg(target_os = "linux")] edit line ~168 in crates/fast-path/src/main.rs
# Change: Duration::from_secs(300) → Duration::from_secs(30)After the agent runs, you will see:
{"level":"DEBUG","message":"Running agent optimization cycle","context_len":487}
{"level":"INFO","message":"Agent updated bid strategy",
"campaign_id":"default","margin":1.08,"cpm_cap":4.5}
{"level":"INFO","message":"Agent optimization cycle complete","response_len":312}# Verify strategy changed
curl -s http://localhost:8081/agent/strategy | jq .[{
"campaign_id": "default",
"margin_multiplier": 1.08,
"max_cpm_usd": 4.50,
"agent_reasoning": "Participation rate is 94% with healthy budget ($9,847 remaining). Budget-block rate is 0.2%, indicating no pacing issues. Raising margin from 1.10 to 1.08 to test if we can win more high-value auctions while maintaining positive ROI. Will monitor budget-block rate next cycle.",
"updated_at_unix": 1747123456
}]curl -s http://localhost:8081/metrics# HELP rtb_requests_total Total bid requests seen
# TYPE rtb_requests_total counter
rtb_requests_total{campaign_id="default"} 6731204
# HELP rtb_bids_submitted_total Total bids submitted
# TYPE rtb_bids_submitted_total counter
rtb_bids_submitted_total{campaign_id="default"} 6710891
# HELP rtb_blocked_budget_total Requests blocked by budget guard
# TYPE rtb_blocked_budget_total counter
rtb_blocked_budget_total{campaign_id="default"} 142
Two containers run on the same rtb-net Docker bridge network: fast-path (the Glommio engine) and bench (the Rust load generator). Keeping them in separate containers prevents the load generator from competing for the same CPU cores as the engine — a single-container benchmark conflates client overhead with server latency.
┌─────────────────────────────────────────────────────────────┐
│ Docker host (your Mac) │
│ │
│ ┌─────────────────────┐ rtb-net ┌──────────────────┐ │
│ │ fast-path container │ ◄─────────► │ bench container │ │
│ │ (Glommio engine) │ internal │ (Rust client) │ │
│ │ port 8080 (bids) │ network │ --profile bench │ │
│ │ port 8081 (mgmt) │ ~0.1ms │ mark │ │
│ └─────────────────────┘ └──────────────────┘ │
│ │ 8080, 8081 │
│ ▼ exposed to host │
│ localhost (curl, browser) │
└─────────────────────────────────────────────────────────────┘
Prerequisites
jqfor parsing API responses:brew install jq
Step 1 — Export your Anthropic API key (optional — only needed for the AI agent; bid path works without it)
export ANTHROPIC_API_KEY=sk-ant-api03-...If omitted, the agent silently skips optimization cycles. The bid fast-path is unaffected.
Step 2 — Verify the Docker daemon is running
docker info > /dev/null 2>&1 && echo "Docker daemon running" || echo "Docker daemon NOT running"If not running, install and start it:
# Install Colima (lightweight Docker runtime for Mac) and the Docker CLI
brew install colima docker
# Install the Docker Compose CLI plugin
mkdir -p ~/.docker/cli-plugins
ln -sfn $(brew --prefix)/opt/docker-compose/bin/docker-compose ~/.docker/cli-plugins/docker-compose
# Start the Docker daemon
colima startThen confirm it is ready before proceeding:
docker ps # should return a table header with no errorColima stalled state: If
colima statusreports running butdocker psstill givesCannot connect to the Docker daemon at unix:///.../.colima/default/docker.sock, the socket forwarder between the Linux VM and the macOS host has died — the socket file exists on disk but nothing is listening on it. Fix:colima stop && colima startThis restarts the VM and re-establishes socket forwarding. Run
docker psagain to confirm.
Step 3 — Build and start the engine
docker compose -f docker/docker-compose.yml down -v && \
docker compose -f docker/docker-compose.yml up --build -d-v removes the fast_path_target named volume so the fresh Linux binary replaces any stale macOS-compiled binary from the bind mount. --build re-compiles inside the container (Linux ARM64 / Glommio).
Why not
cargo build? The source directory is bind-mounted into the container at/workspace. If you compile on macOS first, the hosttarget/directory contains a macOS binary — running it inside Linux givesexec format error. The named volumefast_path_targetshadowstarget/so only the Linux binary built duringdocker buildis used.
Why does the build use
[profile.docker]and not[profile.release]?releaseuseslto = "fat"which peaks at 6–8 GB RAM during linking. Docker Desktop's default memory limit (4 GB) causes the linker to be OOM-killed (SIGKILL).[profile.docker]inherits release settings but useslto = "thin"(~1–2 GB peak), delivering ~90% of fat-LTO throughput within the container memory budget.
Step 4 — Confirm Glommio is running (not the macOS Tokio fallback)
docker compose -f docker/docker-compose.yml logs fast-path| Log line | Expected |
|---|---|
Spawning Glommio executor pool (MaxSpread) |
Glommio active ✓ |
Shard bound and listening |
Engine accepting bids ✓ |
AAMP management API listening on 0.0.0.0:8081 |
Management API up ✓ |
Discovery fetch failed for example-ssp.invalid |
Expected — placeholder SSP URL ✓ |
x-api-key header is required |
ANTHROPIC_API_KEY not set in shell — agent skips, bid path unaffected ✓ |
If you see [DEV] anywhere — stop. The macOS Tokio fallback is running, not Glommio. Check that security_opt: seccomp:unconfined is present in docker-compose.yml (Docker's default seccomp blocks io_uring_register).
Step 5 — Build the bench image (one-time, ~5 min)
docker compose -f docker/docker-compose.yml --profile benchmark build benchThe bench service uses --profile benchmark so it never starts accidentally alongside the engine. It runs only when explicitly invoked.
Step 6 — Seed the campaign budget
curl -s -X POST http://localhost:8081/budget/set \
-H "Content-Type: application/json" \
-d '{"campaign_id":"default","daily_limit_usd":10000000,"max_single_bid_usd":5.0}' | jq .Why $10M? The default budget is $10,000. At 30k RPS with an average floor of $1.57 CPM, the engine spends ~$0.0017 per bid. $10k exhausts in ~47 seconds of a 60-second benchmark. For the last 13 seconds the engine returns HTTP 204 no-bid (budget guard) instead of HTTP 200 bid — these faster responses inflate throughput and deflate latency, making results unreliable. $10M gives ~100 hours of headroom.
Step 7 — Run the benchmark
docker compose -f docker/docker-compose.yml --profile benchmark run --rm benchDefault: 30,000 req/s, 60 seconds, 300 connections. Override any parameter without rebuilding:
docker compose -f docker/docker-compose.yml --profile benchmark run --rm bench \
--rate 50000 --duration 60 --connections 400The bench container joins rtb-net and targets http://fast-path:8080 — the engine's internal DNS name — bypassing the Docker port-forward hop.
Step 8 — Verify budget wasn't exhausted during the run
curl -s http://localhost:8081/budget/default | jq .remaining_usd must be in the millions. Near zero means budget-block 204 responses contaminated your latency numbers — re-seed (Step 6) and re-run.
Step 9 — Read the output
── Results ──────────────────────────────────────────
Total requests: 1290929
Recorded: 1290551 (100.0% success)
Errors: 378 (0.03%)
Throughput: 21509 req/s
Latency (successful bids only):
p50: 14.18ms
p90: 33.66ms
p99: 50.53ms ← must be < 80ms
p99.9: 61.15ms ← must be < 95ms
p99.99: 70.46ms
max: 88.96ms
SLA (SSP timeout = 100ms):
p99 < 80ms: PASS
p99.9 < 95ms: PASS
error rate < 1%: PASS
─────────────────────────────────────────────────────
| Percentile | Target | Why |
|---|---|---|
| p99 | < 80ms | 20ms headroom for SSP network round-trip |
| p99.9 | < 95ms | catches tail spikes before SSP timeout |
| p99.99 | < 100ms | any bid above 100ms is a guaranteed loss |
# Start the engine (no rebuild — reuses the image and named volume)
docker compose -f docker/docker-compose.yml up -d
# Seed budget (always — budget resets at midnight UTC, or when container restarts)
curl -s -X POST http://localhost:8081/budget/set \
-H "Content-Type: application/json" \
-d '{"campaign_id":"default","daily_limit_usd":10000000,"max_single_bid_usd":5.0}'
# Run the benchmark
docker compose -f docker/docker-compose.yml --profile benchmark run --rm bench
# Verify budget after run
curl -s http://localhost:8081/budget/default | jq .Only rebuild (--build -v) when you change Rust source files or Cargo.toml.
Docker Desktop on Apple Silicon is not production hardware. Three factors inflate latency and suppress throughput relative to a real Linux server:
| Factor | Docker Desktop | Production (c6i.8xlarge) |
|---|---|---|
| CPU cores (Glommio workers) | 1 (Docker default) | 31 (32 cores minus 1 for OS) |
| CPU architecture | ARM64 (Apple Silicon VM) | x86_64 bare metal |
| LTO profile | thin (memory limit) |
fat (+10–25% throughput) |
| Network path | Docker bridge (~0.1ms overhead) | Direct NIC (DPDK / SR-IOV) |
Rough production extrapolation from a Docker result: measured_rps × 31 cores × 1.15 (fat LTO). A Docker result of 21.5k RPS → estimated ~766k RPS per production node → ~13 nodes to reach 10M RPS. Treat this as an order-of-magnitude check only; the real number requires a bare-metal Linux benchmark.
What Docker results do validate:
- The SLA shape is correct (p99/p99.9 percentiles pass/fail correctly)
- The budget guard and bid pipeline are stable under sustained load
- The error rate is within tolerance
- The Glommio engine starts cleanly and handles real io_uring on Linux
redis-server --daemonize yes
REDIS_URL=redis://127.0.0.1:6379 \
ANTHROPIC_API_KEY=sk-ant-... \
cargo run -p fast-pathWith REDIS_URL set:
- Budget spend is coordinated via Lua CAS across all engine instances
- Bid strategies are written to Redis and propagated via pub/sub
- Daily budget reset resets both local
AtomicI64and the Redis key
# Simulate a budget alert event
curl -X POST http://localhost:8081/events/ingest \
-H "Content-Type: application/json" \
-d '{
"event_type": "budget_alert",
"campaign_id": "default",
"summary": "Spend rate $420/hr is 2.1x expected. Daily pace $10,080 exceeds $10k limit.",
"severity": "warn"
}'
# The agent will read this event on the next cycle via GetBidPerformanceTool
# and factor it into its bid-shading decision.Honest ceiling: macOS
kqueuetops out at ~500k–900k RPS on an M5 MacBook Air for this workload. The 10M RPS target requires Linux io_uring (Glommio path). These steps let you saturate the macOS path completely so you can observe the agent, budget guard, and Prometheus metrics all running at real stress.
sudo sysctl -w kern.maxfiles=1048576
sudo sysctl -w kern.maxfilesperproc=524288
sudo sysctl -w net.inet.tcp.msl=1000
sudo sysctl -w net.inet.ip.portrange.first=1024
ulimit -n 524288cargo build --release -p fast-path
RUST_LOG=fast_path=warn ANTHROPIC_API_KEY=sk-ant-api03-... \
./target/release/fast-pathRelease build removes debug assertions and enables LTO. warn-level logging eliminates the per-bid JSON log line that becomes the bottleneck above ~200k RPS.
curl -s -X POST http://localhost:8081/budget/set \
-H "Content-Type: application/json" \
-d '{
"campaign_id": "default",
"daily_limit_usd": 99999999.0,
"max_single_bid_usd": 500.0
}' | jq .Without this, the budget guard exhausts the $100 default in under a second at peak RPS and every subsequent bid is blocked — you'd be benchmarking the rejection path, not the bid path.
Run each stage for 30 seconds and observe participation_pct and budget_block_pct from /metrics between stages.
# Stage 1 — warm up (expect ~200k–350k RPS)
wrk -t4 -c200 -d30s -s wrk_bid.lua http://localhost:8080
# Stage 2 — moderate pressure (expect ~350k–550k RPS)
wrk -t8 -c600 -d30s -s wrk_bid.lua http://localhost:8080
# Stage 3 — high pressure (expect ~500k–750k RPS)
wrk -t10 -c1200 -d30s -s wrk_bid.lua http://localhost:8080
# Stage 4 — saturate all P-cores (run 4 wrk processes in parallel)
for i in 1 2 3 4; do
wrk -t2 -c300 -d60s -s wrk_bid.lua http://localhost:8080 &
done
wait| RPS Range | Symptom | Root Cause |
|---|---|---|
| < 200k | participation_pct drops, no errors |
Budget guard Redis round-trip latency |
| 200k–500k | Latency p99 climbs > 5ms | kqueue syscall overhead per accepted connection |
| 500k–700k | Too many open files errors |
kern.maxfilesperproc ceiling reached |
| 700k–900k | Throughput plateaus, CPU ~95% | Single Tokio thread-pool saturated |
| > 900k | wrk reports socket errors | OS TCP backlog exhausted (net.core.somaxconn equivalent) |
macOS kqueue : 1 syscall per event notification
Linux io_uring: 4096 bid requests batched per single io_uring_submit() syscall
→ 4096× fewer kernel crossings per unit work
→ Glommio thread-per-core pins each shard to 1 physical core
→ no Tokio work-stealing, no cross-core cache misses
The M5's 4 performance cores are fully capable of 10M RPS in theory (each core at 2.4GHz can retire ~2.4B simple instructions/sec). The constraint is syscall frequency, not compute. Glommio + io_uring on Linux removes that constraint.
| Component | Specification | Purpose |
|---|---|---|
| Bid server hosts | c6i.8xlarge (32 vCPU, 64GB RAM) | Glommio io_uring executor pool |
| Redis cluster | ElastiCache r7g.large (2-node) | Budget CAS + strategy pub/sub |
| Prometheus | m5.xlarge | Metrics aggregation across fleet |
| Agent hosts | Shared with API tier | Tokio runtime, infrequent LLM calls |
| Variable | Required | Description |
|---|---|---|
ANTHROPIC_API_KEY |
Yes (for agent) | Claude API key. Without it, agent skips cycles silently (debug log). Bid path unaffected. |
REDIS_URL |
Recommended for prod | e.g. redis://redis-cluster.internal:6379. Without it, budget is local-only. |
AAMP_PUBLIC_HOSTNAME |
For win notices | e.g. engine-1.prod.example.com:8081. Enables nurl/lurl in bids. |
RUST_LOG |
Optional | e.g. fast_path=info,aamp_protocol=info. JSON structured output. |
# fast-path-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: fast-path
spec:
replicas: 20 # 20 × 32 cores = 640 cores for bid evaluation
selector:
matchLabels:
app: fast-path
template:
metadata:
labels:
app: fast-path
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8081"
prometheus.io/path: "/metrics"
spec:
containers:
- name: fast-path
image: your-registry/rtb-engine:latest
ports:
- containerPort: 8080 # Bid fast-path
- containerPort: 8081 # AAMP API + Prometheus
env:
- name: ANTHROPIC_API_KEY
valueFrom:
secretKeyRef:
name: anthropic
key: api-key
- name: REDIS_URL
value: redis://redis-cluster.internal:6379
- name: RUST_LOG
value: fast_path=info,aamp_protocol=info
resources:
requests:
cpu: "30" # Reserve 30 cores; 2 reserved for OS + agent
memory: "32Gi"
limits:
cpu: "32"
memory: "64Gi"
securityContext:
capabilities:
add:
- IPC_LOCK # io_uring locked-memory ring
- NET_ADMIN # XDP filter attachment
---
apiVersion: v1
kind: Service
metadata:
name: fast-path-bid
spec:
type: LoadBalancer
selector:
app: fast-path
ports:
- port: 80
targetPort: 8080
protocol: TCP# Multi-stage release build
docker build \
-f docker/Dockerfile.dev \
-t your-registry/rtb-engine:$(git rev-parse --short HEAD) \
.
docker push your-registry/rtb-engine:$(git rev-parse --short HEAD)Before receiving traffic, seed budgets via the AAMP API. Automate this in your campaign management pipeline:
for CAMPAIGN in campaign-001 campaign-002 campaign-003; do
curl -X POST http://${ENGINE_HOST}:8081/budget/set \
-H "Content-Type: application/json" \
-d "{
\"campaign_id\": \"${CAMPAIGN}\",
\"daily_limit_usd\": 5000.0,
\"max_single_bid_usd\": 5.0
}"
done# prometheus.yml — scrape all engine instances
scrape_configs:
- job_name: rtb-engine
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: "true"
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: (.+)
replacement: ${1}:8081
# Key alerts
# Budget exhaustion rate spike → agent auto-shading triggered
# Participation rate drop → agent investigating low win rate
# Agent cycle failure → fallback to last known strategyHorizontal scaling: Each engine process independently connects to Redis for budget CAS. Strategy changes by the agent on any instance propagate via pub/sub within ~1-2ms to all instances. No shared mutable state outside Redis.
Budget reset: The midnight UTC reset runs on every instance simultaneously, resetting both the local AtomicI64 and the Redis key. First writer wins; subsequent writes are idempotent (SET to the same daily_limit value).
Agent coordination: In a 20-instance fleet, 20 agent instances each run every 5 minutes. Each reads the same Redis strategy data. To avoid redundant LLM calls, designate one instance as the "agent-primary" via a Redis lock, or accept that 20 concurrent cycles/5min = $0.04/day in LLM cost (still negligible).
XDP activation: On dedicated bare-metal or c6i instances with a known NIC name:
AAMP_XDP_INTERFACE=eth0 cargo run -p fast-path # or set in EngineConfig
cargo xtask ebpf-build # compile eBPF program firstGraceful rolling deployments: The 100ms shutdown_timeout_ms window covers 99.9% of in-flight bid evaluations. In Kubernetes, set terminationGracePeriodSeconds: 5 to give the engine time to drain, deregister from the load balancer, and exit cleanly.
# Check entire workspace compiles (macOS, no Glommio)
cargo check --workspace
# Run all tests
cargo test --workspace
# Lint (clippy + fmt)
cargo xtask lint
# Build Linux Docker image
cargo xtask docker-build
# Run Linux container
docker compose -f docker/docker-compose.yml up
# Start with Redis (multi-process mode)
REDIS_URL=redis://localhost:6379 \
ANTHROPIC_API_KEY=sk-ant-... \
RUST_LOG=fast_path=debug,aamp_protocol=debug \
cargo run -p fast-pathThree numbers determine whether an agentic system is profitable in production: the rate at which it completes tasks successfully, what each completed task costs to deliver, and what a buyer is actually willing to pay for one. This section works through all three for this engine using the real cost structure in the codebase.
In this engine, "task" has two distinct meanings depending on which layer you look at.
Bid pipeline completion rate is the percentage of incoming auction requests that result in a submitted bid. The engine blocks a bid in two cases: the floor price exceeds the campaign's max_single_bid_usd safety valve, or the daily budget is exhausted (try_spend returns false). Every blocked bid is a lost auction entry — no revenue opportunity, same infra cost. At healthy budget and conservative safety settings, participation should sit above 90%.
Agent cycle completion rate is the percentage of 5-minute optimization cycles that successfully read live metrics and write an updated strategy. A failed cycle is not catastrophic — the previous strategy stays in place — but over several consecutive failures the strategy drifts stale while market conditions change. The engine runs 288 cycles per day. At $0.011 per cycle (see Lever 2), a 10% failure rate wastes ~$3.80/day on retries that produce nothing and leaves strategy stale for ~29 of those cycles.
Each cycle makes 4–5 Anthropic API calls (one per tool call round-trip). Using claude-sonnet-4-6 pricing at $3/MTok input and $15/MTok output:
| Token bucket | Tokens | Cost |
|---|---|---|
| System prompt + context | ~1,200 | $0.0036 |
| Tool schemas (4 tools) | ~400 | $0.0012 |
| Tool results (4 calls) | ~600 | $0.0018 |
| Model output (tool calls + reasoning ≤280 chars) | ~400 | $0.0060 |
| Cycle total | ~2,600 | ~$0.013 |
At 288 cycles/day: ~$3.74/day, ~$112/month in LLM spend.
The shorter reasoning format (≤280 chars enforced in SetBidStrategyTool) directly reduces output token cost versus an unconstrained reasoning field. A 1,500-character reasoning block at $15/MTok costs ~$0.022 in output tokens alone — 3.7× more than the 280-char cap.
A single c6i.8xlarge instance at AWS on-demand pricing ($1.536/hr) costs ~$1,106/month. At 10M RPS sustained, the per-bid infrastructure cost is effectively zero — fixed overhead amortized across billions of requests. The agent and AAMP API run on a dedicated 2-thread Tokio pool and consume less than 2% of available CPU.
With a 1-year reserved instance: ~$740/month.
| Item | On-demand | Reserved (1yr) |
|---|---|---|
| c6i.8xlarge | $1,106 | $740 |
| ElastiCache r7g.large (Redis, 2-node) | $210 | $140 |
| LLM (288 cycles/day) | $112 | $112 |
| Total | ~$1,430/month | ~$992/month |
The value the engine delivers is bid shading in first-price auctions. Since Google's 2019 shift and the subsequent industry follow, the majority of programmatic inventory now clears in first-price auctions. In a first-price auction, you pay exactly what you bid — there is no second-price safety net. Bidding floor × 1.10 (the default before the agent acts) systematically overpays.
The agent narrows the gap between your bid and the actual clearing price by continuously adjusting margin_multiplier based on observed pacing, budget health, and (once win/loss feedback is added — see backlog) actual clearing prices returned by SSPs.
What the engine saves today (without nurl/lurl win/loss feedback):
The agent currently optimizes budget pacing — preventing over-spend that forces emergency bid-shading, and preventing under-delivery when budget is healthy. This alone is worth approximately 3–8% CPM reduction versus a static multiplier that reacts to nothing. The primary mechanism is preventing the two most expensive failure modes: burning the daily budget in the first two hours (no bids for the remaining 22 hours) and leaving budget unspent at day end.
What the engine saves with win/loss feedback (backlog item 1 — nurl/lurl):
When SSPs call back with clearing prices, the agent gains the market signal it needs for genuine first-price bid shading. Industry benchmarks from DSPs with ML-based shading (The Trade Desk, Criteo, Xandr) show 15–25% CPM reduction versus naive fixed-multiplier bidding. That is the ceiling this architecture is designed to reach.
| Managed ad spend | 5% CPM saving | 15% CPM saving | Monthly engine cost | Break-even at |
|---|---|---|---|---|
| $10k/mo | $500 | $1,500 | $1,430 | 15% |
| $30k/mo | $1,500 | $4,500 | $1,430 | 5% |
| $100k/mo | $5,000 | $15,000 | $1,430 | <5% |
| $1M/mo | $50,000 | $150,000 | $1,430 | <1% |
| $10M/mo | $500,000 | $1,500,000 | $1,430* | <0.1% |
*Multi-instance fleet required above ~2–3M RPS sustained; cost scales linearly with instances, value scales linearly with managed spend. The ratio improves at scale.
The engine becomes net-positive at approximately $30k/month managed ad spend under today's pacing-only optimization. With win/loss feedback and full bid shading, break-even drops to roughly $10k/month. Below those thresholds, a managed DSP service is likely cheaper. Above them, the margin widens quickly.
Independent DSPs and trading desks managing $1M–$500M in annual programmatic spend are the primary market. They currently pay for bid shading either via platform fees on managed DSPs (typically 8–15% of spend) or via in-house engineering teams maintaining static rule sets. This engine replaces the rule set with an agent-driven feedback loop at a fixed infrastructure cost.
Large advertisers that have brought programmatic in-house — CPG, retail, financial services — pay agency trading desk margins of 5–15% on managed spend to avoid building their own stack. At $5M/year managed spend, a 10% agency margin is $500k/year. The engine's all-in cost at that volume is under $20k/year.
Ad networks adding programmatic inventory to what was previously direct-sold. They need a bidding engine to participate in open auction without building from scratch. The AAMP API gives them a management surface; the agent handles strategy without a dedicated optimization engineer.
Ad tech infrastructure vendors building white-label DSP platforms. The engine is designed as an embeddable component — the aamp-protocol crate exposes the full state surface, and the agent is a background task that can be replaced or extended without touching the bid path.
What this engine is not suited for today: pure brand-awareness campaigns where impression count matters more than CPM efficiency, campaigns with sub-$10k/month managed spend (fixed costs dominate), and any buyer who needs guaranteed impression delivery (the engine bids but does not yet model win probability per SSP).
See BACKLOG.md for the prioritised feature backlog. Each task contains a self-contained implementation prompt.
The highest-priority unimplemented capability is win/loss feedback (Task 1 in the backlog): adding nurl/lurl to the Bid struct so SSPs can call back with auction clearing prices. Without this, the agent is optimizing against budget-pacing signals only. With it, the agent gains the market price signal needed for genuine first-price auction bid shading — the primary economic thesis of the project.