Agentic RTB Engine

A production-grade Real-Time Bidding engine written in Rust, combining a thread-per-core io_uring fast-path capable of ~10M RPS with an embedded Claude AI agent that autonomously optimizes bid strategies every 5 minutes using live campaign performance data.

The engine is agentic in two senses: structurally (a background LLM agent with tools that read and mutate shared engine state) and economically (the agent's bid-shading decisions directly determine revenue — a 1% CPM improvement on $1M managed spend saves $10k/month).

LLMs Used and What They Solve

Model: `claude-sonnet-4-6` (Anthropic)

Accessed via rig-core 0.35, the Rust-native LLM agent framework.

What the LLM Solves

RTB bid optimization is a continuous online decision problem. The traditional approach is a static rule set: "bid floor × 1.10". This leaves money on the table in two ways simultaneously — overbidding in low-competition auctions (wasting CPM budget) and underbidding in high-value auctions (losing impressions you should have won).

Claude solves this by acting as a closed-loop controller over bid strategy parameters:

Signal In	Claude Reasons About	Action Out
`participation_pct`	Am I entering enough auctions?	Adjust `margin_multiplier`
`budget_block_pct`	Is the campaign over-pacing?	Shade bids to slow spend
`hourly_spend_usd` vs daily budget	Will budget run out before end of day?	Pace aggressively or conservatively
SSP CPM ceilings by category	Which inventory is worth targeting?	Set `max_cpm_usd` per strategy
`agent_reasoning` from prior cycle	What did I try last time?	Avoid repeating failed strategies

Why Claude Specifically

Structured tool use: Claude reliably calls typed Rust tool functions (set_bid_strategy, get_bid_performance) with correct JSON schemas, enabling safe state mutation from natural language reasoning.
Audit trail: Every strategy change includes a required reasoning field (≤512 chars) stored in BidStrategyStore, giving operators a human-readable explanation of every automated decision.
Safety-aware reasoning: The PREAMBLE instructs Claude to respect hard bounds (margin_multiplier ∈ [0.90, 1.50]). Even if Claude hallucinates an extreme value, the tool implementation clamps it.
Cost efficiency: At claude-sonnet-4-6 pricing (~$3/MTok), a 500-token optimization cycle costs ~$0.002. At 288 cycles/day that is $0.58/day in LLM cost versus $100–$250/day in CPM savings from bid shading — a >170× daily ROI on inference cost.

High-Level Design

┌─────────────────────────────────────────────────────────────────────────────────┐
│                           Agentic RTB Engine Process                             │
├──────────────────────────────────┬──────────────────────────────────────────────┤
│         BID FAST-PATH            │              INTELLIGENCE LAYER               │
│         Port 8080                │              Port 8081 (AAMP API)             │
│                                  │                                               │
│  ┌───────────────────────────┐   │   ┌────────────────────────────────────┐     │
│  │   Glommio Executor Pool   │   │   │          RtbAgent (Claude)          │     │
│  │   (Linux: N-1 cores)      │   │   │                                    │     │
│  │   Tokio (macOS dev)       │   │   │  Every 5 min:                      │     │
│  │                           │   │   │  1. build context from SharedState │     │
│  │  Per connection:          │   │   │  2. prompt Claude claude-sonnet-4-6 │     │
│  │  ① Parse HTTP/1.x        │   │   │  3. Claude calls tools:            │     │
│  │  ② simd-json parse       │   │   │     • get_bid_performance          │     │
│  │  ③ Safety valve check    │   │   │     • check_budget                 │     │
│  │  ④ Budget CAS (Redis or  │   │   │     • get_ssp_inventory            │     │
│  │     AtomicI64)           │   │   │     • set_bid_strategy  ◄──────────┼──┐  │
│  │  ⑤ Evaluate bid price    │   │   │  4. updated strategy stored        │  │  │
│  │  ⑥ FlatBuffers response  │   │   └────────────────────────────────────┘  │  │
│  │  ⑦ Write to socket       │   │                                            │  │
│  └───────────┬───────────────┘   │   ┌─────────────────────────────────────┐ │  │
│              │ reads             │   │  AAMP API (Axum)                    │ │  │
│              ▼                   │   │  GET  /health                        │ │  │
│  ┌───────────────────────────┐   │   │  GET  /metrics  (Prometheus)         │ │  │
│  │       SharedState         │◄──┼───│  POST /budget/set                    │ │  │
│  │       Arc<...>            │   │   │  GET  /budget/{id}                   │ │  │
│  │                           │   │   │  POST /events/ingest                 │ │  │
│  │  • BudgetGuard            │   │   │  GET  /agent/strategy                │ │  │
│  │  • BidStrategyStore ◄─────┼───┼───│  POST /agent/optimize  (stub)        │ │  │
│  │  • BidMetrics (Prometheus)│   │   │  GET  /ssp/registry                  │ │  │
│  │  • SspRegistry            │   │   │  POST /negotiate/deal  (stub)        │ │  │
│  │  • EventStore             │   │   └─────────────────────────────────────┘ │  │
│  └───────────────────────────┘   │                                            │  │
│                                  │   ┌─────────────────────────────────────┐  │  │
│                                  │   │  Background Tasks                   │  │  │
│                                  │   │  • SSP Discovery Worker (hourly)    │  │  │
│                                  │   │  • Daily Budget Reset (midnight UTC)│  │  │
│                                  │   │  • Hourly Velocity Reset            │  │  │
│                                  │   │  • Strategy pub/sub subscriber      │──┘  │
│                                  │   └─────────────────────────────────────┘     │
└──────────────────────────────────┴──────────────────────────────────────────────┘
                                        │
                               ┌────────▼────────┐
                               │      Redis       │
                               │  (optional)      │
                               │                  │
                               │  budget:{id}     │  ← atomic Lua CAS
                               │  strategy:{id}   │  ← HSET + pub/sub
                               └─────────────────┘

Two-Runtime Architecture (Linux)

On Linux, the process runs two runtimes that never share a thread:

Runtime	Threads	Responsibilities
Glommio executor pool	N-1 CPU cores	Bid fast-path: `accept` → parse → evaluate → respond
Tokio (dedicated `std::thread`)	2	AAMP Axum API, RtbAgent, Discovery worker, Reset tasks

All coordination between runtimes uses std::sync primitives inside Arc<SharedState> — no Tokio channels crossing the executor boundary.

On macOS, a single Tokio multi-thread runtime runs everything (development only, not for benchmarking).

Low-Level Design

Workspace Crate Map

rtb-engine/
├── crates/
│   ├── fast-path/          # Binary: bid server (ports 8080 + 8081)
│   │   ├── bid_eval.rs     # Core: evaluate() — hot path, async, ~50ns/request
│   │   ├── engine.rs       # Linux: Glommio thread-per-core executor
│   │   ├── engine_dev.rs   # macOS: Tokio fallback
│   │   ├── fbs_bid.rs      # FlatBuffers BidResponse serialiser
│   │   ├── http_parser.rs  # Zero-copy HTTP/1.x request parser (httparse)
│   │   ├── buffer_pool.rs  # Thread-local slab allocator for output buffers
│   │   ├── config.rs       # EngineConfig (listen addr, workers, ring depth)
│   │   └── xdp_loader.rs   # eBPF XDP attach (Linux / Phase 3)
│   │
│   ├── agent/              # Library: rig-core Claude agent
│   │   ├── lib.rs          # RtbAgent: run_optimization_cycle()
│   │   ├── tools.rs        # 4 MCP-style tools (read/write SharedState)
│   │   └── context.rs      # RAG context builder → compact prompt string
│   │
│   ├── aamp-protocol/      # Library: shared state + Axum API
│   │   ├── state.rs        # SharedState, BudgetGuard, BidMetrics, EventStore
│   │   ├── api.rs          # Axum router + all HTTP handlers
│   │   ├── discovery.rs    # SSP discovery worker (reqwest, hourly)
│   │   └── lib.rs          # run_api(), run_strategy_subscriber()
│   │
│   └── shared-types/       # Library: OpenRTB types, error kinds
│       └── bid.rs          # BidRequest, BidResponse, Bid, SeatBid, Impression
│
├── crates/bench/           # Binary: constant-arrival-rate load generator (p99/p99.9 output)
├── xtask/                  # Build automation (flatc, ebpf-build, docker-build, lint)
├── docker/
│   ├── Dockerfile.dev      # Ubuntu 22.04 + Rust + io_uring deps (engine image)
│   ├── Dockerfile.bench    # Multi-stage build: Rust bench binary, minimal runtime image
│   └── docker-compose.yml  # CAP_IPC_LOCK + seccomp:unconfined + bench profile
└── BACKLOG.md              # Prioritised feature backlog with implementation prompts

Data Flow: Single Bid Request

SSP ──POST /bid──► [TCP accept]
                      │
                   [httparse]  zero-copy header parse, extracts body offset
                      │
                   [simd-json] in-place SIMD tokenisation → BidRequest struct
                      │
                   [bid_eval::evaluate()]  ← async
                      │
                   ┌──┴──────────────────────────────────┐
                   │  for each impression:                │
                   │  1. (floor × margin_multiplier)      │
                   │     .min(strategy.max_cpm_usd)       │
                   │  2. is_price_safe() → safety valve   │
                   │  3. try_spend_redis().await          │
                   │     → Redis Lua CAS (if configured)  │
                   │     → AtomicI64 fallback             │
                   │  4. push SeatBid                     │
                   └──┬──────────────────────────────────┘
                      │
                   [BidMetrics::record()]  Prometheus counter increment
                      │
                   [fbs_bid::serialize()]  FlatBuffers into PooledBuf
                      │
                   [stream.write_all()]  HTTP headers + binary body
                      │
                      ▼
                   SSP receives FlatBuffers BidResponse

Agent Optimization Cycle (Every 5 Minutes)

[background Tokio task]
        │
        ▼
context::build(&state)
   → BUDGET STATUS: remaining, daily_limit, spent_today,
                   hourly_spend, daily_pace (elapsed-day extrapolation,
                   not hourly × 24 — more accurate under bursty traffic)
   → BID METRICS: participation%, budget-block%, requests
   → SSP INVENTORY: names, max CPM, categories
   → CURRENT STRATEGIES: margin, max_cpm, last reasoning
   → RECENT EVENTS: last 5 per campaign (newest first)
        │
        ▼
claude-sonnet-4-6.prompt(context)
        │
        ├── [tool call] get_bid_performance("default")
        │       → BidPerformanceOutput { participation_pct, budget_block_pct,
        │                                hourly_spend_usd, daily_pace_usd,
        │                                recent_events }
        │
        ├── [tool call] check_budget("default")  [optional]
        │
        ├── [tool call] get_ssp_inventory(category=None)  [optional]
        │
        └── [tool call] set_bid_strategy("default", margin, max_cpm, reasoning)
                → reasoning: 2–3 sentences ≤280 chars (hard-truncated at sentence boundary)
                → clamps margin ∈ [0.90, 1.50]
                → BidStrategyStore.upsert_redis()
                   → local RwLock write
                   → Redis HSET strategy:default
                   → Redis PUBLISH strategy:updates
                → all engine instances pick up new strategy
                   within one pub/sub delivery cycle (~1ms)

Core Feature System Design

1. Budget Guard — Lock-Free AtomicI64 CAS

The budget guard is the primary financial safety system. A bug that lets bids through without checking can drain a $10k daily budget in 0.6 seconds at 10M RPS.

Hot path (called ~10M times/sec):
┌─────────────────────────────────────────────────────┐
│  try_spend_redis(campaign_id, amount_microdollars)  │
│                                                     │
│  if redis pool configured:                          │
│    EVAL lua_cas_script, KEYS[budget:{id}], ARGV[amt]│
│    → atomic: check remaining >= amount, decrement   │
│    → returns: 1=success, 0=exhausted, -1=not found  │
│    → fallback to AtomicI64 on Redis error           │
│                                                     │
│  else (single-process / dev):                       │
│    RwLock.read() → clone Arc<AtomicBudgetEntry>     │
│    loop:                                            │
│      current = remaining.load(Acquire)              │
│      if current < amount: return false              │
│      compare_exchange(current, current-amount)      │
│      on Ok: track hourly velocity, return true      │
│      on Err: retry (another thread raced)           │
└─────────────────────────────────────────────────────┘

Why not a Mutex?
  Mutex serialises all campaigns globally.
  AtomicI64 CAS per-campaign: 50 campaigns bid in parallel
  with zero cross-campaign contention.
  At 10M RPS: Mutex P99 ≈ 18ms vs CAS P99 ≈ 2ms.
  3 CPU cores recovered per host.

2. FlatBuffers Response — Zero-Alloc Binary Serialisation

BidResponse (Rust struct)
        │
        ▼
FlatBufferBuilder (512B stack scratch)
  build inside-out: Bid → SeatBid → BidResponse
  push vtable slots by pre-computed offsets:
    bid_vt::ID    = 4    (field 0)
    bid_vt::IMPID = 6    (field 1)
    bid_vt::PRICE = 8    (field 2)
        │
        ▼
fbb.finished_data() → &[u8]
        │
        ▼
PooledBuf::acquire()  ← thread-local slab, zero malloc
  extend_from_slice(finished_data)
        │
        ▼
stream.write_all(headers).await
stream.write_all(&body_buf).await
        │ PooledBuf drops → returns to pool
        ▼
SSP receives ~90-byte binary payload
(vs ~130-byte JSON — 30% smaller, zero float formatting)

3. BidStrategyStore — Multi-Process Propagation

Agent writes strategy:
  upsert_redis(campaign_id, BidStrategy)
       │
       ├─► local RwLock write (instant, in-process)
       │
       └─► Redis pipeline (cold path, ~0.5ms):
             HSET strategy:{id}
               margin_multiplier  "1.08"
               max_cpm_usd        "4.50"
               agent_reasoning    "win rate low, raising margin..."
             PUBLISH strategy:updates  "{campaign_id}"

All other engine processes (subscriber task):
  ← receives PUBLISH message
  → load_from_redis(): KEYS strategy:* → HMGET each key
  → upsert() into local BidStrategyStore
  → hot path reads new strategy within next RwLock.read()
  latency: ~1-2ms end-to-end across fleet

Hot-path read (once per request, amortised across all impressions):
  BidStrategyStore.get_bid_params(campaign_id)
  = RwLock.read() → HashMap.get() → (f64, f64)
  = (margin_multiplier, max_cpm_usd) in one lock acquisition (~50ns)
  No Redis on hot path. Local read only.

4. Prometheus Metrics — Fleet-Wide Observability

Per engine instance:
  BidMetrics backed by prometheus::Registry
    rtb_requests_total{campaign_id}       Counter
    rtb_bids_submitted_total{campaign_id} Counter
    rtb_blocked_safety_total{campaign_id} Counter
    rtb_blocked_budget_total{campaign_id} Counter

  Scraped by central Prometheus server:
    GET http://{engine-host}:8081/metrics
    → text/plain 0.0.4 format
    → Prometheus aggregates across all instances
    → Agent reads fleet-wide totals via GetBidPerformanceTool

Running Locally — See It in Action

Prerequisites

# Rust stable (1.75+)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# wrk HTTP benchmarking tool
brew install wrk          # macOS
apt install wrk           # Ubuntu

# Optional: Redis (for multi-process budget coordination)
brew install redis

Step 1 — Set your Anthropic API key

export ANTHROPIC_API_KEY=sk-ant-api03-...

The engine starts and runs without this key — the agent cycle skips silently (debug-level log only). The bid fast-path continues operating normally.

Step 2 — Build and start the engine (macOS / Tokio dev mode)

git clone <repo>
cd rtb-engine
cargo run -p fast-path

Expected output:

{"timestamp":"...","level":"WARN","message":"Non-Linux platform: running Tokio dev fallback..."}
{"timestamp":"...","level":"INFO","message":"AAMP management API listening","addr":"0.0.0.0:8081"}
{"timestamp":"...","level":"INFO","message":"[DEV] Bid server listening via Tokio","addr":"0.0.0.0:8080"}

Step 3 — Seed a campaign budget

curl -s -X POST http://localhost:8081/budget/set \
  -H "Content-Type: application/json" \
  -d '{
    "campaign_id":        "default",
    "daily_limit_usd":    10000.0,
    "max_single_bid_usd": 5.0
  }' | jq .

Expected:

{ "campaign_id": "default", "daily_limit_usd": 10000.0, "status": "set" }

Step 4 — Send a single bid request (smoke test)

curl -s -X POST http://localhost:8080 \
  -H "Content-Type: application/json" \
  -d '{
    "id": "auction-001",
    "imp": [
      {"id": "imp-1", "bidfloor": 1.50, "bidfloorcur": "USD"},
      {"id": "imp-2", "bidfloor": 0.80, "bidfloorcur": "USD"}
    ],
    "tmax": 100
  }'

The response is FlatBuffers binary. To verify it's non-empty:

curl -s -o /dev/null -w "%{http_code} size=%{size_download}\n" \
  -X POST http://localhost:8080 \
  -H "Content-Type: application/json" \
  -d '{"id":"test","imp":[{"id":"i1","bidfloor":1.5}],"tmax":100}'
# 200 size=92

204 means no-bid (budget exhausted or all impressions blocked by safety valve).

Step 5 — Load test with wrk

Save this as wrk_bid.lua in the repo root (it is excluded from git — see .gitignore):

-- wrk_bid.lua
-- Simulates a continuous stream of OpenRTB bid requests.
-- ${AUCTION_PRICE} is an SSP macro — include it literally to test nurl handling.
math.randomseed(os.time())

wrk.method  = "POST"
wrk.headers["Content-Type"] = "application/json"

local auction_ids = {}
for i = 1, 1000 do
    auction_ids[i] = string.format("auction-%06d", i)
end

local floors = {0.50, 0.80, 1.00, 1.25, 1.50, 2.00, 2.50, 3.00}

request = function()
    local id    = auction_ids[math.random(1, 1000)]
    local floor = floors[math.random(1, #floors)]
    local body  = string.format(
        '{"id":"%s","imp":[{"id":"imp-1","bidfloor":%.2f,"bidfloorcur":"USD"}],"tmax":100}',
        id, floor
    )
    return wrk.format(nil, "/", nil, body)
end

Run the benchmark:

# Warm up
wrk -t2 -c50 -d10s -s wrk_bid.lua http://localhost:8080

# Full load test — macOS Tokio dev (expect 50k–300k RPS depending on hardware)
wrk -t8 -c400 -d30s -s wrk_bid.lua http://localhost:8080

Expected output on macOS (Tokio dev mode, M-series chip):

Running 30s test @ http://localhost:8080
  8 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.82ms    0.91ms  18.43ms  87.34%
    Req/Sec    28.14k     3.21k   41.07k   72.00%
  6,731,204 requests in 30.02s, 1.23GB read
Requests/sec: 224,228.13
Transfer/sec: 41.97MB

Note on 10M RPS: The macOS Tokio dev mode is not for production benchmarking. 10M RPS requires the Glommio io_uring thread-per-core path on Linux. See Docker benchmark below.

Step 6 — Watch the AI agent optimize in real time

The agent fires every 300 seconds. To force an early run and watch it:

# Terminal 1: build once, then pipe the binary's output directly to jq.
# Do NOT use `cargo run | jq` — cargo writes compilation progress to stderr,
# which 2>&1 merges with the program's JSON logs and breaks jq on the first
# non-JSON line. Running the pre-built binary produces JSON-only output.
cargo build -p fast-path

# Clean view — message only. Use for normal monitoring.
ANTHROPIC_API_KEY=sk-ant-... \
  ./target/debug/fast-path 2>&1 | \
  jq -r 'select(.fields.message != null) | "\(.timestamp[11:19]) [\(.level)] \(.fields.message)"'

# Diagnostic view — appends the error field when present.
# Use when you see warnings (e.g. "Agent cycle failed") and need to know why.
ANTHROPIC_API_KEY=sk-ant-... \
  ./target/debug/fast-path 2>&1 | \
  jq -r 'select(.fields.message != null) |
    "\(.timestamp[11:19]) [\(.level)] \(.fields.message)\(
      if .fields.error != null then " — \(.fields.error)" else "" end
    )"'

# Terminal 2: generate some load first (so the agent has metrics to reason about)
wrk -t4 -c200 -d60s -s wrk_bid.lua http://localhost:8080 &

# Terminal 3: check current strategy before agent runs
curl -s http://localhost:8081/agent/strategy | jq .

# Wait ~5 min, or restart engine with lower interval for demo:
# macOS: #[cfg(not(target_os = "linux"))] edit line ~139 in crates/fast-path/src/main.rs
# Linux: #[cfg(target_os = "linux")] edit line ~168 in crates/fast-path/src/main.rs
# Change: Duration::from_secs(300) → Duration::from_secs(30)

After the agent runs, you will see:

{"level":"DEBUG","message":"Running agent optimization cycle","context_len":487}
{"level":"INFO","message":"Agent updated bid strategy",
 "campaign_id":"default","margin":1.08,"cpm_cap":4.5}
{"level":"INFO","message":"Agent optimization cycle complete","response_len":312}

# Verify strategy changed
curl -s http://localhost:8081/agent/strategy | jq .

[{
  "campaign_id":       "default",
  "margin_multiplier": 1.08,
  "max_cpm_usd":       4.50,
  "agent_reasoning":   "Participation rate is 94% with healthy budget ($9,847 remaining). Budget-block rate is 0.2%, indicating no pacing issues. Raising margin from 1.10 to 1.08 to test if we can win more high-value auctions while maintaining positive ROI. Will monitor budget-block rate next cycle.",
  "updated_at_unix":   1747123456
}]

Step 7 — View Prometheus metrics

curl -s http://localhost:8081/metrics

# HELP rtb_requests_total Total bid requests seen
# TYPE rtb_requests_total counter
rtb_requests_total{campaign_id="default"} 6731204
# HELP rtb_bids_submitted_total Total bids submitted
# TYPE rtb_bids_submitted_total counter
rtb_bids_submitted_total{campaign_id="default"} 6710891
# HELP rtb_blocked_budget_total Requests blocked by budget guard
# TYPE rtb_blocked_budget_total counter
rtb_blocked_budget_total{campaign_id="default"} 142

Benchmark on Linux via Docker

Two containers run on the same rtb-net Docker bridge network: fast-path (the Glommio engine) and bench (the Rust load generator). Keeping them in separate containers prevents the load generator from competing for the same CPU cores as the engine — a single-container benchmark conflates client overhead with server latency.

┌─────────────────────────────────────────────────────────────┐
│  Docker host (your Mac)                                      │
│                                                              │
│  ┌─────────────────────┐   rtb-net   ┌──────────────────┐  │
│  │  fast-path container │ ◄─────────► │  bench container │  │
│  │  (Glommio engine)    │  internal   │  (Rust client)   │  │
│  │  port 8080 (bids)    │  network    │  --profile bench │  │
│  │  port 8081 (mgmt)    │  ~0.1ms     │  mark            │  │
│  └─────────────────────┘             └──────────────────┘  │
│         │ 8080, 8081                                        │
│         ▼ exposed to host                                    │
│    localhost (curl, browser)                                 │
└─────────────────────────────────────────────────────────────┘

First-time setup

Prerequisites

jq for parsing API responses: brew install jq

Step 1 — Export your Anthropic API key (optional — only needed for the AI agent; bid path works without it)

export ANTHROPIC_API_KEY=sk-ant-api03-...

If omitted, the agent silently skips optimization cycles. The bid fast-path is unaffected.

Step 2 — Verify the Docker daemon is running

docker info > /dev/null 2>&1 && echo "Docker daemon running" || echo "Docker daemon NOT running"

If not running, install and start it:

# Install Colima (lightweight Docker runtime for Mac) and the Docker CLI
brew install colima docker

# Install the Docker Compose CLI plugin
mkdir -p ~/.docker/cli-plugins
ln -sfn $(brew --prefix)/opt/docker-compose/bin/docker-compose ~/.docker/cli-plugins/docker-compose

# Start the Docker daemon
colima start

Then confirm it is ready before proceeding:

docker ps   # should return a table header with no error

Colima stalled state: If colima status reports running but docker ps still gives Cannot connect to the Docker daemon at unix:///.../.colima/default/docker.sock, the socket forwarder between the Linux VM and the macOS host has died — the socket file exists on disk but nothing is listening on it. Fix:
colima stop && colima start
This restarts the VM and re-establishes socket forwarding. Run docker ps again to confirm.

Step 3 — Build and start the engine

docker compose -f docker/docker-compose.yml down -v && \
docker compose -f docker/docker-compose.yml up --build -d

-v removes the fast_path_target named volume so the fresh Linux binary replaces any stale macOS-compiled binary from the bind mount. --build re-compiles inside the container (Linux ARM64 / Glommio).

Why not cargo build? The source directory is bind-mounted into the container at /workspace. If you compile on macOS first, the host target/ directory contains a macOS binary — running it inside Linux gives exec format error. The named volume fast_path_target shadows target/ so only the Linux binary built during docker build is used.

Why does the build use [profile.docker] and not [profile.release]? release uses lto = "fat" which peaks at 6–8 GB RAM during linking. Docker Desktop's default memory limit (4 GB) causes the linker to be OOM-killed (SIGKILL). [profile.docker] inherits release settings but uses lto = "thin" (~1–2 GB peak), delivering ~90% of fat-LTO throughput within the container memory budget.

Step 4 — Confirm Glommio is running (not the macOS Tokio fallback)

docker compose -f docker/docker-compose.yml logs fast-path

Log line	Expected
`Spawning Glommio executor pool (MaxSpread)`	Glommio active ✓
`Shard bound and listening`	Engine accepting bids ✓
`AAMP management API listening` on `0.0.0.0:8081`	Management API up ✓
`Discovery fetch failed` for `example-ssp.invalid`	Expected — placeholder SSP URL ✓
`x-api-key header is required`	ANTHROPIC_API_KEY not set in shell — agent skips, bid path unaffected ✓

If you see [DEV] anywhere — stop. The macOS Tokio fallback is running, not Glommio. Check that security_opt: seccomp:unconfined is present in docker-compose.yml (Docker's default seccomp blocks io_uring_register).

Step 5 — Build the bench image (one-time, ~5 min)

docker compose -f docker/docker-compose.yml --profile benchmark build bench

The bench service uses --profile benchmark so it never starts accidentally alongside the engine. It runs only when explicitly invoked.

Step 6 — Seed the campaign budget

curl -s -X POST http://localhost:8081/budget/set \
  -H "Content-Type: application/json" \
  -d '{"campaign_id":"default","daily_limit_usd":10000000,"max_single_bid_usd":5.0}' | jq .

Why $10M? The default budget is $10,000. At 30k RPS with an average floor of $1.57 CPM, the engine spends ~$0.0017 per bid. $10k exhausts in ~47 seconds of a 60-second benchmark. For the last 13 seconds the engine returns HTTP 204 no-bid (budget guard) instead of HTTP 200 bid — these faster responses inflate throughput and deflate latency, making results unreliable. $10M gives ~100 hours of headroom.

Step 7 — Run the benchmark

docker compose -f docker/docker-compose.yml --profile benchmark run --rm bench

Default: 30,000 req/s, 60 seconds, 300 connections. Override any parameter without rebuilding:

docker compose -f docker/docker-compose.yml --profile benchmark run --rm bench \
  --rate 50000 --duration 60 --connections 400

The bench container joins rtb-net and targets http://fast-path:8080 — the engine's internal DNS name — bypassing the Docker port-forward hop.

Step 8 — Verify budget wasn't exhausted during the run

curl -s http://localhost:8081/budget/default | jq .

remaining_usd must be in the millions. Near zero means budget-block 204 responses contaminated your latency numbers — re-seed (Step 6) and re-run.

Step 9 — Read the output

── Results ──────────────────────────────────────────
  Total requests: 1290929
  Recorded:       1290551 (100.0% success)
  Errors:         378 (0.03%)
  Throughput:     21509 req/s

  Latency (successful bids only):
    p50:    14.18ms
    p90:    33.66ms
    p99:    50.53ms   ← must be < 80ms
    p99.9:  61.15ms   ← must be < 95ms
    p99.99: 70.46ms
    max:    88.96ms

  SLA (SSP timeout = 100ms):
    p99   < 80ms:    PASS
    p99.9 < 95ms:    PASS
    error rate < 1%: PASS
─────────────────────────────────────────────────────

Percentile	Target	Why
p99	< 80ms	20ms headroom for SSP network round-trip
p99.9	< 95ms	catches tail spikes before SSP timeout
p99.99	< 100ms	any bid above 100ms is a guaranteed loss

Second time onwards (engine already built)

# Start the engine (no rebuild — reuses the image and named volume)
docker compose -f docker/docker-compose.yml up -d

# Seed budget (always — budget resets at midnight UTC, or when container restarts)
curl -s -X POST http://localhost:8081/budget/set \
  -H "Content-Type: application/json" \
  -d '{"campaign_id":"default","daily_limit_usd":10000000,"max_single_bid_usd":5.0}'

# Run the benchmark
docker compose -f docker/docker-compose.yml --profile benchmark run --rm bench

# Verify budget after run
curl -s http://localhost:8081/budget/default | jq .

Only rebuild (--build -v) when you change Rust source files or Cargo.toml.

Interpreting Docker results vs. production

Docker Desktop on Apple Silicon is not production hardware. Three factors inflate latency and suppress throughput relative to a real Linux server:

Factor	Docker Desktop	Production (c6i.8xlarge)
CPU cores (Glommio workers)	1 (Docker default)	31 (32 cores minus 1 for OS)
CPU architecture	ARM64 (Apple Silicon VM)	x86_64 bare metal
LTO profile	`thin` (memory limit)	`fat` (+10–25% throughput)
Network path	Docker bridge (~0.1ms overhead)	Direct NIC (DPDK / SR-IOV)

Rough production extrapolation from a Docker result: measured_rps × 31 cores × 1.15 (fat LTO). A Docker result of 21.5k RPS → estimated ~766k RPS per production node → ~13 nodes to reach 10M RPS. Treat this as an order-of-magnitude check only; the real number requires a bare-metal Linux benchmark.

What Docker results do validate:

The SLA shape is correct (p99/p99.9 percentiles pass/fail correctly)
The budget guard and bid pipeline are stable under sustained load
The error rate is within tolerance
The Glommio engine starts cleanly and handles real io_uring on Linux

Step 8 — Run with Redis (multi-process budget coordination)

redis-server --daemonize yes

REDIS_URL=redis://127.0.0.1:6379 \
ANTHROPIC_API_KEY=sk-ant-... \
cargo run -p fast-path

With REDIS_URL set:

Budget spend is coordinated via Lua CAS across all engine instances
Bid strategies are written to Redis and propagated via pub/sub
Daily budget reset resets both local AtomicI64 and the Redis key

Step 9 — Ingest operational events (agent feedback)

# Simulate a budget alert event
curl -X POST http://localhost:8081/events/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "event_type":  "budget_alert",
    "campaign_id": "default",
    "summary":     "Spend rate $420/hr is 2.1x expected. Daily pace $10,080 exceeds $10k limit.",
    "severity":    "warn"
  }'

# The agent will read this event on the next cycle via GetBidPerformanceTool
# and factor it into its bid-shading decision.

Extreme Threshold: macOS Performance Ceiling

Honest ceiling: macOS kqueue tops out at ~500k–900k RPS on an M5 MacBook Air for this workload. The 10M RPS target requires Linux io_uring (Glommio path). These steps let you saturate the macOS path completely so you can observe the agent, budget guard, and Prometheus metrics all running at real stress.

Step 1 — Tune macOS socket/file limits

sudo sysctl -w kern.maxfiles=1048576
sudo sysctl -w kern.maxfilesperproc=524288
sudo sysctl -w net.inet.tcp.msl=1000
sudo sysctl -w net.inet.ip.portrange.first=1024
ulimit -n 524288

Step 2 — Build release binary and run with minimal logging

cargo build --release -p fast-path
RUST_LOG=fast_path=warn ANTHROPIC_API_KEY=sk-ant-api03-... \
  ./target/release/fast-path

Release build removes debug assertions and enables LTO. warn-level logging eliminates the per-bid JSON log line that becomes the bottleneck above ~200k RPS.

Step 3 — Seed a large budget so the guard never blocks

curl -s -X POST http://localhost:8081/budget/set \
  -H "Content-Type: application/json" \
  -d '{
    "campaign_id":       "default",
    "daily_limit_usd":   99999999.0,
    "max_single_bid_usd": 500.0
  }' | jq .

Without this, the budget guard exhausts the $100 default in under a second at peak RPS and every subsequent bid is blocked — you'd be benchmarking the rejection path, not the bid path.

Step 4 — Escalating load stages

Run each stage for 30 seconds and observe participation_pct and budget_block_pct from /metrics between stages.

# Stage 1 — warm up (expect ~200k–350k RPS)
wrk -t4 -c200 -d30s -s wrk_bid.lua http://localhost:8080

# Stage 2 — moderate pressure (expect ~350k–550k RPS)
wrk -t8 -c600 -d30s -s wrk_bid.lua http://localhost:8080

# Stage 3 — high pressure (expect ~500k–750k RPS)
wrk -t10 -c1200 -d30s -s wrk_bid.lua http://localhost:8080

# Stage 4 — saturate all P-cores (run 4 wrk processes in parallel)
for i in 1 2 3 4; do
  wrk -t2 -c300 -d60s -s wrk_bid.lua http://localhost:8080 &
done
wait

Bottleneck Table

RPS Range	Symptom	Root Cause
< 200k	`participation_pct` drops, no errors	Budget guard Redis round-trip latency
200k–500k	Latency p99 climbs > 5ms	kqueue syscall overhead per accepted connection
500k–700k	`Too many open files` errors	`kern.maxfilesperproc` ceiling reached
700k–900k	Throughput plateaus, CPU ~95%	Single Tokio thread-pool saturated
> 900k	wrk reports socket errors	OS TCP backlog exhausted (`net.core.somaxconn` equivalent)

Why 10M RPS needs Linux

macOS kqueue  : 1 syscall per event notification
Linux io_uring: 4096 bid requests batched per single io_uring_submit() syscall
                → 4096× fewer kernel crossings per unit work
                → Glommio thread-per-core pins each shard to 1 physical core
                → no Tokio work-stealing, no cross-core cache misses

The M5's 4 performance cores are fully capable of 10M RPS in theory (each core at 2.4GHz can retire ~2.4B simple instructions/sec). The constraint is syscall frequency, not compute. Glommio + io_uring on Linux removes that constraint.

Production Deployment

Infrastructure Requirements

Component	Specification	Purpose
Bid server hosts	c6i.8xlarge (32 vCPU, 64GB RAM)	Glommio io_uring executor pool
Redis cluster	ElastiCache r7g.large (2-node)	Budget CAS + strategy pub/sub
Prometheus	m5.xlarge	Metrics aggregation across fleet
Agent hosts	Shared with API tier	Tokio runtime, infrequent LLM calls

Environment Variables

Variable	Required	Description
`ANTHROPIC_API_KEY`	Yes (for agent)	Claude API key. Without it, agent skips cycles silently (debug log). Bid path unaffected.
`REDIS_URL`	Recommended for prod	e.g. `redis://redis-cluster.internal:6379`. Without it, budget is local-only.
`AAMP_PUBLIC_HOSTNAME`	For win notices	e.g. `engine-1.prod.example.com:8081`. Enables `nurl`/`lurl` in bids.
`RUST_LOG`	Optional	e.g. `fast_path=info,aamp_protocol=info`. JSON structured output.

Kubernetes Deployment

# fast-path-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fast-path
spec:
  replicas: 20                      # 20 × 32 cores = 640 cores for bid evaluation
  selector:
    matchLabels:
      app: fast-path
  template:
    metadata:
      labels:
        app: fast-path
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port:   "8081"
        prometheus.io/path:   "/metrics"
    spec:
      containers:
        - name: fast-path
          image: your-registry/rtb-engine:latest
          ports:
            - containerPort: 8080   # Bid fast-path
            - containerPort: 8081   # AAMP API + Prometheus
          env:
            - name:  ANTHROPIC_API_KEY
              valueFrom:
                secretKeyRef:
                  name: anthropic
                  key:  api-key
            - name:  REDIS_URL
              value: redis://redis-cluster.internal:6379
            - name:  RUST_LOG
              value: fast_path=info,aamp_protocol=info
          resources:
            requests:
              cpu:    "30"          # Reserve 30 cores; 2 reserved for OS + agent
              memory: "32Gi"
            limits:
              cpu:    "32"
              memory: "64Gi"
          securityContext:
            capabilities:
              add:
                - IPC_LOCK            # io_uring locked-memory ring
                - NET_ADMIN           # XDP filter attachment
---
apiVersion: v1
kind: Service
metadata:
  name: fast-path-bid
spec:
  type:        LoadBalancer
  selector:
    app:       fast-path
  ports:
    - port:       80
      targetPort: 8080
      protocol:   TCP

Build the Production Image

# Multi-stage release build
docker build \
  -f docker/Dockerfile.dev \
  -t your-registry/rtb-engine:$(git rev-parse --short HEAD) \
  .

docker push your-registry/rtb-engine:$(git rev-parse --short HEAD)

Seed Campaign Budgets at Startup

Before receiving traffic, seed budgets via the AAMP API. Automate this in your campaign management pipeline:

for CAMPAIGN in campaign-001 campaign-002 campaign-003; do
  curl -X POST http://${ENGINE_HOST}:8081/budget/set \
    -H "Content-Type: application/json" \
    -d "{
      \"campaign_id\":         \"${CAMPAIGN}\",
      \"daily_limit_usd\":     5000.0,
      \"max_single_bid_usd\":  5.0
    }"
done

Observability Stack

# prometheus.yml — scrape all engine instances
scrape_configs:
  - job_name: rtb-engine
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: "true"
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: (.+)
        replacement: ${1}:8081

# Key alerts
# Budget exhaustion rate spike → agent auto-shading triggered
# Participation rate drop → agent investigating low win rate
# Agent cycle failure → fallback to last known strategy

Scaling and Operational Notes

Horizontal scaling: Each engine process independently connects to Redis for budget CAS. Strategy changes by the agent on any instance propagate via pub/sub within ~1-2ms to all instances. No shared mutable state outside Redis.

Budget reset: The midnight UTC reset runs on every instance simultaneously, resetting both the local AtomicI64 and the Redis key. First writer wins; subsequent writes are idempotent (SET to the same daily_limit value).

Agent coordination: In a 20-instance fleet, 20 agent instances each run every 5 minutes. Each reads the same Redis strategy data. To avoid redundant LLM calls, designate one instance as the "agent-primary" via a Redis lock, or accept that 20 concurrent cycles/5min = $0.04/day in LLM cost (still negligible).

XDP activation: On dedicated bare-metal or c6i instances with a known NIC name:

AAMP_XDP_INTERFACE=eth0 cargo run -p fast-path  # or set in EngineConfig
cargo xtask ebpf-build                           # compile eBPF program first

Graceful rolling deployments: The 100ms shutdown_timeout_ms window covers 99.9% of in-flight bid evaluations. In Kubernetes, set terminationGracePeriodSeconds: 5 to give the engine time to drain, deregister from the load balancer, and exit cleanly.

Development Reference

# Check entire workspace compiles (macOS, no Glommio)
cargo check --workspace

# Run all tests
cargo test --workspace

# Lint (clippy + fmt)
cargo xtask lint

# Build Linux Docker image
cargo xtask docker-build

# Run Linux container
docker compose -f docker/docker-compose.yml up

# Start with Redis (multi-process mode)
REDIS_URL=redis://localhost:6379 \
ANTHROPIC_API_KEY=sk-ant-... \
RUST_LOG=fast_path=debug,aamp_protocol=debug \
cargo run -p fast-path

Commercial Economics in Production

Three numbers determine whether an agentic system is profitable in production: the rate at which it completes tasks successfully, what each completed task costs to deliver, and what a buyer is actually willing to pay for one. This section works through all three for this engine using the real cost structure in the codebase.

Lever 1 — Task Completion Rate

In this engine, "task" has two distinct meanings depending on which layer you look at.

Bid pipeline completion rate is the percentage of incoming auction requests that result in a submitted bid. The engine blocks a bid in two cases: the floor price exceeds the campaign's max_single_bid_usd safety valve, or the daily budget is exhausted (try_spend returns false). Every blocked bid is a lost auction entry — no revenue opportunity, same infra cost. At healthy budget and conservative safety settings, participation should sit above 90%.

Agent cycle completion rate is the percentage of 5-minute optimization cycles that successfully read live metrics and write an updated strategy. A failed cycle is not catastrophic — the previous strategy stays in place — but over several consecutive failures the strategy drifts stale while market conditions change. The engine runs 288 cycles per day. At $0.011 per cycle (see Lever 2), a 10% failure rate wastes ~$3.80/day on retries that produce nothing and leaves strategy stale for ~29 of those cycles.

Lever 2 — Cost Per Task

LLM cost per optimization cycle

Each cycle makes 4–5 Anthropic API calls (one per tool call round-trip). Using claude-sonnet-4-6 pricing at $3/MTok input and $15/MTok output:

Token bucket	Tokens	Cost
System prompt + context	~1,200	$0.0036
Tool schemas (4 tools)	~400	$0.0012
Tool results (4 calls)	~600	$0.0018
Model output (tool calls + reasoning ≤280 chars)	~400	$0.0060
Cycle total	~2,600	~$0.013

At 288 cycles/day: ~$3.74/day, ~$112/month in LLM spend.

The shorter reasoning format (≤280 chars enforced in SetBidStrategyTool) directly reduces output token cost versus an unconstrained reasoning field. A 1,500-character reasoning block at $15/MTok costs ~$0.022 in output tokens alone — 3.7× more than the 280-char cap.

Infrastructure cost per cycle

A single c6i.8xlarge instance at AWS on-demand pricing ($1.536/hr) costs ~$1,106/month. At 10M RPS sustained, the per-bid infrastructure cost is effectively zero — fixed overhead amortized across billions of requests. The agent and AAMP API run on a dedicated 2-thread Tokio pool and consume less than 2% of available CPU.

With a 1-year reserved instance: ~$740/month.

Fully loaded monthly cost (single instance)

Item	On-demand	Reserved (1yr)
c6i.8xlarge	$1,106	$740
ElastiCache r7g.large (Redis, 2-node)	$210	$140
LLM (288 cycles/day)	$112	$112
Total	~$1,430/month	~$992/month

Lever 3 — Monetisable Value Per Task

The value the engine delivers is bid shading in first-price auctions. Since Google's 2019 shift and the subsequent industry follow, the majority of programmatic inventory now clears in first-price auctions. In a first-price auction, you pay exactly what you bid — there is no second-price safety net. Bidding floor × 1.10 (the default before the agent acts) systematically overpays.

The agent narrows the gap between your bid and the actual clearing price by continuously adjusting margin_multiplier based on observed pacing, budget health, and (once win/loss feedback is added — see backlog) actual clearing prices returned by SSPs.

What the engine saves today (without nurl/lurl win/loss feedback): The agent currently optimizes budget pacing — preventing over-spend that forces emergency bid-shading, and preventing under-delivery when budget is healthy. This alone is worth approximately 3–8% CPM reduction versus a static multiplier that reacts to nothing. The primary mechanism is preventing the two most expensive failure modes: burning the daily budget in the first two hours (no bids for the remaining 22 hours) and leaving budget unspent at day end.

What the engine saves with win/loss feedback (backlog item 1 — nurl/lurl): When SSPs call back with clearing prices, the agent gains the market signal it needs for genuine first-price bid shading. Industry benchmarks from DSPs with ML-based shading (The Trade Desk, Criteo, Xandr) show 15–25% CPM reduction versus naive fixed-multiplier bidding. That is the ceiling this architecture is designed to reach.

Break-Even Analysis

Managed ad spend	5% CPM saving	15% CPM saving	Monthly engine cost	Break-even at
$10k/mo	$500	$1,500	$1,430	15%
$30k/mo	$1,500	$4,500	$1,430	5%
$100k/mo	$5,000	$15,000	$1,430	<5%
$1M/mo	$50,000	$150,000	$1,430	<1%
$10M/mo	$500,000	$1,500,000	$1,430*	<0.1%

*Multi-instance fleet required above ~2–3M RPS sustained; cost scales linearly with instances, value scales linearly with managed spend. The ratio improves at scale.

The engine becomes net-positive at approximately $30k/month managed ad spend under today's pacing-only optimization. With win/loss feedback and full bid shading, break-even drops to roughly $10k/month. Below those thresholds, a managed DSP service is likely cheaper. Above them, the margin widens quickly.

Who Buys This

Independent DSPs and trading desks managing $1M–$500M in annual programmatic spend are the primary market. They currently pay for bid shading either via platform fees on managed DSPs (typically 8–15% of spend) or via in-house engineering teams maintaining static rule sets. This engine replaces the rule set with an agent-driven feedback loop at a fixed infrastructure cost.

Large advertisers that have brought programmatic in-house — CPG, retail, financial services — pay agency trading desk margins of 5–15% on managed spend to avoid building their own stack. At $5M/year managed spend, a 10% agency margin is $500k/year. The engine's all-in cost at that volume is under $20k/year.

Ad networks adding programmatic inventory to what was previously direct-sold. They need a bidding engine to participate in open auction without building from scratch. The AAMP API gives them a management surface; the agent handles strategy without a dedicated optimization engineer.

Ad tech infrastructure vendors building white-label DSP platforms. The engine is designed as an embeddable component — the aamp-protocol crate exposes the full state surface, and the agent is a background task that can be replaced or extended without touching the bid path.

What this engine is not suited for today: pure brand-awareness campaigns where impression count matters more than CPM efficiency, campaigns with sub-$10k/month managed spend (fixed costs dominate), and any buyer who needs guaranteed impression delivery (the engine bids but does not yet model win probability per SSP).

See BACKLOG.md for the prioritised feature backlog. Each task contains a self-contained implementation prompt.

The highest-priority unimplemented capability is win/loss feedback (Task 1 in the backlog): adding nurl/lurl to the Bid struct so SSPs can call back with auction clearing prices. Without this, the agent is optimizing against budget-pacing signals only. With it, the agent gains the market price signal needed for genuine first-price auction bid shading — the primary economic thesis of the project.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.cargo		.cargo
crates		crates
docker		docker
xtask		xtask
.gitignore		.gitignore
BACKLOG.md		BACKLOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation