Skip to content

feat: per-handler async dispatch (sync/async per route + group, all engines × all protocols)#302

Merged
FumingPower3925 merged 14 commits into
mainfrom
feat/per-handler-async-v1.4.12
May 30, 2026
Merged

feat: per-handler async dispatch (sync/async per route + group, all engines × all protocols)#302
FumingPower3925 merged 14 commits into
mainfrom
feat/per-handler-async-v1.4.12

Conversation

@FumingPower3925
Copy link
Copy Markdown
Contributor

Closes #300.

Per-handler async dispatch: choose sync vs async per route/group instead of the global Config.AsyncHandlers all-or-nothing switch.

API

  • Route.Async(opt ...bool) / RouteGroup.Async(opt ...bool) — most-specific-wins (route > group > server default); sub-groups inherit; Route.Use preserves the flag.
  • Config.AsyncHandlers stays as the server-level default (back-compat; additive API).

Engines

  • iouring + epoll, H1 — per-request: sync routes run inline on the worker (no handoff), async routes promote the conn to its dispatch goroutine (ErrAsyncDispatch handoff, mirrors ErrUpgradeH2C). Pure-sync servers keep the inline fast path untouched (regression guard); pure-async servers promote on the first request.
  • iouring + epoll, H2 — per-stream: canRunInline consults the route flag — sync → inline on the event loop, async → worker pool. HasAsyncRoutes() fast-out (zero cost on pure-sync H2).
  • std — net/http goroutine-per-request; the flag is a documented no-op.

Tests

  • 11 router/API unit tests; 2 H2 processor tests; 4 ProcessH1 inline-dispatch conn tests; 4 std integration tests; cross-engine linux integration test (iouring + epoll × H1 + h2c × sync/async-default + concurrent).
  • All local tests -race clean; linux engines cross-vet clean.

Validation

Two pristine 24-cell × 2-arch probatorium nightlies on the full matrix (every refapp × engine × arch), with two refapps migrated to exercise both async directions (observability sync-default + .Async(); static_swagger_proxy async-default + .Async(false)):

  • Baseline (API + server-profile H1 + H2 per-stream): run 26667024025 — all bug oracles 0.
  • Phase B (+ H1 per-request worker-inline): run 26669468270 — all bug oracles 0, tier3 clean, properties clean.

Notes

  • WS/SSE detach routes must not be marked .Async(false) (detach is async by construction — documented).
  • WithEngine drivers consult the server-level AsyncHandlers (not per-route overrides) for auto-async — documented.

Add the registration-time API + router plumbing for per-handler async
selection:
- Route.Async(opt ...bool) / RouteGroup.Async(opt ...bool) with
  most-specific-wins precedence (route > group > server default).
- node + staticEntry carry a resolved `async` bool; find/search thread
  it; addRouteWithAsync resolves it against the server default
  (Config.AsyncHandlers). 3-arg addRoute kept as a wrapper (no churn).
- router.routeAsync(method,path) params-free resolver + hasAsyncRoutes()
  + asyncRouteCount for the engine to decide whether async infra is
  needed at all.
- stream.AsyncRouteResolver interface, implemented by routerAdapter, so
  the H2 processor can pick inline vs pooled dispatch per stream.
- Sub-groups inherit parent async; Route.Use preserves the async flag.

Pure router/API layer — no engine behavior change yet. 11 unit tests
cover precedence, sub-group inheritance, param routes, count accounting,
unmatched paths, and the resolver interface.
- server: when any route opts into async dispatch, OR it into the
  resource Config handed to the engine so the async dispatch
  infrastructure is wired up (an async DB route never runs inline on
  the worker / blocks it). Public Server.AsyncHandlers() unchanged.
- H2 processor: canRunInline now consults the AsyncRouteResolver — a
  stream whose matched route is async is forced onto the worker pool
  instead of running inline on the event loop. Gated by a
  HasAsyncRoutes() fast-out so pure-sync H2 servers pay zero added cost.
  Query string stripped to match router path semantics.
- routerAdapter implements RouteAsync + HasAsyncRoutes.

H1 dispatch is currently per-server (any async route → dispatch path
for all conns on iouring/epoll); the per-request H1 worker-inline
optimization is phase B. std is goroutine-per-request (per-route flag
is a documented no-op). H2 gets true per-stream inline-vs-pool.

Tests: 2 processor-level tests (-race) verify sync→inline, async→pool,
and the no-async-routes fast-out.
End-to-end coverage on the std engine (cross-platform; runs locally +
CI): mixed route surface (sync, .Async(), .Async(false) override,
param+async, group async, group override) returns correct responses on
both sync-default and async-default servers; concurrent hammering with
no response cross-talk; keep-alive conn reused across sync+async routes.

std runs every request on a net/http goroutine so .Async() is a no-op
there — these prove the per-route API is wired end-to-end and harmless.
The dispatch differential is covered by the H2 processor tests
(per-stream) and the probatorium cluster matrix (iouring/epoll, linux).
test/integration/per_handler_async_test.go (//go:build linux): starts a
real celeris.Server on iouring AND epoll with a mixed sync/async route
surface and verifies correct responses over both HTTP/1.1 and h2c, under
both sync-default and async-default servers, plus a concurrent
both-protocols hammer asserting zero response cross-talk. Runs on linux
CI + the cluster; complements the std integration tests (darwin/CI) and
the H2 processor unit tests.
Clarify that Config.AsyncHandlers is the server-level default and that
Route.Async / RouteGroup.Async override it per handler (route > group >
default), that H/2 honors the override per-stream, and that WithEngine
drivers consult the server-level flag (not per-route overrides) for
auto-async — so set AsyncHandlers=true when combining WithEngine drivers
with per-route async.
…phase B)

Adds the protocol-layer mechanism for the H1 per-request worker-inline
optimization, INERT until an engine opts in:
- H1State.InlineMode + H1State.RouteAsync. When the engine runs
  ProcessH1 inline on the worker (InlineMode=true) and the parsed
  request's route is async, ProcessH1 stops before the handler, stashes
  the request (+ pipelined bytes) in state.buffer, and returns
  ErrAsyncDispatch (mirrors the ErrUpgradeH2C handoff). The engine then
  hands the conn to its dispatch goroutine to re-run with
  InlineMode=false.
- ErrAsyncDispatch sentinel.

Defaults (InlineMode=false, RouteAsync=nil) make this a no-op — current
sync + per-conn-async behavior is unchanged. 4 unit tests cover
sync-runs-inline, async-bails+stashes+re-runs, pipelined sync-then-async
split, and no-bail-without-InlineMode. Engine wiring (iouring/epoll
promotion) lands separately on top of a clean baseline nightly.
Wire the ErrAsyncDispatch groundwork into iouring + epoll so a server
that mixes sync and async H1 routes runs SYNC routes inline on the
worker / event loop (no goroutine handoff) and ASYNC routes on the
per-conn dispatch goroutine — per request, not per server.

Mechanism (both engines):
- A fresh async-mode HTTP1 conn is no longer sent straight to the
  dispatch goroutine. The async-dispatch block now gates on
  cs.asyncPromoted; unpromoted conns fall through to the inline path.
- The inline ProcessH1 runs with H1State.InlineMode=true (RouteAsync
  wired at initProtocol from the routerAdapter). On the first async
  route ProcessH1 stops before the handler and returns ErrAsyncDispatch
  with the request stashed in the H1 buffer.
- The engine promotes the conn (sticky) and hands the stashed bytes
  (TakeBufferedBytes) to the dispatch goroutine; every later recv goes
  straight to the dispatch path. A preceding pipelined sync response
  already in cs.writeBuf is flushed in order by the dispatch goroutine
  (single writer — no concurrent-SEND hazard).
- Inline ProcessH1 that leaves partial state (buffered headers /
  accumulating body) also promotes, so the partial-state parse paths
  never run inline.

Pure-sync servers (no async routes) keep the existing inline fast path
untouched (regression guard); pure-async servers promote on the first
request (one-time cost). Slowloris on inline async-mode conns is
covered by the tightened checkTimeouts sweep (OnHeaderDeadlineArmed
stays nil in async mode to preserve SINGLE_ISSUER). Cross-vets clean
for linux; validated end-to-end by the cluster nightly.
Per-handler Async()/AsyncHandlers + the doPrepare auto-enable based on
hasAsyncRoutes() replaces every prior reason to flip this env at run
time. The fallback path was only ever in epoll (iouring never read it),
no driver or refapp set it, and it predated the Config.AsyncHandlers
field. Removing it eliminates the last "env can silently change
dispatch mode" surface and aligns epoll with iouring.
Three small fixes from the multi-angle review of PR #302:

1. stream.go resetAndPool: reset CachedRouteAsync alongside the other
   CachedRoute* fields so a recycled stream can't surface a stale
   per-route async decision under any future cache-hit path.

2. iouring/epoll: drop the per-engine private routeAsyncResolver
   interface and consume the public stream.AsyncRouteResolver
   directly. The public interface already requires both RouteAsync
   AND HasAsyncRoutes; gating the cs.h1State.RouteAsync wiring on
   HasAsyncRoutes() means pure-sync servers running with
   Config.AsyncHandlers=true skip the per-recv resolver call. Also
   removes the doc-attachment hazard where the interface declaration
   was splitting runAsyncHandler's godoc preamble from its function.

3. group.go: bring RouteGroup.Async godoc to parity with Route.Async
   — explicit precedence link, the "registration order matters"
   note, and the WS/SSE detach warning.

All cross-vets + race tests clean.
Resolves the entire review punch-list inline so v1.4.12 ships with the
complete per-handler async story rather than spreading the work across
two milestones. Changes:

API (#303)
- Add Route.Sync() / RouteGroup.Sync() paired methods. Reads more
  naturally at the call site than .Async(false) and is grep-friendly.
- Lift the WS/SSE detach safety note to the FIRST sentence of every
  godoc so IDE hover-cards surface it.
- Keep .Async(opt ...bool) intact (back-compat / single source of
  truth for the toggle accounting).

Engine metrics (#306 G3)
- Extend engine.EngineMetrics with AsyncRoutes (static, snapshotted
  at engine construction from routerAdapter.AsyncRouteCount) and
  AsyncPromotedConns (cumulative inline → dispatch promotions).
- Wire iouring + epoll counters through their metrics struct; bump
  at both promotion sites (ErrAsyncDispatch + HasPendingData).
- Adaptive Metrics() aggregates: AsyncRoutes mirrors a sub-engine
  (identical post-construction), AsyncPromotedConns sums both.

Adaptive scaler (#306 G1)
- adaptiveScalerSource.ActiveConns now sums BOTH sub-engines' active
  connections, not just the currently-active one. During a switch the
  old sub-engine continues to serve in-flight conns until clients
  close — the scaler must count them so desired-worker doesn't
  undershoot real CPU load.

iouring inline-flush-before-promote (#307 L1)
- handleRecv ErrAsyncDispatch branch now calls flushSend(cs) before
  promoteConnToAsync when cs.writeBuf holds an inline-handled response
  from a preceding pipelined sync request. flushSend self-gates on
  cs.sending / cs.zcNotifPending so the "one SEND in-flight per FD"
  invariant (PR #36) is preserved — when a SEND is already pending,
  it's a no-op and the dispatch goroutine's later flush picks up the
  bytes intact, preserving order. SQ pressure is non-fatal.

Tests (#304)
- internal/conn/async_dispatch_test.go: 4 new cases:
  * pipelined async→sync (inverse order)
  * body-bearing async POST (zero-copy body × InlineMode interaction)
  * InlineMode=true / RouteAsync=nil graceful degradation
  * HasPendingData partial-headers promotion path
- protocol/h2/stream/processor_async_test.go: 3 new cases:
  * canRunInline gate ordering (EndStream / continuation /
    per-route-async — pins all four corners of the matrix)
  * streamRouteAsync strips ?query before resolution
  * streamRouteAsync gracefully handles missing :path / :method
- test/integration/per_handler_async_test.go: 2 new cases:
  * Engine-level asyncPromoted stickiness via metrics (iouring +
    epoll): sync conn doesn't promote; first async hit promotes
    once; sticky for the conn's life across many pipelined reqs.
  * Single-conn keep-alive strict response ordering on iouring +
    epoll across a 12-request sync/async pattern.

Docs (#305)
- README: Configuration row for AsyncHandlers + a "Async Handlers
  (per-route)" subsection with the v1.4.11 → v1.4.12 migration note.
- doc.go: new # Async Handlers section between Route Groups and
  Middleware, with code examples, precedence, engine matrix, and
  the WS/SSE safety warning.
- example_test.go: ExampleRoute_Async + ExampleRouteGroup_Async.

Validated: full module race-tests pass; cross-vets clean for linux;
all 9 new unit/integration test cases green.
Multi-agent review of every section against the codebase. Changes:

Highlights
- Drop dead goceleris.dev/benchmarks link.
- Qualify "Zero hot-path allocations" → "on the H1/H2 fast paths".
- Replace the benchmarks bullet with a continuously-validated bullet
  pointing at probatorium (gives the badges actual context).

Features
- Add engine-integrated WebSocket bullet (UpgradeWebSocket + Hub).
- Add per-handler async dispatch bullet (Route.Async / Route.Sync).
- "always-on metrics" → "on-by-default (opt out via DisableMetrics)".
- MaxRequestBodySize claim corrected — the bridge uses a fixed 100 MB
  cap, NOT the user-configured value.
- Add a top-level TLS note (cleartext-only on native engines).

Middleware table
- compress: add deflate (4th encoding, not just zstd/brotli/gzip).
- swagger: add ReDoc (3rd renderer alongside Swagger UI + Scalar).
- store: clarify the package is in-memory only; Redis/Postgres/
  memcached adapters live under session/ratelimit subpackages
  (the README was geographically wrong).

Configuration / Async Handlers
- Bridge body cap claim: drop the MaxRequestBodySize tie-in
  (it's actually a compile-time 100 MB constant in bridge.go).
- Rewrite the "single-allocation handoff" sentence: TakeBufferedBytes
  does allocate; the *sentinel* is zero-alloc but the handoff itself
  involves a heap copy + goroutine spawn. Sticky semantics now spelled
  out explicitly.
- H2 dispatch terminology corrected to "shared H2 worker pool
  (runtime.GOMAXPROCS*4 goroutines)".
- Add adaptive compat note (both sub-engines honor the per-route flag).
- Document the new AsyncRoutes / AsyncPromotedConns metrics via
  Server.EngineInfo().Metrics.

Feature Matrix
- Multishot recv: was wrong on kernel floor (6.0+ → 5.19+) AND on
  default — it requires CELERIS_IOURING_MULTISHOT_RECV=1 opt-in due
  to an aarch64 6.6.10 regression. Cell now reflects both.

Benchmarks → new Continuous Validation section
- Drop the second dead goceleris.dev/benchmarks reference; point at
  goceleris/loadgen + probatorium publish-results workflow.
- Add a Continuous Validation section describing PR tier / nightly /
  weekend soak so the probatorium badges have prose context.

Project Structure
- 29 → 36 middleware packages (recounted from middleware/ subdirs).
- Add missing top-level dirs: cmd/, driver/, validation/.
- observe/ line: "Metrics collector" → "Collector" (the actual type).
- celeristest/ line: surface NewContextT and With* options.

Requirements
- Clarify "Direct runtime dependencies" + add kernel version notes.
Folds dependabot #301 into the v1.4.12 PR so the whole release lands as
one unit (rather than racing two PRs through the merge queue). The bump
also lets go mod tidy drop two newly-unused indirect deps (kr/text and
go.yaml.in/yaml/v2 — yaml stays as a transitive of the new common
version at v2.4.4).
@FumingPower3925 FumingPower3925 merged commit 6980671 into main May 30, 2026
7 checks passed
@FumingPower3925 FumingPower3925 deleted the feat/per-handler-async-v1.4.12 branch May 30, 2026 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Per-handler async dispatch (sync/async per route + group, all engines × all protocols incl. H2)

1 participant