feat: per-handler async dispatch (sync/async per route + group, all engines × all protocols)#302
Merged
Merged
Conversation
Add the registration-time API + router plumbing for per-handler async selection: - Route.Async(opt ...bool) / RouteGroup.Async(opt ...bool) with most-specific-wins precedence (route > group > server default). - node + staticEntry carry a resolved `async` bool; find/search thread it; addRouteWithAsync resolves it against the server default (Config.AsyncHandlers). 3-arg addRoute kept as a wrapper (no churn). - router.routeAsync(method,path) params-free resolver + hasAsyncRoutes() + asyncRouteCount for the engine to decide whether async infra is needed at all. - stream.AsyncRouteResolver interface, implemented by routerAdapter, so the H2 processor can pick inline vs pooled dispatch per stream. - Sub-groups inherit parent async; Route.Use preserves the async flag. Pure router/API layer — no engine behavior change yet. 11 unit tests cover precedence, sub-group inheritance, param routes, count accounting, unmatched paths, and the resolver interface.
- server: when any route opts into async dispatch, OR it into the resource Config handed to the engine so the async dispatch infrastructure is wired up (an async DB route never runs inline on the worker / blocks it). Public Server.AsyncHandlers() unchanged. - H2 processor: canRunInline now consults the AsyncRouteResolver — a stream whose matched route is async is forced onto the worker pool instead of running inline on the event loop. Gated by a HasAsyncRoutes() fast-out so pure-sync H2 servers pay zero added cost. Query string stripped to match router path semantics. - routerAdapter implements RouteAsync + HasAsyncRoutes. H1 dispatch is currently per-server (any async route → dispatch path for all conns on iouring/epoll); the per-request H1 worker-inline optimization is phase B. std is goroutine-per-request (per-route flag is a documented no-op). H2 gets true per-stream inline-vs-pool. Tests: 2 processor-level tests (-race) verify sync→inline, async→pool, and the no-async-routes fast-out.
End-to-end coverage on the std engine (cross-platform; runs locally + CI): mixed route surface (sync, .Async(), .Async(false) override, param+async, group async, group override) returns correct responses on both sync-default and async-default servers; concurrent hammering with no response cross-talk; keep-alive conn reused across sync+async routes. std runs every request on a net/http goroutine so .Async() is a no-op there — these prove the per-route API is wired end-to-end and harmless. The dispatch differential is covered by the H2 processor tests (per-stream) and the probatorium cluster matrix (iouring/epoll, linux).
test/integration/per_handler_async_test.go (//go:build linux): starts a real celeris.Server on iouring AND epoll with a mixed sync/async route surface and verifies correct responses over both HTTP/1.1 and h2c, under both sync-default and async-default servers, plus a concurrent both-protocols hammer asserting zero response cross-talk. Runs on linux CI + the cluster; complements the std integration tests (darwin/CI) and the H2 processor unit tests.
Clarify that Config.AsyncHandlers is the server-level default and that Route.Async / RouteGroup.Async override it per handler (route > group > default), that H/2 honors the override per-stream, and that WithEngine drivers consult the server-level flag (not per-route overrides) for auto-async — so set AsyncHandlers=true when combining WithEngine drivers with per-route async.
…phase B) Adds the protocol-layer mechanism for the H1 per-request worker-inline optimization, INERT until an engine opts in: - H1State.InlineMode + H1State.RouteAsync. When the engine runs ProcessH1 inline on the worker (InlineMode=true) and the parsed request's route is async, ProcessH1 stops before the handler, stashes the request (+ pipelined bytes) in state.buffer, and returns ErrAsyncDispatch (mirrors the ErrUpgradeH2C handoff). The engine then hands the conn to its dispatch goroutine to re-run with InlineMode=false. - ErrAsyncDispatch sentinel. Defaults (InlineMode=false, RouteAsync=nil) make this a no-op — current sync + per-conn-async behavior is unchanged. 4 unit tests cover sync-runs-inline, async-bails+stashes+re-runs, pipelined sync-then-async split, and no-bail-without-InlineMode. Engine wiring (iouring/epoll promotion) lands separately on top of a clean baseline nightly.
Wire the ErrAsyncDispatch groundwork into iouring + epoll so a server that mixes sync and async H1 routes runs SYNC routes inline on the worker / event loop (no goroutine handoff) and ASYNC routes on the per-conn dispatch goroutine — per request, not per server. Mechanism (both engines): - A fresh async-mode HTTP1 conn is no longer sent straight to the dispatch goroutine. The async-dispatch block now gates on cs.asyncPromoted; unpromoted conns fall through to the inline path. - The inline ProcessH1 runs with H1State.InlineMode=true (RouteAsync wired at initProtocol from the routerAdapter). On the first async route ProcessH1 stops before the handler and returns ErrAsyncDispatch with the request stashed in the H1 buffer. - The engine promotes the conn (sticky) and hands the stashed bytes (TakeBufferedBytes) to the dispatch goroutine; every later recv goes straight to the dispatch path. A preceding pipelined sync response already in cs.writeBuf is flushed in order by the dispatch goroutine (single writer — no concurrent-SEND hazard). - Inline ProcessH1 that leaves partial state (buffered headers / accumulating body) also promotes, so the partial-state parse paths never run inline. Pure-sync servers (no async routes) keep the existing inline fast path untouched (regression guard); pure-async servers promote on the first request (one-time cost). Slowloris on inline async-mode conns is covered by the tightened checkTimeouts sweep (OnHeaderDeadlineArmed stays nil in async mode to preserve SINGLE_ISSUER). Cross-vets clean for linux; validated end-to-end by the cluster nightly.
Per-handler Async()/AsyncHandlers + the doPrepare auto-enable based on hasAsyncRoutes() replaces every prior reason to flip this env at run time. The fallback path was only ever in epoll (iouring never read it), no driver or refapp set it, and it predated the Config.AsyncHandlers field. Removing it eliminates the last "env can silently change dispatch mode" surface and aligns epoll with iouring.
Three small fixes from the multi-angle review of PR #302: 1. stream.go resetAndPool: reset CachedRouteAsync alongside the other CachedRoute* fields so a recycled stream can't surface a stale per-route async decision under any future cache-hit path. 2. iouring/epoll: drop the per-engine private routeAsyncResolver interface and consume the public stream.AsyncRouteResolver directly. The public interface already requires both RouteAsync AND HasAsyncRoutes; gating the cs.h1State.RouteAsync wiring on HasAsyncRoutes() means pure-sync servers running with Config.AsyncHandlers=true skip the per-recv resolver call. Also removes the doc-attachment hazard where the interface declaration was splitting runAsyncHandler's godoc preamble from its function. 3. group.go: bring RouteGroup.Async godoc to parity with Route.Async — explicit precedence link, the "registration order matters" note, and the WS/SSE detach warning. All cross-vets + race tests clean.
This was referenced May 30, 2026
Resolves the entire review punch-list inline so v1.4.12 ships with the complete per-handler async story rather than spreading the work across two milestones. Changes: API (#303) - Add Route.Sync() / RouteGroup.Sync() paired methods. Reads more naturally at the call site than .Async(false) and is grep-friendly. - Lift the WS/SSE detach safety note to the FIRST sentence of every godoc so IDE hover-cards surface it. - Keep .Async(opt ...bool) intact (back-compat / single source of truth for the toggle accounting). Engine metrics (#306 G3) - Extend engine.EngineMetrics with AsyncRoutes (static, snapshotted at engine construction from routerAdapter.AsyncRouteCount) and AsyncPromotedConns (cumulative inline → dispatch promotions). - Wire iouring + epoll counters through their metrics struct; bump at both promotion sites (ErrAsyncDispatch + HasPendingData). - Adaptive Metrics() aggregates: AsyncRoutes mirrors a sub-engine (identical post-construction), AsyncPromotedConns sums both. Adaptive scaler (#306 G1) - adaptiveScalerSource.ActiveConns now sums BOTH sub-engines' active connections, not just the currently-active one. During a switch the old sub-engine continues to serve in-flight conns until clients close — the scaler must count them so desired-worker doesn't undershoot real CPU load. iouring inline-flush-before-promote (#307 L1) - handleRecv ErrAsyncDispatch branch now calls flushSend(cs) before promoteConnToAsync when cs.writeBuf holds an inline-handled response from a preceding pipelined sync request. flushSend self-gates on cs.sending / cs.zcNotifPending so the "one SEND in-flight per FD" invariant (PR #36) is preserved — when a SEND is already pending, it's a no-op and the dispatch goroutine's later flush picks up the bytes intact, preserving order. SQ pressure is non-fatal. Tests (#304) - internal/conn/async_dispatch_test.go: 4 new cases: * pipelined async→sync (inverse order) * body-bearing async POST (zero-copy body × InlineMode interaction) * InlineMode=true / RouteAsync=nil graceful degradation * HasPendingData partial-headers promotion path - protocol/h2/stream/processor_async_test.go: 3 new cases: * canRunInline gate ordering (EndStream / continuation / per-route-async — pins all four corners of the matrix) * streamRouteAsync strips ?query before resolution * streamRouteAsync gracefully handles missing :path / :method - test/integration/per_handler_async_test.go: 2 new cases: * Engine-level asyncPromoted stickiness via metrics (iouring + epoll): sync conn doesn't promote; first async hit promotes once; sticky for the conn's life across many pipelined reqs. * Single-conn keep-alive strict response ordering on iouring + epoll across a 12-request sync/async pattern. Docs (#305) - README: Configuration row for AsyncHandlers + a "Async Handlers (per-route)" subsection with the v1.4.11 → v1.4.12 migration note. - doc.go: new # Async Handlers section between Route Groups and Middleware, with code examples, precedence, engine matrix, and the WS/SSE safety warning. - example_test.go: ExampleRoute_Async + ExampleRouteGroup_Async. Validated: full module race-tests pass; cross-vets clean for linux; all 9 new unit/integration test cases green.
Multi-agent review of every section against the codebase. Changes: Highlights - Drop dead goceleris.dev/benchmarks link. - Qualify "Zero hot-path allocations" → "on the H1/H2 fast paths". - Replace the benchmarks bullet with a continuously-validated bullet pointing at probatorium (gives the badges actual context). Features - Add engine-integrated WebSocket bullet (UpgradeWebSocket + Hub). - Add per-handler async dispatch bullet (Route.Async / Route.Sync). - "always-on metrics" → "on-by-default (opt out via DisableMetrics)". - MaxRequestBodySize claim corrected — the bridge uses a fixed 100 MB cap, NOT the user-configured value. - Add a top-level TLS note (cleartext-only on native engines). Middleware table - compress: add deflate (4th encoding, not just zstd/brotli/gzip). - swagger: add ReDoc (3rd renderer alongside Swagger UI + Scalar). - store: clarify the package is in-memory only; Redis/Postgres/ memcached adapters live under session/ratelimit subpackages (the README was geographically wrong). Configuration / Async Handlers - Bridge body cap claim: drop the MaxRequestBodySize tie-in (it's actually a compile-time 100 MB constant in bridge.go). - Rewrite the "single-allocation handoff" sentence: TakeBufferedBytes does allocate; the *sentinel* is zero-alloc but the handoff itself involves a heap copy + goroutine spawn. Sticky semantics now spelled out explicitly. - H2 dispatch terminology corrected to "shared H2 worker pool (runtime.GOMAXPROCS*4 goroutines)". - Add adaptive compat note (both sub-engines honor the per-route flag). - Document the new AsyncRoutes / AsyncPromotedConns metrics via Server.EngineInfo().Metrics. Feature Matrix - Multishot recv: was wrong on kernel floor (6.0+ → 5.19+) AND on default — it requires CELERIS_IOURING_MULTISHOT_RECV=1 opt-in due to an aarch64 6.6.10 regression. Cell now reflects both. Benchmarks → new Continuous Validation section - Drop the second dead goceleris.dev/benchmarks reference; point at goceleris/loadgen + probatorium publish-results workflow. - Add a Continuous Validation section describing PR tier / nightly / weekend soak so the probatorium badges have prose context. Project Structure - 29 → 36 middleware packages (recounted from middleware/ subdirs). - Add missing top-level dirs: cmd/, driver/, validation/. - observe/ line: "Metrics collector" → "Collector" (the actual type). - celeristest/ line: surface NewContextT and With* options. Requirements - Clarify "Direct runtime dependencies" + add kernel version notes.
Folds dependabot #301 into the v1.4.12 PR so the whole release lands as one unit (rather than racing two PRs through the merge queue). The bump also lets go mod tidy drop two newly-unused indirect deps (kr/text and go.yaml.in/yaml/v2 — yaml stays as a transitive of the new common version at v2.4.4).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #300.
Per-handler async dispatch: choose sync vs async per route/group instead of the global
Config.AsyncHandlersall-or-nothing switch.API
Route.Async(opt ...bool)/RouteGroup.Async(opt ...bool)— most-specific-wins (route > group > server default); sub-groups inherit;Route.Usepreserves the flag.Config.AsyncHandlersstays as the server-level default (back-compat; additive API).Engines
ErrAsyncDispatchhandoff, mirrorsErrUpgradeH2C). Pure-sync servers keep the inline fast path untouched (regression guard); pure-async servers promote on the first request.canRunInlineconsults the route flag — sync → inline on the event loop, async → worker pool.HasAsyncRoutes()fast-out (zero cost on pure-sync H2).Tests
-raceclean; linux engines cross-vet clean.Validation
Two pristine 24-cell × 2-arch probatorium nightlies on the full matrix (every refapp × engine × arch), with two refapps migrated to exercise both async directions (observability sync-default +
.Async(); static_swagger_proxy async-default +.Async(false)):Notes
.Async(false)(detach is async by construction — documented).AsyncHandlers(not per-route overrides) for auto-async — documented.