M4 epic: production hardening + E2E + load testing by brunota20 · Pull Request #20 · nullislabs/shepherd

brunota20 · 2026-06-25T17:51:00Z

M4 epic — production hardening, E2E, load testing

Builds on #18 (M3 epic). M4 takes the SDK + modules from M3 and hardens the runtime around them so a single Shepherd instance is operable as a production daemon: bounded resource use, supervised crash recovery, structured observability, and an end-to-end testnet harness.

Core deliverable

Area	What landed
Resource limits + non-exhaustive SDK enums	`crates/nexum-engine` enforces per-module fuel + memory budgets; SDK error enums made `#[non_exhaustive]` so SDK bumps don't silently drop arms in module code (COW-1029, COW-1036).
Auto-restart + graceful shutdown + poison-pill	`Supervisor` restarts crashed module instances behind a backoff; SIGTERM/SIGINT drain in-flight work; a module that crashes N times in a row is parked rather than restart-spammed (COW-1033, COW-1072, COW-1032).
WS reconnect + structured logging + Prometheus metrics	`runtime/event_loop.rs` rebuilds WS subscriptions after disconnect; `tracing` + JSON output across the engine; `/metrics` scrape endpoint with per-module counters (COW-1071, COW-1035, COW-1034).
Multi-chain isolation	Per-chain hosts + per-chain local-store namespacing; one chain's RPC failure does not bleed into another (COW-1073).
E2E testnet integration + deployment guide	`docs/operations/e2e-testnet-runbook.md` walks a full Sepolia round-trip; `docs/operations/deployment-guide.md` covers operator setup, env vars, log shipping, scrape config (COW-1064, COW-1030).
AppData resolver via orderbook	`shepherd-sdk::cow` resolves app-data digests through the orderbook resolver endpoint, removing the IPFS hard dependency on submit paths (COW-1074).
Orderbook error envelope forwarded	`HostError.data` now carries the orderbook's structured error envelope so module code can decode `OrderPostErrorKind` without re-parsing JSON (COW-1075).
EthFlow ExcessiveValidTo + TWAP calldata helper	`modules/ethflow-watcher` downgrades the known-benign `ExcessiveValidTo` drop to `Info` to stop polluting Warn metrics; `scripts/_twap_calldata.py` produces a fresh-`t0` TWAP fixture for the e2e harness (COW-1076, COW-1077).
Load-test harness + load-gen calibration	`tools/load-gen` + `tools/orderbook-mock` + `scripts/load-run.sh` drive baseline / medium / saturation runs against an Anvil fork; reports under `docs/operations/load-reports/` capture engine-side latency, error counts, dispatch-correctness against 5×5 / 20×20 / 50×50 watch×event grids (COW-1079, COW-1080).

Validation

cargo fmt --all -- --check clean.
cargo clippy --workspace --all-targets -- -D warnings clean.
cargo test --workspace --all-features — full suite green; load-test harness has its own tools/ test pass.
WASM matrix (wasm32-wasip2 --release) green for all modules in CI.
Live Sepolia smoke: e2e-testnet runbook walked end-to-end including a real eth_sendRawTransaction placement; load-reports under docs/operations/load-reports/ document engine behaviour at 5×5 / 20×20 / 50×50 grids.
Rustdoc -D warnings CI gate still clean after the additions.

Note on diff scope

Builds on the M3 epic (#18) and ultimately on M2 (#17) + your in-flight M1 PRs. Until those merge, the diff visible here includes their contents. Once #17 + #18 + the M1 PRs land, this rebases clean to M4-only against nullislabs:main. Each upstream PR is independent against nullislabs:main so you can merge in any order without forcing cross-branch rebases — the natural review/merge order is M2 → M3 → M4 → M5, but the dependency is logical (build-on-top) rather than git-mechanical.

To focus the M4 review, the M4-specific paths are:

crates/nexum-engine/src/{runtime,supervisor,host}/** (resource limits, supervisor restart, WS reconnect, multi-chain isolation, error envelope forwarding)
crates/nexum-engine/src/engine_config.rs (Prometheus + log config)
modules/ethflow-watcher/src/strategy.rs (ExcessiveValidTo downgrade)
modules/twap-monitor/src/strategy.rs (AppData resolver consumption)
crates/shepherd-sdk/src/cow/*.rs (AppData resolver, structured error envelope)
docs/operations/{e2e-testnet-runbook,deployment-guide}.md
docs/operations/load-reports/*.md
scripts/{e2e-onchain,load-run,load-bootstrap,load-teardown,_twap_calldata}.{sh,py}
tools/{load-gen,orderbook-mock}/**

Closes COW-1029, COW-1030, COW-1032, COW-1033, COW-1034, COW-1035, COW-1036, COW-1064, COW-1071, COW-1072, COW-1073, COW-1074, COW-1075, COW-1076, COW-1077, COW-1079, COW-1080.

Linear milestone: M4 - production hardening + E2E. Companions: #17 (M2), #18 (M3).

brunota20 · 2026-06-25T19:19:51Z

Heads-up: `bleu:dev/m4-base` (the head of this PR) was force-pushed today as part of a linearisation pass on our M2->M5 base stack. Old head was `ced9132b`; new head is `20e5df6`.

The branch is now a strict descendant of the rebased `dev/m3-base` (head of upstream PR #18). The pre-rebase M4 branch had diverged from the bleu mainline before the M3 epic's BLEU-851/854/855 host-trait port landed; rebasing onto the post-port M3 lifts M4 onto the macro-abstracted module shape that flowed through into M5. Result: the file-level diff vs the prior M4 head is non-trivial (modules now use `shepherd_sdk::bind_host_via_wit_bindgen!()` instead of hand-rolled per-module `WitBindgenHost` impls), but every per-commit intent is preserved.

Notable conflict resolutions during the rebase:

COW-1029 (`#[non_exhaustive]` on `HostErrorKind`/`LogLevel`): the wildcard `_ => Internal` / `_ => Info` fallbacks that COW-1029 added to each module's hand-rolled adapter moved into the SDK macro (`crates/shepherd-sdk/src/wit_bindgen_macro.rs`) — single source of truth, semantics unchanged.
COW-1074 (cow-swap appData resolve): the `cow_api_request` forwarder that COW-1074 added per-module also moved into the macro's `CowApiHost` impl.
balance-tracker M4 compliance diff was authored against the pre-port single-file module; the equivalent fixes are already in M3's BLEU-851 port (now part of the new M4 base), so the M4 commit's diff on that file collapsed to a no-op — content preserved via the inherited M3 port.

No content lost. Per-commit history preserved. Author identities preserved. Tests not re-run from this rebase — please flag if anything looks suspicious and we'll run a clean CI pass.

brunota20 · 2026-06-25T21:18:47Z

Fix-pass on the linearised stack: rebased dev/m4-base onto the new dev/m3-base and added a compile/doc-fix commit. Now green across all 4 gates.

New tip: 2eed4fe (was 20e5df6)
Rebase: 32 commits replayed onto ec90663 with zero conflicts.
Added commit fix(shepherd-sdk): add cow_api_request to chainlink StubHost + appData doc link:
- crates/shepherd-sdk/src/chain/chainlink.rs:150 was missing the new cow_api_request method on its local StubHost<Result<String, HostError>> impl after the macro forwarder landed (COW-1074 conflict resolution). E0046 fixed by adding a parallel unreachable!("not used in this test") body.
- crates/shepherd-sdk/src/cow/app_data.rs:19 had an unresolved intra-doc link [EMPTY_APP_DATA_JSON] (symbol used only as cowprotocol::EMPTY_APP_DATA_JSON inside the function). Fully-qualified the path.
- crates/shepherd-sdk/src/wit_bindgen_macro.rs fmt drift: cargo fmt collapses the cow_api::request(...).map_err(convert_err) chain to one line.
4 gates verified green at the new tip on a fresh detached worktree.

brunota20 · 2026-06-25T22:41:28Z

Audit-driven fix pass landed on dev/m4-base.

Before: 2eed4fe
After: 0b946ea

Fixes applied (3 commits, on top of an M2 + M3 rebase chain):

refactor(sdk): replace [u8; 32] with B256 across the resolve_app_data public surface. Rubric Major chore(deps): update wit-bindgen requirement from 0.53 to 0.57 #3 (protocol-ID newtype rule). twap-monitor and ethflow-watcher call sites drop the .0 reach-through.
refactor(cow-orderbook): extract DEFAULT_CHAINS const. Single source of truth for both Default::default() and from_config(). Audit duplication finding chore(deps): update wasmtime-wasi requirement from 41 to 45 #1.
chore(engine.e2e.toml): replace em-dash with ASCII hyphen. Rubric Major Migrate to nexum:[email protected] (unified error model, identity, capabilities) #6.

Audit reference: bruno-brain/wiki/projects/shepherd-audits/milestone-rubric-grant-audit-2026-06-25.md
Gates green: fmt, clippy -D warnings, cargo test --workspace --all-features, RUSTDOCFLAGS=-D warnings cargo doc.

Adds a `[workspace.dependencies]` table to the root manifest consolidating every dep used by 2+ crates across the full nullis- shepherd stack (anyhow, thiserror, tokio, futures, serde, serde_json, tracing, tracing-subscriber, strum, alloy-*, cowprotocol, reqwest, wit-bindgen, clap). Per-crate manifests inherit with `dep.workspace = true`, and may add features per call site via `dep = { workspace = true, features = ["extra"] }`. Single-consumer deps (wasmtime, toml, redb, getrandom, url, hex, axum, rand, ...) stay per-crate. Adds `[workspace.lints]` with light-touch defaults: `dbg_macro` and `todo` denied via clippy, `unsafe_op_in_unsafe_fn` warned via rust. `unsafe_code = deny` cannot be applied workspace-wide because every wit-bindgen guest module emits an `unsafe extern "C"` shim. Also pre-declares `auto_impl` and `derive_more` in the workspace deps table so future `Arc<dyn Trait>` boundaries and newtype-heavy crates can opt in without touching the root manifest. The version-drift failure mode (cowprotocol pinned to `1.0.0-alpha` in nexum-engine but `1.0.0-alpha.3` in shepherd-sdk, flagged in the 2026-06-25 audit) is now impossible by construction: every consumer inherits the single workspace pin. Audit reference: milestone-rubric-grant-audit-2026-06-25.md, judgment calls 1 + 3.

Replaces the `std::env::args().skip(1)` walker with a `#[derive(clap:: Parser)]` struct so the engine binary picks up `--help`, `--version`, proper argument validation, and structured error reporting for free. The positional surface is preserved one-for-one (`<wasm-path> [manifest-path]`); behaviour for callers that already pass two paths is identical. Help output now documents each argument inline rather than hiding the usage in an anyhow message that only fires on misuse. `clap.workspace = true` consumes the workspace dep added in the prior commit; no new direct version pin in this crate. Audit reference: milestone-rubric-grant-audit-2026-06-25.md, judgment call 2.

…irection A casual reader of `07-rpc-namespace-design.md` hitting the file top or the "Method Allowlisting" subsection could plausibly walk away believing the 0.2 runtime gates RPC methods on a read-only allowlist and intercepts signing methods to delegate them to the identity backend. The shipped host implementation does neither: `chain::request` forwards any method string through to the configured alloy provider. Adds an explicit `Status: Future direction (0.3+ target)` callout both at the file top and right above the "Method Allowlisting" subsection so the gap between design intent and shipped behaviour is visible without having to scroll the design narrative end-to-end. Audit reference: milestone-rubric-grant-audit-2026-06-25.md, judgment call 4.

Adds the dependencies the 0.2 host backends need: - cowprotocol (1.0.0-alpha) for the cow-api submission path (OrderBookApi, OrderCreation, OrderUid, Chain). - alloy-provider / -rpc-client / -transport-ws / -primitives (1.5) for the chain JSON-RPC dispatch. The reqwest feature on alloy-provider engages connect_http; the pubsub/ws features back eth_subscribe-class methods. - redb (2) for local-store. Same crate cowprotocol's own watch-tower picked, so the dep tree does not bifurcate when both are used in the same workspace. - reqwest (0.12, rustls-tls) — direct, so the import survives any future cowprotocol feature rearrangement. - tracing + tracing-subscriber (env-filter + fmt) — replaces the 0.1 eprintln! debug log so the engine can drop into a structured log pipeline without re-instrumenting every host call. - thiserror (2) — typed error enums in each backend. - tempfile + wiremock as dev-deps for the host backend tests. Adds engine.example.toml documenting the [engine] state_dir + per- chain RPC URLs the chain backend reads at boot; data/ is now ignored so a local run does not leave the redb file in tree.

Replaces the 0.2 Unsupported stubs with working backends. Each capability lives in its own host submodule so the trait impls in main.rs stay thin (dispatch + project the backend's typed error onto HostError). cow_api::submit_order - Parses the guest's bytes as JSON cowprotocol::OrderCreation. - Dispatches via cowprotocol::OrderBookApi::post_order. - Returns the assigned OrderUid as a 0x-prefixed hex string. cow_api::request - REST passthrough. The base URL is whichever URL the pool's OrderBookApi client carries — so OrderBookApi::new_with_base_url overrides (staging, wiremock) flow through transparently. - Method/path validated host-side; orderbook 4xx/5xx bodies are surfaced verbatim so the guest can decode {errorType,description}. chain::request - Raw JSON-RPC dispatch over an alloy DynProvider opened from engine.toml at boot. WebSocket URLs engage pubsub (eth_subscribe); HTTP URLs use the HTTP transport. Params are passed as serde_json::RawValue so alloy does not re-encode. - request-batch falls back to per-call dispatch (same shape as the earlier stub but now backed by real RPC). local_store - redb file under engine_config.engine.state_dir. - Single shared table. Per-module namespacing is enforced host-side via [len:u8][module_name][raw_key] prefix on every key. list_keys strips the prefix before returning to the guest. logging - Routes through tracing::event! tagged with module=<namespace>. - Engine boot installs an EnvFilter-based subscriber; RUST_LOG overrides the engine.toml log_level. identity / remote-store / messaging / http stay at Unsupported per the 0.2 roadmap (keystore / Swarm / Waku land in 0.3). Tests (14, all green): - cow_orderbook: pool default chains, unknown-chain typing, REST GET passthrough, relative-path resolution, unknown-method rejection, submit_order round-trip — last three under wiremock so the full HTTP path is exercised without hitting api.cow.fi. - provider_pool: empty pool surfaces UnknownChain. - local_store: roundtrip, namespace isolation, delete, list_keys prefix-stripping, empty-namespace rejection. End-to-end against modules/example: example.wasm loads under the new wiring, logs init + on_event through the tracing pipeline.

… death (BLEU-813-817)

…ME (BLEU-820)

…er-pool, supervisor (BLEU-821)

…interfaces (BLEU-819)

…ed_crate_dependencies, drop redundant map_err)

PR #9 specific: - main: warn + return when block/log streams end (WebSocket dropped) - supervisor: simplify dispatch_block by extracting chain_id before move - supervisor: temp_local_store returns (TempDir, LocalStore) instead of leaking - README: correct engine.toml chain syntax to [chains.<id>] with rpc_url Rebased from PR #8: - local_store_redb: table.range() instead of iter() for O(matching) keys - provider_pool: dedupe method clone on the success path - main: hex_encode writes into the pre-allocated buffer - cow_orderbook: drop blank line nit - manifest: collapse nested if and use ? operator (clippy) - alloy_rpc_client / alloy_transport(_ws) imports as _ to satisfy unused_crate_dependencies.

Move the manifest.rs monolith into a directory module with four focused submodules (types, load, capabilities, error). Includes the Subscription enum and the four PR #9 tests for subscription parsing. Behaviour unchanged - pure code motion.

main.rs went from 739 lines of mixed bootstrap + 8 Host trait impls + CLI parser + event loop to ~125 lines of pure orchestration. New layout: - bindings.rs: wasmtime::component::bindgen!() moved out so other modules can name the generated types. - cli.rs: Cli struct + manual parser. - host/state.rs: HostState + WasiView impl. - host/error.rs: unimplemented / internal_error / hex_encode helpers. - host/impls/{chain,cow_api,identity,local_store,remote_store,messaging, logging,clock,random,http,types}.rs: one Host trait impl per file. - runtime/limits.rs: DEFAULT_FUEL_PER_EVENT + DEFAULT_MEMORY_LIMIT. - runtime/event_loop.rs: open_block_streams, open_log_streams, run, wait_for_shutdown_signal, TaggedBlockStream, TaggedLogStream. Adding a new capability is now a single new file under host/impls/ rather than a 60-80 line diff in main.rs.

local_store_redb.rs was 89% tests, cow_orderbook.rs was 60%, and supervisor.rs was 32% (205 lines absolute). Promote each to a directory module with the test suite living in a sibling tests.rs so impl-side diffs stop competing with test churn for attention.

Wires the engine config, runbook, and report template for the 4-6 h E2E run on Sepolia with all 5 modules dispatched simultaneously. This is the integration step between unit-test coverage (MockHost, per-module strategy tests) and the COW-1031 7-day soak; the soak validates stability, this validates correctness in a live dispatch context. ## What this PR ships (scaffold, not run) - `engine.e2e.toml` — unified Sepolia config loading all 5 modules (twap-monitor + ethflow-watcher + price-alert + balance-tracker + stop-loss), separate `state_dir = ./data/e2e`, Prometheus `/metrics` enabled on 127.0.0.1:9100. Operator swaps in their Alchemy/Infura WS URL before launching the run. - `justfile` targets `build-e2e` + `run-e2e`. `build-e2e` reuses `build-m2 + build-m3` so the 5 wasm artefacts are produced in one go; `run-e2e` boots the engine pointed at `engine.e2e.toml` (no `--pretty-logs` so production-shape JSON logs are emitted, ready for jq mining). - `docs/operations/e2e-testnet-runbook.md` — full operator runbook mirroring the M2 + M3 shape. Sections cover RPC selection, on-chain prep (test EOA + Safe + stop-loss pre-sign), boot sequence + expected log shape, the three on-chain triggers that satisfy the per-module terminal-state markers, metrics capture, red flags to watch, and report filing. Acceptance bar from COW-1064 reproduced verbatim. - `docs/operations/e2e-reports/e2e-report.template.md` — empty report skeleton operator copies to `e2e-report-YYYY-MM-DD.md` at the start of each run and fills in as the run progresses. Sections: run metadata, chain coverage, on-chain actions submitted, per-module terminal-state markers, error-count deltas from Prometheus, anomalies, acceptance checklist, sign-off. ## What this PR explicitly does NOT do The 4-6 h run itself is operator-driven and cannot be exercised from CI: 1. Real Sepolia RPC keys (rate-limited public node will not survive a multi-hour run with 4+ eth_call per block). 2. Funded test EOA + ComposableCoW Safe access to submit a real conditional order (twap-monitor's only path to a `submitted:` marker). 3. EthFlow swap from a real EOA on Sepolia (ethflow-watcher's only path to a `submitted:` marker). 4. `setPreSignature` + sell-token allowance from the stop-loss `owner` EOA (stop-loss's only path to a `submitted:` marker that is not a typed `TransferSimulationFailed` warn). 5. 4-6 h wall clock + metrics-start.txt / metrics-end.txt capture. The runbook is unambiguous about what each step requires; the report template's section 8 is the gating sign-off for COW-1031 (7-day soak). ## Smoke-validation done before commit Booted `engine.e2e.toml` end-to-end against live Sepolia for 60+ s (kill -INT-style early shutdown): ``` INFO supervisor ready modules=5 chains=1 INFO log subscription open module=twap-monitor chain_id=11155111 INFO block subscription open chain_id=11155111 INFO log subscription open module=ethflow-watcher chain_id=11155111 DEBUG dispatch ok module=twap-monitor block_number=11088259 latency_ms=1 WARN price-alert: TRIGGERED answer=168110190000 threshold=250000000000 (Below) DEBUG dispatch ok module=balance-tracker block_number=11088259 latency_ms=271 WARN stop-loss retry on next block (0): orderbook error (TransferSimulationFailed): sell token cannot be transferred DEBUG dispatch ok module=stop-loss block_number=11088259 latency_ms=1802 ``` This proves: 5/5 modules init successfully, both log subscriptions + the block subscription open, the dispatch loop ticks against real Sepolia blocks, every module that has a block subscription dispatches on every block, and the real RPC + Chainlink decode + cow-api submit path is exercised inside seconds. The remaining acceptance bar (terminal markers on twap-monitor + ethflow-watcher, 1500-block run, 0-ERROR supervisor log) only the operator can produce. ## Workspace impact - No production-code changes (this is pure ops scaffolding). - `cargo fmt --all --check` clean. - `cargo build --target wasm32-wasip2 --release` produces all 5 module artefacts named exactly as `engine.e2e.toml` references them (twap_monitor.wasm, ethflow_watcher.wasm, price_alert.wasm, balance_tracker.wasm, stop_loss.wasm). Linear: COW-1064. Tenth M4 issue landed; stacks on #43 (COW-1073).

Adds `docs/production.md` — the operator handbook the production-hardening milestone has been pointing at since M2. Sister doc to `docs/06-production-hardening.md`: the existing file is the architecture / design rationale (resource model, restart policy, RPC resilience, logging + metrics design); this new one is the concrete operator handbook (unit files, backup recipes, alert rules, runbook procedures). Cross-referenced both ways. ## Sections 1. **Pre-flight checklist** — every box you need ticked before the first start: release-mode binary, persistent state dir, metrics on loopback, paid RPC, Prometheus + log pipeline, on-call runbook reference. 2. **systemd unit** — full `/etc/systemd/system/shepherd.service` with: dedicated `shepherd` user, SIGINT for graceful shutdown (30 s timeout — covers the COW-1072 last-block persistence path), `NoNewPrivileges`/`ProtectSystem=strict`, 2 G memory cap (defence in depth on top of wasmtime's 64 MiB / module), restart-on-failure with 5 s backoff. Install recipe + journalctl tail snippet. 3. **Docker Compose** — interim Dockerfile (multi-stage Rust build + Debian slim runtime, non-root, EXPOSE 9100, tini PID 1) + compose stack with bundled Prometheus, host loopback port mapping only, `stop_signal: SIGINT`, `stop_grace_period: 30s`, /metrics-based healthcheck. Marked interim because the official Dockerfile is a separate tracking issue. 4. **redb backup** — three operationally-supported paths: cold backup (systemctl stop + cp + start; byte-identical on graceful shutdown), hot backup (SIGSTOP + cp + SIGCONT pattern, safe because the on-disk format is consistent at any commit boundary), restore + `Database::check_integrity()`. Honest about the redb 2.6 surface — no in-process snapshot API today; flagged the roadmap. Retention policy: 7 daily / 4 weekly / 12 monthly = ~2.3 GiB on a 100 MiB store. 5. **Logs** — JSON shape on stdout, two-tier retention model (7 d hot full debug, 90 d cold INFO-only on S3 Glacier), Vector config sample for journald -> Loki/hot + journald -> S3/cold split. Sizing estimate based on the E2E run shape (5 modules × 1 dispatch/12 s ≈ 200 MiB/wk INFO+DEBUG combined). 6. **RPC selection** — provider plan recommendations (Alchemy/Infura/QuickNode tiers), capacity sizing per chain (1 block sub + N log subs + M `eth_call`/block where M grows with TWAP active orders), why public nodes are non-starters. 7. **Metrics + scraping** — complete metric surface table from `grep metrics::counter/histogram/gauge`: `shepherd_event_latency_seconds`, `shepherd_module_errors_total`, `shepherd_module_restarts_total`, `shepherd_module_poisoned`, `shepherd_chain_request_total`, `shepherd_cow_api_submit_total`, `shepherd_stream_reconnects_total`. Label set verified against the source. 15 s scrape interval recommended. 8. **Workload-class tuning** — light indexer / TWAP-style polling / multi-chain swarm classes with concrete fuel + memory numbers. Honest that the limits are compile-time constants today and per-module overrides via `[engine.limits]` are a 0.3 follow-up — operator can change `runtime/limits.rs` constants and rebuild, or ensure load fits within defaults. 9. **Alert rules** — full `prometheus-rules.yml` covering the seven alerts that map to the metric surface: - `ShepherdModulePoisoned` (page) — production module quarantined, needs operator action. - `ShepherdModuleTraps` (ticket) — pre-poison signal. - `ShepherdRpcErrorRate` (ticket) — > 5% RPC errors. - `ShepherdReconnectStorm` (ticket) — WS flapping. - `ShepherdCowApiErrorRate` (ticket) — > 20% orderbook errors over 15 min. - `ShepherdDispatchLatency` (ticket) — p95 > 5 s sustained. - `ShepherdDown` (page) — engine absent for 2 min. 10. **Operational runbook** — five common tasks: tail-per-module, reset poisoned module, add module to running deploy, inspect local-store, bump log level. Each task carries a concrete shell snippet. 11. **Pre-upgrade checklist** — CHANGELOG read, cold backup, stage binary, validate `supervisor ready modules=N chains=M`, swap binary, restart, watch 5 min. 12. **References** — links to the architecture doc, ADRs, runbooks, sister Linear issues. ## Drive-by fix The COW-1064 e2e report template (committed in PR #44) had two metric-label mistakes I caught while writing the metric surface table in section 7: - `result="ok|err"` -> `outcome="ok|err"` (the actual label name in `cow_api.rs` + `chain.rs`). - `reason="trap"` -> `error_kind="trap"` (the actual label name in `supervisor.rs`). Both labels appear in 4 places in the template (section 5 metric delta table + section 7 acceptance checklist). Fixed in-place rather than as a separate PR. ## Workspace impact - No code changes. - No new build dependencies. - `cargo fmt --all --check` clean. Linear: COW-1030. Eleventh M4 issue landed; stacks on #44 (COW-1064).

…-1064 dry run Reconfigures the M3 example modules' manifests to the pinned identities for the 2026-06-18 COW-1064 E2E dry run (Bruno's test EOA + Safe on Sepolia) and adds a `docs/operations/e2e-cow-1064- prep.md` companion to the runbook that captures every copy-paste-able value the operator needs to drive the on-chain side of the run without re-deriving any UID, address, or calldata. ## Module config pinning `modules/examples/stop-loss/module.toml`: - owner -> 0x7bF140727D27ea64b607E042f1225680B40ECa6A (test EOA) - sell_token -> WETH9 Sepolia (was a mainnet KNC address — bug that would have failed the orderbook accept regardless) - buy_token -> COW Sepolia (verified on-chain: name="CoW Protocol Token", symbol="COW", decimals=18) - sell_amount -> 0.005 WETH (fits 0.01 WETH wrap budget) - buy_amount -> 20 COW (conservative quote) - trigger_price -> $2000 (above the Sepolia Chainlink mocked answer ~$1681 so the strategy fires on the first block) `modules/examples/balance-tracker/module.toml`: - addresses -> EOA + Safe (was the hardhat default accounts) - change_threshold -> 0.001 ETH (was 0.1; lower so the small E2E gas-side transfers show as Warn diffs) ## OrderUid pinning + regression test `modules/examples/stop-loss/src/strategy.rs` gains `cow_1064_e2e_settings_yield_expected_uid`: an integration test that constructs `Settings` from the exact same constants as the new manifest and asserts the resulting `build_creation` UID against: 0xc2b9cb4ea1ee5a86d8049ac09d8f494bf04cca0a68407285f31e2e6379800be8 7bf140727d27ea64b607e042f1225680b40eca6a ffffffff (orderDigest || owner || validTo per packOrderUidParams.) If anything drifts — manifest values, EIP-712 type-hash, domain separator — the test fires before the run starts, not during the run. ## Run-prep punch list `docs/operations/e2e-cow-1064-prep.md` (~ 280 lines): 1. **Pinned identities table** — every address the runbook references (EOA, Safe, ComposableCoW, TWAP handler, EthFlow, GPv2Settlement, GPv2VaultRelayer, WETH, COW token, domain separator). All verified via `eth_getCode` on Sepolia before commit. 2. **Per-module config pinning** — stop-loss + balance- tracker effective values in table form. 3. **OrderUid decomposition** — orderDigest (32) + owner (20) + validTo (4) breakdown so an operator reading `setPreSignature` calldata can sanity-check the UID without redoing the EIP-712 math. 4. **Four on-chain actions** — each as a numbered step with the exact contract + function + arguments + Etherscan write-UI URL: - Action 1: wrap 0.01 ETH -> 0.01 WETH9 (optional, only for `submitted:` path; `backoff:` works without). - Action 2: setPreSignature + WETH allowance to GPv2VaultRelayer (optional, paired with action 1). - Action 3: TWAP create() via Safe TX Builder; the full 516-byte calldata pinned verbatim (selector 0x6bfae1ca + tuple-encoded ConditionalOrderParams + dispatch=true). - Action 4: EthFlow swap via cow-swap UI on Sepolia (UI-driven for the quote endpoint hit; calldata fallback link if UI flakes). 5. **Validation snippets** — `cast` invocations to check EOA + Safe balances, WETH balance, allowance, `preSignature(bytes)` lookup, and a `journalctl + jq` one-liner that tails per-module terminal markers in real time. 6. **Re-derivation recipes** — Python + `cargo test` commands to regenerate every pinned value if config drift ever forces a re-run with different identities. 7. **Per-run acceptance checklist** — 9 box-checks that double-pin section 7 of the e2e-report template, scoped to THIS specific run. ## Workspace impact - `cargo test -p stop-loss --lib` -> 8 passed (was 7; +1 for the new pinning test). - `cargo fmt --all --check` clean. - No production-code changes outside the test module. Linear: COW-1064. Twelfth M4 deliverable; stacks on #45.

Three-step automation that wraps the COW-1064 runbook + prep punch list into shell scripts. Operator workflow collapses to: cp scripts/env-template scripts/.env && $EDITOR scripts/.env scripts/e2e-run.sh scripts/e2e-onchain.sh ## … 4-6 h … scripts/e2e-finish.sh Secrets stay on disk. `scripts/.env` is gitignored; the engine config with embedded RPC URL is rendered into a gitignored `engine.e2e.local.toml` at boot time; the PK is read from `scripts/.env` by `cast send` and never echoed. ## Files - `scripts/env-template` (committed): every variable the scripts read, with comments. Operator copies to `scripts/.env` and fills in. - `scripts/lib.sh`: shared bash helpers — `log/warn/die`, `load_env`, `render_engine_config`, `state_value`, and the pinned address constants (EOA, Safe, ComposableCoW, TWAP handler, EthFlow, GPv2Settlement, GPv2VaultRelayer, WETH, COW, expected OrderUid). - `scripts/e2e-run.sh`: renders engine config, cleans data/e2e, builds 5 modules + engine in release, launches via nohup, waits ≤ 60 s for `supervisor ready modules=5 chains=1`, snapshots `metrics-start-<ts>.txt`. Persists PID + log path + start-ISO into `scripts/.state`. - `scripts/e2e-onchain.sh`: derives EOA from `OPERATOR_PRIVATE_KEY` + asserts it matches the pinned test EOA + asserts balance ≥ 0.02 ETH; then `cast send`s: 1. ComposableCoW.create() with the 516-byte pinned TWAP calldata → `ConditionalOrderCreated` → twap-monitor `watch:`. 2. EthFlow.createOrder() with the tuple built from the cow.fi `/api/v1/quote` response (via `_ethflow_quote.py`) → `OrderPlacement` → ethflow-watcher `submitted:`. If `RUN_OPTIONAL_PRESIGN=1` also runs WETH wrap + setPreSignature + GPv2VaultRelayer approval (for stop-loss on-chain settlement; the `submitted:{uid}` marker is produced regardless). Each tx hash appended to `scripts/.state` as `TX_<KIND>=<hash>`. - `scripts/_ethflow_quote.py`: small Python helper that POSTs to cow.fi Sepolia, gets feeAmount + quoteId + validTo + buyAmount, ABI-encodes the EthFlowOrder.Data tuple, and prints the calldata + msg.value for the shell script to consume. - `scripts/e2e-finish.sh`: snapshots `metrics-end-<ts>.txt`, sends SIGINT, waits ≤ 30 s for `graceful shutdown complete` in the log (COW-1072 path), escalates to SIGKILL after 30 s, then invokes the report generator. - `scripts/e2e-report-gen.sh`: parses the JSON-formatted engine log + metrics snapshots + state file into the e2e-report template's 9 sections. Auto-derives chain delta, per-module first marker, every `shepherd_*` counter / histogram delta, ERROR/trapped/poisoned tallies, and the per-row acceptance checklist (block-delta ≥ 1500, all-5-markers, zero traps, zero poison, zero ERROR, TWAP+EthFlow txs present). Writes `docs/operations/e2e-reports/e2e-report-<date>.md` ready for operator review. - `scripts/README.md`: one-time setup, run sequence, troubleshooting table, re-run recipe. ## .gitignore additions ``` *.local.toml # engine config rendered with embedded RPC key scripts/.state # run-state cache (PIDs, paths, tx hashes) scripts/.env # operator secrets (redundant with the existing .env / .env.* rules but explicit) docs/operations/e2e-reports/engine-*.log docs/operations/e2e-reports/metrics-*.txt ``` The auto-generated `e2e-report-<date>.md` is NOT gitignored — operator reviews + commits manually with `git add -f` (the report belongs in history; the raw log + metrics dumps don't). ## Why a separate render step `engine.e2e.toml` is committed with a public-placeholder RPC URL (`wss://ethereum-sepolia-rpc.publicnode.com`); `e2e-run.sh` substitutes `RPC_URL_SEPOLIA` from `.env` into a local file `engine.e2e.local.toml` (gitignored via `*.local.toml`) and points the engine at the local file. This means: - No secret ever lands in `git diff`. - The committed config still boots cleanly (against the public endpoint) for anyone cloning the repo who doesn't have a paid RPC key. - The render step is idempotent — re-running `e2e-run.sh` always overwrites the local file. ## Verification - `bash -n` syntax check on all 5 shell scripts: clean. - `python3 -c "ast.parse(...)"` on `_ethflow_quote.py`: clean. - `render_engine_config` smoke: produced `engine.e2e.local.toml` with the rpc_url line correctly substituted; diff showed exactly one line changed. Linear: COW-1064. Stacks on the existing PR #46.

`scripts/e2e-run.sh` was grepping for the pretty-printed `supervisor ready modules=5 chains=1` flat string. Without `--pretty-logs` (which production-shape JSON deliberately omits) the engine emits {"message":"supervisor ready","modules":5,"chains":1,...} so the grep never matched and the script died at the 60 s deadline even though the engine was already healthy and dispatching blocks (the nohup'd engine stayed alive detached; the wrapper just couldn't see it). Fix: extended the grep to two JSON-field-order alternatives (`modules` before `chains` and vice versa, since the JSON serialiser does not guarantee field order across releases). Bumped the deadline to 90 s because cold-start of the wasm component compile + first RPC handshake on a paid endpoint can comfortably take 30-40 s on a fresh checkout. Linear: COW-1064 (run-prep regression caught live during the 2026-06-18 dry run).

macOS ships /usr/bin/bash at version 3.2.57 due to GPLv3 licensing; `${var,,}` lowercase expansion is bash 4+ only. The EOA-match check died with `bad substitution` on first invocation against the live Sepolia run. Routed both sides of the comparison through `tr '[:upper:]' '[:lower:]'` which is POSIX-portable. Grepped the rest of scripts/ for other `${var,,}` / `${var^^}` constructs — none found, so this was the only impacted site. Linear: COW-1064 (run-prep regression caught live).

… (COW-1064) Two fixes caught live during the 2026-06-18 dry run: 1. `_ethflow_quote.py` imports eth_abi + eth_utils + eth_hash; these are not in the Python stdlib and the script was failing with `ModuleNotFoundError: eth_abi` after the TWAP tx had already landed on Sepolia. Added a pre-flight `python3 -c 'import eth_abi, eth_utils, eth_hash.auto'` at the top of e2e-onchain.sh that fails loudly with the exact `pip3 install` command the operator needs. 2. Re-running e2e-onchain.sh after a partial failure would re-submit the TWAP create() (different nonce → new tx → same salt → ComposableCoW reverts) AND re-fetch a new EthFlow quote (new feeAmount/quoteId → new tx → wastes ETH on duplicate orders). Added idempotency: each action is wrapped in `if existing="$(state_value TX_*)"; then skip`, so the script picks up exactly where it left off using the tx hashes already persisted in scripts/.state. Acceptance: the dry run had TX_TWAP already in .state (manual recovery write); re-running now skips TWAP and only attempts EthFlow. Linear: COW-1064 (run-prep regressions caught live).

The CoW orderbook's `/quote` endpoint rejects the native-ETH sentinel `0xEeee…EEeE` with `InvalidNativeSellToken`. EthFlow orders are still quoted with the *wrapped* form (WETH9 Sepolia) as the sell side; the EthFlow contract itself does the wrap from msg.value on `createOrder`. Verified end-to-end: `python3 _ethflow_quote.py <EOA> 5000000000000000` returns a 292-byte calldata + VALUE_WEI on the live Sepolia orderbook (fee_amount ≈ 0.000308 ETH, buy_amount ≈ 0.192 COW, quote_id 1519204). Linear: COW-1064.

…ns (COW-1064) tracing-subscriber's JSON formatter writes `message` / `module` / `block_number` / `target` at the top level of the event object (no nested `fields`); the parser was looking inside `fields` and finding nothing. Added an `event_field(ev, key)` helper that checks both top-level and nested-fields shapes. Replaced the single substring list with a per-module pattern map, derived from the host.log() call sites inside modules/*/src/strategy.rs. Specifically: twap-monitor -> "watch:", "indexed watch:", "poll watch:" ethflow-watcher -> "ethflow submitted", "ethflow backoff", "ethflow dropped", "already submitted" price-alert -> "TRIGGERED" balance-tracker -> "changed +", "changed -" (per-block "0x<addr> changed +N wei ..." diff log) stop-loss -> "TRIGGERED", "retry on next block", "stop-loss submitted", "stop-loss dropped", "already submitted", "submitted:" Verified against the live 2026-06-18 dry run's engine log: all 5 modules surface ≥ 1 terminal marker. Linear: COW-1064 (run-prep regression caught live during the T+12-min mark of the run).

…okup (COW-1074) Closes the gap surfaced by the COW-1064 dry run (2026-06-18): TWAP orders created through cow-swap UI sign with a non-empty `appData` hash pointing at a richer JSON document (partner-id, slippage settings, quote-id). twap-monitor hard-coded `EMPTY_APP_DATA_JSON` when assembling `OrderCreation`, so the orderbook rejected every submit with `invalid OrderCreation: app_data JSON digest does not match signed app_data hash` and the watch sat in retry-loop forever. The WIT already exposes `cow-api::request(method, path, body)` as a generic REST passthrough. We surface that capability on the SDK trait, wrap it in a typed helper, and use the helper from the strategy. No new host imports, no WIT ABI change, no forced rebuild of unrelated modules. Extended `CowApiHost` with: ```rust fn cow_api_request( &self, chain_id: u64, method: &str, path: &str, body: Option<&str>, ) -> Result<String, HostError>; ``` 404 responses surface as `HostError { code: 404, kind: Unavailable }` so callers can distinguish "orderbook does not have this resource" from genuine upstream failures without introducing a new `HostErrorKind` variant (the existing enum is `non_exhaustive`, but adding a variant on the WIT side would still need an ABI bump). `resolve_app_data(host, chain_id, hash) -> Result<String, HostError>` with: - Short-circuits `EMPTY_APP_DATA_HASH` (`keccak256("{}")`) to `EMPTY_APP_DATA_JSON` (`"{}"`) — no host call needed. - Otherwise GETs `/api/v1/app_data/{hex_hash}` and pulls the `fullAppData` field out of the orderbook's envelope shape (`{"fullAppData": "<JSON string>", ...}`). - 5 unit tests pinning the short-circuit, the success path, the unexpected-shape fall-through, the 404 propagation, and the hex encoder. Extended `MockCowApi` with: - `respond_to_request_for(method, path, result)`: per-key programmable response. - `respond_to_request(result)`: catch-all default. - `request_calls()`: records the (chain_id, method, path, body) tuple for every invocation. The existing `respond` / `calls()` / `submit_order` surface is unchanged. `modules/twap-monitor/src/lib.rs`, `modules/ethflow-watcher/src/lib.rs`, `modules/examples/price-alert/src/lib.rs`, and `modules/examples/stop-loss/src/lib.rs` each gained the trivial 8-line forwarder to the generated `cow_api::request` binding. Example modules implement `CowApiHost` purely for the `Host` blanket-impl supertrait even though some don't actively submit orders — the impl is symmetrically extended. `build_order_creation` now takes the resolved `app_data_json` as an explicit parameter (was hard-coded to `EMPTY_APP_DATA_JSON`). The resolution itself is lifted into the caller `submit_ready`, which calls `shepherd_sdk::cow::resolve_app_data` before assembling the `OrderCreation`. Two graceful-fallback branches: - `err.code == 404` → log Warn "appData hash not mirrored on orderbook" + leave the watch in place. Operators can re-trigger by pinning the document via a future orderbook PUT or by re-creating the order with empty appData. - Any other resolver error → log Warn "appData resolve failed" + leave the watch. Future retry on the next block re-attempts the lookup. Two new strategy tests: - `poll_ready_resolves_non_empty_app_data_then_submits`: programs MockHost with a known JSON + its hash on the order, asserts the full resolve → submit → `submitted:` marker flow. - `poll_ready_skips_submit_when_app_data_hash_not_mirrored`: programs MockHost to 404, asserts no submit attempt, no `submitted:` / `dropped:` markers, Warn log line present. Plus one updated test (`build_order_creation_accepts_matching_non_empty_app_data`) that pins the new "matching hash → JSON" success path directly on `build_order_creation`. - `cargo test --workspace` → 13 + 12 + 16 + 32 + 8 + 8 + 61 (engine) + 23 (twap-monitor) + 7 doctests + 1 integration = 181 tests passing (was 174; +5 SDK +2 twap-monitor). - `cargo clippy --all-targets --workspace -- -D warnings` clean. - `cargo fmt --all --check` clean. - All 4 production module .wasm artefacts build cleanly with the new SDK trait. - No WIT changes. Modules built against the prior SDK trait will fail to compile (the new method is required), but the WIT-generated wasm-side surface is bit-identical. - No host-impl changes (`crates/nexum-engine/src/host/ impls/cow_api.rs`). The host already implements `request` for the wit-bindgen binding. - No metric surface drift. The orderbook lookup goes through the same `shepherd_cow_api_*` counters via the existing `request` path. Linear: COW-1074. Stacks on the COW-1064 run-config branch (#46). Validated locally end-to-end via `cargo test --workspace`; live validation against the running engine will happen on the next COW-1064 dry run (engine restart required to pick up the rebuilt modules).

…-1074) Symmetric extension of the twap-monitor fix in this PR. The ethflow-watcher strategy's `build_eth_flow_creation` hard-coded `EMPTY_APP_DATA_JSON` exactly like twap-monitor did; any OrderPlacement event whose embedded `GPv2OrderData.appData` hash differs from `keccak256("{}")` (i.e. every cow-swap UI EthFlow swap) would hit "app_data JSON digest does not match signed app_data hash" and be silently skipped client-side. The COW-1064 dry run didn't surface this for the EthFlow tx I fired via `scripts/e2e-onchain.sh` — because that script's helper sets `appData = EMPTY_APP_DATA_HASH` — but a cow-swap UI EthFlow swap (which is the realistic production path) would. ## Changes - `build_eth_flow_creation` now takes `app_data_json: String` alongside `chain_id` and `placement`. Docstring updated to reference COW-1074. - `submit_placement` calls `shepherd_sdk::cow::resolve_app_data` before `build_eth_flow_creation`; on 404 logs a Warn "ethflow submit skipped (sender=...): appData hash not mirrored on orderbook" and returns Ok (no marker written, no submit attempt). - 6 test call sites updated to pass `cowprotocol::EMPTY_APP_DATA_JSON.to_string()` explicitly, preserving the existing assertions verbatim. - 2 new integration tests: `placement_with_non_empty_app_data_resolves_then_submits` `placement_skips_submit_when_app_data_hash_not_mirrored` mirror the twap-monitor pair, programming MockHost with a synthetic appData JSON + hash, asserting the resolve → build → submit chain produces a `submitted:{uid}` marker and that 404 produces a Warn-only skip. ## Workspace impact - `cargo test -p ethflow-watcher` → 14 tests passing (was 12; +2 from this commit). - `cargo test --workspace` → 183 tests passing total (was 181 after the twap-monitor commit; +2 ethflow-watcher). - `cargo clippy --all-targets --workspace -- -D warnings` clean. - `cargo fmt --all --check` clean. Linear: COW-1074 (extended scope — same gap in ethflow-watcher).

Captures the 2026-06-18 COW-1064 dry run + live in-flight validation of PR #47 (resolve_app_data fix). ## Acceptance summary 5 of 6 rows green; the only [ ] is `block delta ≥ 1500` (got 415) because the run was intentionally interrupted twice to validate PR #47 against the same data/e2e local-store across pre-PR-47 + PR-47-twap-monitor + PR-47-ethflow-watcher commits. | Row | Result | |---|---| | block delta ≥ 1500 | [ ] (got 415; 3 engine restarts for PR #47 mid-run validation) | | all 5 modules have a terminal marker | [x] | | shepherd_module_errors_total{trap} == 0 | [x] | | no module poisoned | [x] | | 0 ERROR lines from nexum_engine | [x] | | TWAP + EthFlow tx submitted | [x] | ## 4 anomalies filed in Linear, fully documented in §6 - COW-1074 — twap-monitor + ethflow-watcher hardcoded EMPTY_APP_DATA_JSON. **Fixed in-run via PR #47**; live-validated for both modules (§6.5). - COW-1075 — SDK classify_api_error should map `DuplicatedOrder` -> `Drop` (stop-loss retry loop). - COW-1076 — ethflow on-chain `validTo=uint32::MAX` rejected by Sepolia orderbook (`ExcessiveValidTo`; upstream issue). - COW-1077 — scripts/e2e-onchain.sh TWAP `t0=0` produces permanently-finished order (caller-side encoding bug). ## Live PR #47 validation (§6.5 — the key methodology note) Three engine binaries exercised on the same redb local-store: 1. `5bcd47b` (pre-PR-47): surfaces the digest-mismatch client-side skip for both twap-monitor + ethflow-watcher on non-empty appData orders. 2. `acc9654` (PR #47 twap-monitor): existing cow-swap UI TWAP re-polls to Ready -> resolve_app_data resolves the JSON from `/api/v1/app_data/{hash}` -> submit reaches orderbook -> DuplicatedOrder (server-side reject only). Client-side digest check bypassed. 3. `cd68de0` (PR #47 ethflow-watcher): new cow-swap UI EthFlow swap (`0x82da5ced...`) observed -> appData = `0xe46e7d0c...` (NON-empty rich JSON: appCode="CoW Swap", slippageBips=857, smartSlippage=true) -> resolve_app_data calls orderbook -> JSON extracted from `fullAppData` field -> build produces matching-digest body -> submit reaches orderbook -> ExcessiveValidTo (server-side reject only, tracked separately in COW-1076). The PR #47 fix is therefore live-validated end-to-end against the real Sepolia orderbook in **both** affected modules. ## What this report unblocks COW-1031 (7-day soak) is technically unblocked: the engine + 5-module dispatch is proven correct under live conditions; PR #47 closes the only blocking SDK gap for the soak's TWAP + EthFlow coverage. The remaining 3 follow-ups (COW-1075/1076/1077) are quality-of-output rather than correctness regressions and do not block the soak. Operator sign-off pending in §8. Linear: COW-1064 (closes).

…COW-1075) `OrderBookPool::submit_order_json` returns `CowApiError::Orderbook(cowprotocol::Error::OrderbookApi { status, api })` for any 4xx with a typed `{"errorType": "...", ...}` body (see `cowprotocol::transport::HttpResponse::into_status_error`). The WIT adapter was dropping `api` on the floor (`data: None`), so the guest's `shepherd_sdk::cow::classify_api_error` always saw `None` and fell back to its safe-default `TryNextBlock`. Permanent rejections like `DuplicatedOrder`, `InvalidSignature`, or `ExcessiveValidTo` therefore looped forever, masquerading as transient failures. Root cause of the stop-loss infinite-retry behaviour observed in the 2026-06-18 COW-1064 dry run (e2e-report-2026-06-18.md §6.3): 76 retries of an already-submitted order in 170 blocks because the host never let the guest see what the orderbook actually said. Fix is in the WIT adapter (`crates/nexum-engine/src/host/impls/cow_api.rs`), not the SDK classifier. The classifier already handles `Unknown(_)` -> `Drop` correctly via its `Some(_) => Drop` branch; it just needed the envelope to dispatch on. Extracted the projection into a testable `orderbook_to_host_error` helper that: - serialises `ApiError` into `HostError.data` as JSON when the variant is `OrderbookApi { status, api }` (the only variant carrying a structured payload), - sets `code` to the HTTP status so guests can disambiguate 4xx vs 5xx, - leaves `data: None` for other `cowprotocol::Error` variants (transport, serde, unexpected-status) since they have no envelope and `TryNextBlock` is the correct safe default for them. Tests: - `orderbook_to_host_error` unit tests cover the envelope-forward, the optional inner `data` round-trip, and the non-envelope `UnexpectedStatus` branch (3 cases). - New wiremock integration test `submit_order_propagates_orderbook_envelope` confirms a 400 with `errorType: "DuplicatedOrder"` surfaces the `OrderbookApi` variant end-to-end through `OrderBookPool::submit_order_json`. All 13 cow-api-adjacent tests pass; workspace tests untouched.

…1076) EthFlow on-chain orders use `validTo = u32::MAX` by design (see `cowprotocol::eth_flow`). The Sepolia orderbook's max-validTo cap rejects this shape with `errorType = "ExcessiveValidTo"`, and after the COW-1075 host fix the strategy already classifies it correctly as Drop. The remaining gap was operator ergonomics: every EthFlow placement on Sepolia produced a Warn-level "ethflow dropped" line, which would dominate a 7-day soak dashboard with non-anomalous traffic. Change: in `apply_submit_retry`'s Drop arm, peek at the decoded ApiError. If the orderbook's `errorType == "ExcessiveValidTo"`, log at Info instead of Warn. All other Drop reasons (InvalidSignature, WrongOwner, etc.) keep Warn so real anomalies still page the operator. Dispatch (write `dropped:{uid}`, clear stale `backoff:{uid}`) is unchanged. Why not gate on more (e.g. inspect the order's validTo field): the strategy already filters logs to EthFlow contract addresses; ExcessiveValidTo from the orderbook for an EthFlow placement is unambiguously the documented constraint. Keeping the gate narrow avoids accidentally suppressing other-cause Warns. Tests (3 new in `modules/ethflow-watcher/src/strategy.rs`): - `submit_excessive_valid_to_logs_at_info_not_warn`: end-to-end through `on_logs`; confirms exactly one drop line at Info level and zero Warn drops for this case. - `submit_other_permanent_error_still_logs_at_warn`: regression guard - InvalidSignature stays at Warn. - `submit_drop_without_envelope_keeps_warn_level`: predicate-level unit test confirming `is_expected_excessive_valid_to` returns false when `HostError.data` is None (e.g. transport failure). Docs: added "Known upstream constraints on Sepolia" section to `docs/operations/e2e-testnet-runbook.md` documenting this gap, the post-fix operator-visible behaviour, the Prometheus signal (`shepherd_cow_api_submit_total{outcome=\"err\"}` grows by the EthFlow placement count then stops), and a pointer to COW-1076 for the upstream-confirmation status. Soak impact: the COW-1031 7-day run on Sepolia will now show ExcessiveValidTo drops as Info-level traffic. The soak's "0 unexpected errors" acceptance bar is preserved because Warn-level drops only fire on real anomalies. All 17 ethflow-watcher tests pass (+3 new); workspace tests untouched. clippy + fmt clean.

The previous `e2e-onchain.sh` pinned a 516-byte hex blob with `t0 = 0` in the ComposableCoW.create() static-input tuple. TWAP handler's `validateData` does NOT reject `t0 = 0` (it only checks `t0 >= type(uint32).max`), so the `create()` tx succeeded but `TWAPOrderMathLib.calculateValidTo` then computed `part = (block.timestamp - 0) / t = ~3.3M`, which is far above the configured `n = 2` and triggers `AFTER_TWAP_FINISHED` on every `getTradeableOrderWithSignature` poll. Surfaced in the COW-1064 dry run (2026-06-18 report §6.4): supervisor logged the `0xc8fc2725...after twap finished` revert per block. Fix: - New `scripts/_twap_calldata.py` encodes the calldata fresh on every invocation with `t0 = int(time.time()) - 60` (backdated 60s so part 0 is Ready as soon as the order is on-chain). Module docstring explicitly warns against re-hardcoding t0. - `scripts/e2e-onchain.sh` Action 1 now shells out to the helper rather than carrying the hex inline. Validates the output is hex-shaped before passing to `cast send`. - `docs/operations/e2e-cow-1064-prep.md` section 2.3 step 3 replaces the pinned blob with a `python3 scripts/_twap_calldata.py` recipe and a historical note pointing at COW-1077. - `docs/operations/e2e-cow-1064-prep.md` section 4.2 recipe gets `import time` + `int(time.time()) - 60` for `t0` so the re-derivation flow does not re-introduce the bug. - `scripts/README.md` Action 1 description updated to mention the helper. Constants in the helper (sell/buy tokens, amounts, n, t, salt) mirror the prep doc's section 4.2; both must change in lockstep if the TWAP shape is retargeted. Validation: `python3 scripts/_twap_calldata.py` produces 516-byte calldata (1034 hex chars) starting with the correct selector `0x6bfae1ca`; the t0 word reflects current epoch (verified against `0x00...006a3537b5` on the smoke run). `bash -n scripts/e2e-onchain.sh` passes. No engine-side changes; this is a script-and-docs PR.

mfw78 review of PR #8 (nullislabs#8) flagged "we already pull alloy, so pulling hex via there is really not much of a deal". The PR #47 (COW-1074) commit acc9654 then introduced two new custom hex helpers that recreate the same antipattern at a different scope: - `crates/shepherd-sdk/src/cow/app_data.rs::encode_hex` - 32-byte hash → `0x...`. Used by `resolve_app_data` to format the orderbook lookup path. - `modules/twap-monitor/src/strategy.rs::hex_short` - 8-byte prefix → `0x...…`. Used to format `appData` hashes in INFO log lines. Both crates already depend on `alloy-primitives` (sdk: 1.6, twap-monitor: 1.5), so the swap is a one-liner per call site: - `encode_hex(b)` → `format!("0x{}", alloy_primitives::hex::encode(b))` - `hex_short(b)` → `format!("0x{}…", alloy_primitives::hex::encode(&b[..8]))` Both functions keep their old signature so callers (`resolve_app_data` in the SDK, every `host.log` line in twap-monitor strategy) need no changes. Comments on both helpers now explicitly reference mfw78's PR #8 guidance so the next person tempted to hand-roll a `0123456789abcdef` table has a hook. Validation: cargo test -p shepherd-sdk -p twap-monitor: 32 + 23 passed; cargo clippy --all-targets -- -D warnings: clean; cargo fmt --check: clean; zero em-dash drift. Why this PR sits in a separate branch rather than amending PR #47: PR #47 is already In Review, and #48/#49/#50 stack on top of it. Amending would require force-pushing 4 branches. A small follow-up PR keeps each one bisectable and lets mfw78 review the alloy alignment in isolation.

Synthetic load test for shepherd's M4 stack. Distinct from: - COW-1064 (real Sepolia E2E, correctness, 90 min, 5 modules) - COW-1078 (backtest of 7d historical events, replay) - COW-1031 (7-day soak, wall-clock stability) This issue answers one question the others do not: how many events per block can the supervisor dispatch before something breaks? lgahdl's PR #9 review thread flagged sequential per-module dispatch as a potential bottleneck; this PR is how we measure it. Components added: 1. `tools/orderbook-mock` (new crate, axum-based) - HTTP server serving the two endpoints shepherd's cow-api host hits per submission. POST /api/v1/orders returns a synthetic 56-byte OrderUid; GET /api/v1/app_data/{hash} returns the empty appData document. CLI knobs: --port, --latency-ms, --error-rate (alternates InsufficientFee / InvalidSignature to exercise both TryNextBlock and Drop paths). 3 unit tests covering the happy path, the empty appData path, and the error-rate envelope. 2. `tools/load-gen` (new crate, alloy-based) - connects to Anvil, impersonates the pinned Sepolia test EOA via anvil_impersonateAccount + anvil_setBalance, then on every new block fires N ComposableCoW.create(...) + M CoWSwapEthFlow.createOrder(...) calls. Each create uses a fresh salt counter so submissions do not collide on the dedup check. 3 unit tests covering pinned address parsing, salt uniqueness, and calldata selector shape. 3. Engine config: ChainConfig gains optional `orderbook_url` (per chain). OrderBookPool::from_config honours the override using cowprotocol::OrderBookApi::new_with_base_url; absent overrides fall back to canonical api.cow.fi URLs. main.rs switches from ::default() to ::from_config(&engine_cfg). Useful long-term for staging/barn targets, immediately needed to point at the mock. 4. `engine.load.toml` - chain 11155111 -> ws://localhost:8545, cow base URL -> http://localhost:9999, metrics on 127.0.0.1:9100, state_dir = ./data/load (wiped per run). 5. Scripts: - `scripts/load-bootstrap.sh` brings up Anvil + orderbook-mock, tracks PIDs in /tmp/shepherd-load.pids, exposes a teardown helper. - `scripts/load-teardown.sh` idempotent cleanup. - `scripts/load-run.sh` orchestrates one scenario end-to-end: bootstrap, build modules, start engine, snapshot /metrics, run load-gen for --duration-min, snapshot /metrics again, tear down, drop a report skeleton at docs/operations/load-reports/load-NxM-YYYY-MM-DD.md. 6. `docs/operations/load-testnet-runbook.md` - operator runbook covering the three scenarios (baseline 5x5, medium 20x20, saturation 50x50), expected acceptance bars, what the test does NOT prove (WS reconnect / drift / real-orderbook fidelity), troubleshooting. Validation: - cargo test --workspace --exclude <wasm-only-modules>: 196 passed. - cargo clippy --workspace --all-targets --tests -- -D warnings: clean. - cargo fmt --all --check: clean. - bash -n scripts/load-{bootstrap,run,teardown}.sh: clean. - Live orderbook-mock smoke: POST returns valid 56-byte hex UID, GET returns {"fullAppData":"{}"}, /_stats reflects counters. Pending (not in this PR): - Baseline 5x5 report against a real Anvil fork - requires Bruno's RPC_URL_SEPOLIA_HTTP from scripts/.env; once that runs, the report lands in docs/operations/load-reports/. - Metrics-delta auto-generation in scripts/load-run.sh (left as TBD in the script; e2e-report-gen.sh has the delta logic we can adapt). - Saturation scenario - run after the baseline lands so the bottleneck has a clean baseline to compare against.

…tion (COW-1079) First COW-1079 run on a real Anvil fork of Sepolia. The engine-side acceptance bar is cleared with wide margin: - Per-block dispatch latency p50/p95/p99 = 4/6/7 ms (bar was < 2 s). - Zero traps, zero poisoned modules, zero shepherd_module_errors_total. - EthFlow strategy submitted 1 OrderPlacement end-to-end through the mock orderbook in 10 ms; submitted:{uid} marker written cleanly. - 63 Anvil blocks dispatched flawlessly. The honest finding: load-gen's transactions get into Anvil's mempool (twap_ok=270, ethflow_ok=270 per the eth_sendTransaction response), but only 5 ConditionalOrderCreated + 1 OrderPlacement events actually fired - the rest reverted at the contract level (ComposableCoW.create + EthFlow.createOrder run preconditions the load-gen-crafted bodies don't pass). So this run stressed the engine with ~6 events over 60 s, not 5+5 per block. The bar criterion that depends on the load-gen (events-per-block delivered) is the only one that doesn't pass; filing a follow-up to calibrate the revert rate before re-running. Report at docs/operations/load-reports/load-5x5-2026-06-19.md mirrors the COW-1064 e2e-report shape and signs off as "conditional pass" - engine meets the bar; load-gen needs work.

scripts/lib.sh exports REPORTS_DIR=e2e-reports/ unconditionally. load-run.sh used to set REPORTS_DIR=load-reports/ BEFORE sourcing load-bootstrap.sh (which transitively sources lib.sh), so the override was lost and the auto-generated skeleton ended up under e2e-reports/ next to the COW-1064 reports. Move the assignment after the source so the load-reports/ path wins, with a comment explaining the ordering trap. Drive-by: removed the misplaced e2e-reports/load-5x5-2026-06-19.md from the first run; the committed report at load-reports/load-5x5-2026-06-19.md (commit 59fe714) is the canonical copy.

COW-1079 baseline's 5/270 + 1/270 revert rate had two distinct root causes, both contract-side, neither shepherd's fault: 1. **Nonce race in burst submissions.** Anvil's `eth_sendTransaction` against an impersonated account auto-assigns a nonce when none is provided, but the assignment racts with the caller's burst submission. When load-gen fired 5 TWAP + 5 EthFlow per block without waiting for individual receipts, most txs landed in the mempool sharing the same nonce, and Anvil's miner included only one per block - the rest reverted as nonce-too-low. Fix: read the EOA's current nonce at boot, increment locally per successful submission, pin `tx.nonce` explicitly on every `TransactionRequest`. Lock-step with cargo build cache so the nonce counter never crosses async-boundary corruption. 2. **EthFlow OrderUid dedup on identical GPv2 OrderData.** The CoWSwapEthFlow contract dedups by the GPv2 `OrderUid` which is keccak over (buyToken, receiver, sellAmount, buyAmount, appData, feeAmount, validTo, partiallyFillable, kind, sellTokenSource, buyTokenDestination). quoteId is NOT part of that hash. The prior load-gen varied only `quoteId` per call, so all 270 EthFlow submissions produced the same UID and the contract rejected 269/270 as `OrderIsAlreadyOwned`. Fix: vary `sellAmount` by 1 wei per call (`BASE_SELL_AMOUNT + seq`) and pass that same value as `msg.value` so the contract's `msg.value == order.sellAmount` invariant holds. Re-ran baseline 5x5 after both fixes: 130/130 TWAP + 130/130 EthFlow delivered, 130 ConditionalOrderCreated + 130 OrderPlacement events on-chain, 130 cow_api submits OK to mock, 130 ethflow markers written, zero shepherd_module_errors_total. Updated baseline report at docs/operations/load-reports/load-5x5-2026-06-19.md from 'conditional pass' to 'full PASS' with the post-calibration numbers (TWAP block p99 = 49 ms, EthFlow log p99 = 11 ms, 40x margin on the < 2 s bar). Medium 20x20 and saturation 50x50 are now unblocked per the COW-1079 acceptance roadmap.

…(COW-1079) Closes the COW-1079 three-scenario sweep with the COW-1080 calibration in place. All three scenarios pass: baseline 5x5 - 130/130 each, TWAP block p99=49ms medium 20x20 - 280/280 each, TWAP block p99=67ms saturation 50x50 - 300/300 each, TWAP block p99=78ms Latency growth across the watch-count range (130 -> 280 -> 300) is sub-linear: 49 -> 67 -> 78 ms. The lgahdl PR #9 concern about sequential per-module dispatch saturating under load is NOT surfaced at this scale. Zero shepherd_module_errors_total, zero traps, zero EthFlow submit errors across all three runs. The unexpected finding from saturation: the engine did not saturate. The bottleneck is load-gen's sequential eth_sendTransaction submission (each tx ~200 ms RTT, so 100 tx/iteration = ~20 s, vs. Anvil's 1 s block time). To genuinely saturate the engine we would need parallel load-gens against different impersonated EOAs, a sub-second block-time, or thousands of pre-seeded watches. EthFlow log p99 stayed flat at ~9 ms across all three scenarios (it is dominated by the cow-api submit roundtrip, not engine state), confirming the submit path scales independently of the watch count. The cold-start outlier (~500 ms on the first watch-heavy block) appears consistently across runs and is independent of the steady- state watch count - it is a one-shot first-block redb/eth_call warmup cost, NOT a saturation symptom. What this proves: - Shepherd M4 supervisor handles >= 300 concurrent watches + >= 138 block dispatch cycles in 2 min with p99 < 80 ms. - cow-api submit path is steady at ~9 ms p99 regardless of watch count. - Zero error/trap/poison across all three scenarios. What it does NOT prove (and is not in scope here): - Behaviour at 3000+ watches. - WS reconnect resilience (COW-1031 soak). - Multi-day memory drift (COW-1031). - Real-orderbook 4xx variety (COW-1078 backtest). COW-1079 ready to move to In Review.

…079) The single-EOA saturation 50x50 report identified the per-EOA nonce serialisation as the bottleneck before the engine had a chance to saturate. This commit removes that bottleneck: load-gen: - New --parallel N flag. Each worker impersonates a synthetic EOA (0x57...01..0a), gets its own WS connection + nonce stream, runs its own per-block submission loop. Total events per block scales linearly with N. - Disjoint salt space per worker via 96-bit prefix. - Disjoint EthFlow sellAmount space via a 10_000-wide per-worker window (the first attempt shifted by 96 bits, blowing past the 1M ETH funded balance with 7.9e28 wei sellAmounts; fixed). scripts/load-bootstrap.sh + scripts/load-run.sh: - Accept --block-time (passes to anvil) and --parallel (passes to load-gen). Defaults preserve historic behaviour: --block-time 1, --parallel 1. - Auto-report filename now includes scenario label (load-NxM-SCENARIO-date.md) so saturation-parallel does not overwrite the baseline 5x5 report. Saturation-parallel run (10 workers x 5 TWAP + 5 EthFlow per block, --block-time 0.5, 2 min): - load-gen: 895/895 TWAP + 895/895 EthFlow acks, 0 errors. - engine saw 381 ConditionalOrderCreated + 343 OrderPlacement events (43% / 38% delivery vs load-gen acks - Anvil + WS dropping under the heavier load). - shepherd_module_errors_total = 0, zero traps. - All 343 EthFlow submissions reached the mock orderbook 1:1. - TWAP block dispatch: histogram p50/p99 = 145 ms, max = 101 593 ms (101 s outlier on one block when 380+ watches polled against a stressed Anvil JSON-RPC). - Engine-log dispatch_block: n=586, p50=4ms, p95=46ms, p99=74ms, max=101 593 ms - same outlier. Saturation knee identified: 380+ active watches + 0.5s block-time + 10 concurrent WS subscribers produces a 101-second worst-case dispatch + 38-43% event delivery loss. Both symptoms point at the surrounding system (Anvil + WS transport), not at shepherd; engine continues to scale sub-linearly with watch count and never produces a module error, trap, or panic under any tested configuration. For the 7-day COW-1031 soak: this implies the operator should use a paid Sepolia archive endpoint (Alchemy / drpc / QuickNode), not publicnode, OR accept event drops and rely on supervisor reconnect + eth_getLogs re-indexing. Documented in the new report. Report at docs/operations/load-reports/load-50x50-parallel-2026-06-19.md.

Squash of PR #66 - applies 5 blockers + 8 majors from M4 audit.

…a doc link Rebase fallout from the M4 compliance pass: - `chain/chainlink.rs` defines `StubHost<Result<String, HostError>>` and manually implements every `*Host` trait. When the M4 conflict resolution added the `cow_api_request` forwarder into the macro's `CowApiHost` impl, this local StubHost was missed, producing `E0046: not all trait items implemented`. Add a parallel `unreachable!("not used in this test")` body; the test never exercises the cow-api surface. - `cow/app_data.rs`'s module-level doc referred to `EMPTY_APP_DATA_JSON` as an unqualified intra-doc link, but the symbol is only used as `cowprotocol::EMPTY_APP_DATA_JSON` inside the function body (no `use` at module scope). `RUSTDOCFLAGS=-D warnings` rejects the unresolved link. Qualify the path so it resolves while keeping the prose intent. - `wit_bindgen_macro.rs` fmt drift: cargo fmt collapses the `shepherd::cow::cow_api::request(...).map_err(convert_err)` chain to a single line. Apply the canonical format. Brings dev/m4-base back to fmt/clippy/test/doc green.

…face Audit reference: milestone-rubric-grant-audit-2026-06-25.md, Major #3 (`[u8; 32]` for protocol hash across SDK public boundary). The rubric explicitly calls out: "Newtypes for protocol IDs (no raw `[u8; 32]` across module boundaries)." `B256` is already in `shepherd_sdk::prelude` so the swap costs callers nothing - both twap-monitor and ethflow-watcher were holding the appData as `B256` already and reaching through `.0` to satisfy the prior signature. Changes: - `resolve_app_data(host, chain_id, &B256)` (was `&[u8; 32]`) - `encode_hex(&B256)` internal helper - Doctest + 5 unit tests rewritten against `B256::from(bytes)` and `B256::from_slice(EMPTY_APP_DATA_HASH.as_slice())`. Coverage stays identical. - Call sites in twap-monitor and ethflow-watcher drop the `.0` reach-through; pass `&order.appData` directly. No public surface beyond `shepherd-sdk` consumes this function; external module crates in the workspace are the only consumers and both land in the same commit.

Audit reference: milestone-rubric-grant-audit-2026-06-25.md, duplication finding "Canonical CoW chain set [Mainnet, Gnosis, Sepolia, ArbitrumOne, Base]" duplicated at `crates/nexum-engine/src/host/cow_orderbook.rs:39-43` and `:66-70`. `from_config` was added in the M4 multi-chain pass and reproduced the same 5-element array `Default::default` already used. Adding a sixth chain previously needed touching both arrays in lock-step; pull the list into a single `const DEFAULT_CHAINS: &[Chain]` so the single-source-of-truth property is structural. Also drops the redundant `use cowprotocol::OrderBookApi;` inside `from_config` (already in scope from the module-top `use cowprotocol:: {Chain, OrderBookApi, ...}` line). Behaviour identical.

Audit reference: milestone-rubric-grant-audit-2026-06-25.md, Major #6. Rubric forbids em-dashes in operator-facing config files; while .toml is technically a grey zone the comment surfaces verbatim when operators `cat engine.e2e.toml` during e2e runbook execution.

brunota20 · 2026-06-26T00:04:49Z

Audit judgment-call pass complete on top of #20. New tip: aef680a8897a8213cb696953e3781e945a5973ac (was 0b946ea3...).

Changes layered on top by the bleu/nullis-shepherd audit pass (M4 layer):

JC1/JC2/JC3/JC5 cascade from M2 + M3: workspace deps + clap CLI surface + shepherd_sdk::address consolidation flow through unchanged. The Cargo.toml conflict on rebase (M4 introduced tracing-subscriber with json feature, JSON formatter contract COW-1035) resolved by adding json to the workspace tracing-subscriber feature list so every crate inherits via workspace = true.
JC2 propagation into M4 cli.rs: M4's --pretty-logs flag was originally added as a hand-rolled arg parser. Replaced with a #[arg(long = "pretty-logs")] pub pretty_logs: bool field on the clap-derive Cli struct (matches the JC2 migration). No behaviour change.

All 4 gates (fmt, clippy --workspace --all-targets --all-features -D warnings, test --workspace --all-features, RUSTDOCFLAGS="-D warnings" doc) green on the new tip. No upstream commits amended; the JC changes land as Cargo.toml + cli.rs conflict resolutions on the existing M4 commits.

PR head dev/m4-base updated directly; no separate alias branch needed.

brunota20 force-pushed the dev/m4-base branch from ced9132 to 20e5df6 Compare June 25, 2026 19:15

brunota20 force-pushed the dev/m4-base branch from 20e5df6 to 2eed4fe Compare June 25, 2026 21:17

brunota20 force-pushed the dev/m4-base branch from 2eed4fe to 0b946ea Compare June 25, 2026 22:32

brunota20 added 20 commits June 25, 2026 20:10

runtime: multi-module supervisor + block/log event loop

be7a3b1

feat(supervisor): apply ADR-0001/0003/0005/0016 and trap-based module…

9ebbeea

… death (BLEU-813-817)

feat(supervisor): add fuel + memory limits per module store (BLEU-818)

473c95f

docs: rename nexum.toml -> module.toml in example, justfile, and READ…

ad3d798

…ME (BLEU-820)

test: fill host backend test gaps — manifest parsing, cow-api, provid…

62c5811

…er-pool, supervisor (BLEU-821)

test: E2E supervisor tests + fix wit_import_to_cap to skip type-only …

881965d

…interfaces (BLEU-819)

style: apply rust-idiomatic rules (em-dashes, #[from] Orderbook, unus…

7d1c0b6

…ed_crate_dependencies, drop redundant map_err)

chore(deps): patch cowprotocol to bleu/cow-rs main (post-alpha.3)

8c848dd

docs(adr): add 0001-0007 capturing engine and CoW architecture decisions

edbafca

docs(adr): unwrap hard-wrapped paragraphs to single line each

c21378e

docs(adr): revise CoW design and reorder ADRs (0001-0008)

e5010c4

brunota20 and others added 27 commits June 25, 2026 20:55

chore(rust-idiomatic): M4 compliance pass (blockers + majors) (#66)

bdd822d

Squash of PR #66 - applies 5 blockers + 8 majors from M4 audit.

brunota20 force-pushed the dev/m4-base branch from 0b946ea to aef680a Compare June 25, 2026 23:57

brunota20 mentioned this pull request Jun 26, 2026

M3 epic: SDK + examples + tutorial + QA validation #23

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

M4 epic: production hardening + E2E + load testing#20

M4 epic: production hardening + E2E + load testing#20
brunota20 wants to merge 108 commits into
nullislabs:mainfrom
bleu:dev/m4-base

brunota20 commented Jun 25, 2026

Uh oh!

brunota20 commented Jun 25, 2026

Uh oh!

brunota20 commented Jun 25, 2026

Uh oh!

brunota20 commented Jun 25, 2026

Uh oh!

brunota20 commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

brunota20 commented Jun 25, 2026