M3 epic: SDK + examples + tutorial + QA validation by brunota20 · Pull Request #18 · nullislabs/shepherd

brunota20 · 2026-06-18T13:00:13Z

M3 epic — consolidated for review

This PR aggregates the M3 deliverable. M3 ships the layered SDK + developer experience that lets a module author write strategy logic against &impl Host (testable without wasm) while the production wit-bindgen adapter ships as mechanical glue.

Core deliverable

Crate / module	What it adds
`crates/shepherd-sdk`	4 per-capability host traits (`ChainHost`, `LocalStoreHost`, `CowApiHost`, `LoggingHost`) + supertrait `Host`; SDK-side `HostError` mirroring the wit struct; helpers in `chain` (`eth_call_params`, `parse_eth_call_result`, `decode_revert_hex`) and `cow` (`PollOutcome`, `RetryAction`, `classify_api_error`, `gpv2_to_order_data`, `decode_revert`, `IConditionalOrder` sol! interface).
`crates/shepherd-sdk-test`	`MockHost` with per-trait mocks (`MockChain`, `MockLocalStore`, `MockCowApi`, `MockLogging`) — enables module unit tests that run as native Rust, no wasm toolchain.
`modules/examples/price-alert`	Chainlink oracle reader. Demonstrates `chain::request` + ABI decode + threshold logic.
`modules/examples/balance-tracker`	ERC-20 balance differ. Demonstrates raw `chain::request` + per-key `local-store` persistence.
`modules/examples/stop-loss`	Full M3 surface: oracle read + `OrderCreation` with `Signature::PreSign` + cow-api submit + typed retry classification.
`docs/tutorial-first-module.md`	Reads as a guided tour of the real stop-loss module instead of inlined snippets with `todo!()`.
Strategy / lib.rs split	M2 modules (twap-monitor, ethflow-watcher) refactored to consume the Host trait pattern + SDK helpers (ADR-0009).

Note on diff scope

Same caveat as the M2 epic: nullislabs:main is the pre-M1 baseline. Until your in-flight M1 PRs merge, this diff also includes M1 (#8/#9/#12/#15) + M2 (#17) + M3. Once M2 epic (#17) merges and your M1 PRs land, this rebases clean to M3-only.

To focus the M3 review, the M3-specific paths are:

crates/shepherd-sdk/ and crates/shepherd-sdk-test/
modules/examples/ (3 example modules)
modules/twap-monitor/src/strategy.rs and modules/ethflow-watcher/src/strategy.rs (refactor to consume SDK)
docs/05-sdk-design.md (M3 implementation status callout added at top)
docs/adr/0009-host-trait-surface.md (new)
docs/operations/m{2,3}-testnet-runbook.md + m3-edge-case-validation.md (new)
docs/qa-signoff-cow-1063.md (new, captures the QA pass)
Small CI hardening: .github/workflows/ci.yml matrix + rustdoc gate
Two supervisor bug fixes the M3 testnet wiring surfaced (see below)

Architectural review request

This is the surface you flagged for explicit review: "areas that touch on architecture (specifically the host module architecture) I would like input / review on."

ADR-0009 captures three coupled decisions:

Four per-capability traits + supertrait Host with blanket impl. Lets strategy code be <H: Host> generic; tests inject MockHost, production injects WitBindgenHost.
SDK-side HostError mirroring the wit struct field-for-field, bridged via per-module From impls. Keeps shepherd-sdk-test world-neutral so mocks compile without a wasm toolchain.
Per-module strategy.rs + lib.rs split: strategy is pure logic; lib.rs is the wit-bindgen + WitBindgenHost adapter + Guest impl.

The "Considered options" section explicitly rejects: single fat Host trait, proc-macro-now (deferred), re-exporting wit-bindgen HostError. The WitBindgenHost adapter is ~150 lines of mechanical glue per module — acknowledged duplication, candidate for a future #[nexum::module] proc macro described in docs/05-sdk-design.md.

docs/05-sdk-design.md carries a "Current implementation status (M3)" callout at the top distinguishing what shipped vs the forward-looking vision. Your call on whether to (a) trim the doc to M3 reality, or (b) keep the rest as roadmap.

Bugs surfaced + fixed during testnet wiring

Two M1-tail bugs the M3 testnet runbook exposed (live on Sepolia, both fixed in this epic):

runtime/event_loop.rs: select_all over an empty Vec yielded None immediately, tripping the "stream ended -> shut down" arm before any event flowed. Block-only manifests (all M3 example modules) bailed the engine. Fix: substitute stream::pending() when the Vec is empty. Regression test in supervisor::tests::run_does_not_bail_when_both_stream_kinds_are_empty.
Supervisor::load: init-failed modules stayed alive = true and received every block dispatch, wasting fuel on a no-op. Fix: flip alive = false when init returns Err. Boot log changes from count=N to loaded=N alive=M. Regression test in supervisor::tests::init_failure_marks_module_dead_and_excludes_from_dispatch.

Both fixes are small and clearly scoped; happy to split them into separate PRs if you'd prefer.

Validation

Unit tests: 151 host tests + 6 doctests passing.
Supervisor integration tests: 5 module-specific + 2 regression tests cover the wit-bindgen + WitBindgenHost + supervisor dispatch path for every production module.
Live testnet (Sepolia): docs/operations/m3-testnet-runbook.md walks 3 modules end-to-end on Sepolia in ~10s wall clock. docs/operations/m3-edge-case-validation.md runs 5 error-path scenarios (bad RPC, bad oracle, capability mismatch, malformed config, cross-restart persistence) all passing.
cargo clippy --all-targets --workspace -- -D warnings clean.
cargo fmt --all --check clean.
cargo doc --workspace --no-deps -D warnings clean (CI gate added).
0 em-dashes in crates/, modules/, docs/ (rust-idiomatic rubric enforced).
WASM builds for all 5 modules under wasm32-wasip2 --release (CI matrix).

Considered but deferred

docs/05-sdk-design.md describes a richer SDK (#[nexum::module] proc macro, full alloy Provider via HostTransport, TypedState postcard helpers, Signer, typed Cow client with raw_request, nexum-sdk + shepherd-sdk crate split). M3 shipped the thinner Host-trait + helpers + MockHost surface; the rest is forward-looking work. The status callout in docs/05-sdk-design.md makes this explicit so the doc is not misread as API reference for code that does not exist yet.

Closes BLEU-825, BLEU-826, BLEU-827, BLEU-828, BLEU-829, BLEU-830, BLEU-831, BLEU-832, BLEU-833, BLEU-835, BLEU-836, BLEU-840, BLEU-841, BLEU-843, BLEU-844, BLEU-846, BLEU-847, BLEU-848, BLEU-851, BLEU-852, BLEU-854, BLEU-855, COW-1063, COW-1066, COW-1067, COW-1068, COW-1069, COW-1070.

Linear milestone: M3 - SDK + Developer Experience. Companion: #17 (M2).

Adds the dependencies the 0.2 host backends need: - cowprotocol (1.0.0-alpha) for the cow-api submission path (OrderBookApi, OrderCreation, OrderUid, Chain). - alloy-provider / -rpc-client / -transport-ws / -primitives (1.5) for the chain JSON-RPC dispatch. The reqwest feature on alloy-provider engages connect_http; the pubsub/ws features back eth_subscribe-class methods. - redb (2) for local-store. Same crate cowprotocol's own watch-tower picked, so the dep tree does not bifurcate when both are used in the same workspace. - reqwest (0.12, rustls-tls) — direct, so the import survives any future cowprotocol feature rearrangement. - tracing + tracing-subscriber (env-filter + fmt) — replaces the 0.1 eprintln! debug log so the engine can drop into a structured log pipeline without re-instrumenting every host call. - thiserror (2) — typed error enums in each backend. - tempfile + wiremock as dev-deps for the host backend tests. Adds engine.example.toml documenting the [engine] state_dir + per- chain RPC URLs the chain backend reads at boot; data/ is now ignored so a local run does not leave the redb file in tree.

Replaces the 0.2 Unsupported stubs with working backends. Each capability lives in its own host submodule so the trait impls in main.rs stay thin (dispatch + project the backend's typed error onto HostError). cow_api::submit_order - Parses the guest's bytes as JSON cowprotocol::OrderCreation. - Dispatches via cowprotocol::OrderBookApi::post_order. - Returns the assigned OrderUid as a 0x-prefixed hex string. cow_api::request - REST passthrough. The base URL is whichever URL the pool's OrderBookApi client carries — so OrderBookApi::new_with_base_url overrides (staging, wiremock) flow through transparently. - Method/path validated host-side; orderbook 4xx/5xx bodies are surfaced verbatim so the guest can decode {errorType,description}. chain::request - Raw JSON-RPC dispatch over an alloy DynProvider opened from engine.toml at boot. WebSocket URLs engage pubsub (eth_subscribe); HTTP URLs use the HTTP transport. Params are passed as serde_json::RawValue so alloy does not re-encode. - request-batch falls back to per-call dispatch (same shape as the earlier stub but now backed by real RPC). local_store - redb file under engine_config.engine.state_dir. - Single shared table. Per-module namespacing is enforced host-side via [len:u8][module_name][raw_key] prefix on every key. list_keys strips the prefix before returning to the guest. logging - Routes through tracing::event! tagged with module=<namespace>. - Engine boot installs an EnvFilter-based subscriber; RUST_LOG overrides the engine.toml log_level. identity / remote-store / messaging / http stay at Unsupported per the 0.2 roadmap (keystore / Swarm / Waku land in 0.3). Tests (14, all green): - cow_orderbook: pool default chains, unknown-chain typing, REST GET passthrough, relative-path resolution, unknown-method rejection, submit_order round-trip — last three under wiremock so the full HTTP path is exercised without hitting api.cow.fi. - provider_pool: empty pool surfaces UnknownChain. - local_store: roundtrip, namespace isolation, delete, list_keys prefix-stripping, empty-namespace rejection. End-to-end against modules/example: example.wasm loads under the new wiring, logs init + on_event through the tracing pipeline.

… death (BLEU-813-817)

…ME (BLEU-820)

…er-pool, supervisor (BLEU-821)

…interfaces (BLEU-819)

…ed_crate_dependencies, drop redundant map_err)

PR #9 specific: - main: warn + return when block/log streams end (WebSocket dropped) - supervisor: simplify dispatch_block by extracting chain_id before move - supervisor: temp_local_store returns (TempDir, LocalStore) instead of leaking - README: correct engine.toml chain syntax to [chains.<id>] with rpc_url Rebased from PR #8: - local_store_redb: table.range() instead of iter() for O(matching) keys - provider_pool: dedupe method clone on the success path - main: hex_encode writes into the pre-allocated buffer - cow_orderbook: drop blank line nit - manifest: collapse nested if and use ? operator (clippy) - alloy_rpc_client / alloy_transport(_ws) imports as _ to satisfy unused_crate_dependencies.

Move the manifest.rs monolith into a directory module with four focused submodules (types, load, capabilities, error). Includes the Subscription enum and the four PR #9 tests for subscription parsing. Behaviour unchanged - pure code motion.

main.rs went from 739 lines of mixed bootstrap + 8 Host trait impls + CLI parser + event loop to ~125 lines of pure orchestration. New layout: - bindings.rs: wasmtime::component::bindgen!() moved out so other modules can name the generated types. - cli.rs: Cli struct + manual parser. - host/state.rs: HostState + WasiView impl. - host/error.rs: unimplemented / internal_error / hex_encode helpers. - host/impls/{chain,cow_api,identity,local_store,remote_store,messaging, logging,clock,random,http,types}.rs: one Host trait impl per file. - runtime/limits.rs: DEFAULT_FUEL_PER_EVENT + DEFAULT_MEMORY_LIMIT. - runtime/event_loop.rs: open_block_streams, open_log_streams, run, wait_for_shutdown_signal, TaggedBlockStream, TaggedLogStream. Adding a new capability is now a single new file under host/impls/ rather than a 60-80 line diff in main.rs.

local_store_redb.rs was 89% tests, cow_orderbook.rs was 60%, and supervisor.rs was 32% (205 lines absolute). Promote each to a directory module with the test suite living in a sibling tests.rs so impl-side diffs stop competing with test churn for attention.

Carries PR #8 (host backends) + PR #9 (supervisor) + cowprotocol patch. Open upstream: nullislabs#15.

Open upstream: nullislabs#12. Resolved .gitignore by taking the PR #12 additions (.agents/, .claude/, skills-lock.json) plus PR #15's data/. # Conflicts: # .gitignore

…LEU-823 in)

Per ADR-0001 (module.toml schema), authored for the two M2 modules: twap-monitor / module.toml - capabilities.required = ["logging", "local-store", "chain", "cow-api"] — matches the Rust imports the BLEU-826/827/828 paths exercise. - [[subscription]] log on Sepolia (chain_id 11155111) against ComposableCoW (0xfdaFc9d1902f4e0b84f65F49f244b32b31013b74) with topic-0 keccak256( "ConditionalOrderCreated(address,(address,bytes32,bytes))" ) = 0x2cceac5555b0ca45a3744ced542f54b56ad2eb45e521962372eef212a2cbf361. - [[subscription]] block on Sepolia for the BLEU-827 poll loop. ethflow-watcher / module.toml - Same capability set (chain reserved for a future eth_call — e.g. read the EthFlow refund pointer — without churning the manifest). - [[subscription]] log on Sepolia against CoWSwapEthFlow production (0xbA3cB449bD2B4ADddBc894D8697F5170800EAdeC) with topic-0 keccak256( "OrderPlacement(address,(address,address,address,uint256,uint256, uint32,bytes32,uint256,bytes32,bool,bytes32,bytes32), (uint8,bytes),bytes)" ) = 0xcf5f9de2984132265203b5c335b25727702ca77262ff622e136baa7362bf1da9. Both [capabilities.http].allow stay empty: all outbound HTTP flows through the cow-api capability, which routes via the host's pinned orderbook URL. The content hash field is the 0.2 placeholder all-zero sha256; 0.3 will validate it against the loaded component bytes. Linear: BLEU-834. Ref ADR-0001.

brunota20 · 2026-06-24T12:40:12Z

Heads up: bleu:fix/supervisor-alive-on-init-err advanced from 9e76602 to 841a359 to include the BLEU-836 deployment-runbook commit that landed in the bleu-fork dev/m3-base via PR #17. The change is purely additive on top of the previous head (the M3 compliance pass) - no rebase, no rewriting.

Diff vs prior head: +1 commit (docs(deployment): operator runbook (BLEU-836)), markdown-only, under docs/deployment/.

This brings the M3 epic upstream PR in line with the current bleu-fork dev/m3-base state. From here, the M3 deliverables are: SDK + examples + tutorial + QA validation (original epic) + rust-idiomatic compliance + operator deployment runbook.

brunota20 · 2026-06-24T13:05:36Z

Linear issues delivered by this PR

This M3 epic delivers the following CoW project tickets (renamed from BLEU- prefix; same underlying tickets):

COW-1048 (BLEU-840): Extract shared helpers from twap-monitor + ethflow-watcher into shepherd-sdk
COW-1046 (BLEU-841): Mock host crate (shepherd-sdk-test)
COW-1047 (BLEU-843): Refactor twap-monitor + ethflow-watcher to consume shepherd-sdk
COW-1045 (BLEU-844): SDK API reference docs (rustdoc + landing)
COW-1043 (BLEU-846): Example module - price-alert (Chainlink oracle reader)
COW-1042 (BLEU-847): Example module - balance-tracker
COW-1041 (BLEU-848): Tutorial - "Build your first Shepherd module"
COW-1040 (BLEU-851): Refactor price-alert to Host trait + MockHost
COW-1039 (BLEU-852): Ship modules/examples/stop-loss + rework tutorial
COW-1038 (BLEU-854): Refactor twap-monitor to Host trait + MockHost tests
COW-1037 (BLEU-855): Refactor ethflow-watcher to Host trait + MockHost tests

All 11 tickets above have already been transitioned to Done in Linear (the original feature PRs landed via dev/m3-base advance + bleu epic #73; this upstream PR delivers the same work to nullislabs/shepherd:main).

brunota20 · 2026-06-24T13:27:25Z

Heads up on an M3 design call before you finish reviewing upstream PR #18. We shipped a host-trait seam (ADR-0009) instead of the proc macros the original grant text promised. The two aren't competing - host-trait is the layer macros sit on top of - and we landed the substrate first so the testing-framework deliverable would actually be useful.

What shipped

shepherd_sdk::host::Host supertrait (ChainHost + LocalStoreHost + CowApiHost + LoggingHost) + MockHost in shepherd-sdk-test. Each module splits into strategy.rs (generic over &impl Host) and lib.rs (wit-bindgen + WitBindgenHost adapter + Guest delegating to strategy).

Why trait-first

~190 unit tests exercise strategy logic via MockHost, no wasm32-wasip2 build in the inner loop. Macros-first would route every meaningful test through wit-bindgen + wasmtime.
Plain Rust at the boundary - normal diagnostics, full IDE, no macro-expansion rabbit holes.
The macros stay additive. #[nexum::module] / #[on_block] / #[on_logs]can land later as sugar emitting theWitBindgenHostadapter +Guestimpl, without touching strategy code. Tracked as Linear "M5 - SDK Ergonomics: proc macro + Provider + Signer", framed as the deferred M5+ vision fromdocs/05-sdk-design.md`.

Cost

~30-40 lines of mechanical boilerplate per module vs the ~5 lines a #[nexum::module] macro would emit. Same shape in all 5 production modules - grep WitBindgenHost to see it. Cheap to delete once the macro lands.

Where to look

ADR-0009 (docs/adr/0009-host-trait-surface.md): considered alternatives + explicit deferred-macro decision.
Trait surface: crates/shepherd-sdk/src/host.rs on PR M3 epic: SDK + examples + tutorial + QA validation #18 head (bleu:fix/supervisor-alive-on-init-err @ 841a359).
M3 epic squash on the bleu fork: bleu/nullis-shepherd#73 (on bleu:main as bc9a462).

Ask

If the trade-off looks right, M3 ships as-is and the macro layer becomes the next milestone (post-grant follow-up or scoped extension - your call). If you'd rather we add the macros now against M3's original text, flag it on PR #18 and we'll line up the work + scope conversation.

Three threads from the internal review mirror of upstream nullislabs/shepherd PR #17: 1. ethflow-watcher/module.toml capabilities: move `chain` from required to optional. The comment on the original manifest already said the module does not call `chain` today; declaring it as required widened the grant for a capability the module does not exercise. Optional keeps "future-proofing" (BLEU-855 can use it without manifest churn) without violating least-privilege. 2. ethflow-watcher/module.toml subscription comment: soften the "identical on every chain" claim. cow-rs::ETH_FLOW_PRODUCTION is identical across chains today, but unlike ComposableCoW's CREATE2 address EthFlow has had multiple per-network and per-version deployments historically. Multi-chain config in M5 must re-check per `chain_id` instead of assuming the address carries. Address itself stays unchanged: 0xbA3cB449bD2B4ADddBc894D8697F5170800EAdeC is verified against the live Sepolia deployment (event firing observed in the COW-1064 dry-run on 2026-06-18 + cow-rs canonical constant + multiple load-test runs). 3. README.md module manifest example: the documented `address` field said `0xC92E8bdf79f0507f65a392b0ab4667716BFE0110` labeled "ComposableCoW", but that is the GPv2VaultRelayer (per scripts/lib.sh). ComposableCoW is `0xfdaFc9d1902f4e0b84f65F49f244b32b31013b74`. Fixed the address; expanded the comment to clarify it is the canonical CREATE2 address (same on every supported chain). Stays on `feat/m2-module-manifests-bleu-834` as a stacked branch so upstream PR #17 + internal mirror PR #54 can see the fixes as a separate, atomic commit.

…ance) Filtered subset of the compliance applied in PRs #66/#67 of bleu/nullis-shepherd, restricted to files that exist on the M2 epic head. M3+ files (shepherd-sdk, examples, backtest, deploy artifacts) and M4-coupled hunks (ProviderError typed-source variants, JoinSet reconnect tasks, supervisor restart helper) are skipped — they land via their own upstream PRs. Brings M2 epic in line with the repo-wide rust rubric (typed errors, no anyhow in libs, em-dash sweep, #[non_exhaustive] on public error enums).

…841) Two-part deliverable: 1. New `shepherd_sdk::host` module exposing the trait seam between strategy logic and the wit-bindgen shims a module generates per- cdylib: - `ChainHost` — request(chain_id, method, params) - `LocalStoreHost`— get / set / delete / list_keys - `CowApiHost` — submit_order(chain_id, body) - `LoggingHost` — log(level, message) - `Host` — supertrait bundling all four (blanket impl so callers only need the supertrait bound) The traits ride on a host-neutral `HostError` (same field shape as wit-bindgen's), with `HostErrorKind` and `LogLevel` mirroring the WIT enums verbatim. Modules bridge their own wit-bindgen `HostError` to the SDK's with a one-liner `From` impl on each side; the M3 tutorial (BLEU-848) documents the adapter pattern. 2. New `shepherd-sdk-test` crate (dev-only, host-only) supplying in-memory implementations for every trait + assertion helpers: - `MockHost { chain, store, cow_api, logging }` - `MockChain`: programmable `(method, params)` -> result map; records every call with `chain_id`, `method`, `params`. - `MockLocalStore`: HashMap-backed; `list_keys` does a prefix scan (sorted output for stable assertions). - `MockCowApi`: single programmable response shared across calls; records each submission's `chain_id` + body bytes; `last_body_as_json` helper for inline assertions. - `MockLogging`: buffers all lines with their level; `contains` / `count_at` helpers. Unconfigured calls return `HostErrorKind::Unsupported` so an unprogrammed test fails fast instead of silently passing on a default value. Tests: 8 host tests on `shepherd-sdk-test` + 1 module-level doctest locking the recommended usage pattern. Workspace + wasm32-wasip2 check still clean. Adoption is opt-in: existing M2 modules keep their pure-function tests for now. BLEU-848 (tutorial) will demonstrate the new strategy-takes-Host pattern with `MockHost` end-to-end.

- Tightened the crate-root rustdoc on `shepherd-sdk/src/lib.rs`: switched the inline `[Type](path)` link form to top-of-file reference-style link definitions so the rustdoc target is unambiguous and the source stays readable. - Removed the placeholder `pub mod store {}` (out-of-scope until a second strategy module needs the same key conventions). - New `crates/shepherd-sdk/README.md` covering: quick tour table, host-free testing recipe with `shepherd-sdk-test`, the no-wit-bindgen-in-SDK rationale, layout map, and how to generate docs with the strict flags. - New `docs/sdk.md` repo-level landing page that lists the four host capabilities the SDK mirrors and links into the rustdoc per module. Gate: `cargo doc -p shepherd-sdk -p shepherd-sdk-test --no-deps` runs clean under `RUSTDOCFLAGS="-D warnings -D missing-docs"`. Every public item carries a doc comment; intra-doc links resolve. Tests + clippy unchanged.

New `modules/examples/price-alert/` — first canonical SDK example. A Shepherd module that polls a Chainlink AggregatorV3 price oracle on every block (throttled by `every_n_blocks`) and emits a Warn- level log when the answer crosses a config-supplied threshold. Demonstrates the three load-bearing patterns of a Shepherd module: - `chain::request` + ABI decode via `alloy_sol_types` (sol! interface AggregatorV3 declares `latestRoundData`, decode via `abi_decode_returns`). - shepherd-sdk helpers (`chain::eth_call_params` + `chain::parse_eth_call_result`; the SDK's prelude is *not* used here because the module needs none of the CoW types). - `[config]` driven behaviour parsed once in `init` and stored in `OnceLock<Settings>` for read-only access on every event. Module-internal: - `Settings` (renamed from `Config` to avoid clashing with the wit-bindgen-generated `Config` type alias for the `init` arg). - `Direction { Above, Below }` deciding which side of the threshold fires. - `scale_threshold(decimal, decimals)` hand-rolled because alloy does not ship a `Decimal::parse_units`-style helper; handles optional sign, missing decimal point, short / long fractional, rejects non-digit garbage. Locked by 5 unit tests. - `classify(answer, threshold, direction)` pure 1-liner with 2 edge tests (at-or-above vs. at-or-below behaviour at the boundary). - `parse_config(entries)` returns `Result<Settings, String>` with human-readable errors; 4 unit tests cover happy path, defaults, unknown direction, missing key. module.toml: - `capabilities = ["logging", "chain"]` (no local-store; no cow-api). - `[[subscription]]` block on Sepolia (chain_id 11155111). - `[config]` ships defaults pointing at the canonical Sepolia ETH/USD feed with a 2500.00 USD threshold + "below" direction. 11 host tests; clippy clean on host + wasm32-wasip2. .wasm is 206 KB optimised — comparable to the M2 modules (twap 305 KB, ethflow 275 KB) and dominated by alloy-sol-types + wit-bindgen runtime.

New `modules/examples/balance-tracker/` — second canonical SDK example. Subscribes to blocks, reads `eth_getBalance(addr)` for a configured address list, persists each reading under `balance:{addr}` in local-store, and emits a Warn-level log when the delta against the prior reading exceeds `change_threshold` wei. Demonstrates: - `chain::request` with a non-`eth_call` method (raw JSON-RPC with hand-built params), to balance the price-alert example's sol! / `eth_call` flow. - `local-store` `get` / `set` per-key persistence with U256 LE serialisation as the wire format. - The "diff against last seen" pattern reusable across indexer modules (transfer monitors, allowance trackers, …). Module-internal: - `Settings { addresses: Vec<Address>, change_threshold: U256 }` parsed from `[config]` once at `init` and stored in `OnceLock<Settings>`. - `parse_balance_hex(json)` — strips JSON quotes and the `0x` prefix, decodes the remaining hex into a U256. Handles `"0x"` (zero balance), rejects unquoted / non-hex bodies. - `parse_addresses(raw)` — comma-separated list with whitespace tolerance and empty-segment skipping; rejects empty lists. - `abs_diff` + `parse_u256_le` + `u256_to_le_bytes` — pure utilities with edge-case coverage. module.toml: - `capabilities = ["logging", "chain", "local-store"]` (the superset that distinguishes this example from price-alert, which only needs chain + logging). - `[[subscription]]` block on Sepolia (chain_id 11155111). - `[config]` ships defaults pointing at two anvil-style EOAs and a 0.1 ETH change threshold. 13 host tests; clippy clean on host + wasm32-wasip2. `.wasm` is 99 KB optimised — about half of price-alert's 206 KB because it does not pull `alloy-sol-types` into the link tree (no ABI work; all decoding is hex/U256).

End-to-end cold-start guide that takes an external developer from "I cloned the repo" to "I see my module's first event in the engine log" in under four hours. Scenario: stop-loss order — combines every load-bearing pattern in the SDK (block subscription, chain::request + ABI decode, local- store dedup, cow_api::submit_order, host-free tests via MockHost). The tutorial walks through each pattern via the four worked examples already in the repo (price-alert, balance-tracker, twap-monitor, shepherd-sdk-test) and stitches them into the stop- loss module. Sections + rough budgets: 0. Prerequisites (15m) — toolchain check; verify the example module runs. 1. Scaffold workspace (15m) — Cargo.toml template + workspace members entry. 2. Manifest (10m) — module.toml with the four capabilities + Sepolia [[subscription]] + [config] schema. 3. Strategy (60m) 3a. Pure logic — on_block<H: Host>(...) using shepherd-sdk's chain helpers and AggregatorV3 sol! interface. 3b. Guest adapter — wit_bindgen::generate! + the WitBindgenHost struct that bridges to shepherd_sdk::host (one-time boilerplate per module). 3c. Unit tests — two MockHost tests: idle-above- trigger + triggers-and-dedups. 4. Build (5m) — cargo build --target wasm32-wasip2 --release + size sanity. 5. Run (10m) — engine.toml WS RPC for Sepolia + cargo run -p nexum-engine. 6. Where to go (10m) — production hardening + real order assembly (twap-monitor cross-ref) + multi-chain. Pure docs change — no module added (the stop-loss in §3 is the reader's exercise; build_order_body deliberately ends in a `todo!` with a cross-reference to twap-monitor's canonical assembly path). Worked artefacts referenced in the tutorial are the existing examples landed in #18 / #19 plus shepherd-sdk + shepherd-sdk-test. Cross-links: docs/sdk.md (BLEU-844), docs/deployment.md (BLEU-836), ADR-0001 / 0006 / 0007. Acceptance per the issue: the tutorial is reviewer-validatable. Time-budget callout at the end asks for a tag `docs/tutorial` if a section drags, so we tighten on feedback.

QA pass against the team's rust-idiomatic skill ahead of M4. All mandatory rules now hold; the cleanup is mostly mechanical with a handful of small typing improvements where the rule asked for one thiserror enum per error type. Replaced every U+2014 with " - " across .rs / .toml / .md: - 51 source-file occurrences - 5 Cargo.toml comments - 366 occurrences across docs/*.md (most in ADRs and the deployment / tutorial / sdk landings) Grep gate: `grep -rn '—' crates/ modules/ docs/` returns 0. Added to every crate root that previously lacked it: - crates/shepherd-sdk/src/lib.rs - crates/shepherd-sdk-test/src/lib.rs - modules/{example,twap-monitor,ethflow-watcher}/src/lib.rs - modules/examples/{price-alert,balance-tracker}/src/lib.rs `crates/nexum-engine/src/main.rs` already had it. - shepherd-sdk dropped `serde` (only `serde_json` is actually imported; cowprotocol re-exports carry their own serde derive transitively). - balance-tracker dropped its direct `alloy-primitives` dep — now goes through `shepherd_sdk::prelude::{Address, U256, address}`. Tests adapt. - `shepherd_sdk::host::HostError` gains `#[derive(thiserror:: Error)]` + `#[error("{domain}: {message} (code={code}, kind={kind:?})")]`. Was a plain struct without Display. Added `thiserror = "2"` as a dep. - `modules/twap-monitor::BuildError`: hand-rolled Display impl replaced with `#[derive(thiserror::Error)]` + per-variant `#[error(...)]` + `#[from] cowprotocol::Error`. The map_err at the call site collapses to `?`. - `modules/ethflow-watcher::BuildError`: same conversion (4 variants, one of them `#[from]`). Both modules add `thiserror = "2"` as a direct dep. - `cargo clippy --all-targets --workspace -- -D warnings` clean. - `cargo test --workspace`: 121 tests pass. - nexum-engine 41, shepherd-sdk 27, shepherd-sdk-test 8 + 1 doctest, twap-monitor 13, ethflow-watcher 7, price-alert 11, balance-tracker 13. - `#[non_exhaustive]` is *not* applied to public enums (`HostErrorKind`, `LogLevel`, `RetryAction`, `PollOutcome`). The first two mirror the WIT 0.2 enums (locked at the WIT contract layer); the last two are intentional 3- and 5-arm contracts with no expected growth. If a future kind shows up, the rule applies then. - `parse_config` / `parse_settings` in the example modules return `Result<T, String>` rather than a typed enum. The rule's "no string-wrapping" applies to error variants that *wrap* an upstream `std::error::Error`; one-shot config parsers with bespoke per-field messages are pragmatic. The error surface is internal to the module's `init` and not part of the orderbook retry contract.

Validates the host-trait pattern from the M3 tutorial end-to-end on a real module. The price-alert example now matches the recipe the tutorial recommends: modules/examples/price-alert/ ├── Cargo.toml adds shepherd-sdk-test as dev-dep └── src/ ├── lib.rs wit_bindgen::generate! + WitBindgenHost │ adapter + From conversions + Guest impl └── strategy.rs pure logic against `&impl Host` + parse_config + scale_threshold + tests Strategy logic now takes `&impl shepherd_sdk::host::Host` and never calls `nexum::host::*` free functions directly. The wit-bindgen boilerplate (WitBindgenHost struct, ChainHost / LocalStoreHost / CowApiHost / LoggingHost impls, convert_err / sdk_err_into_wit / convert_level helpers) lives in lib.rs - mechanical and identical across modules, a future declarative macro in shepherd-sdk will elide it. parse_config now returns `Result<Settings, shepherd_sdk::host:: HostError>` instead of `Result<T, String>`. Carrying the SDK error through the strategy / adapter / Guest seam means the same domain / kind / code / message / data fields surface to the operator verbatim. Tests: 16 (was 11) - all strategy tests now run against shepherd_sdk_test::MockHost rather than calling wit-bindgen directly. The 5 new ones lock the on_block behaviour end-to-end: - idle when price is on the safe side of the trigger - triggers below threshold (Direction::Below) - triggers above threshold (Direction::Above) - warns + continues on RPC timeout (no propagation into the supervisor) - warns on undecodable oracle response - respects `every_n_blocks` throttle cargo clippy --all-targets --workspace -- -D warnings clean. .wasm 210 KB (was 206 KB; +4 KB for the adapter boilerplate, which deduplicates against shepherd-sdk so future modules add no extra cost).

Closes the loop opened by BLEU-848 (tutorial). The tutorial used to walk through a stop-loss scenario but left `build_order_body` as a `todo!()` cross-referencing twap-monitor. Now: 1. `modules/examples/stop-loss/` ships as a real workspace member, shaped the same way as the price-alert refactor (BLEU-851 / PR #22): pure logic in `strategy.rs` against `&impl Host`, wit-bindgen adapter + Guest impl in `lib.rs`. 2. The strategy is complete - reads a Chainlink oracle, builds an `OrderCreation` with `Signature::PreSign` (owner pre-signs via setPreSignature on-chain ahead of the trigger; module ships zero ECDSA), dedups via `submitted:{uid}`, persists `dropped:{uid}` on permanent submit errors. 3. Tests (7 total) cover the dispatch matrix end-to-end against `shepherd_sdk_test::MockHost`: - idle_when_price_above_trigger - triggers_and_submits_once_then_dedups - permanent_submit_error_marks_dropped (+ dedup on the next block) - transient_submit_error_leaves_state_unchanged - oracle_rpc_error_is_warn_and_continue - parse_config_round_trips_settings - parse_config_rejects_missing_owner 4. `docs/tutorial-first-module.md` rewritten as a guided tour instead of inlined snippets. The tutorial now reads the real `modules/examples/stop-loss/` source top-to-bottom and explains *why* each piece is shaped the way it is - sections on the wit-bindgen adapter, the `OrderCreation` assembly with PreSign, the dedup matrix, and the test recipe against MockHost. No more `todo!()`. Numbers: - `.wasm` 304 KB optimised (release build). - 7 host tests passing; clippy clean on host + wasm32-wasip2. - Tutorial is 449 lines (was 580 with the duplicated inline code); shorter because it points at real files instead of transcribing. Stacks on PR #22 (price-alert host-trait refactor) so both modules land alongside the wit-bindgen adapter recipe the tutorial documents.

Mirrors what BLEU-851 (price-alert) and BLEU-852 (stop-loss) did for the M3 example modules. Closes the parallel M2 gap. Before: the entire dispatch pipeline (indexer / poll / submit / retry / lifecycle) lived in `lib.rs` alongside the wit-bindgen glue, calling `chain::request`, `local_store::*`, `cow_api::submit_order`, and `logging::log` directly. The 13 existing tests covered only parsers and encoders - the state machine itself was unverified in unit. After: 1. `strategy.rs` (new) - pure logic against `shepherd_sdk::host::Host`. Defines `LogView<'a>` and `BlockInfo` so the strategy stays wit-independent; exposes `on_logs` / `on_block` entry points. 2. `lib.rs` (rewritten, 665 -> 165 lines) - wit-bindgen `generate!`, `WitBindgenHost` adapter implementing all four SDK host traits, `Guest` impl that destructures `types::Event` and delegates to `strategy`. 3. Tests against `shepherd_sdk_test::MockHost` (7 new) cover the dispatch matrix that was previously hand-verified only: - `index_records_new_watch_on_conditional_order_created` - `index_overwrites_in_place_on_redelivered_log` (re-org replay guard, BLEU-826 invariant) - `poll_skips_when_next_block_gate_is_in_future` - `poll_ready_submits_order_and_persists_submitted_uid` - `submit_transient_error_leaves_state_unchanged_for_next_block` - `submit_permanent_error_drops_watch` - `poll_dont_try_again_drops_watch_and_gates` (uses a real `OrderNotValid` selector via the SDK-exported sol! interface) 4. All 13 original pure tests preserved unchanged. Total: 20 tests (was 13). Numbers: - `.wasm` 313,926 bytes (release wasm32-wasip2). - 20 tests passing; clippy clean on host + wasm32-wasip2. - 0 em-dashes in the module tree. Stacks on PR #23 (BLEU-852) so reviewers can compare strategy / lib.rs split side-by-side with the price-alert and stop-loss references.

…855) Same shape as BLEU-854 (twap-monitor / PR #24). Closes the M2-side gap on ethflow-watcher. Before: `submit_placement`, `prior_outcome`, `apply_submit_retry`, and the `submitted:` / `dropped:` / `backoff:` bookkeeping called `local_store::*` and `cow_api::submit_order` directly, with all the state-machine bits unverified in unit (only 7 decoder / encoder tests). After: 1. `strategy.rs` (new) - pure logic against `shepherd_sdk::host::Host`. `LogView<'a>` keeps the strategy wit- independent; `on_logs` is the entry point. 2. `lib.rs` (rewritten, 427 -> 157 lines) - wit-bindgen `generate!`, `WitBindgenHost` adapter, `Guest` impl that destructures `types::Event::Logs` into `LogView`s and delegates to `strategy::on_logs`. 3. Tests against `shepherd_sdk_test::MockHost` (5 new) cover the dispatch + idempotency matrix: - `placement_log_submits_order_and_persists_submitted_uid` - `redelivered_placement_is_skipped_via_submitted_uid_dedup` (PR #10 / commit c5e4d7d regression guard) - `submit_transient_error_writes_backoff_marker_and_returns` - `submit_permanent_error_persists_dropped_uid_and_clears_backoff` - `eip1271_signature_shape_round_trips_through_submit_body` (decodes the JSON body MockCowApi received and asserts `signingScheme=eip1271`, signature blob verbatim, `from` = EthFlow contract) 4. All 7 original pure tests preserved unchanged. Total: 12 tests (was 7). Numbers: - `.wasm` 281,518 bytes (release wasm32-wasip2). - 12 tests passing; clippy clean on host + wasm32-wasip2. - 0 em-dashes in the module tree. Stacks on PR #24 (BLEU-854) so reviewers can compare both M2 strategy / lib.rs splits in one stack with the M3 examples.

Pre-upstream QA pass against the M2 + M3 + M2-host-trait stacks. Two findings applied here as a single tip-level commit instead of rewriting each stacked PR (mfw78 prefers history preservation over amended PRs): 1. `cargo fmt --all` across the workspace. Bulk of the churn is in M1 `crates/nexum-engine/src/supervisor/tests.rs` (386 line diff, pre-existing drift); the rest is M2/M3 leaf modules my own recent PRs introduced. No semantic changes. 2. One em-dash slipped past the rust-idiomatic sweep in `modules/examples/price-alert/src/strategy.rs:4` (a module-level doc comment). Replaced with ASCII ` - `. Three em-dashes remain in `wit/**.wit` files, all in mfw78's M1 prose. Intentionally left alone - the rust-idiomatic skill is a Bleu-internal preference and should not rewrite his upstream authoring style. Tracked as a separate question for him in the QA sign-off report. QA matrix on this commit: - `cargo fmt --all --check`: clean - `cargo clippy --all-targets --workspace -- -D warnings`: clean - `cargo test --workspace`: 145 host tests + 1 doctest passing (twap 20, ethflow 12, balance 13, price 16, stop-loss 7, shepherd-sdk 27, shepherd-sdk-test 8, nexum-engine 41, doctest 1) - `cargo build --target wasm32-wasip2 --release -p <module>`: clean for all 5 modules. Sizes: twap-monitor 313,926 B ethflow-watcher 281,518 B stop-loss 311,290 B price-alert 215,080 B balance-tracker 101,518 B - Em-dashes in `crates/` + `modules/` + `docs/`: 0 - `warn(unused_crate_dependencies)` on every crate root: present (sdk, sdk-test, nexum-engine, twap, ethflow, price-alert, balance-tracker, stop-loss) Outstanding (deferred): - BLEU-853 / COW-1029: `#[non_exhaustive]` batch on SDK public enums (HostErrorKind, LogLevel, PollOutcome, RetryAction). Held until just before upstream cut so wit-bindgen stays bridge-able. - WIT-file em-dashes in upstream prose - ask mfw78.

Captures the result of the pre-upstream QA pass. Two non-blocking follow-ups surfaced for mfw78's call before the consolidated PR: 1. `docs/05-sdk-design.md` describes a 2-layer SDK with `nexum-sdk` + proc macros + alloy Provider + Signer that M3 did not ship. M3 actually delivered the thinner Host-trait + helpers + MockHost surface. Doc and code need to agree (either trim doc to M3 scope or expand M4/M5 to match doc). 2. No ADR captures the M3 Host trait + strategy/lib split decision. ADR-0009 candidate. Everything else is green: 145 tests + 1 doctest, clippy clean, 0 em-dashes in our code, all 5 modules build for wasm32-wasip2, warn(unused_crate_dependencies) on every crate root. The 3 WIT-file em-dashes are mfw78's M1 prose - left alone. Optional follow-ups (none gating): - balance-tracker host-trait refactor for shape consistency. - mfw78 PR description template adoption on existing PR bodies.

Addresses the two non-blocking architectural items surfaced in COW-1063's sign-off matrix before the consolidated upstream PR: (a) `docs/05-sdk-design.md` -> add a "Current implementation status (M3, 2026-06-17)" callout at the top with a per-feature table mapping every section to its actual state. The doc itself stays as the M5+ north-star (it's mfw78's design document); the callout tells readers what is shipped vs deferred so they don't read the proc-macro / Provider / Signer sections as API reference for code that exists today. Status table covers: ✅ shipped: shepherd-sdk, shepherd-sdk-test, 4-trait host surface + supertrait Host, HostError mirror, chain + cow helpers, MockHost, strategy/lib split recipe, block.timestamp in ms. ❌ deferred (M5): nexum-sdk crate split, #[nexum::module] / #[shepherd::module] proc macros, named event handlers, async fn dispatch, full alloy Provider via HostTransport, TypedState (postcard), Signer (identity), Cow typed client, MockIdentity / MockProvider / WasmTestHarness, cargo nexum CLI. (b) `docs/adr/0009-host-trait-surface.md` (new) -> captures the three coupled M3 architectural decisions: 1. Four per-capability traits (ChainHost, LocalStoreHost, CowApiHost, LoggingHost) + supertrait Host with a blanket impl. 2. SDK-side HostError mirroring the wit struct field-for-field, bridged via per-module one-liner From impls. World-neutral so shepherd-sdk-test compiles without wasm. 3. Per-module strategy.rs (pure, &impl Host) + lib.rs (wit-bindgen adapter) split, applied uniformly across price-alert, stop-loss, twap-monitor, ethflow-watcher. Considered alternatives section explicitly rejects: single fat Host trait, #[nexum::module] proc macro now (M5 work), re-exporting wit-bindgen HostError, strategy colocated with wit-bindgen adapter. Marks the COW-1029 / BLEU-853 #[non_exhaustive] batch as the follow-up that protects the field-equivalence assumption. Doc 05 and ADR-0009 cross-reference each other, so readers landing on either find the other. Both files are em-dash clean.

Previously `build-module` only compiled `-p example`, leaving the 5 production modules (twap-monitor, ethflow-watcher, price-alert, balance-tracker, stop-loss) without CI coverage on the wasm side. A wasm-build regression (broken cowprotocol feature flag, alloy version drift, no_std assumption broken) would ship to upstream review without CI catching it. This converts the job to a `matrix.module` strategy listing all 6 modules (example kept for parity) and adds a tiny "report wasm size" step so reviewers can spot size regressions in the Actions log. `fail-fast: false` so one broken module does not mask others. Verified locally: - example builds clean - twap-monitor builds clean - ethflow-watcher builds clean - price-alert builds clean - balance-tracker builds clean - stop-loss builds clean Linear: COW-1066.

…nks (COW-1069) Locks the rustdoc discipline BLEU-844 (COW-1045) introduced. CI changes (.github/workflows/ci.yml): - New `docs:` job runs `cargo doc --workspace --no-deps` with `RUSTDOCFLAGS="-D warnings"`. Any rustdoc warning (missing docs, broken intra-doc link, unresolved code reference) fails CI. Source fixes surfaced by the new gate: - `crates/nexum-engine/src/bindings.rs:8`: drop `[crate::host::impls]` intra-doc link; `impls` is `mod` (private) so rustdoc cannot resolve it. Keep the prose reference unquoted. - `crates/nexum-engine/src/manifest/mod.rs:24`: `[load]` is ambiguous (sibling `fn load` + `mod load`). Disambiguate with `[mod@load]`. - `crates/nexum-engine/src/manifest/types.rs:4`: same fix for `[super::load]` -> `[mod@super::load]`. `#![warn(missing_docs)]` is already on `crates/shepherd-sdk/src/lib.rs` (line 80) and `crates/shepherd-sdk-test/src/lib.rs` (line 59), so the new CI step locks the existing baseline rather than introducing fresh churn. Verified locally: RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps -> clean Linear: COW-1069. Stacks on COW-1066 (CI matrix).

…COW-1067) shepherd-sdk had 27 public items and 0 doctests, so renames or signature changes on the SDK surface broke silently. Adds runnable usage examples on the load-bearing public items. Doctests landed: chain::eth_call_params (encode JSON-RPC params) chain::parse_eth_call_result (decode hex result) chain::decode_revert_hex (OrderNotValid -> DontTryAgain) cow::classify_api_error (InsufficientFee -> TryNextBlock; InvalidSignature -> Drop; None -> TryNextBlock default) cow::gpv2_to_order_data (zero-receiver normalised to None) host::Host (strategy fn generic over &impl Host; hidden hand-rolled stub impl in the example so the doctest is self- contained and avoids the shepherd-sdk-test dev-dep cycle) `#![warn(missing_docs)]` already on the crate root; the new gate from COW-1069 (PR #28) enforces the rustdoc warning surface in CI. Verified locally: cargo test --doc -p shepherd-sdk -> 6 passed cargo test --workspace -> 145 host tests + 7 doctests passing cargo clippy --all-targets --workspace -> clean cargo fmt --all --check -> clean grep -rn '—' crates/shepherd-sdk/src/ -> 0 Linear: COW-1067. Stacks on COW-1066 + COW-1069.

…ules (COW-1068) Closes the M3 gap surfaced by the COW-1063 QA pass: every production module had strong MockHost coverage on its strategy logic, but none exercised the real wit-bindgen + WitBindgenHost adapter + supervisor dispatch path. Wit-bindgen / wasmtime / linker regressions could ship without any test catching them. Adds 5 integration tests in `crates/nexum-engine/src/supervisor/ tests.rs`, one per production module, modelled on the existing `e2e_supervisor_boots_example_module` shape: e2e_twap_monitor_block_dispatch e2e_ethflow_watcher_log_dispatch e2e_price_alert_block_dispatch e2e_balance_tracker_block_dispatch e2e_stop_loss_block_dispatch Each test: * Uses `module_wasm_or_skip(name)` so local runs without a fresh `cargo build --target wasm32-wasip2 --release -p <module>` are skipped rather than failing. * Boots the supervisor with the module's real `module.toml` (not a synthesised manifest), so capability declarations + subscription shapes are honest. * Dispatches a synthetic Block (block-subscribed modules) or Log (ethflow-watcher) on Sepolia chain id 11155111. * Asserts the supervisor delivered the event and the module stayed alive. Three shared helpers added next to the existing `example_wasm()` ones: module_wasm(name) / module_wasm_or_skip(name) production_module_toml(rel_path) boot_production_module(...) synthetic_sepolia_block() Asserts are intentionally minimal at this layer (dispatched == 1 / alive_count == 1). Stronger module-specific assertions (local-store keys for `submitted:{uid}`, etc.) require either hand-crafted ABI payloads or a real chain/orderbook stub - that work lives in COW-1064 (testnet integration). The MockHost coverage already exercises those state transitions per BLEU-851 / -852 / -854 / -855. Verified locally: cargo test -p nexum-engine -> 46 passed (was 41) cargo test --workspace -> 149 host tests + 6 doctests passing cargo clippy --all-targets --workspace -> clean cargo fmt --all --check -> clean grep -rn '—' crates/nexum-engine/src/supervisor/tests.rs -> 0 Linear: COW-1068. Stacks on COW-1066 + COW-1069 + COW-1067.

… boot) Wires up the M2 milestone for actual testnet exercise on Sepolia. Closes the gap "M2 is fully tested in unit + integration but has never been run against a real chain". ## New files - `engine.m2.toml` - workspace-root engine config that boots `twap-monitor` + `ethflow-watcher` against Sepolia public WS. Separate `state_dir = "./data/m2"` so it never collides with the M1 example runbook. - `docs/operations/m2-testnet-runbook.md` - 200-line runbook with 6 sections: 0. Prerequisites (rustup target, just, Sepolia RPC, faucet) 1. Smoke run (passive, observe traffic on Sepolia) 2. Round-trip run (author a TWAP via Safe + Compose + an EthFlow swap via cow.fi, watch end-to-end submission) 3. Inspecting state after a run 4. What this run does NOT prove (and which issues cover that) 5. Troubleshooting matrix 6. References (engine_config schema, ADRs, PR range) - `justfile` recipes: build-m2: cargo build both M2 wasm modules run-m2: build-m2 + build-engine + cargo run engine ## Validated locally Booted `cargo run -p nexum-engine -- --engine-config engine.m2.toml` against Sepolia public WS. Observed in ~1s wall clock: - WS provider opened against ethereum-sepolia-rpc.publicnode.com - Both manifests parsed; both capability sets resolved (logging + local-store + chain + cow-api) - Both wasm components compiled - Both `init` succeeded - `supervisor up count=2`, `supervisor ready modules=2 chains=1` - All 3 subscriptions opened cleanly: block subscription chain_id=11155111 log subscription module=twap-monitor chain_id=11155111 log subscription module=ethflow-watcher chain_id=11155111 - Clean SIGTERM shutdown The actual observed log output is captured verbatim in the runbook section 1 so future operators know what "healthy" looks like. ## Scope - The smoke half (section 1) is passive: it validates boot + subscription health without producing traffic. Useful before every round-trip. - The round-trip half (section 2) requires a Sepolia Safe + test ETH + interaction with the Compose Safe app / cow.fi UI. Cannot be automated from CI (chain-side actions need a wallet). Operator works through the steps. - What this does NOT prove is explicit in section 4: throughput / soak (COW-1031), cross-module isolation under load (COW-1064), adversarial resource exhaustion (COW-1036), security review (COW-1065). ## Not addressed - Env-var substitution in engine.toml (e.g. `${SEPOLIA_RPC}`) is not wired in the engine today; runbook documents the workaround (edit URL inline). Filing as a follow-up is out of scope here - if needed, add as an M4 nice-to-have. - `ls-dump` CLI binary referenced in section 3 does not exist yet; section explicitly says "no ls-dump bin in 0.2; proper inspector is M4 scope" and falls back to re-booting the engine on the same state_dir to inspect rows via the dispatch logs. Linear: stacks on COW-1068. No new issue created - this is documentation work supporting the existing M2 milestone, not a new deliverable.

Surfaced wiring up `engine.m3.toml` for the M3 testnet runbook: all 3 M3 example modules (price-alert, balance-tracker, stop-loss) only declare `[[subscription]] kind = "block"`, leaving `log_streams` empty. `select_all` over an empty Vec yields `None` immediately, the `tokio::select!` arm fired, and the loop hit the "log stream ended - shutting down for restart" bail before any block flowed. The engine bailed within ~50 ms of `supervisor ready`. Fix: replace each empty side with `futures::stream::pending()` so the corresponding select arm is never selected. The bail-on-None semantic still fires when a *non-empty* stream actually closes (real WebSocket drop), which is the original intent. The bug was symmetric (log-only configs would also bail) but only the block-only path is exercised by an existing module config. M2 was unaffected because both modules subscribe to at least one log. Regression test in `supervisor::tests:: run_does_not_bail_when_both_stream_kinds_are_empty`: invokes `run` with two empty `Vec`s plus a 50 ms shutdown timer; asserts `run` blocks the full 50 ms instead of returning at 0 ms. The pre-fix binary returns in <5 ms. Verified locally: cargo test -p nexum-engine -> 47 passed (was 46) just run-m3 -> 3 modules boot; first block dispatch fires all 3 strategy paths against live Sepolia (oracle read, balance polls, cow-api submit + retry classification)

… 3-module E2E) Sister doc to `docs/operations/m2-testnet-runbook.md`. Same shape, different modules. Closes the gap "M3 is unit + integration tested but has never been exercised against a real chain", same as the M2 runbook closed for M2. ## New files - `engine.m3.toml` - workspace-root engine config that boots the 3 M3 example modules (price-alert + balance-tracker + stop-loss) against Sepolia public WS. Separate `state_dir = "./data/m3"` so it never collides with M1 / M2 runbook state. - `docs/operations/m3-testnet-runbook.md` - operator runbook mirroring the M2 one: prerequisites, smoke+active run (M3 is active by default since the example modules trigger on every block), optional pre-signature setup for real stop-loss settlement, state inspection, scope boundaries, troubleshooting, references. - `justfile` recipes: `build-m3` + `run-m3`. ## Validated locally A single Sepolia block dispatch (~10 s wall clock) drove all 3 M3 strategy paths through the live testnet: - **price-alert**: `chain::request eth_call` -> Chainlink AggregatorV3Interface -> ABI decode -> `TRIGGERED answer= 174553978080 threshold=250000000000 (Below)` (Sepolia ETH/USD feed reports $1745.54, below the $2500 default threshold). - **balance-tracker**: 2 `chain::request eth_getBalance` calls (one per configured address) - SDK chain helper + multi-key local-store path. - **stop-loss**: `eth_call` oracle -> `from_signed_order_data` `OrderCreation` with `Signature::PreSign` -> `cow-api::submit- order` bytes=561 -> orderbook returns typed `TransferSimulationFailed` -> `classify_api_error` tags as retriable -> `retry on next block`. Full submit path confirmed; the orderbook rejection is the typed-retry contract working as designed (the default config's `owner = 0x70997970...` does not hold the sell token on Sepolia, so simulation correctly fails). This validates everything the SDK BLEU-840 / BLEU-841 / BLEU-851 / -852 / -854 / -855 PR series builds: Host trait surface, chain helpers, cow helpers, MockHost recipe, strategy/lib split. The same code paths that pass 145 unit tests + 6 doctests + 5 supervisor integration tests now also work against live Sepolia. ## What this validates that the M2 runbook does not M2 only exercises the orderbook submit path indirectly (through the EthFlow watcher reacting to swap.cow.fi traffic, and only when app_data is empty - documented limitation). M3 stop-loss submits proactively on every poll, so the orderbook always sees a real `OrderCreation` body even if it rejects. The typed-retry SDK contract (`classify_api_error` mapping `TransferSimulationFailed` -> `RetryAction::TryNextBlock`) is exercised end-to-end with a real orderbook response, not a fixture. ## Stacks on - `fix(event_loop)` commit immediately preceding this one - the bug surfaced wiring up `engine.m3.toml` (block-only subscriptions bailed the engine pre-fix). - PR #31 (M2 runbook) - same operator-doc shape, same conventions.

…pass Closes the gap "M3 happy path is validated on testnet but error paths are not". Five mutations of `engine.m3.toml` / `module.toml::[config]` run against the live `just run-m3` boot; each captured observed output + verdict. ## Scenarios run | # | Mutation | Observed | Verdict | |---|---|---|---| | 1.1 | engine.m3.toml: rpc_url = "wss://nonexistent.example.com" | `Error: connect chain 11155111: IO error: failed to lookup address information` + clean exit | ✅ structured + fail-fast | | 1.2 | price-alert: oracle_address = 0x...01 (EOA, no code) | `WARN price-alert: latestRoundData decode failed: ABI decoding failed: buffer overrun while deserializing` + module alive | ✅ graceful + clear error | | 1.3 | stop-loss: required = ["logging"] (dropped chain/local-store/cow-api) | `Error: load module ... capability violation in stop_loss.wasm component imports cow-api but it is not listed in [capabilities].required` | ✅ security boundary enforced | | 1.4 | price-alert: threshold = "not-a-number" | `WARN init failed module=price-alert kind=HostErrorKind::InvalidInput "threshold: non-digit character in 'not-a-number'"` + other modules unaffected | ✅ with 1 minor observation (see below) | | 1.5 | boot 1 (rm -rf data/m3) -> boot 2 (no rm) | both boots clean, redb file preserved | ✅ cross-restart persistence | ## Surfaced finding Scenario 1.4 caught a minor supervisor behaviour: init-failed modules stay `alive=true` and continue to receive dispatches. Safe in practice because all M3 example modules guard with `SETTINGS.get().is_none() -> return Ok(())`, but wastes fuel + RPC requests per block on a no-op. Filed as a follow-up issue recommending `Supervisor::load` set `alive=false` (or skip the push into `self.modules`) when `Guest::init` returns `Err(HostError)`. ## Validates - Engine error reporting: 5 distinct error paths each surface a typed error with clear domain + message. No silent failures, no panics, no infinite retry loops. - M3 SDK contract: BLEU-814 (32-byte namespace), COW-1025 (capability enforcement), BLEU-851 / -852 / -854 / -855 (typed Settings parsing via HostError) all verified on live Sepolia, not just MockHost. - Operator UX: every misconfiguration scenario produces output an operator can act on without reading source. ## Reproduce Each mutation is one line. `git checkout` to restore between runs. The full diff per scenario is inline in the doc. ## Not in scope (M4 territory) - Fuel exhaustion (COW-1036) - Module trap during on_event + supervisor restart (COW-1033 / COW-1032) - WS reconnect with backoff (current is bail + external restart; flagged in event_loop.rs as "0.3 fix") - State-dump CLI for redb inspection (M4 nice-to-have) ## Follow-up issue Filed separately: "Supervisor::load should mark module alive=false when init returns Err(HostError)". Linear MCP was unavailable at commit time; issue to be filed manually in COW project under M3 milestone.

…070) Pre-fix behaviour: `Supervisor::load` pushed every module into `self.modules` with `alive = true`, even when `Guest::init` returned `Err(HostError)`. The supervisor logged `WARN init failed` but the dispatcher still routed every block / log to the dead module, where the M3 example strategies short-circuited via `SETTINGS.get().is_none() -> return Ok(())`. Safe but wasteful, and the `supervisor up count=N` log was misleading (counted the dead module as up). Surfaced live on Sepolia by scenario 1.4 of `docs/operations/m3-edge-case-validation.md`: set `[config] threshold = "not-a-number"` in price-alert, observe init return InvalidInput, then watch the dispatcher hammer the dead module every block for 14s. ## Fix `Supervisor::load` now captures the init result into `init_succeeded: bool` and sets `LoadedModule.alive = init_succeeded`. The boot log changes from `supervisor up count=N` to `supervisor up loaded=N alive=M` so the discrepancy is loud. ## Regression test `supervisor::tests::init_failure_marks_module_dead_and_excludes_from_dispatch`: - Synthesises a manifest matching real price-alert shape but with `threshold = "not-a-number"`. - Boots the supervisor; asserts `module_count() == 1` (loaded) and `alive_count() == 0` (dead). - Dispatches a synthetic Sepolia block; asserts `dispatched == 0` (the only "subscribed" module is dead, so the dispatch fast-path skips it). ## Live validation on Sepolia (rerun of scenario 1.4 with fix) Before fix: ``` INFO supervisor up count=3 <-- includes dead module ``` After fix: ``` WARN init failed - module loaded but marked dead; dispatcher will skip it module=price-alert kind=HostErrorKind::InvalidInput INFO supervisor up loaded=3 alive=2 ``` ## Docs update `docs/operations/m3-edge-case-validation.md` scenario 1.4 verdict updated from "✅ with minor observation" to "✅; resolved in this PR series". The original observation block is replaced with a note pointing at the regression test + the new log line. ## Workspace state - `cargo test --workspace` -> 151 host tests + 6 doctests passing (was 150 + 6; +1 from the new regression test). - `cargo clippy --all-targets --workspace -- -D warnings` clean. - `cargo fmt --all --check` clean. - 0 em-dashes in changed files. Linear: COW-1070. Closes the only finding from PR #33. ## Considered alternatives **Option B** (skip pushing the init-failed module into `self.modules` entirely) would have been cleaner but requires callers of `Supervisor::load_one` to handle the "module not added" case. Option A (this PR - flip alive=false) preserves the existing API surface; the dispatch fast-path already gates on `if !alive { continue; }` so the dispatched-event count drops to 0 without any caller-side change. **Option C** (visibility only - rename the boot log) was rejected; it surfaces the discrepancy but does nothing about the per-block no-op fuel cost on the dead module.

12 review threads addressed end-to-end. Net diff is -720 lines despite adding ~200 lines of new helpers + tests, because the WitBindgenHost adapter deduplication alone wipes ~400 lines. Per-thread: #1 (balance-tracker architecture): refactored to match the M3 host-trait+adapter split the other 4 modules use. Created `strategy.rs` with `on_block(&impl Host, ...)`, moved check_one / fetch_balance / parse_balance_hex / parse_settings into it, converted parse_config to use SDK config helpers + typed HostError instead of String. Added 3 MockHost-driven tests covering first-seen-above-threshold, below-threshold-persist, and error-does-not-abort-loop. #2 + #3 (WitBindgenHost dedup): new `shepherd_sdk::bind_host_via_wit_bindgen!()` declarative macro. Single source of truth in `crates/shepherd-sdk/src/wit_bindgen_macro.rs`; the 4 trait impls + convert_err / sdk_err_into_wit / convert_level collapse to one macro invocation per module. Migrated all 5 modules (twap-monitor, ethflow-watcher, price-alert, stop-loss, balance-tracker). Each module's lib.rs lost ~80 lines. #4 (scale_decimal + config_get dup): new `shepherd_sdk::config` with `get_required`, `get_optional`, `scale_decimal`, and a typed `ConfigError` enum (host-neutral). price-alert + stop-loss consume the SDK helpers; their local duplicates were deleted. Module-level decimal-parsing tests removed (covered by 7 SDK tests + 4 proptest cases now). #5 (Chainlink dup): new `shepherd_sdk::chain::chainlink` with `read_latest_answer(host, chain_id, oracle, domain) -> Option<I256>`. Encapsulates the eth_call → parse → ABI decode flow + Warn logging. price-alert + stop-loss now call the helper; their local AggregatorV3 sol! definitions + read_oracle / on_block oracle plumbing was deleted. SDK ships with 3 StubHost tests covering happy path, host error, and garbage-hex. #6 (WIT world capability elision): added new "Capability enforcement vs. the WIT world" section to ADR-0009 documenting that price-alert + balance-tracker compile against the shepherd:cow/shepherd supertype but their manifests omit cow-api, and that boot success depends on wasm-tools' unused- import elision. Flagged as load-bearing; M5 macro hardening path documented. #7 (poll-time revert classification inert): filed COW-1082 for the host-side fix (forward structured eth_call error data into HostError.data; analogous to COW-1075 for orderbook). #8 (classify_api_error retry-default unbounded): filed COW-1083 for the rate-limit / max-retry follow-up on the backoff: marker. #9 (RetryAction::Backoff dead variant): no code change; replied to thread clarifying it is reserved API surface waiting on a richer upstream retry_hint shape (open question for mfw78). #10 (no proptest anywhere): added `proptest` to shepherd-sdk dev-dependencies. New `crates/shepherd-sdk/src/proptests.rs` with 6 properties covering eth_call_params/parse_eth_call_result round-trip, parse_eth_call_result rejection on unquoted input, config::scale_decimal round-trip + sign-preservation, U256 LE byte round-trip, and no-panic guards for decode_revert_hex + gpv2_to_order_data marker dispatch. #11 (ethflow chain capability least-privilege): moved `chain` from required to optional in `modules/ethflow-watcher/module.toml`, mirroring the M2 mirror fix already applied. #12 (ADR-0009 test-count census): dropped the "145 host tests (twap 20, ethflow 12, ...)" breakdown; kept the qualitative claim. CI is now the authoritative count. Drive-by: alloy-sol-types moved from regular to dev-dependencies in price-alert and stop-loss now that the Chainlink ABI helper is inside shepherd-sdk and the modules only use sol! in their test helpers. Validation: - cargo test --workspace: every crate green; 5 modules + SDK + sdk-test + engine all pass. 8 host tests gained on balance-tracker; 6 proptest props gained on shepherd-sdk; 3 Chainlink helper tests gained. - cargo clippy --workspace --all-targets -- -D warnings: clean. - cargo fmt --check: clean. - cargo build --target wasm32-wasip2 --release for all 5 modules: clean. - Zero em-dashes in source code added.

…ance) Filtered subset of the compliance applied in PRs #66/#67 of bleu/nullis-shepherd, restricted to files that exist on the M3 epic head. M4/M5-only files (shepherd-backtest, baseline-latency tools, etc.) are skipped, and compliance hunks that depended on M4-introduced types/functions (reconnect tasks, JoinSet plumbing in event_loop, the M4-shape `ProviderError`/Rpc variant, the supervisor restart loop, env-var substitution in engine.toml) are skipped too - they only make sense once the underlying M4 code lands. Brings the M3 epic in line with the repo-wide rust rubric in the cases that do transfer cleanly: - crates/nexum-engine/src/manifest/error.rs: swap manual Display/Error impls for `thiserror::Error` derives, mark `ParseError` `#[non_exhaustive]`, carry source via `#[from]`. - crates/nexum-engine/src/manifest/load.rs: drop the `.map_err(ParseError::Io/Toml)` call sites that the `#[from]` impls now cover, swap `eprintln!` lines for structured `tracing::{info,warn}`. - crates/nexum-engine/src/manifest/mod.rs: doc tidy-up. - crates/nexum-engine/src/host/mod.rs: tighten submodule visibility from `pub` to `pub(crate)` (no out-of-crate users). - crates/nexum-engine/src/host/error.rs + host/impls/cow_api.rs: drop the bespoke `hex_encode` helper in favour of `alloy_primitives::hex::encode_prefixed`, already a dep on M3 and used elsewhere in `shepherd-sdk`. - crates/nexum-engine/src/engine_config.rs: introduce a trimmed `EngineConfigError` (Io + Toml only - the M5 `Substitute` variant covers an env-var-substitution path that does not exist on M3) and return it from `load_or_default` instead of `anyhow::Result`. `main.rs`'s `?` still works thanks to `From<EngineConfigError> for anyhow::Error`. - crates/shepherd-sdk/src/cow/error.rs: mark `RetryAction` `#[non_exhaustive]`. - modules/{twap-monitor,examples/stop-loss,ethflow-watcher}/src/strategy.rs: add a default arm to each `match RetryAction { ... }` that now needs one, treating unknown future variants conservatively (retry on next block / leave watch in place) instead of silently dropping the watch on an SDK bump. cargo fmt + cargo clippy --all-targets -D warnings + cargo test --workspace --all-features all green on the worktree.

Squash of PR #17 - operator deployment runbook for BLEU-836.

brunota20 · 2026-06-25T19:18:26Z

Heads-up: the bleu-side branch backing this PR (`bleu:fix/supervisor-alive-on-init-err`) was force-pushed today as part of a linearisation pass on our M2->M5 base stack. Old head was `a0b0692f`; new head is `cf81d36`.

The linearisation moved this branch from being a sibling of `dev/m2-base` (both branched off the same M2-prep commit) to being a strict descendant of the post-cleanup `dev/m2-base` (= upstream PR #17 head, `2b11f913`). M2 is now a strict ancestor of M3, M3 of M4, and M4 of M5 — so atomic stacks can be reviewed in sequence without the prior parallel-merge tangle.

Diff against `main`: 1 line difference vs the prior head (a README ComposableCoW address corrected to the canonical CREATE2 address — this was already in M2 cleanup; missing on M3 only as a baseline-drift artefact). Existing review threads should re-anchor cleanly.

No content was lost in the rebase: per-commit history preserved, author identities preserved.

brunota20 · 2026-06-25T19:25:58Z

Restructured: new PR opens with head=bleu:dev/m3-base for consistent naming with M4/M5 epics (also use dev/m{N}-base). Same content, mfw hasn't reviewed yet so no review loss. See replacement PR.

brunota20 added 23 commits June 1, 2026 14:19

runtime: multi-module supervisor + block/log event loop

6f669c6

docs(adr): add 0001-0007 capturing engine and CoW architecture decisions

8af08bd

docs(adr): unwrap hard-wrapped paragraphs to single line each

3f1dbf8

docs(adr): revise CoW design and reorder ADRs (0001-0008)

7e05190

fix(docs): reviewed ADRs by bleu

67a0be7

fix(docs): revised ADRs and diagrams

e5579a3

feat(supervisor): apply ADR-0001/0003/0005/0016 and trap-based module…

ed48319

… death (BLEU-813-817)

feat(supervisor): add fuel + memory limits per module store (BLEU-818)

1570036

docs: rename nexum.toml -> module.toml in example, justfile, and READ…

32a2198

…ME (BLEU-820)

test: fill host backend test gaps — manifest parsing, cow-api, provid…

38ac8e3

…er-pool, supervisor (BLEU-821)

test: E2E supervisor tests + fix wit_import_to_cap to skip type-only …

fdd64e4

…interfaces (BLEU-819)

style: apply rust-idiomatic rules (em-dashes, #[from] Orderbook, unus…

7131282

…ed_crate_dependencies, drop redundant map_err)

chore(deps): patch cowprotocol to bleu/cow-rs main (post-alpha.3)

81ee734

Merge PR #15 (feat/cowprotocol-bleu-main-v3) into dev/m2-base

7161382

Carries PR #8 (host backends) + PR #9 (supervisor) + cowprotocol patch. Open upstream: nullislabs#15.

Merge PR #12 (docs/adr-bundle) into dev/m2-base

1eca322

Open upstream: nullislabs#12. Resolved .gitignore by taking the PR #12 additions (.agents/, .claude/, skills-lock.json) plus PR #15's data/. # Conflicts: # .gitignore

chore(deps): bump cowprotocol patch to bleu/cow-rs main (BLEU-822 + B…

5a42878

…LEU-823 in)

brunota20 requested a review from mfw78 as a code owner June 18, 2026 13:00

jeffersonBastos mentioned this pull request Jun 19, 2026

[review-mirror] M3 epic (upstream #18): SDK + examples + tutorial + QA bleu/nullis-shepherd#55

Closed

brunota20 added 2 commits June 25, 2026 14:38

brunota20 and others added 25 commits June 25, 2026 15:43

docs(deployment): operator runbook (BLEU-836) (#17)

cf81d36

Squash of PR #17 - operator deployment runbook for BLEU-836.

brunota20 force-pushed the fix/supervisor-alive-on-init-err branch from a0b0692 to cf81d36 Compare June 25, 2026 19:16

brunota20 closed this Jun 25, 2026

brunota20 mentioned this pull request Jun 25, 2026

[review-mirror] M3 epic (upstream nullislabs/shepherd#18) bleu/nullis-shepherd#86

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

M3 epic: SDK + examples + tutorial + QA validation#18

M3 epic: SDK + examples + tutorial + QA validation#18
brunota20 wants to merge 63 commits into
nullislabs:mainfrom
bleu:fix/supervisor-alive-on-init-err

brunota20 commented Jun 18, 2026 •

edited

Loading

Uh oh!

brunota20 commented Jun 24, 2026

Uh oh!

brunota20 commented Jun 24, 2026

Uh oh!

brunota20 commented Jun 24, 2026

Uh oh!

brunota20 commented Jun 25, 2026

Uh oh!

brunota20 commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

brunota20 commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

M3 epic — consolidated for review

Core deliverable

Note on diff scope

Architectural review request

Bugs surfaced + fixed during testnet wiring

Validation

Considered but deferred

Uh oh!

brunota20 commented Jun 24, 2026

Uh oh!

brunota20 commented Jun 24, 2026

Linear issues delivered by this PR

Uh oh!

brunota20 commented Jun 24, 2026

What shipped

Why trait-first

Cost

Where to look

Ask

Uh oh!

brunota20 commented Jun 25, 2026

Uh oh!

brunota20 commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

brunota20 commented Jun 18, 2026 •

edited

Loading