Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions apps/supervisor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,11 +81,13 @@ The KISS watcher script runs one deterministic supervisor invocation per child o
node scripts/ickb-supervisor-loop.mjs --max-runs 1 --stable-limit 2 --backoff-seconds 0 -- --scenario standard-cycle --max-cycles 1
```

Loop-owned options go before `--`; supervisor options go after `--`. If using `pnpm live:supervisor:loop`, keep loop-owned options before the first `--` so they are not passed through to the supervisor. The loop stops on supervisor nonzero exit, incident artifacts listed in `summary.json`, tx-creating outcomes or tx hashes for tx-creating outcomes, a new outcome after the first run, repeated no-progress signatures, or `--max-runs`.
Loop-owned options go before `--`; supervisor options go after `--`. If using `pnpm live:supervisor:loop`, keep loop-owned options before the first `--` so they are not passed through to the supervisor. The loop stops on supervisor nonzero exit, incident artifacts listed in `summary.json`, tx-creating outcomes or tx hashes for tx-creating outcomes, a new outcome after the first run, repeated no-progress signatures, or `--max-runs`. `-- --help` and `-- -h` are child help passthroughs: the delegated help is printed and the wrapper exits with the child status.

The external loop also has a loop-owned `--child-timeout-seconds` guard for the supervisor child process. Keep it long enough for the whole delegated supervisor run, including actor preflights and actor commands, not just one `--command-timeout-seconds` window. The dynamic loop defaults this guard to the supervisor-loop default so the supervisor keeps ownership of killing funded actor process groups on command timeout.

For continuous tester-bot matching, use `node scripts/ickb-supervisor-dynamic-loop.mjs` or `pnpm live:supervisor:dynamic-loop`. This remains outside `apps/supervisor`: it reads tester preflight balance summaries, chooses a currently fundable tester scenario, and delegates each bounded chunk to `scripts/ickb-supervisor-loop.mjs`. When `--target-outcome tester_fresh_order_skip` is passed through, supervisor auto-planning can choose `tester-fresh-skip-two-pass`; the dynamic loop itself only chooses fundable tester stimuli.
For continuous tester-bot matching, use `node scripts/ickb-supervisor-dynamic-loop.mjs` or `pnpm live:supervisor:dynamic-loop`. This remains outside `apps/supervisor`: it reads tester preflight balance summaries, chooses a currently fundable tester scenario, and delegates each bounded chunk to `scripts/ickb-supervisor-loop.mjs`. When `--target-outcome tester_fresh_order_skip` is passed through, supervisor auto-planning can choose `tester-fresh-skip-two-pass`; the dynamic loop itself only chooses fundable tester stimuli. The dynamic loop also treats `-- --help` and `-- -h` as child help passthroughs and exits with the delegated status.

Loop and dynamic-loop exit codes are operator-visible control flow: tx/new-outcome stops exit `0`, incidents exit `2`, `max_runs` and `stable_no_progress` inspection stops exit `3`, and child nonzero statuses are preserved.

Dynamic loop sessions are live-validation artifacts, separate from production bot-only logs. They default to ignored `log/validation/dynamic-<time>-<pid>/`; override the root with `--log-root <path>` or pin one run with `--session-root <path>`. The session root must be exactly `<log-root>/validation/<session>`, stay under the resolved log root, avoid symlinked parents, and not already exist. The dynamic loop derives its chunk timeout from the delegated supervisor-loop child timeout, chunk run count, and chunk backoff unless `--chunk-timeout-seconds` is explicitly set high enough, so an outer chunk timeout does not kill the supervisor-loop process before it can enforce its child cleanup boundary.

Expand Down
175 changes: 175 additions & 0 deletions apps/supervisor/src/index.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1496,6 +1496,7 @@ describe("classification", () => {
outcome: "bot_match_committed",
terminal: false,
actions: { matchedOrders: 1, deposits: 0 },
txHashes: [txHash("44")],
});
});

Expand All @@ -1516,6 +1517,7 @@ describe("classification", () => {
expect(classifyActorResult("bot", commandResult("bot", stdout))).toMatchObject({
outcome: "terminal_chain_rejection",
terminal: true,
txHashes: [txHash("46")],
});
});

Expand Down Expand Up @@ -1726,6 +1728,87 @@ describe("classification", () => {
});
});

it("preserves accepted tx hashes in generic early classifications", () => {
expect(classifyActorResult("tester", {
...commandResult("tester", JSON.stringify({ txHash: txHash("50") })),
timedOut: true,
})).toMatchObject({
outcome: "command_timeout",
txHashes: [txHash("50")],
});
expect(classifyActorResult("bot", {
...commandResult("bot", JSON.stringify(botEvent("bot.transaction.committed", { txHash: txHash("51") }))),
spawnError: "ENOENT",
status: null,
})).toMatchObject({
outcome: "nonzero_exit",
txHashes: [txHash("51")],
});
expect(classifyActorResult("bot", {
...commandResult("bot", JSON.stringify(botEvent("bot.transaction.committed", { txHash: txHash("52") }))),
stdoutTruncated: true,
})).toMatchObject({
outcome: "malformed_evidence",
txHashes: [txHash("52")],
});
expect(classifyActorResult("tester", commandResult("tester", [
JSON.stringify({ txHash: txHash("53") }),
"{not-json}",
].join("\n")))).toMatchObject({
outcome: "malformed_evidence",
txHashes: [txHash("53")],
});
});

it("preserves accepted preflight tx hashes in generic early classifications", () => {
expect(classifyActorResult("preflight", {
...commandResult("preflight", JSON.stringify({ txHash: txHash("54"), bounded: true, maxIterations: 1 }, null, 2)),
timedOut: true,
})).toMatchObject({
outcome: "command_timeout",
txHashes: [txHash("54")],
});
});

it("does not preserve conflicted tx hashes in generic early classifications", () => {
expect(classifyActorResult("tester", {
...commandResult("tester", JSON.stringify({ txHash: txHash("56"), error: { txHash: txHash("57") } })),
timedOut: true,
})).toMatchObject({
outcome: "command_timeout",
txHashes: [],
});
});

it("rejects bot post-broadcast failures with mismatched tx hash evidence", () => {
const stdout = JSON.stringify(botEvent("bot.transaction.failed", {
outcome: "post_broadcast_unresolved",
txHash: txHash("58"),
error: { txHash: txHash("59") },
}));

expect(classifyActorResult("bot", commandResult("bot", stdout))).toMatchObject({
outcome: "malformed_evidence",
terminal: true,
reason: "bot post-broadcast transaction failure evidence contained mismatched tx hashes",
txHashes: [],
});
});

it("rejects tester skips with mismatched tx hash evidence", () => {
const stdout = JSON.stringify({
txHash: txHash("5a"),
skip: { reason: "fresh-matchable-order", txHash: txHash("5b") },
});

expect(classifyActorResult("tester", commandResult("tester", stdout))).toMatchObject({
outcome: "malformed_evidence",
terminal: true,
reason: "tester skip evidence contained mismatched tx hashes",
txHashes: [],
});
});

it("keeps tester confirmation timeouts classified by safety evidence despite exit code 2", () => {
const result = {
...commandResult("tester", JSON.stringify({
Expand All @@ -1747,6 +1830,49 @@ describe("classification", () => {
});
});

it("counts matching top-level and nested tester transaction failure tx hashes once", () => {
const result = {
...commandResult("tester", JSON.stringify({
txHash: txHash("ae"),
error: {
name: "TransactionConfirmationError",
message: "Transaction confirmation timed out",
txHash: txHash("ae"),
isTimeout: true,
},
})),
status: 2,
};

expect(classifyActorResult("tester", result)).toMatchObject({
outcome: "confirmation_timeout",
terminal: true,
txHashes: [txHash("ae")],
});
});

it("rejects mismatched top-level and nested tester transaction failure tx hashes", () => {
const result = {
...commandResult("tester", JSON.stringify({
txHash: txHash("ae"),
error: {
name: "TransactionConfirmationError",
message: "Transaction confirmation timed out",
txHash: txHash("af"),
isTimeout: true,
},
})),
status: 2,
};

expect(classifyActorResult("tester", result)).toMatchObject({
outcome: "malformed_evidence",
terminal: true,
reason: "tester transaction failure evidence contained mismatched tx hashes",
txHashes: [],
});
});

it("classifies serialized tester post-broadcast unresolved failures", () => {
const result = {
...commandResult("tester", JSON.stringify({
Expand Down Expand Up @@ -3015,6 +3141,55 @@ describe("deterministic incident handling", () => {
});
});

it("counts matching top-level and nested transaction hashes once in summaries", async () => {
const writes = new Map<string, string>();
const args = parseArgs([
"--bot-config", "config/bot-testnet.json",
"--tester-config", "config/tester-testnet.json",
"--out-dir", "logs/live-supervisor/dedup-tx-hash-summary-test",
"--scenario", "bot-only",
"--stop-after-tx-count", "1",
"--max-cycles", "1",
]);
const plan = resolvePlan(args, "/repo", { spawnSyncCommand: ignoredChecker(true) });

const exitCode = await supervise(args, plan, {
skipBuiltRuntimeCheck: true,
spawnCommand: ((_command: string, commandArgs: string[]) => isPreflightCommand(commandArgs) ? fakeSuccessfulPreflightChild() : fakeChild([
JSON.stringify(botEvent("bot.transaction.built", {
actions: { collectedOrders: 0, completedDeposits: 0, matchedOrders: 1, deposits: 0, withdrawalRequests: 0, withdrawals: 0 },
})),
JSON.stringify(botEvent("bot.transaction.committed", {
txHash: txHash("5c"),
error: { txHash: txHash("5c") },
})),
].join("\n"))) as never,
spawnSyncCommand: ignoredChecker(true) as never,
stat: missingStat,
mkdir: () => Promise.resolve(undefined),
appendFile: (path, text) => {
const key = pathToString(path);
writes.set(key, `${writes.get(key) ?? ""}${textToString(text)}`);
return Promise.resolve();
},
writeFile: (path, text) => {
writes.set(pathToString(path), textToString(text));
return Promise.resolve();
},
});

expect(exitCode).toBe(0);
const summary = jsonArtifact(writes, "/repo/logs/live-supervisor/dedup-tx-hash-summary-test/summary.json");
expect(summary).toMatchObject({
stopped: "stop_after_tx_count",
txCreatingTxHashCount: 1,
txCreatingOutcomeCount: 1,
});
expect(recordAt(summary["txHashesByOutcome"], "summary tx hashes")).toEqual({
bot_match_committed: [txHash("5c")],
});
});

});

function commandResult(actor: "bot" | "tester" | "preflight", stdout: string): CommandResult {
Expand Down
Loading
Loading