Skip to content

Fix run-integrity issues #25/#26/#27 from the Fable 5 live run#28

Merged
jkbennitt merged 1 commit into
masterfrom
fix/run-integrity
Jun 10, 2026
Merged

Fix run-integrity issues #25/#26/#27 from the Fable 5 live run#28
jkbennitt merged 1 commit into
masterfrom
fix/run-integrity

Conversation

@jkbennitt

Copy link
Copy Markdown
Member

Closes #25, closes #26, closes #27. All three were found via the Fable 5 live run's artifacts (results/fable5-live-N1).

#25 — threat_response unwinnable

  • first_draft_tick is now actually written: successful draft executions record per-threat response delay in loop ticks (_record_draft_response); threats appearing while colonists are already drafted record an instant (0-tick) response.
  • Null incident placeholders (enemy_count=0 && threat_level=0.0) no longer enter threats_seen — the live run's "threat" was a null /incidents entry, and DefenseCommander's refusal to draft was correct.
  • SCORING_VERSION 1.0 → 1.1: v1.0 scores with non-empty threats_seen are not comparable (the Fable run's 0.754 ex-artifact recomputes to ~0.834).

#26 — wandering shelter

First successful get_terrain_summary is pinned for the client lifetime. Terrain is static; re-deriving the colony center from live pawn positions made the verified sites chase the builders (10 blueprints, 10 sites, 0 shelters, mood 0.52→0.36).

#27 — pawn-targeted writes 1/13

  • work_priority accepts the three shapes models emit ({"<WorkType>": n} documented, {"work_type": X, "priority": n}, {"work_priorities": {...}}) — previously the second shape posted literal work="work_type", priority="Research" to RIMAPI, which is what the "null-ref" actually was.
  • job_assign accepts the job alias and maps x/z to target_position; an absent job name now fails visibly (ActionOutcome error) instead of posting an empty JobDef.

Observability

ActionOutcome + ACTION_EXEC events now carry action parameters (root-causing #27 required exhuming payloads from raw LLM text).

Verification

  • 440 tests pass (17 new across executor normalization, threat tracking, terrain pinning), ruff clean, mypy strict clean.
  • Live 2-tick preflight pending RimWorld being loaded — will report results on this PR before merge.

🤖 Generated with Claude Code

#25 — threat_response was unwinnable: first_draft_tick was never
written anywhere, so the metric zeroed permanently once any threat
registered, and null incident placeholders (enemy_count=0,
threat_level=0.0) counted as threats. Draft-action execution now
records per-threat response delays (instant if already drafted when
the threat appears), placeholders are filtered, and SCORING_VERSION
bumps to 1.1 — 1.0 scores with non-empty threats_seen are not
comparable.

#26 — MAP_SUMMARY sites chased the builders: the colony center was
re-derived from live pawn positions every tick, invalidating the
previous shelter blueprint (10 sites in 10 ticks, none built). The
first successful terrain summary is now pinned for the client's
lifetime; terrain is static.

#27 — pawn-targeted writes were 1/13: work_priority params passed
verbatim posted garbage (work="work_type", priority="Research") and
job_assign defaulted a missing job_def to an empty string. Both
handlers now normalize the shapes models actually emit, and
unnormalizable params fail visibly through ActionOutcome instead of
posting junk.

Observability: ActionOutcome (and ACTION_EXEC events) now carry the
action parameters — debugging #27 required digging payloads out of
raw LLM output because events omitted them.

Also includes the launch-chart/quote-card generation script for the
Fable 5 run artifacts.

Co-Authored-By: Claude Fable 5 <[email protected]>
@jkbennitt

Copy link
Copy Markdown
Member Author

Live 2-tick preflight complete (results/fable5-preflight5, Fable 5, seed 42). All five verification targets pass:

Target Before (live-N1) After
threat_response with no hostiles 0.0 (phantom null incident) 1.000
Pawn-targeted write success 1/13 3/3 (both work_type and nested work_priorities shapes normalized; bed_rest also landing)
Shelter site across ticks new site every tick (133,138) both ticks — pin holds
ACTION_EXEC parameters absent present — and immediately useful: both remaining failures are fully diagnosable from the event alone
scoring_version 1.0 1.1

Composite 0.859 over 2 ticks, 14/14 deliberations parsed, zero retries, $0 on subscription.

Two new visible (non-silent) failures surfaced by the params observability, neither in scope here: stockpile_zone priority sent as the string "Important", and a t1 growing_zone rejected with "Invalid plant definition: Plant_Rice" while t0's 'plant': 'rice' succeeded — RIMAPI def-name quirk worth a small follow-up issue.

@jkbennitt jkbennitt merged commit 34cb377 into master Jun 10, 2026
3 checks passed
@jkbennitt jkbennitt deleted the fix/run-integrity branch June 10, 2026 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant