Skip to content

Strict action-plan schema + alias resolution in parse filter#24

Merged
jkbennitt merged 1 commit into
masterfrom
fix/strict-action-schema
Jun 10, 2026
Merged

Strict action-plan schema + alias resolution in parse filter#24
jkbennitt merged 1 commit into
masterfrom
fix/strict-action-schema

Conversation

@jkbennitt

Copy link
Copy Markdown
Member

Why

Live 2-tick Fable 5 preflights (via the new claude-code provider) exposed silent measurement gaps: 14 successful LLM calls produced zero actions and zero error events. Two distinct holes:

  1. Schema violations parsed as empty plans. Fable returned rich, valid JSON in its own invented shape (threat_assessment/defensive_actions). data.get("actions", []) swallowed the missing key — no parse error, no correction retry, confidence 0.0, and the benchmark silently measured nothing.
  2. Alias-blind action filter. Once Fable did emit actions, every entry was dropped: the ALLOWED_ACTIONS check ran on raw names before alias resolution, so set_work_priority was rejected despite our own ACTION_TYPE_ALIASES mapping it to the allowed work_priority.

What

  • Missing "actions" key → ActionPlanParseError carrying the exact expected schema (feeds the existing correction-retry prompt). Explicit actions: [] remains valid (deliberate no-op). Schema example extracted to _ACTION_PLAN_SCHEMA_EXAMPLE (single source of truth for system prompt, reminder, and error message).
  • Action filter accepts raw name or catalog alias; added create_growing_zone/create_stockpile_zone aliases observed live.
  • RESPONSE FORMAT reminder + the role's literal valid action_type list appended to every user prompt (recency position). A/B verified live: without it Fable-via-claude-p writes markdown reports; with it, compliant JSON.
  • _RAW_OUTPUT_CHARS 4096 → 16384 — Fable's completions are multi-KB and the verbatim transcripts are first-class artifacts.

Preflight evidence (live game, seed 42, 2 ticks each)

calls actions proposed executed parse errors visible
before 14 0 0 0 (silent)
strict only 2 ok + 12 fail 0 0 12 (visible)
+ reminder + aliases 14, zero retries 33 (tick 0) 11 total 0 needed

Bonus: tick 0's research_target failed ("ShieldBelt not available"), the DO-NOT-REPEAT broadcast fired, and tick 1's research_target succeeded — first live confirmation the PR #19 feedback loop changes behavior.

423 tests pass (4 new), ruff clean, mypy strict clean.

🤖 Generated with Claude Code

Live Fable 5 preflights exposed two silent measurement gaps:

- A schema-violating but valid JSON response (model-invented shape with
  no "actions" key) parsed into an empty plan with no error event — the
  correction retry never fired and a schema violation was
  indistinguishable from a deliberate no-op. Missing "actions" now
  raises ActionPlanParseError whose message carries the exact expected
  schema (the retry prompt embeds it), with the shape extracted to
  _ACTION_PLAN_SCHEMA_EXAMPLE as single source of truth.
- The per-action ALLOWED_ACTIONS filter checked raw action_type names
  before alias resolution, so set_work_priority was dropped even though
  ACTION_TYPE_ALIASES maps it to the allowed work_priority. The filter
  now accepts either the raw name or its catalog alias; added
  create_growing_zone / create_stockpile_zone aliases observed live.

Plus two prompt/observability changes informed by the same preflights:
a recency-positioned RESPONSE FORMAT reminder with the role's literal
valid action_type list at the end of every user prompt (Fable through
claude -p drifts into markdown reports when the schema sits ~8KB back
in the system prompt — A/B verified), and PROVIDER_CALL raw_output
capture bumped 4KB -> 16KB so frontier-length completions aren't
clipped.

Preflight results (2 ticks, Fable 5, live game): before — 0 actions,
silent; after — all 7 agents proposing (33 actions tick 0), 11 executed
across both ticks, zero parse retries, and the DO-NOT-REPEAT loop
visibly correcting a failed research_target between ticks.

Co-Authored-By: Claude Fable 5 <[email protected]>
@jkbennitt jkbennitt merged commit 5d361d0 into master Jun 10, 2026
3 checks passed
@jkbennitt jkbennitt deleted the fix/strict-action-schema branch June 10, 2026 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant