Skip to content

Update assistants to Fable#525

Merged
josephjclark merged 4 commits into
mainfrom
model-update-fable
Jun 11, 2026
Merged

Update assistants to Fable#525
josephjclark merged 4 commits into
mainfrom
model-update-fable

Conversation

@hanna-paasivirta

@hanna-paasivirta hanna-paasivirta commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Short Description

Moves the main chat services (job_chat, workflow_chat, global_chat planner + doc_agent_chat prototype) from Sonnet to Claude Fable 5. Keeps RAG helpers, vocab_mapper, and the test judge on Sonnet.

This will increase our model costs, possibly by 5x. It is difficult to evaluate without running a lot of tests because thinking behaviour (i.e. output token volume) will vary between models and calls.

I haven't tested the effects of this update extensively. I skimmed over acceptance tests for the three chat services and tested locally with Lightning to see everything still works and doesn't seem slower. We'll be ready to rollback if there's complaints.

Fixes #524

Implementation details

  • models.py: add "claude-fable" alias and CLAUDE_FABLE constant for claude-fable-5
  • Point service configs at claude-fable: rag.yaml (model), gen_project_config.yaml, doc_agent_chat/config.yaml, global_chat/config.yaml (planner). Code fallback defaults updated to match.
  • Keep on Sonnet: job_chat RAG calls (llm_search_decision, llm_retrieval), vocab_mapper.
  • Triple max_tokens on Fable routes to absorb the new tokenizer (~30% more tokens): 16384 → 49152 (job_chat, workflow_chat, doc_agent_chat), 8192 → 24576 (planner).
  • Pass an explicit per-request timeout=httpx.Timeout(600.0, connect=5.0) on the four non-streaming messages.create calls (job_chat, workflow_chat, doc_agent_chat, planner). Required: the SDK rejects non-streaming requests with max_tokens > ~21k unless a timeout is given. Values match the SDK default, so no behaviour change.
  • Planner effort switched from high to medium.
  • Remove dead temperature config: unread keys in doc_agent_chat, workflow_chat, and planner configs, plus the planner's unused self.temperature. Live temperature=0 settings (RAG, vocab_mapper, router) are unchanged and stay on Sonnet/Haiku, which accept it.

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to know!):

  • Code generation (copilot but not intellisense)
  • Learning or fact checking
  • Strategy / design
  • Optimisation / refactoring
  • Translation / spellchecking / doc gen
  • Other
  • I have not used AI

You can read more details in our Responsible AI Policy

@hanna-paasivirta

hanna-paasivirta commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

@josephjclark Does billing need to be adjusted for users?

@hanna-paasivirta hanna-paasivirta marked this pull request as ready for review June 11, 2026 13:16

@josephjclark josephjclark left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running the acceptance tests and it's certainly working!

Very hard to get a handle on better or worse. I think I'll release and smoke test staging.

We'll have to look back in a couple of weeks and assess whether the extra cost is worthwhile

@josephjclark josephjclark merged commit e7f0dea into main Jun 11, 2026
2 checks passed
@josephjclark josephjclark deleted the model-update-fable branch June 11, 2026 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update assistants to use Fable

2 participants