Add pluggable LLM API provider hook with tool calling by haileyok · Pull Request #337 · roostorg/osprey

haileyok · 2026-06-04T06:57:24Z

What

Adds a new, vendor-neutral LLM API provider plugin hook to Osprey, designed for tool calling from day one.

This is the first chunk of making LLM access pluggable in Osprey. It is deliberately upstream-only and adds a new public plugin hook, so I'd love interface feedback before this lands (opening as draft for that reason). Relates to the Jan 9 2026 working-group "pluginizing" discussion (#104).

Changes

osprey.worker.lib.llm — new BaseLLMProvider interface plus small vendor-neutral dataclasses: LLMMessage, ToolDefinition, ToolCall, ToolResult, LLMResponse, LLMUsage, CacheControl. Tool definitions go in; tool-call requests come back out; tool results are fed back in on the next chat() call. Streaming is intentionally out of scope (room left for a future chat_stream).
register_llm_provider hookspec (firstresult=True) + bootstrap_llm_provider(config) helper, mirroring the existing single-provider hooks (register_input_stream, register_execution_result_store). bootstrap_llm_provider returns None when no plugin registers one, so callers null-check.
Example provider in example_plugins (llm/anthropic_provider.py) that talks to the Anthropic Messages API directly, including tool_use request/response translation and optional prompt-cache control. Registered via register_llm_provider in example_plugins/src/register_plugins.py.
Tests: dataclass invariants + interface contract (osprey_worker/.../lib/llm/tests/test_base.py), and a full tool-call cycle against a fake Anthropic client — no SDK, API key, or network (example_plugins/src/llm/tests/test_anthropic_provider.py). example_plugins/src is added to pytest testpaths so example tests run.
Docs: hook menu table + an "LLM provider hook" section in docs/DEVELOPMENT.md.

Design notes / open questions

Tool calling shape. The dataclasses are a proposal; happy to refine names/shape. The key constraint was making tool calling first-class rather than bolted on later.
CacheControl generality. Included because a downstream consumer wants to keep Anthropic prompt-cache breakpoints (ephemeral + TTL). It's optional and vendor-neutral; if folks prefer, it could move into provider-specific **params instead.
firstresult=True. Only one LLM provider can register, matching register_input_stream. Reasonable for now; noted in the docstring.

Validation

Locally green: ruff check, ruff format --diff, mypy . (no issues), full pre-commit run --all-files (prettier skipped, as in CI), and fawltydeps --check-unused. The example provider tests pass directly; the lib/llm tests run under the docker integration harness like the rest of osprey_worker.

Introduce a vendor-neutral BaseLLMProvider interface (osprey.worker.lib.llm) designed for tool calling from day one: tool definitions in, tool-call requests out, tool results back in. Add a register_llm_provider pluggy hookspec (firstresult=True) and a bootstrap_llm_provider() helper, mirroring the existing single-provider hooks (register_input_stream, register_execution_result_store). Add an example provider in example_plugins that talks to the Anthropic Messages API directly, demonstrating the request/response and tool_use translation. The anthropic SDK is imported lazily and is not declared as a workspace dependency (it conflicts with the pinned typing-extensions), so it stays optional and CI needs no SDK, API key, or network. Includes unit tests (dataclass invariants + a full tool-call cycle against a fake Anthropic client) and docs for the new hook in docs/DEVELOPMENT.md.

coderabbitai · 2026-06-04T06:57:32Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 4b80f64a-9deb-4a09-bee1-edc1fbddd533

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch llm-provider-hook

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

The RuntimeError referenced an example_plugins[llm] extra that no longer exists (anthropic isn't a declared dependency due to the typing-extensions conflict). Point users at the manual install instead.

- Preserve system-prompt cache_control: _build_system now emits the Anthropic block form (list of text blocks with cache_control) when any system part carries a breakpoint, instead of always flattening to a string that drops it. - Avoid falsy-or traps: model/max_tokens use 'is not None' so explicit '' / 0 pass through instead of silently becoming defaults (matches temperature). - Degrade gracefully on a non-dict tool_use 'input' instead of crashing. - Extract _cache_control_dict helper (de-dupes the message/system cache mapping) and note the extended-cache-ttl beta-header requirement for non-default ttl. - Clarify bootstrap_llm_provider docstring: registered != ready (SDK/creds may be validated lazily on first chat()). - Make the missing-SDK test deterministic via monkeypatch (sys.modules) so it no longer depends on anthropic being absent now that example_plugins is in CI testpaths; add regression tests for the three fixes above.

- Module docstring no longer references the removed example_plugins[llm] extra; points to manual 'uv pip install anthropic' and the pyproject NOTE. - base.py chat() docstring clarifies that to cache the system prompt you pass a role='system' message with cache_control (the plain system= arg can't).

- Make anthropic a real dependency of example_plugins by bumping the root typing-extensions pin (==4.6.3 -> >=4.14.0; resolves to 4.15.0). Removes the 'not declared / install manually' workaround (pyproject NOTE, mypy override comment, anthropic.* mypy entry, DEVELOPMENT.md manual-install note). - Use the standard OSPREY_ prefix for config keys (OSPREY_LLM_ANTHROPIC_{API_KEY,MODEL,MAX_TOKENS}). - Default to claude-sonnet-4-6 (verified current Sonnet 4.6 API id). - Don't register an LLM provider by default; an LLM provider is optional and may be unregistered (bootstrap_llm_provider now returns None here). - base.py chat(): raise NotImplementedError instead of a no-effect '...' (CodeQL 'statement has no effect'). - Drop 'mirroring other hooks' language from the hookspec docstring; drop a stray 'e.g.' in DEVELOPMENT.md. - Tests updated for OSPREY_ keys; drop the now-obsolete missing-SDK test.

@tool

Vendor-neutral sugar on top of the provider interface, in osprey.worker.lib.llm: - tools.py: ToolParameter + a ToolRegistry.@tool decorator that compiles to the existing ToolDefinition (JSON Schema). Handlers are plain sync callables (no imposed context object); dispatch(ToolCall) runs the handler and captures unknown-tool / bad-args / handler errors as ToolResult(is_error=True) to feed back to the model. registry.definitions() feeds provider.chat(tools=...). - loop.py: run_tool_loop drives the standard exchange (chat -> dispatch tool_calls -> append role='assistant' tool-use + role='tool' results -> repeat) until the model returns a final answer, or raises ToolLoopLimitExceeded past max_iterations. Input messages are not mutated. - Exported from llm/__init__.py; documented in DEVELOPMENT.md; unit tests for schema building, registration, dispatch (success/error/unknown), and the loop (tool cycle, no-op, limit, no-mutation).

- Move the LLM provider hook + tool-calling docs out of DEVELOPMENT.md into a dedicated docs/llm.md (linked from the hook table and added to SUMMARY.md). - Trim the register_plugins note to just state an LLM provider can be registered here, dropping the rationale.

- loop.py: don't dispatch tools on the final, limit-tripping iteration — side effects whose results can never be sent back to the model are no longer run before raising ToolLoopLimitExceeded. - tools.py: serialize the handler result inside dispatch's try/except so a non-serialisable/circular return becomes is_error instead of crashing the loop; document that error text reaches the model (no secrets) and the default=None sentinel. - typing-extensions: pin exact ==4.15.0 (matches repo convention + uv.lock) instead of an unbounded floor. - anthropic_provider: correct stale cache-ttl comment (1h TTL is GA, no beta header needed). - Tests: loop does not dispatch on the final iteration; dispatch captures an unserialisable result as is_error.

haileyok

Claude:3 Code review — two-reviewer pass (standard + adversarial), iterated to clean.

Verdict: No outstanding bugs. Both reviewers independently approved the final state; all Medium findings from the first pass are fixed with regression tests.

Findings & resolutions

Severity	Where	Finding	Resolution
Medium	`lib/llm/loop.py`	`run_tool_loop` dispatched tools on the final, limit-tripping iteration and discarded the results — running side effects that can never be sent back to the model	Fixed: break before dispatching on the last allowed iteration, then raise `ToolLoopLimitExceeded`; regression test added
Medium	`lib/llm/tools.py`	`dispatch` serialized the handler result outside the `try/except`, so a non-serialisable/circular return crashed `run_tool_loop` instead of becoming `is_error`	Fixed: serialize inside the `try`; regression test added
Medium	`pyproject.toml`	`typing-extensions` was changed to an unbounded floor (`>=4.14.0`) — a shared, monorepo-wide dep, inconsistent with the exact-pin convention	Pinned exact `==4.15.0` (matches `uv.lock`)
Low	`anthropic_provider.py`	Stale comment claimed the 1h cache TTL needs a beta header	Corrected — 1h TTL is GA on the Claude API (set via the `ttl` field, no header)
Low	`lib/llm/tools.py`	Handler exception text is echoed to the model (secret-leak vector); `default=None` sentinel undocumented	Documented both
Low (accepted)	`anthropic_provider.py`	Benign lazy-client init race under gevent (last-writer-wins)	Accepted as harmless
Low (accepted)	`example_plugins/pyproject.toml`	`anthropic>=0.40.0` floor vs exact-pin convention	Left as a floor: example-plugin dep, reproducibility comes from `uv.lock`

What's good (both reviewers)

Clean vendor-neutral interface; tests pin real contracts (exact Anthropic request shape, role mapping, full tool_use→tool_result cycle, usage remap, non-dict input degradation, system cache_control block form); correct firstresult=True/None bootstrap semantics; the tool loop doesn't mutate the caller's messages and terminates correctly; well-scoped dependency/mypy changes.

All local CI gates pass: ruff, ruff format, mypy, full pre-commit, fawltydeps, and the example-provider unit tests.

_{Reviewers ran on the general model (the configured gpt-5.5 reviewer route was failing auth at review time).}

github-code-quality Bot found potential problems Jun 4, 2026

View reviewed changes

Comment thread osprey_worker/src/osprey/worker/lib/llm/base.py Fixed

haileyok added 3 commits June 4, 2026 07:15

Cleanup: correct stale anthropic-missing error message

c84d1ba

The RuntimeError referenced an example_plugins[llm] extra that no longer exists (anthropic isn't a declared dependency due to the typing-extensions conflict). Point users at the manual install instead.

github-code-quality Bot found potential problems Jun 4, 2026

View reviewed changes

Comment thread osprey_worker/src/osprey/worker/lib/llm/base.py Fixed

haileyok commented Jun 4, 2026

View reviewed changes

haileyok added 2 commits June 4, 2026 18:32

haileyok commented Jun 4, 2026

View reviewed changes

Comment thread docs/DEVELOPMENT.md Outdated

haileyok commented Jun 4, 2026

View reviewed changes

Comment thread example_plugins/src/register_plugins.py Outdated

haileyok added 2 commits June 4, 2026 19:09

haileyok commented Jun 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add pluggable LLM API provider hook with tool calling#337

Add pluggable LLM API provider hook with tool calling#337
haileyok wants to merge 8 commits into
mainfrom
llm-provider-hook

haileyok commented Jun 4, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Review skipped

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

haileyok left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

haileyok commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Changes

Design notes / open questions

Validation

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

haileyok left a comment

Choose a reason for hiding this comment

Findings & resolutions

What's good (both reviewers)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

haileyok commented Jun 4, 2026 •

edited

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading