Skip to content

feat: per-turn tool retrieval (trim tool-definition context flood)#858

Merged
anandgupta42 merged 6 commits into
mainfrom
feat/inference-stack
Jun 10, 2026
Merged

feat: per-turn tool retrieval (trim tool-definition context flood)#858
anandgupta42 merged 6 commits into
mainfrom
feat/inference-stack

Conversation

@anandgupta42

@anandgupta42 anandgupta42 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Adds a flag-gated (ALTIMATE_TOOL_RETRIEVAL, default OFF) per-turn tool subset: an always-on core set plus a lexically-ranked top-k of the remaining tools, never dropping a tool already referenced mid-trajectory. This trims the ~78-tool definition flood that hurts tool selection and inflates input tokens. v1 is lexical, dependency-free, and deterministic. Wired into session/llm.ts (marker-wrapped); a no-op when the flag is unset.

Previously this PR also carried constrained decoding and a critic gate. Both were split out so each lands on its own merits:

Validation (measured)

Tool retrieval — A/B, 8 ADE-bench dbt tasks, deepseek-v4-flash, value-graded (output checked for correctness, not just "builds"):

Arm Resolved Input tokens Cost
ALTIMATE_TOOL_RETRIEVAL=1 4/8 238k $0.024
baseline (all ~78 tools) 4/8 474k $0.048

identical resolve-rate (0 tasks differ), −50% input tokens, −49% cost.

Type of change

  • New feature (non-breaking change which adds functionality)

Issue for this PR

Closes #857

How did you verify your code works?

  • 5 unit tests (test/tool/retrieval.test.ts) — selection logic incl. always-on core + referenced-tool retention; all green.
  • Tool-retrieval A/B (above) — value-graded on real dbt tasks: −50% input tokens / −49% cost at identical resolve-rate.
  • tsgo typecheck clean; altimate_change markers in session/llm.ts balanced (7/7); default-off so the non-flagged path is unchanged.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have added tests that prove my feature works
  • New and existing unit tests pass locally with my changes

Summary by CodeRabbit

  • New Features

    • LLM now uses an optional tool-retrieval mechanism to trim the active toolset per query (disabled by default; enable via env).
    • Retrieval always preserves core tools and any in-flight referenced tools while selecting additional relevant tools.
  • Documentation

    • Added guide describing tool retrieval behavior and configuration.
  • Tests

    • Added tests covering selection behavior, preserved tools, edge cases, and enablement toggling.

…rieval, critic gate)

Three flag-gated, independent, default-off reliability features for the agent loop:

- `provider/constrained.ts` — grammar/JSON-Schema constrained decoding so a local
  model (vLLM/LM Studio/llama.cpp) is forced to emit a parseable, schema-correct
  tool call. Pure (schema in → payload out).
- `tool/retrieval.ts` — per-turn tool subset (always-on core + lexical top-k), never
  dropping a tool referenced mid-trajectory; trims the ~78-tool context flood. v1
  lexical, dependency-free, deterministic.
- `tool/critic.ts` — pre-execution gate for side-effecting tools via a pluggable
  `Verifier`; default allows everything (ungated), a real verifier is injected.

Wired flag-gated into `session/llm.ts` (markers; default off → upstream path unchanged).
18 unit tests; typecheck clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a deterministic Retrieval module that selects a per-turn top‑k subset of tools (always keeps CORE and in‑flight referenced tools) and wires it into LLM.stream behind the ALTIMATE_TOOL_RETRIEVAL flag to prune the active tool map based on the last user query.

Changes

Tool Retrieval and LLM Integration

Layer / File(s) Summary
Retrieval module implementation
packages/opencode/src/tool/retrieval.ts
Exports Retrieval namespace with CORE, Tool/Options types, enabled(), a lexical score() helper, and select() that deterministically returns a Set of tool names to keep (preserving CORE and keep, stable tie-break, min-tools pass-through).
LLM Stream Integration
packages/opencode/src/session/llm.ts
Imports and conditionally applies Retrieval in LLM.stream. Extracts a textual query from the last user message (handles string and array), builds candidate {name,description} list, calls Retrieval.select(..., {keep: referencedTools}), and removes non-selected tools from the active tool map while retaining "invalid".
Tests for selection and flag behavior
packages/opencode/test/tool/retrieval.test.ts
Adds tests verifying CORE retention, lexical relevance selection, honoring keep (in‑flight) tools, small-list pass‑through, token-length matching, topk semantics with CORE/keep, and Retrieval.enabled() env-flag toggling.
Docs: Tool Retrieval
docs/docs/configure/tools/config.md
Documents the ALTIMATE_TOOL_RETRIEVAL=1 flag, always-exposed CORE tools, preservation of referenced/in‑flight tools, and deterministic lexical selection rules.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

needs-review:blocked

Poem

A rabbit sniffs the tool-filled glen,
Hops to the words the user penned,
Keeps the core and friends in sight,
Prunes the rest with lexical light,
Hooray for tidy tool-time, hop! 🐇

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is comprehensive with all required sections: summary of changes, validation metrics, type of change, issue reference, and verification details. However, it lacks the required 'PINEAPPLE' marker at the very top, which is mandatory per the template. Add 'PINEAPPLE' at the very beginning of the PR description before any other content, as required by the template for AI-generated contributions.
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: implementing per-turn tool retrieval to reduce context overhead from tool definitions.
Linked Issues check ✅ Passed The PR successfully implements the core tool retrieval requirements from #857: flag-gated feature (default OFF), deterministic lexical subset selection, core tool preservation, mid-trajectory tool retention, and comprehensive unit tests. Code changes match the stated objectives and validation metrics are provided.
Out of Scope Changes check ✅ Passed All code changes are directly scoped to the tool retrieval feature: new retrieval module, LLM stream integration, comprehensive tests, and documentation. The PR correctly split out constrained decoding and critic gate features into separate efforts, keeping this PR focused.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/inference-stack

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

The `ALTIMATE_CRITIC_GATE` flag was a no-op — `tool/critic.ts` was never
imported into the execute path, so enabling the flag did nothing. Removing
it from this PR so every shipped flag is actually wired:

- `ALTIMATE_TOOL_RETRIEVAL` — wired in `session/llm.ts`, validated (-50% input tokens at equal resolve)
- `ALTIMATE_CONSTRAINED_TOOLCALLS` — wired in `session/llm.ts` (local providers)

The critic gate (pre-execution `Verifier` for side-effecting tools) moves to
a follow-up that wires it into the `session/prompt.ts` execute wrapper with an
integration test. Code preserved on `feat/critic-gate`.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/opencode/src/tool/retrieval.ts (1)

55-76: ⚡ Quick win

topk budget is mostly consumed by CORE.

keep (CORE + caller keep) counts against topk, and CORE alone is 10 entries. With the default topk: 12, an enabled retrieval pass exposes only ~2 lexically-ranked non-core tools out of ~78 — which may starve the selection of task-relevant tools and undercut the feature's intent. Consider treating topk as a budget for retrieved tools beyond core/keep, or raising the default.

♻️ One option: budget retrieved tools separately from core/keep
-    for (const r of ranked) {
-      if (keep.size >= topk) break
-      keep.add(r.name)
-    }
+    const limit = keep.size + topk // topk additional tools beyond core + forced-keep
+    for (const r of ranked) {
+      if (keep.size >= limit) break
+      keep.add(r.name)
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/src/tool/retrieval.ts` around lines 55 - 76, The current
select function lets CORE and caller-provided opts.keep consume the topk budget,
so CORE (10 items) leaves almost no room for lexically-ranked tools; adjust
select so topk (and minToolsToRetrieve behavior) applies only to additional
retrieved tools beyond CORE/keep: compute baseKeep from CORE and opts.keep, then
compute retrievedBudget = (opts.topk ?? 12) - baseKeep.size (clamped to >=0) or,
better, treat topk as the number of non-core tools directly (e.g., retrievedTopk
= opts.topk ?? 12) and only count items added from ranked into that
retrievedTopk; update the loop that iterates over ranked (and any minToRetrieve
logic) to stop after retrievedBudget/retrievedTopk is filled while still always
including all CORE/keep entries; reference function select, variables
topk/minToRetrieve, CORE, keep, rest, ranked, and score to locate and modify the
selection and stopping logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/opencode/src/tool/retrieval.ts`:
- Around line 17-20: Retrieval.CORE contains "ls" which doesn't match any
registered Tool id, so update the CORE array in retrieval.ts: either replace
"ls" with the actual registered tool id used when defining the listing tool
(e.g., the id passed to Tool.define for the filesystem/ls-like tool) or remove
"ls" from Retrieval.CORE; ensure the change aligns with the select() behavior
that only keeps CORE entries present in the candidate set so the tool id in
Retrieval.CORE must exactly match a Tool.define(...) id.

---

Nitpick comments:
In `@packages/opencode/src/tool/retrieval.ts`:
- Around line 55-76: The current select function lets CORE and caller-provided
opts.keep consume the topk budget, so CORE (10 items) leaves almost no room for
lexically-ranked tools; adjust select so topk (and minToolsToRetrieve behavior)
applies only to additional retrieved tools beyond CORE/keep: compute baseKeep
from CORE and opts.keep, then compute retrievedBudget = (opts.topk ?? 12) -
baseKeep.size (clamped to >=0) or, better, treat topk as the number of non-core
tools directly (e.g., retrievedTopk = opts.topk ?? 12) and only count items
added from ranked into that retrievedTopk; update the loop that iterates over
ranked (and any minToRetrieve logic) to stop after retrievedBudget/retrievedTopk
is filled while still always including all CORE/keep entries; reference function
select, variables topk/minToRetrieve, CORE, keep, rest, ranked, and score to
locate and modify the selection and stopping logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 09c66980-6898-48d1-8b44-a89331a61b52

📥 Commits

Reviewing files that changed from the base of the PR and between a490bd4 and e6b70c6.

📒 Files selected for processing (5)
  • packages/opencode/src/provider/constrained.ts
  • packages/opencode/src/session/llm.ts
  • packages/opencode/src/tool/retrieval.ts
  • packages/opencode/test/provider/constrained.test.ts
  • packages/opencode/test/tool/retrieval.test.ts

Comment thread packages/opencode/src/tool/retrieval.ts

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 5 files

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread packages/opencode/src/tool/retrieval.ts Outdated
Comment thread packages/opencode/src/provider/constrained.ts Outdated
Comment thread packages/opencode/src/tool/retrieval.ts Outdated
Constrained tool-call decoding is local-providers-only and has no validation
run behind it yet (the A/B that justifies this PR measured tool retrieval).
Removing it so the validated retrieval lever can land clean; constrained moves
to its own branch/PR pending a local vLLM guided-decoding run.

- remove `provider/constrained.ts` + its test
- remove the constrained wiring + import from `session/llm.ts` (retrieval stays)

#858 is now retrieval-only.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@anandgupta42 anandgupta42 changed the title feat: inference-time tool-call reliability (constrained decoding, retrieval, critic gate) feat: per-turn tool retrieval (trim tool-definition context flood) Jun 1, 2026

@dev-punia-altimate dev-punia-altimate left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-Persona Review — Verdict: block

The PR contains a critical syntax error that will cause runtime failure, and lacks proper error handling for destructive database operations. While the performance optimization is well-researched and aligned with best practices, the code as written will break production migrations.

15/15 agents completed · 236s · 9 findings (1 critical, 2 high, 1 medium)

Critical

  • [code-reviewer] Incomplete string literal: 'workspace_id, endpoint_' is cut off mid-line, likely due to truncation in the diff. This will cause a SyntaxError when executed. → alembic_tenant_clickhouse/migrations/versions/2026_06_01_0030_databricks_reorder_clickhouse_sort_keys.py:358
    • 💡 Complete the column name after 'endpoint_' — likely 'endpoint_id' or similar — to form a valid tuple element in the leading columns list.

High

  • [tech-lead] Migration script directly executes raw SQL via SQLAlchemy text() without abstraction or validation layer, making it brittle and hard to test. → alembic_tenant_clickhouse/migrations/versions/2026_06_01_0030_databricks_reorder_clickhouse_sort_keys.py:150
    • 💡 Extract table rebuild logic into a reusable utility in app/utils/clickhouse_migrator.py with input validation and logging, then call from migration. This improves testability and maintainability.
  • [tech-lead] No explicit error handling around EXCHANGE TABLES or DROP operations; failure could leave orphaned _new tables or corrupt state. → alembic_tenant_clickhouse/migrations/versions/2026_06_01_0030_databricks_reorder_clickhouse_sort_keys.py:280
    • 💡 Wrap table exchange and drop in try/except blocks with rollback logic and explicit logging of state before/after critical operations.

Medium

  • [tech-lead] TABLE_PLANS dictionary contains 27 entries with inline string tuples; hard to validate and error-prone if column names change. → alembic_tenant_clickhouse/migrations/versions/2026_06_01_0030_databricks_reorder_clickhouse_sort_keys.py:55
    • 💡 Define a dataclass or named tuple for TablePlan and validate column names against system.tables during migration startup to catch typos early.

Multi-Persona Review · vllm:qwen3-next-80b (waves) + vllm-fallback (synth) ·

anandgupta42 added a commit that referenced this pull request Jun 4, 2026
)

* fix: two tests flaky under parallel CI load (S27 + trace snapshot)

Both pass locally but fail consistently in CI's heavy parallel run (9474
tests / 378 files) — the repo's "no flaky tests under resource contention"
case. Neither is caused by any feature change; they fail identically on
unrelated PRs (#854/#858/#863), blocking all of them.

- `real-tool-simulation` S27: the progressive-suggestion dedup state is a
  module-global Set. The test's `beforeEach` reset used a dynamic
  `await import`, which under parallel CI can resolve to a different module
  instance than the tool's static import — so the real Set is never reset and
  accumulates `sql_analyze` from S25/S26 → S27 sees no suggestion. Fix: import
  `PostConnectSuggestions` statically (same instance the tools use); reset in
  S27 too.
- `tracing-adversarial-snapshot` "shows 'running' status": waited a fixed 50ms
  for a debounced async snapshot write, too short under CI load → read a stale
  snapshot. Fix: poll the on-disk status until expected (timeout 4s) instead
  of a fixed sleep.

Closes #879

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

* fix: raise CI test timeout 30s→90s to kill resource-contention flakiness

The "TypeScript" job runs all 9500+ tests in one parallel bun process. Under
CPU contention a few slower tests (real fs/spawn/git-bootstrap) get starved and
exceed the 30s per-test timeout NON-deterministically — different tests each run
(observed: 32s and 51s timeouts). This blocks every PR with failures unrelated
to the diff. 90s gives ~3x headroom over the worst observed, removing the
flakiness without masking genuinely-hung tests.

Part of #879.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <[email protected]>

@dev-punia-altimate dev-punia-altimate left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Code Review — OpenCodeReview (Gemini) — 3 finding(s)

  • 3 anchored to a line (posted inline when the comment stream is on)
  • 0 without a line anchor
All findings (full text)

1. packages/opencode/src/tool/retrieval.ts (L40-L42)

[🟠 MEDIUM] The regular expression /[a-z_]+/g only matches alphabetic characters and underscores. If tool names or query terms contain numbers or hyphens (e.g., dbt-schema-verify, v2, read-file), they will be incorrectly split or ignored entirely. This will also cause the direct name mention boost (words.has(t.name.toLowerCase())) to silently fail for any tools with numbers or hyphens in their names, because the words set won't contain the full name.

Consider including numbers and hyphens in the regular expression.

Suggested change:

  function score(query: string, t: Tool): number {
    const words = new Set(query.toLowerCase().match(/[a-z0-9_-]+/g) ?? [])
    const hay = (t.name + " " + (t.description ?? "")).toLowerCase()

2. packages/opencode/src/tool/retrieval.ts (L55-L57)

[🟠 MEDIUM] The default topk limit is set to 12. Since the CORE tools list already contains 10 items, this default only leaves room for a maximum of 2 dynamically retrieved tools per turn (and possibly zero if opts.keep tools fill the remaining slots). As topk is not explicitly passed by the caller in session/llm.ts, this default configuration might be too restrictive and severely limit the LLM's ability to utilize relevant tools.

Consider increasing the default topk value or modifying the logic to guarantee a certain number of dynamically retrieved tools independently of the core and kept tools.

Suggested change:

  export function select(query: string, tools: Tool[], opts: Options = {}): Set<string> {
    const topk = opts.topk ?? 20 // Adjust to a more reasonable default
    const minToRetrieve = opts.minToolsToRetrieve ?? topk

3. packages/opencode/src/session/llm.ts (L190-L197)

[🔵 LOW] According to the code quality checklist:

  1. Nested Ternary Expressions: The nested ternary used for the query assignment is prohibited. Please refactor this to use if-else statements.
  2. TypeScript Types: Avoid using the any type (c as any, p: any, t as any). If they are strictly necessary due to third-party SDK typing issues, please provide comments explaining the reason.

Suggested change:

      const c = lastUser?.content
      let query = ""
      if (typeof c === "string") {
        query = c
      } else if (Array.isArray(c)) {
        // Explicitly using any due to complex UserContent types from the ai SDK
        query = c.map((p: any) => (typeof p === "string" ? p : (p?.text ?? ""))).join(" ")
      }
      
      // Explicitly using any as the Tool type might lack a strict description definition
      const list = Object.entries(tools).map(([name, t]) => ({ name, description: (t as any)?.description }))

Comment on lines +40 to +42
function score(query: string, t: Tool): number {
const words = new Set(query.toLowerCase().match(/[a-z_]+/g) ?? [])
const hay = (t.name + " " + (t.description ?? "")).toLowerCase()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[🟠 MEDIUM] The regular expression /[a-z_]+/g only matches alphabetic characters and underscores. If tool names or query terms contain numbers or hyphens (e.g., dbt-schema-verify, v2, read-file), they will be incorrectly split or ignored entirely. This will also cause the direct name mention boost (words.has(t.name.toLowerCase())) to silently fail for any tools with numbers or hyphens in their names, because the words set won't contain the full name.

Consider including numbers and hyphens in the regular expression.

Suggested change:

Suggested change
function score(query: string, t: Tool): number {
const words = new Set(query.toLowerCase().match(/[a-z_]+/g) ?? [])
const hay = (t.name + " " + (t.description ?? "")).toLowerCase()
function score(query: string, t: Tool): number {
const words = new Set(query.toLowerCase().match(/[a-z0-9_-]+/g) ?? [])
const hay = (t.name + " " + (t.description ?? "")).toLowerCase()

Comment on lines +55 to +57
export function select(query: string, tools: Tool[], opts: Options = {}): Set<string> {
const topk = opts.topk ?? 12
const minToRetrieve = opts.minToolsToRetrieve ?? topk

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[🟠 MEDIUM] The default topk limit is set to 12. Since the CORE tools list already contains 10 items, this default only leaves room for a maximum of 2 dynamically retrieved tools per turn (and possibly zero if opts.keep tools fill the remaining slots). As topk is not explicitly passed by the caller in session/llm.ts, this default configuration might be too restrictive and severely limit the LLM's ability to utilize relevant tools.

Consider increasing the default topk value or modifying the logic to guarantee a certain number of dynamically retrieved tools independently of the core and kept tools.

Suggested change:

Suggested change
export function select(query: string, tools: Tool[], opts: Options = {}): Set<string> {
const topk = opts.topk ?? 12
const minToRetrieve = opts.minToolsToRetrieve ?? topk
export function select(query: string, tools: Tool[], opts: Options = {}): Set<string> {
const topk = opts.topk ?? 20 // Adjust to a more reasonable default
const minToRetrieve = opts.minToolsToRetrieve ?? topk

Comment on lines +190 to +197
const c = lastUser?.content as any
const query =
typeof c === "string"
? c
: Array.isArray(c)
? c.map((p: any) => (typeof p === "string" ? p : (p?.text ?? ""))).join(" ")
: ""
const list = Object.entries(tools).map(([name, t]) => ({ name, description: (t as any)?.description }))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[🔵 LOW] According to the code quality checklist:

  1. Nested Ternary Expressions: The nested ternary used for the query assignment is prohibited. Please refactor this to use if-else statements.
  2. TypeScript Types: Avoid using the any type (c as any, p: any, t as any). If they are strictly necessary due to third-party SDK typing issues, please provide comments explaining the reason.

Suggested change:

Suggested change
const c = lastUser?.content as any
const query =
typeof c === "string"
? c
: Array.isArray(c)
? c.map((p: any) => (typeof p === "string" ? p : (p?.text ?? ""))).join(" ")
: ""
const list = Object.entries(tools).map(([name, t]) => ({ name, description: (t as any)?.description }))
const c = lastUser?.content
let query = ""
if (typeof c === "string") {
query = c
} else if (Array.isArray(c)) {
// Explicitly using any due to complex UserContent types from the ai SDK
query = c.map((p: any) => (typeof p === "string" ? p : (p?.text ?? ""))).join(" ")
}
// Explicitly using any as the Tool type might lack a strict description definition
const list = Object.entries(tools).map(([name, t]) => ({ name, description: (t as any)?.description }))

@dev-punia-altimate

Copy link
Copy Markdown
Contributor

🤖 Code Review — OpenCodeReview (Gemini) — 3 finding(s)

  • 3 anchored to a line (posted inline when the comment stream is on)
  • 0 without a line anchor
All findings (full text)

1. packages/opencode/src/tool/retrieval.ts (L40-L42)

[🟠 MEDIUM] The regular expression /[a-z_]+/g only matches alphabetic characters and underscores. If tool names or query terms contain numbers or hyphens (e.g., dbt-schema-verify, v2, read-file), they will be incorrectly split or ignored entirely. This will also cause the direct name mention boost (words.has(t.name.toLowerCase())) to silently fail for any tools with numbers or hyphens in their names, because the words set won't contain the full name.

Consider including numbers and hyphens in the regular expression.

Suggested change:

  function score(query: string, t: Tool): number {
    const words = new Set(query.toLowerCase().match(/[a-z0-9_-]+/g) ?? [])
    const hay = (t.name + " " + (t.description ?? "")).toLowerCase()

2. packages/opencode/src/tool/retrieval.ts (L55-L57)

[🟠 MEDIUM] The default topk limit is set to 12. Since the CORE tools list already contains 10 items, this default only leaves room for a maximum of 2 dynamically retrieved tools per turn (and possibly zero if opts.keep tools fill the remaining slots). As topk is not explicitly passed by the caller in session/llm.ts, this default configuration might be too restrictive and severely limit the LLM's ability to utilize relevant tools.

Consider increasing the default topk value or modifying the logic to guarantee a certain number of dynamically retrieved tools independently of the core and kept tools.

Suggested change:

  export function select(query: string, tools: Tool[], opts: Options = {}): Set<string> {
    const topk = opts.topk ?? 20 // Adjust to a more reasonable default
    const minToRetrieve = opts.minToolsToRetrieve ?? topk

3. packages/opencode/src/session/llm.ts (L190-L197)

[🔵 LOW] According to the code quality checklist:

  1. Nested Ternary Expressions: The nested ternary used for the query assignment is prohibited. Please refactor this to use if-else statements.
  2. TypeScript Types: Avoid using the any type (c as any, p: any, t as any). If they are strictly necessary due to third-party SDK typing issues, please provide comments explaining the reason.

Suggested change:

      const c = lastUser?.content
      let query = ""
      if (typeof c === "string") {
        query = c
      } else if (Array.isArray(c)) {
        // Explicitly using any due to complex UserContent types from the ai SDK
        query = c.map((p: any) => (typeof p === "string" ? p : (p?.text ?? ""))).join(" ")
      }
      
      // Explicitly using any as the Tool type might lack a strict description definition
      const list = Object.entries(tools).map(([name, t]) => ({ name, description: (t as any)?.description }))

@dev-punia-altimate

Copy link
Copy Markdown
Contributor

❌ Tests — Failures Detected

TypeScript — 15 failure(s)

  • connection_refused [1.00ms]
  • timeout
  • permission_denied
  • parse_error
  • oom [1.00ms]
  • network_error
  • auth_failure
  • rate_limit
  • internal_error
  • empty_error
  • connection_refused
  • timeout [1.00ms]
  • permission_denied
  • parse_error
  • network_error

Next Step

Please address the failing cases above and re-run verification.

cc @anandgupta42

Review fixes (packages/opencode/src/tool/retrieval.ts):
- CORE listed "ls" but the real tool id is "list" (tool/ls.ts → Tool.define("list")),
  so the list tool was NOT protected and could be retrieved out. Corrected to "list".
- Tokenizer regex `/[a-z_]+/g` dropped digits and split nothing on hyphens, so query
  terms like "v2"/"s3" and hyphenated names matched poorly. Widened to `/[a-z0-9_]+/g`.

Docs: documented the `ALTIMATE_TOOL_RETRIEVAL` flag (default off), the always-on core
set, in-flight-tool protection, and the ~50%-input-token / equal-resolve result, under
configure/tools "Tool Behavior".

Note: the topk-budget nitpick is intentionally unchanged — the A/B that validated this
feature (−50% input tokens at identical resolve) ran at the current default; re-tuning
the core-vs-retrieved split is a separate follow-up, not a correctness fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@anandgupta42

Copy link
Copy Markdown
Contributor Author

Addressed the review on the actual PR contents (packages/opencode/src/tool/retrieval.ts) in 71db185:

Fixed

  • P1 (cubic): lslist. CORE listed "ls", but the real tool id is "list" (tool/ls.tsTool.define("list")), so the list tool wasn't protected and could be retrieved out. Corrected.
  • Tokenizer (coderabbit/Gemini): /[a-z_]+/g/[a-z0-9_]+/g. Now keeps digits (v2, s3) and splits hyphenated names into matchable parts.
  • Docs added for the ALTIMATE_TOOL_RETRIEVAL flag (default off, always-on core, in-flight-tool protection, ~50%-token/equal-resolve result) under configure/tools → Tool Behavior.

Intentionally not changed

  • topk budget nitpick — the A/B that validated this feature (−50% input tokens at identical resolve-rate) ran at the current default; re-tuning the core-vs-retrieved split is a separate tuning follow-up, not a correctness fix.

Spurious / stale findings (safe to dismiss)

  • dev-punia-altimate CHANGES_REQUESTED ("critical syntax error") points at alembic_tenant_clickhouse/.../databricks_reorder_clickhouse_sort_keys.py — that file is in altimate-backend, not in this PR. This PR is retrieval-only; the review ran against the wrong diff.
  • cubic findings on provider/constrained.ts are stale — constrained decoding was split out of this PR into its own branch; constrained.ts is no longer part of feat: per-turn tool retrieval (trim tool-definition context flood) #858.

Multi-model review (MiniMax/GLM-5/Claude) findings:
- CRITICAL (MiniMax): CORE listed "list", but no list/ls tool is registered
  (registry has glob/bash/etc., not ListTool). The `all.has` guard made it
  harmless, but it was a dead/misleading entry. Removed it; documented that CORE
  must be real registered ids. (There is no directory-listing tool — glob/bash ls.)
- CRITICAL/MINOR (GLM-5/MiniMax/Claude consensus): score() filtered tokens with
  length > 3, dropping high-signal 3-char domain terms (sql, dbt, pii, ddl, api).
  Changed to length >= 3.
- MAJOR (MiniMax/GLM-5): topk read as a hard cap; documented that it is NOT —
  core + in-flight tools are always retained (dropping them corrupts the turn);
  topk bounds only the extra ranked additions. Behavior unchanged (correct as-is).
- MINOR (MiniMax/GLM-5): documented why "invalid" is exempt from retrieval in llm.ts.

Tests: + CORE-has-no-phantom regression, + 3-char-token scoring, + topk-not-a-cap
(core/in-flight survive). 8/8 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/opencode/test/tool/retrieval.test.ts (1)

58-65: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guarantee env restoration with try/finally to avoid test-state leaks.

At Line 59-65, process.env["ALTIMATE_TOOL_RETRIEVAL"] is only restored on the happy path. If an assertion fails, global env state can leak into later tests.

Proposed fix
 test("enabled() reads the env flag", () => {
   const prev = process.env["ALTIMATE_TOOL_RETRIEVAL"]
-  process.env["ALTIMATE_TOOL_RETRIEVAL"] = "1"
-  expect(Retrieval.enabled()).toBe(true)
-  delete process.env["ALTIMATE_TOOL_RETRIEVAL"]
-  expect(Retrieval.enabled()).toBe(false)
-  if (prev !== undefined) process.env["ALTIMATE_TOOL_RETRIEVAL"] = prev
+  try {
+    process.env["ALTIMATE_TOOL_RETRIEVAL"] = "1"
+    expect(Retrieval.enabled()).toBe(true)
+    delete process.env["ALTIMATE_TOOL_RETRIEVAL"]
+    expect(Retrieval.enabled()).toBe(false)
+  } finally {
+    if (prev === undefined) {
+      delete process.env["ALTIMATE_TOOL_RETRIEVAL"]
+    } else {
+      process.env["ALTIMATE_TOOL_RETRIEVAL"] = prev
+    }
+  }
 })
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/test/tool/retrieval.test.ts` around lines 58 - 65, The test
mutates process.env["ALTIMATE_TOOL_RETRIEVAL"] without guaranteeing restoration;
wrap the mutation and assertions in a try/finally so the original value saved in
prev is always restored. In the test named "enabled() reads the env flag" (which
calls Retrieval.enabled()), move setting process.env to the try block and
perform both expect calls there, then in finally restore the environment by
deleting the var if prev was undefined or reassigning prev if it existed. Ensure
the finally runs unconditionally so no test-state leak occurs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@packages/opencode/test/tool/retrieval.test.ts`:
- Around line 58-65: The test mutates process.env["ALTIMATE_TOOL_RETRIEVAL"]
without guaranteeing restoration; wrap the mutation and assertions in a
try/finally so the original value saved in prev is always restored. In the test
named "enabled() reads the env flag" (which calls Retrieval.enabled()), move
setting process.env to the try block and perform both expect calls there, then
in finally restore the environment by deleting the var if prev was undefined or
reassigning prev if it existed. Ensure the finally runs unconditionally so no
test-state leak occurs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 23d287cb-2bd8-4909-8753-0bf2326ddf1b

📥 Commits

Reviewing files that changed from the base of the PR and between 71db185 and c770459.

📒 Files selected for processing (3)
  • packages/opencode/src/session/llm.ts
  • packages/opencode/src/tool/retrieval.ts
  • packages/opencode/test/tool/retrieval.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/opencode/src/session/llm.ts
  • packages/opencode/src/tool/retrieval.ts

@anandgupta42 anandgupta42 merged commit 88b7f6f into main Jun 10, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

2 participants