Skip to content

warmup tools for speed up #214

Description

@esokullu

do you think we can speed things up by running get_accessibility_tree right from the get-go before first msg to llm, even for planning (when it's available)

• I’ll ground this in the current webbrain code path rather than guessing, especially around where planning gets its initial context and where get_accessibility_tree is exposed.

• Explored
└ Search get_accessibility_tree|accessibility tree|accessibilityTree|planning|planner|first msg|first message|initial in MEMORY.md
Search get_accessibility_tree|accessibility tree|accessibilityTree|planning|planner|first msg|first message|initial in .

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

• The repo already treats the AX tree as the primary page-read surface, but the planner currently runs before the normal tool loop and only gets URL/title/history/user message unless something else enriches the
message first. I’m checking the exact pre-loop flow now.

• Explored
└ Read agent.js, planner.js, ARCHITECTURE.md
Search _runPlannerGate|buildPlannerMessages|_collectPageContext|read_page|initial_user_message|get_accessibility_tree|executeTool|_execute in agent.js

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

• So far this looks like a real latency tradeoff, not only a prompt tweak: front-loading the tree could eliminate the very common first “read page” LLM turn, but it would also add preflight work to tasks where
page context is irrelevant or unavailable.

• Explored
└ Read agent.js, planner.js, tools.js

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

• One important detail: Firefox’s visible code already enriches only URL/title on every turn, and screenshot only on first turn when vision exists. That suggests an AX preflight would be a new enrichment channel,
not just moving an existing planner input earlier.

• Explored
└ Search _enrichUserMessageWithCurrentPage|_maybeRunPlannerGate|_runPlannerGate|async run|sendMessage|handleMessage|processMessage in agent.js
Search _enrichUserMessageWithCurrentPage|_maybeRunPlannerGate|_runPlannerGate|async run|sendMessage|handleMessage|processMessage in agent.js
Search executeTool|_executeTool|callTool|handleTool|get_accessibility_tree|toolCall in agent.js
Search get_accessibility_tree in test

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

• I’m also looking at the benchmark/scenario setup because the likely gain depends on how often the first tool call is already get_accessibility_tree. If most tasks already start there, prefetch saves one LLM
round trip on many cases, but we need to avoid paying that cost on navigation/file/download/API-shaped tasks.

• Explored
└ Read agent.js, accessibility-tree.js, content.js
Search preflight|prefetch|initial.*accessibility|accessibility.*initial|synthetic.*tool|synthetic.*result|tool result in run.js

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

• The benchmark files also show AX as the expected first read for a large slice of browser tasks. That strengthens the case for a guarded pre-observe path, especially for local models where a whole extra LLM round
trip is expensive relative to a DOM read.

• Explored
└ Read llm_browser_agent_vs_sonnet_simple.html, MEMORY.md, ARCHITECTURE.md

• Ran node - <<'NODE'
│ const fs = require('fs');
│ const path = 'test/llm/analysis/llm_browser_agent_vs_sonnet_simple.html';
│ … +11 lines
└ total 100
get_accessibility_tree 32 32.0%
… +12 lines (ctrl + t to view transcript)
scroll 1 1.0%
get_selection 1 1.0%

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

• Yes. I think it’s worth doing, but as a guarded pre-observe snapshot, not as an unconditional fake tool turn.

Current flow: _enrichUserMessageWithCurrentPage() adds URL/title, then planner runs before the normal tool loop. The planner prompt already mentions “current page context,” but right now that mostly means URL/
title unless screenshot vision is available. Since get_accessibility_tree is the primary page read path, preloading a small visible AX snapshot would let both the planner and first main LLM call start with page
state instead of spending turn 1 asking for it.

I’d implement it like this:

  • Before the first LLM call, try get_accessibility_tree({ filter: "visible", maxDepth: 8-10 }).
  • Wrap it as untrusted page context in the enriched user message.
  • Put a short timeout on it, maybe 500-1000ms; if it fails or content scripts aren’t available, continue normally.
  • Skip on PDFs, browser/internal pages, missing tab access, or obvious non-page tasks.
  • Cache that snapshot for the first loop step so if the model still calls the same AX read, we can return the cached result instead of doing a second DOM read.
  • Invalidate after navigation, click/type/set_field/scroll, or if the snapshot is old.

The main risk is token bloat and stale refs, especially with plan approval delay. So I’d keep the snapshot small and label it as a timestamped initial snapshot. The planner should use it for page understanding,
but the act loop should still tolerate stale ref_ids and re-read if an interaction fails.

Net: yes, this should speed up the common “first LLM call just asks to read the page” pattern, and it should improve planner quality too. I’d make it opportunistic and cheap, not mandatory.

─ Worked for 1m 30s ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

› ooor... we can call it. but not feed the _enrichUserMessageWithCurrentPage with it. but 80% the first tool llm will call will be get_accessibility_tree so we would speed things up with responding from cache?
which approach do you think it makes sense? this is like warmup cache, while the tool is idle you get the idea?

• Yes, I get the idea. As a safe default, I’d pick the warmup-cache approach first.

It has the best risk profile: call get_accessibility_tree({filter:"visible"}) opportunistically while the first LLM call is running, but don’t feed it into _enrichUserMessageWithCurrentPage. If the model’s first
tool call is the same AX read, return the warmed result immediately. If it asks for something else, discard it. This keeps behavior almost unchanged and avoids token bloat, stale page-context bias, or planner
prompt injection risk.

The caveat: this only saves the AX tool latency, not the second LLM round trip. If AX build/injection is 100-500ms and the model call is 3-10s, the win is modest. Feeding the AX tree into the first prompt is the
bigger speed play because it can skip the whole “LLM asks to read page → tool result → second LLM” cycle. But it also changes model behavior and increases prompt size.

So I’d sequence it:

  1. Implement warmup cache first: low-risk, easy to measure, almost no behavior change.
  2. Add trace metrics: warmed, cache hit, stale discard, latency saved, first tool name.
  3. If data shows most tasks still do AX -> second LLM, then add a second mode: inline initial AX context for Ask mode and planner, maybe behind a setting/experiment.

For Act mode, I’d be conservative: warm cache by default, inline AX only after metrics prove it’s worth the token cost. For planner quality specifically, inline AX is more useful than cache, because the planner
never sees cached tool results unless it asks for tools, which it currently doesn’t.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions