update yutori template to n1.5 by dprevoznik · Pull Request #159 · kernel/cli

dprevoznik · 2026-05-06T22:48:07Z

Summary

Updates the Yutori CUA template (TS + Python) from n1-latest to n1.5-latest, plus speed and request-size improvements.

This is more than a model-name swap — n1.5 changes several action signatures, adds new actions, and introduces a tool-set selector. The template is computer-use only (no Playwright page exposed to the model), so we explicitly exclude the DOM/Playwright "expanded" tools.

Loop / API changes

bump model id from n1-latest to n1.5-latest
send extra_body.tool_set: "browser_tools_core-20260403" to select the coordinate-based tool set (the default, but pinned for stability)
send extra_body.disable_tools: ["extract_elements", "find", "set_element_value", "execute_js"] as defense-in-depth — these are the expanded tools that need a Playwright page

Action / handler changes

rename hover → mouse_move
key_press: parameter key_comb → key
type: drop press_enter_after / clear_before_typing (n1.5 emits separate key_press actions instead)
new actions: middle_click, mouse_down, mouse_up, hold_key, go_forward
click actions: optional modifier parameter wired into Kernel's hold_keys for shift/ctrl-clicks
wait: optional duration parameter honored
hold_key: optional duration parameter honored via pressKey's duration

Request-size trimming

Mirrors the official yutori-sdk-python payload.py algorithm to keep the per-request payload under the API ceiling on long sessions:

MAX_REQUEST_BYTES = 9_500_000, KEEP_RECENT_SCREENSHOTS = 6 (Yutori's published defaults)
size-triggered (not count-triggered) — only kicks in when the estimated request exceeds the threshold
two-phase strip: drop images outside the protected window first; if still too large, walk into the protected window but always preserve the latest screenshot
dual-list pattern: the caller's full conversation_messages is preserved (deep-copied before mutation); only the trimmed copy is sent to the API
when an image_url block is stripped from a tool result with no remaining text, a "Screenshot omitted to stay under request size limit." placeholder is inserted so the message stays valid

Latency

post-action SCREENSHOT_DELAY halved from 300ms → 150ms (the settle wait after click/type/scroll before grabbing the next screenshot)
ACTION_DELAY (the focus-settling wait inside goto_url's ctrl+L → type → Enter flow) left at 300ms — halving it risks silent input misrouting if the address bar hasn't taken focus yet

Files

pkg/create/templates.go — display name + description
pkg/templates/{typescript,python}/yutori/{loop,index|main,tools/computer,README}.* — implementation, trim helpers, types, docs

Test plan

make build and make test pass
scaffold the template (kernel create -t yutori -l {typescript,python}) and deploy end-to-end against n1.5-latest in the Default project
smoke test: navigate + interact on example.com (both TS and Python)
verify key_press Enter works on in-page search forms (DuckDuckGo, Google) and Wikipedia
verify scroll action works on results pages
verify goto_url (ctrl+L address-bar flow) still works post-delay changes
confirm trim helpers preserve the full caller history while shrinking only the request copy (verified with synthetic 20-step × 600KB history: 12.9MB → 9.2MB, last 6 screenshots byte-identical, tool_call_ids intact)

Note

Medium Risk
Updates the Yutori agent loop and action schema for both Python and TypeScript templates, changing the API payload and browser-control behaviors. Risk is moderate since it affects runtime automation flows (tool invocation, input timing, and message history trimming) but is isolated to scaffolding/templates.

Overview
Upgrades the Yutori computer-use templates (Python + TypeScript) from n1-latest to n1.5-latest, including updated request wiring (tool_set: browser_tools_core-20260403 and explicit disable_tools to keep the template computer-use only).

Updates the action schema/handlers to match n1.5: adds support for new actions (e.g. mouse_move, mouse_down/mouse_up, middle_click, hold_key, go_forward), changes key_press to use key, drops n1-only typing flags, and threads optional modifier into Kernel hold_keys for clicks/scrolls.

Adds request-size protection in both loops by deep-copying and trimming older screenshot parts when the serialized message payload exceeds ~9.5MB (keeping the most recent screenshots), and reduces post-action screenshot delay from 300ms to 150ms. Documentation and template display strings are updated to reflect n1.5 and the supported/disabled tool set.

^{Reviewed by Cursor Bugbot for commit 7bb49ec. Bugbot is set up for automated code reviews on this repo. Configure here.}

- bump model id from `n1-latest` to `n1.5-latest` - send `extra_body.tool_set: browser_tools_core-20260403` to use the coordinate-based tool set - send `extra_body.disable_tools` to explicitly exclude the DOM/Playwright tools (`extract_elements`, `find`, `set_element_value`, `execute_js`) since this template runs computer-use only - rename `hover` to `mouse_move` - rename `key_press` parameter `key_comb` to `key` - drop `press_enter_after` and `clear_before_typing` from `type` (n1.5 emits separate `key_press` actions instead) - add new actions: `middle_click`, `mouse_down`, `mouse_up`, `hold_key`, `go_forward` - support optional `modifier` parameter on click actions via Kernel's `hold_keys` Co-Authored-By: Claude Opus 4.7 <[email protected]>

n1.5's scroll action accepts an optional `modifier` (e.g., shift) that on browsers translates a vertical wheel into a horizontal scroll. Plumb it into Kernel's `ComputerScrollParams.hold_keys` so the OS-level modifier+wheel event is dispatched correctly. Co-Authored-By: Claude Opus 4.7 <[email protected]>

Co-Authored-By: Claude Opus 4.7 <[email protected]>

Mirror yutori-sdk-python's reference loop: deep-copy a request-only view of the messages and strip old image_url blocks once the JSON payload exceeds ~9.5 MB, while always preserving the most recent 6 screenshots and the very latest one. The caller's full history is left intact for the return value. Co-Authored-By: Claude Opus 4.7 <[email protected]>

300ms after every action was conservative — at 50 iterations that is ~15s of pure wall-clock waiting before model calls. 150ms still gives the page enough time to settle for typical interactions while halving the per-step overhead. Co-Authored-By: Claude Opus 4.7 <[email protected]>

key_press / hold_key compound keys already consulted MODIFIER_MAP, but the new `modifier` parameter on click and scroll actions passes a bare modifier name ("control", "meta", "command") that bypassed the lookup and went out as-is. Kernel's hold_keys wants "ctrl" and "super" — so ctrl-click and cmd-click silently dropped the modifier. Unify the per-part mapping into a single helper applied to both the compound and single-key paths in TS and Python.

Yutori's reference impl (frontend-visualqa actions.py:469, 506) interprets the model-supplied `duration` argument on `wait` and `hold_key` as seconds — passed straight to asyncio.sleep on the wait path, and clamped to 100s on the hold_key path. Our handlers were treating duration as milliseconds, so any model-supplied value was silently interpreted 1000× too short (`wait { duration: 2 }` slept 2ms instead of 2s; `hold_key { duration: 0.5 }` held 0.5ms instead of 500ms). Defaults were unaffected because they were pre-computed in ms. Convert seconds → ms before passing to Kernel's pressKey, and use seconds directly for asyncio.sleep / setTimeout. Adds a `> 0` guard on hold_key duration, which also resolves the bugbot nit about negative values reaching the SDK.

Specific bench scores and per-token pricing will stale fast and aren't load-bearing for the template.

firetiger-agent · 2026-05-12T02:39:50Z

Firetiger deploy monitoring skipped

This PR didn't match the auto-monitor filter configured on your GitHub connection:

Any PR that changes the kernel API. Monitor changes to API endpoints (packages/api/cmd/api/) and Temporal workflows (packages/api/lib/temporal) in the kernel repo

Reason: PR updates a Yutori template with model changes and action signature updates, but does not modify kernel API endpoints (packages/api/cmd/api/) or Temporal workflows (packages/api/lib/temporal) in the codebase.

To monitor this PR anyway, reply with @firetiger monitor this.

The openai-node SDK does not have a Python-style `extra_body` kwarg — it serializes the body as-is. Passing `extra_body: {...}` as a body field made Yutori receive a literal `{"extra_body": ...}` and silently drop the tool_set pin and disable_tools defense-in-depth. Hoist both fields and apply them via a typed spread (mirrors the anthropic-computer-use loop pattern).

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit d2affba. Configure here.}

masnwilliams

reviewed — n1.5 contract changes look clean and python/typescript halves stay in lockstep. trim helpers verified end-to-end on a synthetic 20-step × 600KB history (16MB → 8.8MB, last 6 screenshots byte-identical, caller list unchanged). one real bug worth fixing before ship:

Bugs

pkg/templates/python/yutori/tools/computer.py:259,280 and pkg/templates/typescript/yutori/tools/computer.ts:281,312 — duration for wait and hold_key is treated as milliseconds, but per yutori's reference impl (yutori-ai/frontend-visualqa/src/frontend_visualqa/actions.py lines 469, 506) it's seconds — passed straight to asyncio.sleep, clamped to 100 seconds. defaults are fine because they're pre-computed in ms (1000, 2000), but any model-supplied duration is interpreted 1000× too short (wait { duration: 2 } → 2ms instead of 2s, hold_key { duration: 0.5 } → 0.5ms instead of 500ms). need to multiply by 1000 before sleeping / before passing to kernel press_key.duration.

Nits

pkg/templates/python/yutori/loop.py:203 and pkg/templates/typescript/yutori/loop.ts:147 — placeholder text is "Screenshot omitted to stay under request size limit.", but PR description quotes "[screenshot omitted to fit request size]". wording mismatch only.
pkg/templates/python/yutori/README.md:6 and pkg/templates/typescript/yutori/README.md:6 — vendor benchmark + pricing numbers will stale fast. consider scoping with "at launch" or dropping.
pkg/templates/typescript/yutori/loop.ts:112 — @ts-expect-error on extra_body becomes a build failure the day the OpenAI SDK adds the field. @ts-ignore or a cast is more future-proof.

dprevoznik · 2026-05-13T01:42:35Z

@masnwilliams fixed the nits you had! still working

dprevoznik and others added 8 commits May 6, 2026 22:47

frame readme with n1.5 benchmarks and scope

8d3f065

Co-Authored-By: Claude Opus 4.7 <[email protected]>

docs(yutori): drop vendor benchmark + pricing numbers from README

72ea88e

Specific bench scores and per-token pricing will stale fast and aren't load-bearing for the template.

dprevoznik marked this pull request as ready for review May 12, 2026 02:39

dprevoznik requested a review from masnwilliams May 12, 2026 02:42

dprevoznik and others added 2 commits May 11, 2026 22:42

Merge branch 'main' into hypeship/yutori-n15-template

cdf4eeb

cursor Bot reviewed May 12, 2026

View reviewed changes

Comment thread pkg/templates/typescript/yutori/loop.ts Outdated

Comment thread pkg/templates/typescript/yutori/tools/computer.ts

cursor Bot reviewed May 12, 2026

View reviewed changes

Comment thread pkg/templates/python/yutori/tools/computer.py Outdated

masnwilliams requested changes May 12, 2026

View reviewed changes

Merge branch 'main' into hypeship/yutori-n15-template

7bb49ec

dprevoznik requested a review from masnwilliams May 13, 2026 01:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update yutori template to n1.5#159

update yutori template to n1.5#159
dprevoznik wants to merge 11 commits into
mainfrom
hypeship/yutori-n15-template

dprevoznik commented May 6, 2026 •

edited by cursor Bot

Loading

Uh oh!

firetiger-agent Bot commented May 12, 2026

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

masnwilliams left a comment

Uh oh!

dprevoznik commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dprevoznik commented May 6, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Loop / API changes

Action / handler changes

Request-size trimming

Latency

Files

Test plan

Uh oh!

firetiger-agent Bot commented May 12, 2026

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

masnwilliams left a comment

Choose a reason for hiding this comment

Bugs

Nits

Uh oh!

dprevoznik commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dprevoznik commented May 6, 2026 •

edited by cursor Bot

Loading