Skip to content

update yutori template to n1.5#159

Open
dprevoznik wants to merge 11 commits into
mainfrom
hypeship/yutori-n15-template
Open

update yutori template to n1.5#159
dprevoznik wants to merge 11 commits into
mainfrom
hypeship/yutori-n15-template

Conversation

@dprevoznik
Copy link
Copy Markdown
Contributor

@dprevoznik dprevoznik commented May 6, 2026

Summary

Updates the Yutori CUA template (TS + Python) from n1-latest to n1.5-latest, plus speed and request-size improvements.

This is more than a model-name swap — n1.5 changes several action signatures, adds new actions, and introduces a tool-set selector. The template is computer-use only (no Playwright page exposed to the model), so we explicitly exclude the DOM/Playwright "expanded" tools.

Loop / API changes

  • bump model id from n1-latest to n1.5-latest
  • send extra_body.tool_set: "browser_tools_core-20260403" to select the coordinate-based tool set (the default, but pinned for stability)
  • send extra_body.disable_tools: ["extract_elements", "find", "set_element_value", "execute_js"] as defense-in-depth — these are the expanded tools that need a Playwright page

Action / handler changes

  • rename hovermouse_move
  • key_press: parameter key_combkey
  • type: drop press_enter_after / clear_before_typing (n1.5 emits separate key_press actions instead)
  • new actions: middle_click, mouse_down, mouse_up, hold_key, go_forward
  • click actions: optional modifier parameter wired into Kernel's hold_keys for shift/ctrl-clicks
  • wait: optional duration parameter honored
  • hold_key: optional duration parameter honored via pressKey's duration

Request-size trimming

Mirrors the official yutori-sdk-python payload.py algorithm to keep the per-request payload under the API ceiling on long sessions:

  • MAX_REQUEST_BYTES = 9_500_000, KEEP_RECENT_SCREENSHOTS = 6 (Yutori's published defaults)
  • size-triggered (not count-triggered) — only kicks in when the estimated request exceeds the threshold
  • two-phase strip: drop images outside the protected window first; if still too large, walk into the protected window but always preserve the latest screenshot
  • dual-list pattern: the caller's full conversation_messages is preserved (deep-copied before mutation); only the trimmed copy is sent to the API
  • when an image_url block is stripped from a tool result with no remaining text, a "Screenshot omitted to stay under request size limit." placeholder is inserted so the message stays valid

Latency

  • post-action SCREENSHOT_DELAY halved from 300ms → 150ms (the settle wait after click/type/scroll before grabbing the next screenshot)
  • ACTION_DELAY (the focus-settling wait inside goto_url's ctrl+L → type → Enter flow) left at 300ms — halving it risks silent input misrouting if the address bar hasn't taken focus yet

Files

  • pkg/create/templates.go — display name + description
  • pkg/templates/{typescript,python}/yutori/{loop,index|main,tools/computer,README}.* — implementation, trim helpers, types, docs

Test plan

  • make build and make test pass
  • scaffold the template (kernel create -t yutori -l {typescript,python}) and deploy end-to-end against n1.5-latest in the Default project
  • smoke test: navigate + interact on example.com (both TS and Python)
  • verify key_press Enter works on in-page search forms (DuckDuckGo, Google) and Wikipedia
  • verify scroll action works on results pages
  • verify goto_url (ctrl+L address-bar flow) still works post-delay changes
  • confirm trim helpers preserve the full caller history while shrinking only the request copy (verified with synthetic 20-step × 600KB history: 12.9MB → 9.2MB, last 6 screenshots byte-identical, tool_call_ids intact)

Note

Medium Risk
Updates the Yutori agent loop and action schema for both Python and TypeScript templates, changing the API payload and browser-control behaviors. Risk is moderate since it affects runtime automation flows (tool invocation, input timing, and message history trimming) but is isolated to scaffolding/templates.

Overview
Upgrades the Yutori computer-use templates (Python + TypeScript) from n1-latest to n1.5-latest, including updated request wiring (tool_set: browser_tools_core-20260403 and explicit disable_tools to keep the template computer-use only).

Updates the action schema/handlers to match n1.5: adds support for new actions (e.g. mouse_move, mouse_down/mouse_up, middle_click, hold_key, go_forward), changes key_press to use key, drops n1-only typing flags, and threads optional modifier into Kernel hold_keys for clicks/scrolls.

Adds request-size protection in both loops by deep-copying and trimming older screenshot parts when the serialized message payload exceeds ~9.5MB (keeping the most recent screenshots), and reduces post-action screenshot delay from 300ms to 150ms. Documentation and template display strings are updated to reflect n1.5 and the supported/disabled tool set.

Reviewed by Cursor Bugbot for commit 7bb49ec. Bugbot is set up for automated code reviews on this repo. Configure here.

dprevoznik and others added 8 commits May 6, 2026 22:47
- bump model id from `n1-latest` to `n1.5-latest`
- send `extra_body.tool_set: browser_tools_core-20260403` to use the
  coordinate-based tool set
- send `extra_body.disable_tools` to explicitly exclude the DOM/Playwright
  tools (`extract_elements`, `find`, `set_element_value`, `execute_js`)
  since this template runs computer-use only
- rename `hover` to `mouse_move`
- rename `key_press` parameter `key_comb` to `key`
- drop `press_enter_after` and `clear_before_typing` from `type` (n1.5
  emits separate `key_press` actions instead)
- add new actions: `middle_click`, `mouse_down`, `mouse_up`, `hold_key`,
  `go_forward`
- support optional `modifier` parameter on click actions via Kernel's
  `hold_keys`

Co-Authored-By: Claude Opus 4.7 <[email protected]>
n1.5's scroll action accepts an optional `modifier` (e.g., shift) that
on browsers translates a vertical wheel into a horizontal scroll. Plumb
it into Kernel's `ComputerScrollParams.hold_keys` so the OS-level
modifier+wheel event is dispatched correctly.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Mirror yutori-sdk-python's reference loop: deep-copy a request-only
view of the messages and strip old image_url blocks once the JSON
payload exceeds ~9.5 MB, while always preserving the most recent
6 screenshots and the very latest one. The caller's full history
is left intact for the return value.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
300ms after every action was conservative — at 50 iterations that is
~15s of pure wall-clock waiting before model calls. 150ms still gives
the page enough time to settle for typical interactions while halving
the per-step overhead.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
key_press / hold_key compound keys already consulted MODIFIER_MAP, but
the new `modifier` parameter on click and scroll actions passes a bare
modifier name ("control", "meta", "command") that bypassed the lookup
and went out as-is. Kernel's hold_keys wants "ctrl" and "super" — so
ctrl-click and cmd-click silently dropped the modifier.

Unify the per-part mapping into a single helper applied to both the
compound and single-key paths in TS and Python.
Yutori's reference impl (frontend-visualqa actions.py:469, 506)
interprets the model-supplied `duration` argument on `wait` and
`hold_key` as seconds — passed straight to asyncio.sleep on the wait
path, and clamped to 100s on the hold_key path. Our handlers were
treating duration as milliseconds, so any model-supplied value was
silently interpreted 1000× too short (`wait { duration: 2 }` slept 2ms
instead of 2s; `hold_key { duration: 0.5 }` held 0.5ms instead of
500ms). Defaults were unaffected because they were pre-computed in ms.

Convert seconds → ms before passing to Kernel's pressKey, and use
seconds directly for asyncio.sleep / setTimeout. Adds a `> 0` guard on
hold_key duration, which also resolves the bugbot nit about negative
values reaching the SDK.
Specific bench scores and per-token pricing will stale fast and aren't
load-bearing for the template.
@dprevoznik dprevoznik marked this pull request as ready for review May 12, 2026 02:39
@firetiger-agent
Copy link
Copy Markdown

Firetiger deploy monitoring skipped

This PR didn't match the auto-monitor filter configured on your GitHub connection:

Any PR that changes the kernel API. Monitor changes to API endpoints (packages/api/cmd/api/) and Temporal workflows (packages/api/lib/temporal) in the kernel repo

Reason: PR updates a Yutori template with model changes and action signature updates, but does not modify kernel API endpoints (packages/api/cmd/api/) or Temporal workflows (packages/api/lib/temporal) in the codebase.

To monitor this PR anyway, reply with @firetiger monitor this.

@dprevoznik dprevoznik requested a review from masnwilliams May 12, 2026 02:42
dprevoznik and others added 2 commits May 11, 2026 22:42
The openai-node SDK does not have a Python-style `extra_body` kwarg —
it serializes the body as-is. Passing `extra_body: {...}` as a body
field made Yutori receive a literal `{"extra_body": ...}` and silently
drop the tool_set pin and disable_tools defense-in-depth.

Hoist both fields and apply them via a typed spread (mirrors the
anthropic-computer-use loop pattern).
Comment thread pkg/templates/typescript/yutori/loop.ts Outdated
Comment thread pkg/templates/typescript/yutori/tools/computer.ts
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit d2affba. Configure here.

Comment thread pkg/templates/python/yutori/tools/computer.py Outdated
Copy link
Copy Markdown
Contributor

@masnwilliams masnwilliams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed — n1.5 contract changes look clean and python/typescript halves stay in lockstep. trim helpers verified end-to-end on a synthetic 20-step × 600KB history (16MB → 8.8MB, last 6 screenshots byte-identical, caller list unchanged). one real bug worth fixing before ship:

Bugs

  • pkg/templates/python/yutori/tools/computer.py:259,280 and pkg/templates/typescript/yutori/tools/computer.ts:281,312duration for wait and hold_key is treated as milliseconds, but per yutori's reference impl (yutori-ai/frontend-visualqa/src/frontend_visualqa/actions.py lines 469, 506) it's seconds — passed straight to asyncio.sleep, clamped to 100 seconds. defaults are fine because they're pre-computed in ms (1000, 2000), but any model-supplied duration is interpreted 1000× too short (wait { duration: 2 } → 2ms instead of 2s, hold_key { duration: 0.5 } → 0.5ms instead of 500ms). need to multiply by 1000 before sleeping / before passing to kernel press_key.duration.

Nits

  • pkg/templates/python/yutori/loop.py:203 and pkg/templates/typescript/yutori/loop.ts:147 — placeholder text is "Screenshot omitted to stay under request size limit.", but PR description quotes "[screenshot omitted to fit request size]". wording mismatch only.
  • pkg/templates/python/yutori/README.md:6 and pkg/templates/typescript/yutori/README.md:6 — vendor benchmark + pricing numbers will stale fast. consider scoping with "at launch" or dropping.
  • pkg/templates/typescript/yutori/loop.ts:112@ts-expect-error on extra_body becomes a build failure the day the OpenAI SDK adds the field. @ts-ignore or a cast is more future-proof.

@dprevoznik dprevoznik requested a review from masnwilliams May 13, 2026 01:42
@dprevoznik
Copy link
Copy Markdown
Contributor Author

@masnwilliams fixed the nits you had! still working

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants