feat: Sway backend + hybrid NL find + macro/OCR + Hermes skill by Stijnman · Pull Request #24 · agent-sh/computer-use-linux

Stijnman · 2026-06-08T08:17:15Z

Summary

Closes the highest-ROI gaps identified for making computer-use-linux the definitive production Linux desktop MCP:

Windowing

Sway/wlroots backend via swaymsg -t get_tree with SWAYSOCK discovery, container-id focus ([con_id=N] focus), and doctor probe registration (between Hyprland and i3).

Agent ergonomics

find_element — natural-language element discovery returning @eN refs with confidence scoring
hybrid_strategy — accessibility-first vs coordinate-fallback recommendation (COMPUTER_USE_LINUX_HYBRID=1)
get_clipboard / set_clipboard — wl-clipboard / xclip / xsel
start_recording / stop_recording / replay_macro — JSON workflow capture + Hermes skill skeleton export
screenshot_debug — element bounding-box highlights + optional tesseract OCR

Hermes onboarding

Expanded skills/computer-use-linux/SKILL.md with the accessibility-first + hybrid decision tree, new tool table, and COMPUTER_USE_LINUX_HYBRID setup.

Test plan

cargo test — 110 unit tests pass (including new Sway parser + NL find_element tests)
computer-use-linux doctor on KDE/X11 session
Manual validation on Sway session with swaymsg available
Hermes MCP tool discovery with new tools enabled

Notes

Hybrid mode is opt-in via env var to preserve existing accessibility-first defaults.
OCR requires tesseract-ocr installed; fails gracefully when absent.
Macro replay returns steps for the host to execute (no silent auto-execution).

@en

Add high-ROI agent desktop capabilities on top of the existing AT-SPI foundation: - Sway/wlroots window backend via swaymsg (list, focus, doctor probe) - Natural-language find_element with @en refs and hybrid_strategy guidance - Clipboard get/set tools (wl-clipboard, xclip, xsel) - Macro record/replay with JSON export and Hermes skill skeleton - screenshot_debug with element bounding-box highlights and optional OCR - Expanded Hermes skill with accessibility-first + hybrid decision tree Enable hybrid coordinate fallback with COMPUTER_USE_LINUX_HYBRID=1.

gemini-code-assist

Code Review

This pull request introduces several new features to the Linux computer-use agent, including a Sway windowing backend, clipboard management, natural-language element finding, a hybrid input strategy recommendation system, macro recording/replay, and visual debugging with OCR and bounding-box highlights. Feedback on these changes highlights a bug in Sway's X11 PID hydration where the internal container ID is incorrectly used instead of the X11 window ID, an incomplete macro recording implementation that fails to capture steps during mutating actions, and a rendering issue in visual debugging where off-screen bounding boxes draw misleading borders along screen edges.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-08T08:19:52Z

+fn hydrate_sway_window_pids(windows: &mut [WindowInfo]) {
+    for window in windows {
+        if window.pid.is_none() {
+            if let Some(client_type) = window.client_type.as_deref() {
+                if client_type == "x11" {
+                    window.pid = sway_x11_window_pid(window.window_id);
+                }
+            }
+        }
+    }
+}


In Sway, window_id in WindowInfo is set to the internal Sway container ID (self.id), not the X11 window ID (self.window). Passing window.window_id to sway_x11_window_pid causes xprop to fail because it expects a valid X11 window ID, not a Sway container ID.

To fix this, we should perform the PID hydration directly inside to_window_info where self.window is still available, and then remove hydrate_sway_window_pids entirely.

fn hydrate_sway_window_pids(_windows: &mut [WindowInfo]) {}

gemini-code-assist · 2026-06-08T08:19:52Z

+            wm_class,
+            pid: self.pid.and_then(|pid| u32::try_from(pid).ok()),
+            bounds,


Hydrate the PID directly here using self.window (the actual X11 window ID) when self.pid is not populated by Sway.

Suggested change

wm_class,

pid: self.pid.and_then(|pid| u32::try_from(pid).ok()),

bounds,

wm_class,

pid: self.pid

.and_then(|pid| u32::try_from(pid).ok())

.or_else(|| self.window.and_then(sway_x11_window_pid)),

bounds,

gemini-code-assist · 2026-06-08T08:19:52Z

+    fn start_recording(
+        &self,
+        Parameters(params): Parameters<StartRecordingParams>,
+    ) -> Json<RecordingOutput> {
+        let message = self.macro_recorder.start(params.name.clone());


The macro_recorder is started and stopped via the start_recording and stop_recording tools, but self.macro_recorder.record_step is never actually called within any of the mutating tool implementations (such as click, type_text, press_key, drag, scroll, set_value, or perform_action).

As a result, any recorded macro will always contain 0 steps. Please ensure that record_step is called with the corresponding tool name and parameters inside each of these mutating tools to enable functional macro recording.

gemini-code-assist · 2026-06-08T08:19:52Z

+fn draw_hollow_rect(image: &mut RgbaImage, x: i32, y: i32, width: u32, height: u32, color: Rgba<u8>) {
+    let image_width = image.width() as i32;
+    let image_height = image.height() as i32;
+    let left = x.max(0);
+    let top = y.max(0);
+    let right = (x + width as i32).min(image_width);
+    let bottom = (y + height as i32).min(image_height);
+    if left >= right || top >= bottom {
+        return;
+    }
+    for px in left..right {
+        if top < image_height {
+            image.put_pixel(px as u32, top as u32, color);
+        }
+        if bottom - 1 < image_height {
+            image.put_pixel(px as u32, (bottom - 1) as u32, color);
+        }
+    }
+    for py in top..bottom {
+        if left < image_width {
+            image.put_pixel(left as u32, py as u32, color);
+        }
+        if right - 1 < image_width {
+            image.put_pixel((right - 1) as u32, py as u32, color);
+        }
+    }
+}


The current implementation of draw_hollow_rect draws borders along the screen edges (left = x.max(0), top = y.max(0)) when a bounding box is partially off-screen. This creates misleading visual borders that do not correspond to the actual element boundaries.

We can implement proper clipping by only drawing the pixels of the rectangle's borders that actually fall within the image dimensions.

fn draw_hollow_rect(image: &mut RgbaImage, x: i32, y: i32, width: u32, height: u32, color: Rgba<u8>) { let iw = image.width() as i32; let ih = image.height() as i32; let x2 = x + width as i32 - 1; let y2 = y + height as i32 - 1; for px in x..=x2 { if px >= 0 && px < iw { if y >= 0 && y < ih { image.put_pixel(px as u32, y as u32, color); } if y2 >= 0 && y2 < ih { image.put_pixel(px as u32, y2 as u32, color); } } } for py in y..=y2 { if py >= 0 && py < ih { if x >= 0 && x < iw { image.put_pixel(x as u32, py as u32, color); } if x2 >= 0 && x2 < iw { image.put_pixel(x2 as u32, py as u32, color); } } } }

avifenesh

This is an auto review done by revuto.

avifenesh · 2026-06-08T08:30:48Z

+        &self,
+        Parameters(params): Parameters<StartRecordingParams>,
+    ) -> Json<RecordingOutput> {
+        let message = self.macro_recorder.start(params.name.clone());


This is an auto review done by revuto.

start_recording flips the recorder on, but none of the mutating tool handlers call self.macro_recorder.record_step(...) before returning (a search for record_step only finds the method definition). As a result stop_recording will always report an empty steps array, so the new macro/replay feature advertised by these tools cannot capture any workflow.

avifenesh · 2026-06-08T16:05:39Z

@Stijnman Hi :)
Thanks for the PR!
Could you please ensure that CI passes and you address all the reviews so I can have a more focused scope to review?

Stijnman requested a review from avifenesh as a code owner June 8, 2026 08:17

gemini-code-assist Bot reviewed Jun 8, 2026

View reviewed changes

avifenesh reviewed Jun 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Sway backend + hybrid NL find + macro/OCR + Hermes skill#24

feat: Sway backend + hybrid NL find + macro/OCR + Hermes skill#24
Stijnman wants to merge 1 commit into
agent-sh:mainfrom
Stijnman:feat/all-extra-functionalities-v1

Stijnman commented Jun 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 8, 2026

Uh oh!

gemini-code-assist Bot Jun 8, 2026

Uh oh!

gemini-code-assist Bot Jun 8, 2026

Uh oh!

gemini-code-assist Bot Jun 8, 2026

Uh oh!

avifenesh left a comment

Uh oh!

avifenesh Jun 8, 2026

Uh oh!

avifenesh commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Stijnman commented Jun 8, 2026

Summary

Windowing

Agent ergonomics

Hermes onboarding

Test plan

Notes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

avifenesh left a comment

Choose a reason for hiding this comment

Uh oh!

avifenesh Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

avifenesh commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants