feat: add LM Studio backend, draggable + per-session-hideable gem icon by guberm · Pull Request #15 · kessler/gemma-gem

guberm · 2026-06-10T15:59:35Z

Summary

Adds two user-facing features and a pnpm build fix:

Local model via LM Studio (over a link) — run inference against a local OpenAI-compatible endpoint instead of in-browser WebGPU.
Movable gem icon — drag the floating icon anywhere; position persists.
Hide the gem icon for the current session on the current site — a temporary, per-hostname hide that clears on browser restart (distinct from the permanent "Disable on this site").

1. LM Studio / remote backend

A third model option, LM Studio (local), appears in the gear-icon settings. When selected, an endpoint panel lets you set the base URL (default http://localhost:1234/v1), an optional model name, optional API key, and a context limit.

The agent already builds the full Gemma-formatted prompt, so the remote backend POSTs it raw to {baseUrl}/completions with stream: true and parses the SSE deltas — the same special-token tool/thinking protocol applies as the WebGPU path. A Gemma model loaded in LM Studio is recommended for tool use.
Special-token parsing (thinking blocks, tool calls, visible text) was extracted from model-host.ts into a shared GemmaStreamFilter so both backends behave identically.
load() pings /models so you get immediate feedback if LM Studio's server isn't running.
Settings persist across sessions; switching back to a WebGPU Gemma model is one dropdown change.
Limitation: image/audio tools (screenshots) work only on the WebGPU backend, not over the LM Studio link.

New: offscreen/remote-model-host.ts, offscreen/stream-filter.ts. The offscreen agent loop now swaps between an in-browser host and the remote host via an activeHost reference; the background reads/persists the endpoint config and forwards it on model:load / model:switch.

2. Draggable icon

Press-and-drag the gem icon (pointer events, 4px threshold to distinguish a drag from a click, clamped to the viewport). Position is saved to storage.local and restored on every page.

3. Per-session hide

A Hide gem icon (this session) button hides the icon per-hostname using chrome.storage.session (survives reloads, clears on browser restart). While hidden, reopen the chat with the toggle shortcut (Alt+G), where the button flips to Show gem icon. Required granting content scripts session-storage access in the background.

Build fix (pnpm)

wxt.config.ts resolved onnxruntime-web via require.resolve('onnxruntime-web'), which only worked under npm's flat node_modules. Now it resolves relative to @huggingface/transformers (its parent), so the build works under pnpm's nested layout.

Files

shared/models.ts — lm-studio model + RemoteEndpointConfig type, isRemoteModel(), storage keys/defaults
shared/messages.ts — remote:config message; remoteConfig on model load/switch
offscreen/remote-model-host.ts, offscreen/stream-filter.ts — new
offscreen/model-host.ts, entrypoints/offscreen/main.ts — backend swapping + shared stream filter
background/message-router.ts, entrypoints/background.ts — config persistence + session-storage access
content/chat-overlay.ts, content/gem-icon.ts, entrypoints/content.ts — settings UI, drag, session-hide
wxt.config.ts — pnpm-compatible ORT resolve
README.md — docs for all three features

Testing

wxt build --mode development ✔ — extension builds into .output/chrome-mv3-dev.
Manual: load unpacked, start LM Studio's local server with a Gemma model, select LM Studio (local); drag the icon; hide/show it for the session.

Notes for reviewers

public/ort/ort-wasm-simd-threaded.asyncify.mjs changed because the pinned onnxruntime-web dev version copies a matching loader. Revert if you want to keep the previously committed loader.
pnpm compile still reports pre-existing errors (chrome, navigator.gpu, transformers strict-null) because the repo doesn't declare @types/chrome / @webgpu/types. These predate this PR and don't affect the build. Happy to add those devDependencies in a follow-up if desired.

## Summary Adds two user-facing features and a pnpm build fix: 1. **Local model via LM Studio (over a link)** — run inference against a local OpenAI-compatible endpoint instead of in-browser WebGPU. 2. **Movable gem icon** — drag the floating icon anywhere; position persists. 3. **Hide the gem icon for the current session on the current site** — a temporary, per-hostname hide that clears on browser restart (distinct from the permanent "Disable on this site"). ## 1. LM Studio / remote backend A third model option, **LM Studio (local)**, appears in the gear-icon settings. When selected, an endpoint panel lets you set the **base URL** (default `http://localhost:1234/v1`), an optional **model name**, optional **API key**, and a **context limit**. - The agent already builds the full Gemma-formatted prompt, so the remote backend POSTs it raw to `{baseUrl}/completions` with `stream: true` and parses the SSE deltas — the same special-token tool/thinking protocol applies as the WebGPU path. A Gemma model loaded in LM Studio is recommended for tool use. - Special-token parsing (thinking blocks, tool calls, visible text) was extracted from `model-host.ts` into a shared `GemmaStreamFilter` so both backends behave identically. - `load()` pings `/models` so you get immediate feedback if LM Studio's server isn't running. - Settings persist across sessions; switching back to a WebGPU Gemma model is one dropdown change. - **Limitation:** image/audio tools (screenshots) work only on the WebGPU backend, not over the LM Studio link. New: `offscreen/remote-model-host.ts`, `offscreen/stream-filter.ts`. The offscreen agent loop now swaps between an in-browser host and the remote host via an `activeHost` reference; the background reads/persists the endpoint config and forwards it on `model:load` / `model:switch`. ## 2. Draggable icon Press-and-drag the gem icon (pointer events, 4px threshold to distinguish a drag from a click, clamped to the viewport). Position is saved to `storage.local` and restored on every page. ## 3. Per-session hide A **Hide gem icon (this session)** button hides the icon per-hostname using `chrome.storage.session` (survives reloads, clears on browser restart). While hidden, reopen the chat with the toggle shortcut (Alt+G), where the button flips to **Show gem icon**. Required granting content scripts session-storage access in the background. ## Build fix (pnpm) `wxt.config.ts` resolved `onnxruntime-web` via `require.resolve('onnxruntime-web')`, which only worked under npm's flat `node_modules`. Now it resolves relative to `@huggingface/transformers` (its parent), so the build works under pnpm's nested layout. ## Files - `shared/models.ts` — `lm-studio` model + `RemoteEndpointConfig` type, `isRemoteModel()`, storage keys/defaults - `shared/messages.ts` — `remote:config` message; `remoteConfig` on model load/switch - `offscreen/remote-model-host.ts`, `offscreen/stream-filter.ts` — **new** - `offscreen/model-host.ts`, `entrypoints/offscreen/main.ts` — backend swapping + shared stream filter - `background/message-router.ts`, `entrypoints/background.ts` — config persistence + session-storage access - `content/chat-overlay.ts`, `content/gem-icon.ts`, `entrypoints/content.ts` — settings UI, drag, session-hide - `wxt.config.ts` — pnpm-compatible ORT resolve - `README.md` — docs for all three features ## Testing - `wxt build --mode development` ✔ — extension builds into `.output/chrome-mv3-dev`. - Manual: load unpacked, start LM Studio's local server with a Gemma model, select **LM Studio (local)**; drag the icon; hide/show it for the session. ## Notes for reviewers - `public/ort/ort-wasm-simd-threaded.asyncify.mjs` changed because the pinned `onnxruntime-web` dev version copies a matching loader. Revert if you want to keep the previously committed loader. - `pnpm compile` still reports pre-existing errors (`chrome`, `navigator.gpu`, transformers strict-null) because the repo doesn't declare `@types/chrome` / `@webgpu/types`. These predate this PR and don't affect the build. Happy to add those devDependencies in a follow-up if desired.

…status - Chat opens adjacent to the gem icon (above when room, below otherwise) and both move together as a linked unit - Chat window is draggable by its header; dragging either window or icon moves the other by the same delta; position saved via icon's storage key - fix: RemoteModelHost.getCurrentModelId() returned null during load(), causing model:status 'loading' to fall back to initialModelId (Gemma E2B) and show "Downloading Gemma 4 E2B…" instead of "Connecting to LM Studio…" — now tracks loadingModelId (same pattern as GemmaModelHost) - content.ts tracks currentModelId independently of initialModelId so the fallback in model:status is always the actual selected model

…dels - Chat window is resizable (CSS resize + ResizeObserver); size persists via storage.local and is restored across pages/sessions - Model name shown in header below "Gemma Gem" title, updates on model switch - Settings panel scrollable when taller than 55% of window height - moveTo/getSize use actual container dimensions (adapts after user resize) - LM Studio config: ⟳ button and debounced URL-change trigger fetch of /v1/models from the endpoint; results appear as clickable chips that fill the "Model name" field; fetches go through the background service worker (avoids CORS / mixed-content issues in the content script) - remote:fetch_models / remote:models_result message types added

…ON fetch - Replace CSS `resize: both` (bottom-right only) with 8 custom transparent handle divs (n/s/e/w + 4 corners); each correctly updates width, height, and left/top position, clamped to viewport and min/max dimensions - Add `ngrok-skip-browser-warning: 1` header to all remote fetches (authHeaders in RemoteModelHost + background fetch_models handler) so ngrok tunnels return JSON instead of the browser-warning HTML page

- Header: "+ New" button clears messages and resets agent history (onNewChat also resets stopped/modelReady/shownLoadingMessage so the next send behaves like a fresh session) - LM Studio config: ⟳ button moved next to "Model name" field (removed from URL row); fetched models populate a <datalist> connected to the text input — click ⟳, then type or select from the native browser dropdown; if only one model is loaded it auto-fills; fetch errors shown inline below the field - Removed old chips / models-section UI

guberm added 5 commits June 10, 2026 11:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add LM Studio backend, draggable + per-session-hideable gem icon#15

feat: add LM Studio backend, draggable + per-session-hideable gem icon#15
guberm wants to merge 5 commits into
kessler:mainfrom
guberm:main

guberm commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

guberm commented Jun 10, 2026

Summary

1. LM Studio / remote backend

2. Draggable icon

3. Per-session hide

Build fix (pnpm)

Files

Testing

Notes for reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant