feat: add LM Studio backend, draggable + per-session-hideable gem icon#15
Open
guberm wants to merge 5 commits into
Open
feat: add LM Studio backend, draggable + per-session-hideable gem icon#15guberm wants to merge 5 commits into
guberm wants to merge 5 commits into
Conversation
## Summary
Adds two user-facing features and a pnpm build fix:
1. **Local model via LM Studio (over a link)** — run inference against a local OpenAI-compatible endpoint instead of in-browser WebGPU.
2. **Movable gem icon** — drag the floating icon anywhere; position persists.
3. **Hide the gem icon for the current session on the current site** — a temporary, per-hostname hide that clears on browser restart (distinct from the permanent "Disable on this site").
## 1. LM Studio / remote backend
A third model option, **LM Studio (local)**, appears in the gear-icon settings. When selected, an endpoint panel lets you set the **base URL** (default `http://localhost:1234/v1`), an optional **model name**, optional **API key**, and a **context limit**.
- The agent already builds the full Gemma-formatted prompt, so the remote backend POSTs it raw to `{baseUrl}/completions` with `stream: true` and parses the SSE deltas — the same special-token tool/thinking protocol applies as the WebGPU path. A Gemma model loaded in LM Studio is recommended for tool use.
- Special-token parsing (thinking blocks, tool calls, visible text) was extracted from `model-host.ts` into a shared `GemmaStreamFilter` so both backends behave identically.
- `load()` pings `/models` so you get immediate feedback if LM Studio's server isn't running.
- Settings persist across sessions; switching back to a WebGPU Gemma model is one dropdown change.
- **Limitation:** image/audio tools (screenshots) work only on the WebGPU backend, not over the LM Studio link.
New: `offscreen/remote-model-host.ts`, `offscreen/stream-filter.ts`. The offscreen agent loop now swaps between an in-browser host and the remote host via an `activeHost` reference; the background reads/persists the endpoint config and forwards it on `model:load` / `model:switch`.
## 2. Draggable icon
Press-and-drag the gem icon (pointer events, 4px threshold to distinguish a drag from a click, clamped to the viewport). Position is saved to `storage.local` and restored on every page.
## 3. Per-session hide
A **Hide gem icon (this session)** button hides the icon per-hostname using `chrome.storage.session` (survives reloads, clears on browser restart). While hidden, reopen the chat with the toggle shortcut (Alt+G), where the button flips to **Show gem icon**. Required granting content scripts session-storage access in the background.
## Build fix (pnpm)
`wxt.config.ts` resolved `onnxruntime-web` via `require.resolve('onnxruntime-web')`, which only worked under npm's flat `node_modules`. Now it resolves relative to `@huggingface/transformers` (its parent), so the build works under pnpm's nested layout.
## Files
- `shared/models.ts` — `lm-studio` model + `RemoteEndpointConfig` type, `isRemoteModel()`, storage keys/defaults
- `shared/messages.ts` — `remote:config` message; `remoteConfig` on model load/switch
- `offscreen/remote-model-host.ts`, `offscreen/stream-filter.ts` — **new**
- `offscreen/model-host.ts`, `entrypoints/offscreen/main.ts` — backend swapping + shared stream filter
- `background/message-router.ts`, `entrypoints/background.ts` — config persistence + session-storage access
- `content/chat-overlay.ts`, `content/gem-icon.ts`, `entrypoints/content.ts` — settings UI, drag, session-hide
- `wxt.config.ts` — pnpm-compatible ORT resolve
- `README.md` — docs for all three features
## Testing
- `wxt build --mode development` ✔ — extension builds into `.output/chrome-mv3-dev`.
- Manual: load unpacked, start LM Studio's local server with a Gemma model, select **LM Studio (local)**; drag the icon; hide/show it for the session.
## Notes for reviewers
- `public/ort/ort-wasm-simd-threaded.asyncify.mjs` changed because the pinned `onnxruntime-web` dev version copies a matching loader. Revert if you want to keep the previously committed loader.
- `pnpm compile` still reports pre-existing errors (`chrome`, `navigator.gpu`, transformers strict-null) because the repo doesn't declare `@types/chrome` / `@webgpu/types`. These predate this PR and don't affect the build. Happy to add those devDependencies in a follow-up if desired.
…status - Chat opens adjacent to the gem icon (above when room, below otherwise) and both move together as a linked unit - Chat window is draggable by its header; dragging either window or icon moves the other by the same delta; position saved via icon's storage key - fix: RemoteModelHost.getCurrentModelId() returned null during load(), causing model:status 'loading' to fall back to initialModelId (Gemma E2B) and show "Downloading Gemma 4 E2B…" instead of "Connecting to LM Studio…" — now tracks loadingModelId (same pattern as GemmaModelHost) - content.ts tracks currentModelId independently of initialModelId so the fallback in model:status is always the actual selected model
…dels - Chat window is resizable (CSS resize + ResizeObserver); size persists via storage.local and is restored across pages/sessions - Model name shown in header below "Gemma Gem" title, updates on model switch - Settings panel scrollable when taller than 55% of window height - moveTo/getSize use actual container dimensions (adapts after user resize) - LM Studio config: ⟳ button and debounced URL-change trigger fetch of /v1/models from the endpoint; results appear as clickable chips that fill the "Model name" field; fetches go through the background service worker (avoids CORS / mixed-content issues in the content script) - remote:fetch_models / remote:models_result message types added
…ON fetch - Replace CSS `resize: both` (bottom-right only) with 8 custom transparent handle divs (n/s/e/w + 4 corners); each correctly updates width, height, and left/top position, clamped to viewport and min/max dimensions - Add `ngrok-skip-browser-warning: 1` header to all remote fetches (authHeaders in RemoteModelHost + background fetch_models handler) so ngrok tunnels return JSON instead of the browser-warning HTML page
- Header: "+ New" button clears messages and resets agent history (onNewChat also resets stopped/modelReady/shownLoadingMessage so the next send behaves like a fresh session) - LM Studio config: ⟳ button moved next to "Model name" field (removed from URL row); fetched models populate a <datalist> connected to the text input — click ⟳, then type or select from the native browser dropdown; if only one model is loaded it auto-fills; fetch errors shown inline below the field - Removed old chips / models-section UI
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds two user-facing features and a pnpm build fix:
1. LM Studio / remote backend
A third model option, LM Studio (local), appears in the gear-icon settings. When selected, an endpoint panel lets you set the base URL (default
http://localhost:1234/v1), an optional model name, optional API key, and a context limit.{baseUrl}/completionswithstream: trueand parses the SSE deltas — the same special-token tool/thinking protocol applies as the WebGPU path. A Gemma model loaded in LM Studio is recommended for tool use.model-host.tsinto a sharedGemmaStreamFilterso both backends behave identically.load()pings/modelsso you get immediate feedback if LM Studio's server isn't running.New:
offscreen/remote-model-host.ts,offscreen/stream-filter.ts. The offscreen agent loop now swaps between an in-browser host and the remote host via anactiveHostreference; the background reads/persists the endpoint config and forwards it onmodel:load/model:switch.2. Draggable icon
Press-and-drag the gem icon (pointer events, 4px threshold to distinguish a drag from a click, clamped to the viewport). Position is saved to
storage.localand restored on every page.3. Per-session hide
A Hide gem icon (this session) button hides the icon per-hostname using
chrome.storage.session(survives reloads, clears on browser restart). While hidden, reopen the chat with the toggle shortcut (Alt+G), where the button flips to Show gem icon. Required granting content scripts session-storage access in the background.Build fix (pnpm)
wxt.config.tsresolvedonnxruntime-webviarequire.resolve('onnxruntime-web'), which only worked under npm's flatnode_modules. Now it resolves relative to@huggingface/transformers(its parent), so the build works under pnpm's nested layout.Files
shared/models.ts—lm-studiomodel +RemoteEndpointConfigtype,isRemoteModel(), storage keys/defaultsshared/messages.ts—remote:configmessage;remoteConfigon model load/switchoffscreen/remote-model-host.ts,offscreen/stream-filter.ts— newoffscreen/model-host.ts,entrypoints/offscreen/main.ts— backend swapping + shared stream filterbackground/message-router.ts,entrypoints/background.ts— config persistence + session-storage accesscontent/chat-overlay.ts,content/gem-icon.ts,entrypoints/content.ts— settings UI, drag, session-hidewxt.config.ts— pnpm-compatible ORT resolveREADME.md— docs for all three featuresTesting
wxt build --mode development✔ — extension builds into.output/chrome-mv3-dev.Notes for reviewers
public/ort/ort-wasm-simd-threaded.asyncify.mjschanged because the pinnedonnxruntime-webdev version copies a matching loader. Revert if you want to keep the previously committed loader.pnpm compilestill reports pre-existing errors (chrome,navigator.gpu, transformers strict-null) because the repo doesn't declare@types/chrome/@webgpu/types. These predate this PR and don't affect the build. Happy to add those devDependencies in a follow-up if desired.