Skip to content

gleean/glean

Repository files navigation

Glean logo

Glean

Glean is a local-first knowledge engine with a Rust core. This repository is still early-stage; the current milestone focuses on a working Cargo workspace and an MCP-compatible stdio server (glean mcp).

Workspace layout

Crate Role
packages/core Indexing engine: LanceDB, SQLite shadow, GleanEngine, pipeline
packages/host Host runtime: daemon loop, MCP router, config editor, status
apps/cli Single glean binary: Clap, stdio MCP transport, glean daemon, glean index
apps/desktop Tauri 2 + Vite/React desktop UI; read-only search via glean-host, indexing via sidecar glean daemon

Building

Prerequisites: Rust stable 1.91+ (rustup), cargo, and protoc (Protocol Buffers compiler - required by LanceDB's Rust dependency chain). On macOS: brew install protobuf; on Debian/Ubuntu: sudo apt-get install protobuf-compiler.

Full workspace builds (cargo test --workspace, cargo clippy --workspace) also compile glean-desktop (Tauri). On Linux, install WebKitGTK / related dev packages so pkg-config finds glib-2.0 (see the apt-get install list in .github/workflows/rust.yml).

cargo build -p glean-cli --release

The binary is emitted as target/release/glean.

Desktop app (Tauri + Vite)

See apps/desktop/README.md. Releases: run pnpm tag on main, then git push origin main and git push origin vX.Y.Z — CI builds macOS Apple Silicon (.dmg), Windows NSIS (.exe), and standalone glean-* CLI binaries. See ops notes (local).

Quick start:

cargo build -p glean-cli          # sidecar binary at target/debug/glean
pnpm install
pnpm --filter @glean/desktop tauri dev

MCP (stdio)

Run the MCP server on stdin/stdout:

glean mcp

Manual stdin (debugging)

The server reads newline-delimited JSON: each line must be a full JSON-RPC 2.0 object (jsonrpc, method, and for requests an id). Typing plain text such as initialize is not valid JSON, so you will see Parse error (-32700) on stdout.

INFO lines from lance::dataset_events on stderr while the process starts are normal: the engine opens the Lance dataset before the read loop; they are not protocol traffic.

One-shot check (one request, then EOF):

echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' | ./target/release/glean mcp

Expect a single JSON line on stdout with a result payload (capabilities + serverInfo). Two requests on two lines:

printf '%s\n%s\n' \
  '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' \
  '{"jsonrpc":"2.0","id":2,"method":"tools/list"}' \
  | ./target/release/glean mcp

Automated tests (recommended over manual stdio)

MCP behaviour is awkward to poke interactively because stdin needs JSON lines. Use the bundled tests instead:

  • Fast in-process tests (handle_json_line):
    cargo test -p glean-host mcp::router
  • Real subprocess + temp storage (CARGO_BIN_EXE_glean stdio framing):
    cargo test -p glean-cli --test mcp_subprocess

The router tests cover initialize, invalid JSON / plaintext lines, unknown methods, tools/list, and tools/call/search_semantic/search_files round-trips with indexed content.

Cursor (example)

Use command + args form when your client splits arguments:

  • Command: /absolute/path/to/target/release/glean
  • Args: mcp

Some UIs accept a single command string (/path/to/glean mcp); use whichever your editor supports.

Implemented protocol surface (MVP)

This server speaks newline-delimited JSON-RPC 2.0 with MCP-shaped methods:

  • initialize
  • tools/list: search_semantic, search_files, read_file_context, get_recent_changes, get_graph_topology
  • tools/call: search_semantic (hybrid + optional rerank; needs content_state=ready), search_files (SQLite fs_entries; partial results during initial walk; no embedder), read_file_context, get_recent_changes (discovered files by mtime_ns, not vector-gated), get_graph_topology (Lite Graph after content ingest). MCP does not run the indexer—use glean daemon.

Environment:

  • GLEAN_STORAGE_ROOT: global home — config.toml, cache/embedding/ (FastEmbed model.onnx + tokenizer downloads), cache/reranker/, logs/ (defaults to ~/.glean).
  • GLEAN_WORKSPACE_ROOT: workspace root for MCP / daemon (optional; defaults to cwd). Per-project index lives at <workspace>/.glean/ (metadata/index.db, vectors/ only — no config file there).
  • GLEAN_LOG: optional filter for tracing on stderr and rolling files (same syntax as tracing_subscriber::EnvFilter, e.g. info, glean_core=debug). If unset: MCP / glean status use info on both stderr and rolling files; glean daemon uses info for rolling files and warn on stderr. The glean binary does not read RUST_LOG.
  • Runtime TOML (optional): merged from $GLEAN_STORAGE_ROOT/config.toml only (defaults + global). [indexing].watch_interval is in seconds; 0 = daemon initial sync only. Nested Git repositories under a workspace are discovered but not auto-indexed (use Desktop Start indexing or append metadata/index_requests.jsonl). See .docs/04-Ops-Security/local-storage-model.md for multi-workspace index layout. glean config list / init / set operate on global config; glean models pull rerank pre-downloads the BGE cache under global storage.

Rolling logs also land under {GLEAN_STORAGE_ROOT}/logs/ (cli.yyyy-mm-dd / daemon.yyyy-mm-dd). Do not print diagnostics to stdout while running glean mcp. For quick inspection from a terminal, run glean logs (-n line count, --source cli|daemon|all); it does not install the tracing subscriber.

Embedding model & rebuilding the vector index

Chunks are embedded with FastEmbed using AllMiniLM-L6-v2 (384-dimensional Float32 vectors) and stored in LanceDB document_chunks (see .docs/02-Developer-Guide/lancedb-schema.md). The ONNX artifacts download on first use into $GLEAN_STORAGE_ROOT/cache/embedding/ (not the process working directory — required for packaged desktop apps).

If you upgrade Glean and see LanceDB schema mismatch, stop running processes, delete <workspace>/.glean/vectors (or the whole .glean folder), then run glean daemon again to reindex that workspace.

Manual index: glean index enqueues a walk when the daemon is already running, or runs a one-shot drain when it is not (--wait polls until idle).

Verification loop

chmod +x scripts/verify_rust.sh
./scripts/verify_rust.sh

Optional: Cursor Hooks (local only)

This repo may ignore .cursor/ for open-source hygiene. If you want automatic verification after an Agent completes a turn, configure a user-level Cursor hook (for example on stop) to run scripts/verify_rust.sh.

Treat .github/workflows/rust.yml as the shared PR / main CI gate (fmt, clippy, tests, excludes glean-desktop on Linux). Desktop releases: local pnpm tag then push tag → release-desktop.yml; see .docs/04-Ops-Security/desktop-release.md.

Contributors

  • Optional Cursor Hooks (e.g. on stop) pointing at scripts/verify_rust.sh are a local productivity aid. They are not required for correctness.
  • rust.yml: PR + non-release pushes to main; --exclude glean-desktop (faster than full workspace).
  • scripts/verify_rust.sh: local full workspace including desktop/Tauri sidecar prep.
  • Releases: download .dmg, Windows NSIS setup .exe, and glean-* CLI binaries from GitHub Releases — not the auto-generated Source code zip only.

License

Licensed under the Apache License, Version 2.0. See LICENSE.

About

A local-first knowledge engine and RAG infrastructure powered by Rust, featuring MCP support, hybrid search (BM25 + Vector), and LanceDB integration

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors