codebase-index is a local-first codebase indexing tool that helps Claude Code,
Codex CLI, OpenCode, and other AI coding agents find relevant files, symbols, and
references without scanning an entire repository.
codebase-index is a private, offline retrieval layer for AI code search. It builds a SQLite index of your repository, extracts symbols with Tree-sitter, ranks matches with hybrid retrieval, and returns compact file:line ranges that an AI coding agent can read instead of opening broad file sets.
Use it when you want Cursor-like codebase awareness in terminal-based AI tools while keeping source code, snippets, and search metadata on your machine.
If you are opening this repository for the first time, follow this order:
If you only need the shortest path, run:
pip install "codebase-index @ git+https://github.com/denfry/[email protected]"
cd your-project
codebase-index init # prompts for Claude Code / Codex CLI / OpenCode
codebase-index index
codebase-index search "where is authentication implemented?"1.2.1 is released. The current release includes repository discovery,
SQLite FTS5 storage, Tree-sitter symbols and references, hybrid ranking, graph
impact analysis, token-budgeted retrieval packets, optional local embeddings,
hooks/watch support, multi-CLI installation, MCP server support, and a tested
GitHub-only pipx install path.
The 1.2.1 release adds skill auto-update/rollback commands and version stamps
so installed skills stay in sync with the package automatically.
The 1.2.0 release added HTML graph export, auto-indexing search commands, and
updated skill resources.
See CHANGELOG.md and
docs/ROADMAP.md.
MCP is now available as a stdio server via codebase-index mcp --root <repo>.
It exposes healthcheck, search_code, find_symbol, find_refs,
impact_of, explain_code, and index_stats; see docs/MCP.md.
You: "Where is user authentication implemented?"
Agent: searches local index (symbols + FTS5 + graph)
reads only 3 ranked files instead of scanning 60
answers with citations: src/auth/AuthService.ts:12-148
For most users, install the package from the tagged GitHub release and run
init inside the repository you want to index:
pip install "codebase-index @ git+https://github.com/denfry/[email protected]"
cd your-project
codebase-index init # choose Claude Code, Codex CLI, OpenCode, or all
codebase-index indexIn a non-interactive script, pass a target explicitly:
codebase-index init --target auto # install into detected AI CLIs
codebase-index init --target codex # write AGENTS.md + Codex resources
codebase-index init --target claude # write .claude/skills/codebase-index
codebase-index init --target opencode # write OpenCode command + agent filesOne command in Claude Code:
/plugin marketplace add denfry/codebase-index
/plugin install codebase-index@codebase-index
Or just ask: "install the codebase-index plugin".
What happens on first run: when a session starts, a SessionStart hook
(scripts/bootstrap.sh / .ps1) creates a private Python virtual environment under
~/.claude/plugins/data/codebase-index-*/venv and installs the pinned
codebase-index package (from requirements.lock) into it — using uv if present,
otherwise python -m venv + pip. It reinstalls only when the lock file changes.
Nothing is installed globally; uninstalling the plugin removes the data directory.
Prerequisite: Python 3.11+ on your PATH. The first install needs network access to
fetch the package; later sessions are offline. The skill builds its index on
your first codebase question, so there is no manual index step.
Distribution note: the plugin bootstrap installs the pinned requirement from
requirements.lock. In 1.2.1, that lock points at the tagged GitHub release
instead of PyPI. You can override it with CBX_INSTALL_SPEC when testing a local
checkout or a different Git ref.
AI coding agents struggle with large repositories when they rely on broad file
reads, grep output, or user-provided context. codebase-index gives those agents
a ranked local retrieval packet before they read source files.
- Token waste — Scanning entire files or running broad grep/glob queries burns through the context window on irrelevant content.
- No symbol awareness — Standard search can't distinguish a function definition from a call, or a class from a variable.
- No ranking — Grep returns all matches with no relevance ordering. The agent must read everything.
- No context — Grep doesn't know which files are related or what to read next.
- Cloud dependency — External code indexing services send your proprietary code to remote servers.
Developers get Cursor-like codebase awareness in Claude Code, Codex CLI, and OpenCode without leaving the terminal or sending code to a remote indexing service.
codebase-index builds a local hybrid index that combines:
- Symbol search — Tree-sitter AST parsing extracts classes, functions, methods, and variables across the supported code-language set.
- Full-text search — SQLite FTS5 for fast lexical search across code chunks.
- Path search — File path matching for location-aware queries.
- Optional semantic search — Vector embeddings for similarity-based retrieval (opt-in, local by default).
- Dependency graph — Import, call, and reference edges for impact analysis and graph expansion.
- Token-budgeted output — Ranked retrieval packets with specific line ranges, not whole files.
The AI agent reads only the recommended files and line ranges, not the entire repository.
/codebase-index "where is user authentication implemented?"Expected output:
Top matches:
┌──────┬──────────────────────────┬──────────────────────────┬───────┬──────────────────────────────┐
│ Rank │ Path │ Symbols │ Score │ Reason │
├──────┼──────────────────────────┼──────────────────────────┼───────┼──────────────────────────────┤
│ 1 │ src/auth/AuthService.ts │ AuthService, login │ 0.92 │ exact symbol match │
│ 2 │ src/routes/auth.ts │ loginHandler, logout │ 0.78 │ FTS match · 4 callers │
│ 3 │ src/middleware/auth.ts │ requireAuth │ 0.65 │ path match · FTS match │
└──────┴──────────────────────────┴──────────────────────────┴───────┴──────────────────────────────┘
Recommended reads:
1. src/auth/AuthService.ts:12-148
reason: matched AuthService, login(), validatePassword()
2. src/routes/auth.ts:20-91
reason: /login route calls AuthService.login()
3. src/middleware/auth.ts:5-42
reason: auth middleware validates sessions
If you are new to this repo, start with docs/QUICKSTART.md.
If you want all install options and troubleshooting, use docs/INSTALLATION.md.
Multi-CLI installer (Claude Code + Codex CLI + OpenCode): one command via
install.sh / install.ps1 — see docs/installer.md.
# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/denfry/codebase-index/main/install.sh | sh# Windows PowerShell
irm https://raw.githubusercontent.com/denfry/codebase-index/main/install.ps1 | iexcd your-project
pip install "codebase-index @ git+https://github.com/denfry/[email protected]"
codebase-index init
codebase-index indexcodebase-index requires Python 3.11 or newer.
If codebase-index init --target opencode fails with:
ModuleNotFoundError: No module named 'importlib.resources.abc'; 'importlib.resources' is not a package
the pipx environment was likely created with an older Python version. Reinstall codebase-index using Python 3.11+ explicitly:
pipx uninstall codebase-index
py -0p
pipx install --python "<path-to-python-3.11-or-newer>\python.exe" "git+https://github.com/denfry/[email protected]"For example:
pipx install --python "C:\Users\you\AppData\Local\Programs\Python\Python312\python.exe" "git+https://github.com/denfry/[email protected]"Then run initialization again:
codebase-index init --target opencode
codebase-index indexpipx install "git+https://github.com/denfry/[email protected]"
cd your-project
codebase-index init --target auto
codebase-index indexgit clone https://github.com/denfry/codebase-index.git
cd codebase-index
pip install -e ".[dev]"PyPI, uvx, Homebrew, signed release checksums, and SBOMs are important for a
tool that reads entire repositories, but they are not all verified as shipped in
1.2.1. Target install story:
uvx codebase-index init
pipx install codebase-index
brew install denfry/tap/codebase-indexcodebase-index doctorSee docs/INSTALLATION.md for the full guide, including optional extras (embeddings, watch mode) and troubleshooting.
# Initialize the index for your project
codebase-index init
# Build the index
codebase-index index
# Search for something
codebase-index search "where is authentication implemented?"
# Look up a specific symbol
codebase-index symbol "AuthService"
# Find callers and references
codebase-index refs "AuthService.login"
# Analyze impact of a change
codebase-index impact "src/auth/AuthService.ts"
# View index statistics
codebase-index stats
# Run diagnostics
codebase-index doctorAdd --json to any command for machine-readable output.
User question
↓
CLI instructions or skill
↓
Hybrid retrieval
├─ Path search
├─ Symbol search (Tree-sitter AST)
├─ SQLite FTS5 full-text search
├─ Optional embeddings (vector search)
└─ Graph expansion (callers, imports, references)
↓
Ranked retrieval packet
↓
Agent reads only the recommended line ranges
↓
Answer with precise file:line citations
- Local-first indexing — All data stays on your machine
- No network by default — Zero external API calls out of the box
- Respects ignore files —
.gitignore,.claudeignore,.codeindexignore - SQLite storage — Fast, reliable, single-file database
- FTS5 lexical search — Full-text search with code-aware tokenization
- Tree-sitter AST parsing — Tier-A symbol extraction for Python, JavaScript, TypeScript, Java, Go, Rust, C, C++, C#, Ruby, PHP, and Kotlin; Tier-B generic extraction for code languages with a loadable grammar such as Lua
- Symbol extraction — Classes, functions, methods, variables with line ranges
- Incremental indexing — Only changed files are re-indexed
- Token-budgeted output — Configurable max output size
- Secret redaction — Masks keys, tokens, and credentials in snippets
- Optional embeddings — Local or remote vector search (opt-in)
- Optional hooks/watch — Auto-update index after file edits
- Multi-CLI setup — Claude Code, Codex CLI, and OpenCode instructions
- MCP server — stdio MCP tools for search, symbols, refs, impact, explain, health, and stats
codebase-index is designed with privacy as a first principle:
- No telemetry — No usage data, analytics, or crash reports are collected or transmitted.
- No external API calls by default — All indexing, storage, and search happen locally.
- Does not index sensitive files —
.env, private keys, certificates, tokens, and credential files are excluded before parsing. - Respects ignore files —
.gitignore,.claudeignore,.codeindexignore, and.cursorignoreare all honored. - Index stored locally — SQLite database in
.claude/cache/codebase-index/(gitignored by default). - Optional embeddings are local by default — External embedding APIs require explicit opt-in with warnings.
- Secret redaction — Snippets are scrubbed for AWS keys, private keys, JWTs, bearer tokens, and connection strings before output.
See docs/SECURITY_MODEL.md for the full security model and threat analysis.
There are three benchmark surfaces today:
- Public benchmark suite in
tests/benchmark_public.py: reproducible multi-language fixture with Recall@1/3/5, MRR, nDCG, answer-correctness proxy, token economy, language breakdown, freshness latency, graph tasks, and scale counters. - Smoke benchmark on
sample_repo: validates the CLI is fast and stable on a tiny fixture, but it is not evidence of production retrieval quality. - Honest benchmark on a real Java repository:
tests/benchmark_honest.pycompares codebase-index against a disciplinedrg+ read-window baseline on 10 realistic questions. Results are documented in tests/benchmark_honest_RESULTS.md.
Run the public suite:
python tests/benchmark_public.py --workdir .tmp-public-benchmarkCurrent honest benchmark headline:
| Metric | Result |
|---|---|
| Repo | 303 Java files, ~55k LOC |
| Retrieval quality | recall@3: 70% index vs 40% rg baseline |
| Token economy | ~13x fewer answer tokens than rg + 80-line windows |
| Verified language impact | Java symbols fixed from 0 to 3,543 symbols |
The public suite now has the metric framework. It still needs larger public or documented external repos for 10k/100k/1M LOC scale claims and deeper framework graph tasks. See docs/BENCHMARKS.md.
├── skill/ # Source instruction package (SKILL.md, scripts, examples)
├── skills/ # Plugin skill copy
├── src/codebase_index/ # Python package (CLI, indexer, retrieval, storage)
├── docs/ # Documentation (architecture, schema, security, FAQ)
├── examples/ # Sample queries, retrieval output, demo project
├── tests/ # Test suite with fixture repositories
├── bin/ # Plugin CLI wrappers (cbx, codebase-index)
├── scripts/ # Bootstrap scripts (bootstrap.sh, bootstrap.ps1)
├── hooks/ # Plugin hooks (hooks.json)
├── .claude-plugin/ # Plugin manifest + marketplace catalog
├── .github/ # Issue templates, CI workflows, PR template
├── README.md # This file
├── LICENSE # MIT License
├── CHANGELOG.md # Release history
├── CONTRIBUTING.md # Contributor guide
├── SECURITY.md # Security policy
├── ROADMAP.md # Development milestones
├── requirements.lock # Pinned install spec for bootstrap
└── pyproject.toml # Package configuration
Create .codeindex.json in your project root:
{
"index": {
"max_file_bytes": 1048576,
"chunk_size": 500,
"chunk_overlap": 50
},
"embeddings": {
"backend": "noop",
"allow_external": false
}
}.codeindexignore— Tool-specific ignore patterns (highest priority).gitignore— Standard git ignore patterns.claudeignore— Claude-specific ignore patterns
.claude/cache/codebase-index/
├── index.sqlite # SQLite database with FTS5
└── config.json # Resolved configuration
codebase-index init can install instructions for three AI coding CLIs:
| CLI | Files written by init |
Best command |
|---|---|---|
| Claude Code | .claude/skills/codebase-index/ |
codebase-index init --target claude |
| Codex CLI | AGENTS.md + .codex/skills/codebase-index/ |
codebase-index init --target codex |
| OpenCode | .opencode/commands/ + .opencode/agents/ + resources |
codebase-index init --target opencode |
Use codebase-index init --target auto to install into detected CLIs, or
codebase-index init --target all to write every supported integration.
The Claude Code skill is defined in skill/SKILL.md with
YAML frontmatter for automatic selection.
Example .claude/CLAUDE.md:
## Codebase Questions
Before answering any question about this project's code:
1. Use the codebase-index skill to search the local index first.
2. Read only the recommended line ranges — do not scan entire files.
3. Answer with file:line citations.Configure automatic index updates in .codeindex.json:
{
"hooks": {
"post_tool_use": {
"enabled": true,
"events": ["Write", "Edit"],
"command": "codebase-index update --quiet"
}
}
}See skill/examples/ for full examples.
No. codebase-index is not a replacement for Cursor or any IDE. It is a
local retrieval layer for terminal AI coding agents. You still use Claude Code,
Codex CLI, OpenCode, or another agent as your primary interface.
No. By default, codebase-index is completely local-first and offline. All indexing, storage, and search happen on your machine. External embeddings are opt-in only and require explicit configuration.
Yes. The default configuration disables embeddings entirely (backend = "noop"). Search uses SQLite FTS5, Tree-sitter symbol extraction, path matching, and graph expansion. Embeddings are an optional enhancement.
Yes. The index is incremental — only changed files are re-indexed. SQLite with FTS5 handles large datasets efficiently. Generated files, dependencies, and binaries are excluded automatically.
Grep returns all matches with no ranking, no symbol awareness, and no context about related files. codebase-index combines lexical search with symbol extraction and graph expansion to return ranked, contextual results with specific line ranges to read.
Yes. Run codebase-index mcp --root <repo> to expose the local index over stdio
MCP. See docs/MCP.md for tools and client config templates.
Yes. The CLI is agent-agnostic. Any agent that can run shell commands can use
codebase-index, and JSON output (--json) is parseable by other tools.
codebase-index clean
# Or manually: rm -rf .claude/cache/codebase-index/
codebase-index indexWe welcome contributions! See CONTRIBUTING.md for the full guide.
Quick start:
git clone https://github.com/denfry/codebase-index.git
cd codebase-index
pip install -e ".[dev]"
pytest
ruff check src/ tests/See ROADMAP.md for the full milestone plan.
| Milestone | Status | Description |
|---|---|---|
| M0 | ✅ Done | Repository packaging |
| M1 | ✅ Done | SQLite + FTS5 index |
| M2 | ✅ Done | Tree-sitter symbol extraction |
| M3 | ✅ Done | Hybrid retrieval |
| M4 | ✅ Done | Graph expansion |
| M5 | ✅ Done | Token-budgeted retrieval packets |
| M6 | ✅ Done | Optional local embeddings |
| M7 | ✅ Done | Claude Code Skill packaging |
| M7.5 | ✅ Done | One-command plugin install |
| M8 | ✅ Done | Hooks + watch mode |
| M9 | ✅ Done | Public release |