Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/public-repo-sharing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"gitsema": minor
---

Add public repo sharing: persisted repos can now be flagged `public` (`gitsema repos visibility <repo-id> public|private`), auto-granting `read` access to non-owner callers who index an existing public repo, gated by a first-index allow-list (`auth.allowPublicAutoIndex`/`GITSEMA_PUBLIC_AUTO_INDEX`) and a per-user re-index throttle (`auth.minReindexIntervalSeconds`/`GITSEMA_MIN_REINDEX_INTERVAL_SECONDS`).
31 changes: 29 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,30 @@ When implementing a new feature or phase:

---

## Simplify passes

When running a `/simplify` pass, any finding that gets **skipped** (out of scope,
too large a refactor, judged not worth doing now) must be written up as an entry
in **`docs/feature-ideas.md`** rather than just mentioned in the chat summary —
otherwise the finding is lost once the session ends.

---

## PR babysitting in this environment

The `send_later` tool (claude-code-remote MCP server) is **not available** in
this environment. CI turning green also does **not** trigger a webhook event —
only CI failures, new review comments, and similar activity do. This means a
subscribed PR can sit at "CI passed" indefinitely with no event to notice it.

When babysitting/watching a PR here, compensate by running a short polling
wait (e.g. a backgrounded `sleep ~7m` via Bash `run_in_background`, repeated as
needed) and re-checking CI status (`pull_request_read` → `get_status` /
`get_check_runs`) after each wait, instead of relying on `send_later` or
webhook delivery alone.

---

## Releases & changesets

This repo uses [changesets](https://github.com/changesets/changesets) for versioning, `CHANGELOG.md` generation, and npm publishing (OIDC trusted publishing — no npm token).
Expand Down Expand Up @@ -460,7 +484,7 @@ gitsema index

**Pluggable storage backends (Phase 101–103):** all reads/writes go through async `MetadataStore` / `VectorStore` / `FtsStore` interfaces (`src/core/storage/types.ts`). The default `sqlite` backend wraps the schema below; `postgres` routes metadata + FTS through Postgres (pgvector for vectors), and `qdrant` uses Qdrant for vectors with Postgres for metadata/FTS. Select via `storage.*` config or `GITSEMA_STORAGE_*` env vars (see Configuration), inspect with `gitsema storage info`, and copy between backends with `gitsema storage migrate`.

**Schema overview (current schema v30):**
**Schema overview (current schema v31):**

| Table | Purpose |
|---|---|
Expand Down Expand Up @@ -496,6 +520,8 @@ gitsema index
| `repo_grants` | Per-user repo access grants (`read`/`write`/`owner`, optional branch-glob pattern); replaces the binary `repo_tokens` model for new deployments; added in v28 (Phase 123, multi-tenant-auth §5 Phase B) |
| `sso_identities` | Linked external OIDC/SSO identities (`provider` + `external_id` → `user_id`, unique per identity); added in v29 (Phase 124, multi-tenant-auth §5 Phase C) |
| `audit_log` | Identity/authorization audit trail — grant create/revoke, token create/revoke, login success/failure, org membership changes, repo org moves; no FK constraints (historical record outlives referenced rows); added in v30 (Phase 125, multi-tenant-auth §5 Phase D) |
| `repos.visibility` / `repos.owner_user_id` | Repo visibility flag (`private`/`public`) and first-claimer owner; added in v31 (Phase 126, public-repo-sharing) |
| `repo_grants.source` | Provenance of an auto-issued grant, e.g. `auto-public` for attach-as-reader grants; added in v31 (Phase 126, public-repo-sharing) |

**FTS5 note:** Blobs indexed before Phase 11 have no FTS5 content. `--hybrid` search only applies to blobs with FTS5 entries. `--include-content` in evolution dumps also depends on FTS5 content. Use `gitsema backfill-fts` to populate FTS5 content for older index entries.

Expand All @@ -519,7 +545,8 @@ gitsema index
- v27 → v28: Added `orgs`, `org_members`, `repo_grants` tables (+ indexes) and a `repos.org_id` column for org/grant authorization (Phase 123 / multi-tenant-auth §5 Phase B)
- v28 → v29: Added `sso_identities` table (+ indexes) for linked external OIDC/SSO identities (Phase 124 / multi-tenant-auth §5 Phase C)
- v29 → v30: Added `audit_log` table (+ indexes) for the identity/authorization audit trail (Phase 125 / multi-tenant-auth §5 Phase D)
- **Current version: 30**
- v30 → v31: Added `visibility` and `owner_user_id` columns (+ index) to `repos`, and a `source` column to `repo_grants`, for public repo sharing (Phase 126 / public-repo-sharing)
- **Current version: 31**

Schema changes require updating both `src/core/db/schema.ts` and the migration logic in `src/core/db/sqlite.ts`.

Expand Down
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ All commands support a top-level `--verbose` flag (or `GITSEMA_VERBOSE=1`) for d
| `gitsema repos grants <repo-id>` | List grants on a repo (operator-only) |
| `gitsema repos revoke <repo-id> <username>` | Revoke a user's grants on a repo (operator-only) |
| `gitsema repos move-to-org <repo-id> <org>` | Move a repo to a different org; grants survive untouched (operator-only) |
| `gitsema repos visibility <repo-id> public\|private` | Set a persisted repo's visibility flag, gating attach-as-reader auto-grants (operator-only) |
| `gitsema auth sso link <provider> <external-id> <username>` | Link an external SSO/OIDC identity to an existing user; provider must be in `GITSEMA_SSO_PROVIDERS` (operator-only) |
| `gitsema auth sso unlink <provider> <external-id>` | Unlink an external identity (operator-only) |
| `gitsema auth sso list <username>` | List SSO identities linked to a user (operator-only) |
Expand Down Expand Up @@ -247,6 +248,17 @@ Manage persisted repos with `gitsema repos list-persisted` and
`gitsema repos remove <repoId> [--purge]`. See
[`docs/features.md`](docs/features.md#persistent-server-side-repo-storage) for details.

**Public repo sharing (Phases 126–127):** pass `visibility: 'public'` in the
`POST /api/v1/remote/index` request body to flag a repo as public (default
`'private'`). A non-owner authenticated caller indexing an existing public
repo is auto-granted `read` access; registering a *brand-new* public repo
requires `auth.allowPublicAutoIndex` / `GITSEMA_PUBLIC_AUTO_INDEX` (default
off) unless the caller is an operator. Non-owner re-index triggers on a
public repo are throttled to one per `auth.minReindexIntervalSeconds` /
`GITSEMA_MIN_REINDEX_INTERVAL_SECONDS` (default 300s, returns `429` +
`Retry-After`). See
[`docs/features.md`](docs/features.md#public-repo-sharing-phases-126127) for details.

> **Deploying the server?** See the [deployment guide](docs/deploy.md) for Docker /
> docker-compose, systemd, the Postgres + Qdrant backends, key security, backups,
> and per-repo-size tuning. The repo also ships `docker-compose.yml` (Ollama
Expand Down
45 changes: 44 additions & 1 deletion docs/PLAN.md
Original file line number Diff line number Diff line change
Expand Up @@ -4862,10 +4862,53 @@ that deprecated it, and its removal status.
| **126** | §5 Phase 1 | Visibility flag + attach-as-reader | `repos.visibility` (`'private'\|'public'`, default `'private'`) + `ownerUserId` columns; `gitsema repos visibility <repoId> public\|private` CLI (owner/superadmin only); registration-flow change in `src/server/routes/remote.ts` auto-issuing a `repo_grants` reader row when a second user attaches to an existing public repo's shared index. |
| **127** | §5 Phase 2 | First-index gate + refresh throttle | `auth.allowPublicAutoIndex`/`GITSEMA_PUBLIC_AUTO_INDEX` config gate (default `false`) restricting who may register a brand-new public-flagged repo; `auth.minReindexIntervalSeconds` per-`(user, repoId)` refresh throttle returning `429`/`Retry-After`. No hard dependency on Phase 126 beyond the `visibility` column existing. |

**Status:** not started — draft design, scheduled here per `/phase-plan`.
**Status:** ✅ complete *(completed vNEXT)*. Implemented in
`src/server/routes/remote.ts` (registration-flow gate/throttle/attach-as-reader
logic), `src/core/indexing/repoRegistry.ts` (`visibility`/`ownerUserId` columns,
`setRepoVisibility`, `isPublicAutoIndexAllowed`, `getMinReindexIntervalSeconds`),
and `src/cli/commands/repos.ts` (`gitsema repos visibility <repoId> public|private`).
Explicitly out of scope (per the design doc §6): cross-repo blob-level dedup
for forks with different URLs ("shape 2") remains undesigned and unscheduled.

Deviations from the design doc, discovered during implementation:
- **Two independent SQLite databases in a `gitsema tools serve` deployment.**
`getActiveSession()` (cwd-relative `.gitsema/index.db`, used by the entire
Phase 122-125 auth/orgs/grants system and by `authMiddleware`'s
`req.userId` resolution) and `getRegistrySession()`
(`${GITSEMA_DATA_DIR}/registry.db`, cwd-independent, used for persisted
repo clone/index-path bookkeeping since Phase 41) are two separate DB
files, each running the full schema with its own independent `users`/
`repos`/`repo_grants` tables and per-file FK enforcement. The design doc
speaks of "the `repos` table" and "the `users` table" as if unified; this
split predates Phase 126 and was not anticipated by the spec. Resolution:
`registry.db` keeps its original sole purpose (clone/index-path
bookkeeping) and never stores `ownerUserId`; the active DB becomes the
canonical store for `visibility`/`ownerUserId`/`repo_grants`, with
`runIndexJob` performing a dual-write to mirror the repo's `id`/`name`/
`url`/`normalizedUrl`/`clonePath`/`dbPath`/`visibility` row into both DBs
after each successful persisted index, and `ownerUserId` written only to
the active DB's copy. All visibility/ownership/grant reads in the route
handler resolve against the active DB's mirrored row, never `registry.db`'s.
This is a server-deployment-internal data-plumbing detail with no surfaced
CLI/API change; the broader question of whether the active session's
cwd-relative default is the right long-term identity-store location for
`gitsema tools serve` specifically (vs. tying it to `GITSEMA_DATA_DIR`) is
deferred — fixing it would be a breaking change to already-shipped Phase
122-125 behavior, well beyond this track's scope.
- Grant role naming uses the existing `'read'|'write'|'owner'` enum (Phase
123) rather than the design doc's `'reader'` wording — no new role was
introduced.
- "Superadmin" in the design doc's first-index gate is resolved via the
existing "operator" trust boundary (`req.userId === undefined` — local
CLI/global-key/no-auth-required callers), consistent with the Phase
122-125 precedent that operator-equivalent access is a stronger trust
tier than any network role, rather than introducing a new superadmin flag.
- Flipping a repo back to `private` does not auto-revoke previously
auto-issued `repo_grants` rows (per the design doc's own open question in
§7) — they remain until explicitly revoked via `gitsema repos revoke`.
- `repo_grants.source` (`'auto-public'` vs. manual) was added to distinguish
attach-as-reader auto-grants from explicit `gitsema repos grant` grants.

---

## Superadmin-Locked Model Set Track (Phases 128–130)
Expand Down
47 changes: 46 additions & 1 deletion docs/feature-ideas.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This document tracks upcoming feature ideas that are **not yet in active development** (not in `PLAN.md`) and haven't been **fully designed** (no design file). It's a staging area for "what now?" questions and medium-term product direction.

**Last updated:** 2026-06-23 (added audit log coverage enforcement idea, found during the Phase 125 `/simplify` review; refined public-repo sharing's access-control half into `docs/public-repo-sharing-plan.md` and the superadmin-locked model set idea into `docs/locked-model-set-plan.md`; kept cross-repo blob dedup as an open idea)
**Last updated:** 2026-06-23 (added the public-repo-sharing throttle/policy-extraction idea, found during the Phase 126/127 `/simplify` review; added audit log coverage enforcement idea, found during the Phase 125 `/simplify` review; refined public-repo sharing's access-control half into `docs/public-repo-sharing-plan.md` and the superadmin-locked model set idea into `docs/locked-model-set-plan.md`; kept cross-repo blob dedup as an open idea)
**Audience:** Developers considering next phases; product planning

> **Note:** As of this update, the LSP/MCP remote-delegation foundation this
Expand Down Expand Up @@ -473,6 +473,51 @@ shapes (none chosen yet):

---

## Public Repo Sharing: Throttle/Rate-Limit Unification & Policy Extraction

### Problem
- Found during the `/simplify` review of Phase 126/127 (public-repo-sharing).
`checkAndRecordReindexThrottle()` (`src/server/routes/remote.ts`) is a
bespoke per-`(repoId, userId)` cooldown map, separate from the generic
abuse-prevention rate limiter in `src/server/middleware/rateLimiter.ts`
(`express-rate-limit`, keyed by Bearer token/IP, fixed window). They serve
different concerns today — one is a global RPM cap, the other a
business-rule re-index cooldown — but having two independent
rate-limiting mechanisms in the same route module is worth revisiting if
a third throttle-shaped requirement shows up.
- The same review flagged that the 2c/2d/2e public-repo gate/throttle/grant
logic in the `POST /api/v1/remote/index` handler (first-index gate,
refresh throttle, attach-as-reader auto-grant) could be extracted into a
single `applyPublicRepoPolicy()` function for readability, but that was
judged too large a restructuring for a cleanup pass on already-shipped
code.

### Intended Behavior
No design committed yet. Two independent, optional follow-ups:
- If a third per-key throttle need appears, consider whether a shared
generic "keyed cooldown" utility (used by both `rateLimiter.ts` and
`checkAndRecordReindexThrottle`) is worth building, vs. keeping them
separate as distinct concerns.
- Extract the public-repo gate/throttle/grant sequence in
`src/server/routes/remote.ts` into a named `applyPublicRepoPolicy()` (or
similar) function once it grows another condition or gets touched again,
rather than as a standalone refactor now.

### Design Gaps
- [ ] Whether a generic keyed-cooldown abstraction is worth the indirection
given only one current caller (`checkAndRecordReindexThrottle`).
- [ ] Where the line is for "policy extraction" — at what point does the
2c/2d/2e sequence justify its own function vs. staying inline.

### Effort Estimate
- Small either way — both are isolated, mechanical refactors with existing
test coverage to verify against.

### Prerequisites
- None — both are optional cleanups on already-shipped Phase 126/127 code.

---

## Related Issues & Documents

- **Parity tracking:** See `docs/parity.md` for tool availability across interfaces
Expand Down
40 changes: 40 additions & 0 deletions docs/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,46 @@ audit events — the equivalent operator-only CLI-direct paths (`gitsema repos g
v1, since those paths already require local DB access, a stronger trust boundary than
the network surface this audit trail is primarily meant to cover.

### Public repo sharing (Phases 126–127)

Registration-flow extension layered on top of Phases 122-123's `repo_grants`
model, letting a repo owner opt their persisted repo into shared read access
without minting individual grants by hand. Three axes:
- **Visibility flag** — `repos.visibility` (`'private'` default, `'public'`),
set via `gitsema repos visibility <repoId> public|private` (operator-only —
no network auth boundary on this command). `repos.owner_user_id` records the
first user (or `null` for an operator/no-auth caller) whose registration
request created the repo; first-claimer semantics are preserved across
re-indexes — later registration requests never overwrite it.
- **Attach-as-reader auto-grant** — when an authenticated, non-owner caller
triggers `POST /api/v1/remote/index` against an *existing* `public` repo
they don't already have a grant on, a `read`-role `repo_grants` row is
auto-issued for them with `source: 'auto-public'` (distinguishing it from a
manually issued grant). A caller who already holds a higher role
(`write`/`owner`) is never downgraded.
- **Trigger rights** — registering a *brand-new* repo as `public` requires
`auth.allowPublicAutoIndex` / `GITSEMA_PUBLIC_AUTO_INDEX` (default `false`)
to be enabled, unless the caller is an operator (no `req.userId` — local
CLI/global-key/no-auth-required request, the same stronger-trust-tier
precedent established in Phases 122-125). Once a public repo exists,
non-owner re-index triggers are throttled to at most one per
`auth.minReindexIntervalSeconds` / `GITSEMA_MIN_REINDEX_INTERVAL_SECONDS`
(default 300s) per `(user, repo)` pair, returning `429` + `Retry-After`; the
repo's owner is never throttled.

**Implementation note — two independent databases.** A `gitsema tools serve`
deployment has two separate SQLite files: the cwd-relative active session
(`.gitsema/index.db`, the canonical store for the entire Phase 122-125 auth/
orgs/grants system, resolved by `authMiddleware`) and the registry session
(`${GITSEMA_DATA_DIR}/registry.db`, cwd-independent, tracking persisted-repo
clone/index paths since Phase 41). Both run the full schema with independent
per-file FK enforcement, so an `owner_user_id` valid in one is not
automatically valid in the other. `registry.db` keeps its original sole
purpose and never stores `owner_user_id`; the active DB is the canonical
store for `visibility`/`owner_user_id`/`repo_grants`, kept in sync by a
dual-write in `runIndexJob` after each successful persisted index. See
`docs/PLAN.md`'s Phase 126/127 entry for the full deviation note.

### Persistent server-side repo storage

`POST /api/v1/remote/index` **persists** the clone + index by default (`persist: true`),
Expand Down
1 change: 1 addition & 0 deletions docs/parity.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,7 @@ This table shows which tools/commands are available in which interface. A checkm
| `auth` (login/logout/whoami/token */create-user) | ✓ | — | — | — | — | ✓ | — | — |
| `orgs` (create/list/members */`users` create/list) | ✓ | — | — | — | — | ✓ | — | — |
| `repos grant/grants/revoke/move-to-org` | ✓ | — | — | — | — | ✓ | ✓ | — |
| `repos visibility` | ✓ | — | — | — | — | ✓ | ✓ | — |
| `auth sso link/unlink/list` | ✓ | — | — | — | — | ✓ | ✓ | — |
| `audit log` | ✓ | — | — | — | — | — | — | — |
| `quickstart` / `setup` | ✓ | — | — | — | — | ✓ | ✓ | — |
Expand Down
20 changes: 19 additions & 1 deletion src/cli/commands/repos.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ import { Command } from 'commander'
import { createHash, randomBytes } from 'node:crypto'
import { rmSync } from 'node:fs'
import { getActiveSession, getRawDb } from '../../core/db/sqlite.js'
import { addRepo, listRepos, multiRepoSearch, getRegistrySession, getRepo, getRepoDir, removeRepo } from '../../core/indexing/repoRegistry.js'
import { addRepo, listRepos, multiRepoSearch, getRegistrySession, getRepo, getRepoDir, removeRepo, setRepoVisibility } from '../../core/indexing/repoRegistry.js'
import { buildProvider } from '../../core/embedding/providerFactory.js'
import { embedQuery } from '../../core/embedding/embedQuery.js'
import { parsePositiveInt } from '../../utils/parse.js'
Expand Down Expand Up @@ -263,6 +263,24 @@ export function reposCommand(): Command {
console.log(`Revoked ${revoked} grant(s) for '${username}' on repo '${repoId}'.`)
})

cmd
.command('visibility <repo-id> <state>')
.description('Set a repo\'s visibility to public or private (Phase 126) — operator-only, no network auth boundary')
.action((repoId: string, state: string) => {
if (state !== 'public' && state !== 'private') {
console.error("Error: state must be 'public' or 'private'")
process.exit(1)
}
const session = getRegistrySession()
const repo = getRepo(session, repoId)
if (!repo) {
console.error(`Error: repo '${repoId}' not found in registry`)
process.exit(1)
}
setRepoVisibility(session, repoId, state)
console.log(`Repo '${repoId}' visibility set to '${state}'.`)
})

cmd
.command('move-to-org <repo-id> <org>')
.description('Move a repo to a different org; existing grants survive the move (Phase 123)')
Expand Down
Loading
Loading