From 4359899c29832a3233aeec77d2472110c03b3ad5 Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 23 Jun 2026 12:11:01 +0000 Subject: [PATCH 1/4] Implement Phase 126-127: public repo sharing (visibility, attach-as-reader, first-index gate, refresh throttle) Adds repos.visibility/owner_user_id and repo_grants.source columns, a gitsema repos visibility CLI command, and registration-flow logic in POST /api/v1/remote/index for auto-granting read access on public repos, gated by auth.allowPublicAutoIndex and throttled by auth.minReindexIntervalSeconds. Fixes a registry-DB/active-DB FK mismatch discovered during testing: registry.db (clone/index-path bookkeeping) and the active session DB (auth/grants) are independent SQLite files with separate FK-enforced users tables, so ownerUserId is now mirrored only into the active DB rather than written to both. Co-Authored-By: Claude Sonnet 4.6 Claude-Session: https://claude.ai/code/session_01BYNogpqPb8oyTF3B5Ebu5p --- .changeset/public-repo-sharing.md | 5 + CLAUDE.md | 7 +- README.md | 12 + docs/PLAN.md | 45 +++- docs/features.md | 40 ++++ docs/parity.md | 1 + src/cli/commands/repos.ts | 20 +- src/core/auth/grants.ts | 23 +- src/core/config/configManager.ts | 6 + src/core/db/migrations/031_repo_visibility.ts | 32 +++ src/core/db/migrations/runner.ts | 2 + src/core/db/schema.ts | 9 +- src/core/db/sqlite.ts | 18 +- src/core/indexing/repoRegistry.ts | 76 +++++- src/server/routes/remote.ts | 114 ++++++++- tests/remoteIndexPersistence.test.ts | 221 ++++++++++++++++++ 16 files changed, 606 insertions(+), 25 deletions(-) create mode 100644 .changeset/public-repo-sharing.md create mode 100644 src/core/db/migrations/031_repo_visibility.ts diff --git a/.changeset/public-repo-sharing.md b/.changeset/public-repo-sharing.md new file mode 100644 index 0000000..162ff33 --- /dev/null +++ b/.changeset/public-repo-sharing.md @@ -0,0 +1,5 @@ +--- +"gitsema": minor +--- + +Add public repo sharing: persisted repos can now be flagged `public` (`gitsema repos visibility public|private`), auto-granting `read` access to non-owner callers who index an existing public repo, gated by a first-index allow-list (`auth.allowPublicAutoIndex`/`GITSEMA_PUBLIC_AUTO_INDEX`) and a per-user re-index throttle (`auth.minReindexIntervalSeconds`/`GITSEMA_MIN_REINDEX_INTERVAL_SECONDS`). diff --git a/CLAUDE.md b/CLAUDE.md index a530397..fb8752e 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -460,7 +460,7 @@ gitsema index **Pluggable storage backends (Phase 101–103):** all reads/writes go through async `MetadataStore` / `VectorStore` / `FtsStore` interfaces (`src/core/storage/types.ts`). The default `sqlite` backend wraps the schema below; `postgres` routes metadata + FTS through Postgres (pgvector for vectors), and `qdrant` uses Qdrant for vectors with Postgres for metadata/FTS. Select via `storage.*` config or `GITSEMA_STORAGE_*` env vars (see Configuration), inspect with `gitsema storage info`, and copy between backends with `gitsema storage migrate`. -**Schema overview (current schema v30):** +**Schema overview (current schema v31):** | Table | Purpose | |---|---| @@ -496,6 +496,8 @@ gitsema index | `repo_grants` | Per-user repo access grants (`read`/`write`/`owner`, optional branch-glob pattern); replaces the binary `repo_tokens` model for new deployments; added in v28 (Phase 123, multi-tenant-auth §5 Phase B) | | `sso_identities` | Linked external OIDC/SSO identities (`provider` + `external_id` → `user_id`, unique per identity); added in v29 (Phase 124, multi-tenant-auth §5 Phase C) | | `audit_log` | Identity/authorization audit trail — grant create/revoke, token create/revoke, login success/failure, org membership changes, repo org moves; no FK constraints (historical record outlives referenced rows); added in v30 (Phase 125, multi-tenant-auth §5 Phase D) | +| `repos.visibility` / `repos.owner_user_id` | Repo visibility flag (`private`/`public`) and first-claimer owner; added in v31 (Phase 126, public-repo-sharing) | +| `repo_grants.source` | Provenance of an auto-issued grant, e.g. `auto-public` for attach-as-reader grants; added in v31 (Phase 126, public-repo-sharing) | **FTS5 note:** Blobs indexed before Phase 11 have no FTS5 content. `--hybrid` search only applies to blobs with FTS5 entries. `--include-content` in evolution dumps also depends on FTS5 content. Use `gitsema backfill-fts` to populate FTS5 content for older index entries. @@ -519,7 +521,8 @@ gitsema index - v27 → v28: Added `orgs`, `org_members`, `repo_grants` tables (+ indexes) and a `repos.org_id` column for org/grant authorization (Phase 123 / multi-tenant-auth §5 Phase B) - v28 → v29: Added `sso_identities` table (+ indexes) for linked external OIDC/SSO identities (Phase 124 / multi-tenant-auth §5 Phase C) - v29 → v30: Added `audit_log` table (+ indexes) for the identity/authorization audit trail (Phase 125 / multi-tenant-auth §5 Phase D) -- **Current version: 30** +- v30 → v31: Added `visibility` and `owner_user_id` columns (+ index) to `repos`, and a `source` column to `repo_grants`, for public repo sharing (Phase 126 / public-repo-sharing) +- **Current version: 31** Schema changes require updating both `src/core/db/schema.ts` and the migration logic in `src/core/db/sqlite.ts`. diff --git a/README.md b/README.md index 3f67943..810beb6 100644 --- a/README.md +++ b/README.md @@ -124,6 +124,7 @@ All commands support a top-level `--verbose` flag (or `GITSEMA_VERBOSE=1`) for d | `gitsema repos grants ` | List grants on a repo (operator-only) | | `gitsema repos revoke ` | Revoke a user's grants on a repo (operator-only) | | `gitsema repos move-to-org ` | Move a repo to a different org; grants survive untouched (operator-only) | +| `gitsema repos visibility public\|private` | Set a persisted repo's visibility flag, gating attach-as-reader auto-grants (operator-only) | | `gitsema auth sso link ` | Link an external SSO/OIDC identity to an existing user; provider must be in `GITSEMA_SSO_PROVIDERS` (operator-only) | | `gitsema auth sso unlink ` | Unlink an external identity (operator-only) | | `gitsema auth sso list ` | List SSO identities linked to a user (operator-only) | @@ -247,6 +248,17 @@ Manage persisted repos with `gitsema repos list-persisted` and `gitsema repos remove [--purge]`. See [`docs/features.md`](docs/features.md#persistent-server-side-repo-storage) for details. +**Public repo sharing (Phases 126–127):** pass `visibility: 'public'` in the +`POST /api/v1/remote/index` request body to flag a repo as public (default +`'private'`). A non-owner authenticated caller indexing an existing public +repo is auto-granted `read` access; registering a *brand-new* public repo +requires `auth.allowPublicAutoIndex` / `GITSEMA_PUBLIC_AUTO_INDEX` (default +off) unless the caller is an operator. Non-owner re-index triggers on a +public repo are throttled to one per `auth.minReindexIntervalSeconds` / +`GITSEMA_MIN_REINDEX_INTERVAL_SECONDS` (default 300s, returns `429` + +`Retry-After`). See +[`docs/features.md`](docs/features.md#public-repo-sharing-phases-126127) for details. + > **Deploying the server?** See the [deployment guide](docs/deploy.md) for Docker / > docker-compose, systemd, the Postgres + Qdrant backends, key security, backups, > and per-repo-size tuning. The repo also ships `docker-compose.yml` (Ollama diff --git a/docs/PLAN.md b/docs/PLAN.md index 7b1386f..fb42262 100644 --- a/docs/PLAN.md +++ b/docs/PLAN.md @@ -4862,10 +4862,53 @@ that deprecated it, and its removal status. | **126** | §5 Phase 1 | Visibility flag + attach-as-reader | `repos.visibility` (`'private'\|'public'`, default `'private'`) + `ownerUserId` columns; `gitsema repos visibility public\|private` CLI (owner/superadmin only); registration-flow change in `src/server/routes/remote.ts` auto-issuing a `repo_grants` reader row when a second user attaches to an existing public repo's shared index. | | **127** | §5 Phase 2 | First-index gate + refresh throttle | `auth.allowPublicAutoIndex`/`GITSEMA_PUBLIC_AUTO_INDEX` config gate (default `false`) restricting who may register a brand-new public-flagged repo; `auth.minReindexIntervalSeconds` per-`(user, repoId)` refresh throttle returning `429`/`Retry-After`. No hard dependency on Phase 126 beyond the `visibility` column existing. | -**Status:** not started — draft design, scheduled here per `/phase-plan`. +**Status:** ✅ complete *(completed vNEXT)*. Implemented in +`src/server/routes/remote.ts` (registration-flow gate/throttle/attach-as-reader +logic), `src/core/indexing/repoRegistry.ts` (`visibility`/`ownerUserId` columns, +`setRepoVisibility`, `isPublicAutoIndexAllowed`, `getMinReindexIntervalSeconds`), +and `src/cli/commands/repos.ts` (`gitsema repos visibility public|private`). Explicitly out of scope (per the design doc §6): cross-repo blob-level dedup for forks with different URLs ("shape 2") remains undesigned and unscheduled. +Deviations from the design doc, discovered during implementation: +- **Two independent SQLite databases in a `gitsema tools serve` deployment.** + `getActiveSession()` (cwd-relative `.gitsema/index.db`, used by the entire + Phase 122-125 auth/orgs/grants system and by `authMiddleware`'s + `req.userId` resolution) and `getRegistrySession()` + (`${GITSEMA_DATA_DIR}/registry.db`, cwd-independent, used for persisted + repo clone/index-path bookkeeping since Phase 41) are two separate DB + files, each running the full schema with its own independent `users`/ + `repos`/`repo_grants` tables and per-file FK enforcement. The design doc + speaks of "the `repos` table" and "the `users` table" as if unified; this + split predates Phase 126 and was not anticipated by the spec. Resolution: + `registry.db` keeps its original sole purpose (clone/index-path + bookkeeping) and never stores `ownerUserId`; the active DB becomes the + canonical store for `visibility`/`ownerUserId`/`repo_grants`, with + `runIndexJob` performing a dual-write to mirror the repo's `id`/`name`/ + `url`/`normalizedUrl`/`clonePath`/`dbPath`/`visibility` row into both DBs + after each successful persisted index, and `ownerUserId` written only to + the active DB's copy. All visibility/ownership/grant reads in the route + handler resolve against the active DB's mirrored row, never `registry.db`'s. + This is a server-deployment-internal data-plumbing detail with no surfaced + CLI/API change; the broader question of whether the active session's + cwd-relative default is the right long-term identity-store location for + `gitsema tools serve` specifically (vs. tying it to `GITSEMA_DATA_DIR`) is + deferred — fixing it would be a breaking change to already-shipped Phase + 122-125 behavior, well beyond this track's scope. +- Grant role naming uses the existing `'read'|'write'|'owner'` enum (Phase + 123) rather than the design doc's `'reader'` wording — no new role was + introduced. +- "Superadmin" in the design doc's first-index gate is resolved via the + existing "operator" trust boundary (`req.userId === undefined` — local + CLI/global-key/no-auth-required callers), consistent with the Phase + 122-125 precedent that operator-equivalent access is a stronger trust + tier than any network role, rather than introducing a new superadmin flag. +- Flipping a repo back to `private` does not auto-revoke previously + auto-issued `repo_grants` rows (per the design doc's own open question in + §7) — they remain until explicitly revoked via `gitsema repos revoke`. +- `repo_grants.source` (`'auto-public'` vs. manual) was added to distinguish + attach-as-reader auto-grants from explicit `gitsema repos grant` grants. + --- ## Superadmin-Locked Model Set Track (Phases 128–130) diff --git a/docs/features.md b/docs/features.md index 126e4ed..3d28579 100644 --- a/docs/features.md +++ b/docs/features.md @@ -369,6 +369,46 @@ audit events — the equivalent operator-only CLI-direct paths (`gitsema repos g v1, since those paths already require local DB access, a stronger trust boundary than the network surface this audit trail is primarily meant to cover. +### Public repo sharing (Phases 126–127) + +Registration-flow extension layered on top of Phases 122-123's `repo_grants` +model, letting a repo owner opt their persisted repo into shared read access +without minting individual grants by hand. Three axes: +- **Visibility flag** — `repos.visibility` (`'private'` default, `'public'`), + set via `gitsema repos visibility public|private` (operator-only — + no network auth boundary on this command). `repos.owner_user_id` records the + first user (or `null` for an operator/no-auth caller) whose registration + request created the repo; first-claimer semantics are preserved across + re-indexes — later registration requests never overwrite it. +- **Attach-as-reader auto-grant** — when an authenticated, non-owner caller + triggers `POST /api/v1/remote/index` against an *existing* `public` repo + they don't already have a grant on, a `read`-role `repo_grants` row is + auto-issued for them with `source: 'auto-public'` (distinguishing it from a + manually issued grant). A caller who already holds a higher role + (`write`/`owner`) is never downgraded. +- **Trigger rights** — registering a *brand-new* repo as `public` requires + `auth.allowPublicAutoIndex` / `GITSEMA_PUBLIC_AUTO_INDEX` (default `false`) + to be enabled, unless the caller is an operator (no `req.userId` — local + CLI/global-key/no-auth-required request, the same stronger-trust-tier + precedent established in Phases 122-125). Once a public repo exists, + non-owner re-index triggers are throttled to at most one per + `auth.minReindexIntervalSeconds` / `GITSEMA_MIN_REINDEX_INTERVAL_SECONDS` + (default 300s) per `(user, repo)` pair, returning `429` + `Retry-After`; the + repo's owner is never throttled. + +**Implementation note — two independent databases.** A `gitsema tools serve` +deployment has two separate SQLite files: the cwd-relative active session +(`.gitsema/index.db`, the canonical store for the entire Phase 122-125 auth/ +orgs/grants system, resolved by `authMiddleware`) and the registry session +(`${GITSEMA_DATA_DIR}/registry.db`, cwd-independent, tracking persisted-repo +clone/index paths since Phase 41). Both run the full schema with independent +per-file FK enforcement, so an `owner_user_id` valid in one is not +automatically valid in the other. `registry.db` keeps its original sole +purpose and never stores `owner_user_id`; the active DB is the canonical +store for `visibility`/`owner_user_id`/`repo_grants`, kept in sync by a +dual-write in `runIndexJob` after each successful persisted index. See +`docs/PLAN.md`'s Phase 126/127 entry for the full deviation note. + ### Persistent server-side repo storage `POST /api/v1/remote/index` **persists** the clone + index by default (`persist: true`), diff --git a/docs/parity.md b/docs/parity.md index a8cc18c..c2997e9 100644 --- a/docs/parity.md +++ b/docs/parity.md @@ -136,6 +136,7 @@ This table shows which tools/commands are available in which interface. A checkm | `auth` (login/logout/whoami/token */create-user) | ✓ | — | — | — | — | ✓ | — | — | | `orgs` (create/list/members */`users` create/list) | ✓ | — | — | — | — | ✓ | — | — | | `repos grant/grants/revoke/move-to-org` | ✓ | — | — | — | — | ✓ | ✓ | — | +| `repos visibility` | ✓ | — | — | — | — | ✓ | ✓ | — | | `auth sso link/unlink/list` | ✓ | — | — | — | — | ✓ | ✓ | — | | `audit log` | ✓ | — | — | — | — | — | — | — | | `quickstart` / `setup` | ✓ | — | — | — | — | ✓ | ✓ | — | diff --git a/src/cli/commands/repos.ts b/src/cli/commands/repos.ts index b46bcaf..ca2ee82 100644 --- a/src/cli/commands/repos.ts +++ b/src/cli/commands/repos.ts @@ -2,7 +2,7 @@ import { Command } from 'commander' import { createHash, randomBytes } from 'node:crypto' import { rmSync } from 'node:fs' import { getActiveSession, getRawDb } from '../../core/db/sqlite.js' -import { addRepo, listRepos, multiRepoSearch, getRegistrySession, getRepo, getRepoDir, removeRepo } from '../../core/indexing/repoRegistry.js' +import { addRepo, listRepos, multiRepoSearch, getRegistrySession, getRepo, getRepoDir, removeRepo, setRepoVisibility } from '../../core/indexing/repoRegistry.js' import { buildProvider } from '../../core/embedding/providerFactory.js' import { embedQuery } from '../../core/embedding/embedQuery.js' import { parsePositiveInt } from '../../utils/parse.js' @@ -263,6 +263,24 @@ export function reposCommand(): Command { console.log(`Revoked ${revoked} grant(s) for '${username}' on repo '${repoId}'.`) }) + cmd + .command('visibility ') + .description('Set a repo\'s visibility to public or private (Phase 126) — operator-only, no network auth boundary') + .action((repoId: string, state: string) => { + if (state !== 'public' && state !== 'private') { + console.error("Error: state must be 'public' or 'private'") + process.exit(1) + } + const session = getRegistrySession() + const repo = getRepo(session, repoId) + if (!repo) { + console.error(`Error: repo '${repoId}' not found in registry`) + process.exit(1) + } + setRepoVisibility(session, repoId, state) + console.log(`Repo '${repoId}' visibility set to '${state}'.`) + }) + cmd .command('move-to-org ') .description('Move a repo to a different org; existing grants survive the move (Phase 123)') diff --git a/src/core/auth/grants.ts b/src/core/auth/grants.ts index fd12de1..5fd9e3f 100644 --- a/src/core/auth/grants.ts +++ b/src/core/auth/grants.ts @@ -23,10 +23,14 @@ export interface Grant { branchPattern: string | null grantedBy: number createdAt: number + /** Provenance, e.g. 'auto-public' for attach-as-reader grants (Phase 126). NULL for manually-issued grants. */ + source: string | null } const ROLE_RANK: Record = { read: 1, write: 2, owner: 3 } +const GRANT_COLUMNS = 'id, user_id, repo_id, role, branch_pattern, granted_by, created_at, source' + function rowToGrant(row: { id: number user_id: number @@ -35,6 +39,7 @@ function rowToGrant(row: { branch_pattern: string | null granted_by: number created_at: number + source: string | null }): Grant { return { id: row.id, @@ -44,6 +49,7 @@ function rowToGrant(row: { branchPattern: row.branch_pattern, grantedBy: row.granted_by, createdAt: row.created_at, + source: row.source ?? null, } } @@ -53,10 +59,11 @@ function rowToGrant(row: { */ export function createGrant( rawDb: InstanceType, - opts: { userId: number; repoId: string; role: GrantRole; branchPattern?: string | null; grantedBy: number }, + opts: { userId: number; repoId: string; role: GrantRole; branchPattern?: string | null; grantedBy: number; source?: string | null }, ): Grant { const createdAt = Math.floor(Date.now() / 1000) const branchPattern = opts.branchPattern ?? null + const source = opts.source ?? null // SQLite's UNIQUE index treats every NULL branch_pattern as distinct, so // ON CONFLICT can't be used to dedupe all-branches grants — match manually. const existing = rawDb @@ -67,14 +74,14 @@ export function createGrant( } else { rawDb .prepare( - `INSERT INTO repo_grants (user_id, repo_id, role, branch_pattern, granted_by, created_at) - VALUES (?, ?, ?, ?, ?, ?)`, + `INSERT INTO repo_grants (user_id, repo_id, role, branch_pattern, granted_by, created_at, source) + VALUES (?, ?, ?, ?, ?, ?, ?)`, ) - .run(opts.userId, opts.repoId, opts.role, branchPattern, opts.grantedBy, createdAt) + .run(opts.userId, opts.repoId, opts.role, branchPattern, opts.grantedBy, createdAt, source) } const row = rawDb .prepare( - 'SELECT id, user_id, repo_id, role, branch_pattern, granted_by, created_at FROM repo_grants WHERE user_id = ? AND repo_id = ? AND branch_pattern IS ?', + `SELECT ${GRANT_COLUMNS} FROM repo_grants WHERE user_id = ? AND repo_id = ? AND branch_pattern IS ?`, ) .get(opts.userId, opts.repoId, branchPattern) as Parameters[0] return rowToGrant(row) @@ -90,7 +97,7 @@ export function revokeGrant(rawDb: InstanceType, userId: number export function listGrants(rawDb: InstanceType, repoId: string): Grant[] { const rows = rawDb .prepare( - 'SELECT id, user_id, repo_id, role, branch_pattern, granted_by, created_at FROM repo_grants WHERE repo_id = ? ORDER BY created_at ASC', + `SELECT ${GRANT_COLUMNS} FROM repo_grants WHERE repo_id = ? ORDER BY created_at ASC`, ) .all(repoId) as Array[0]> return rows.map(rowToGrant) @@ -100,7 +107,7 @@ export function listGrants(rawDb: InstanceType, repoId: string) export function listGrantsForUser(rawDb: InstanceType, userId: number): Grant[] { const rows = rawDb .prepare( - 'SELECT id, user_id, repo_id, role, branch_pattern, granted_by, created_at FROM repo_grants WHERE user_id = ? ORDER BY created_at ASC', + `SELECT ${GRANT_COLUMNS} FROM repo_grants WHERE user_id = ? ORDER BY created_at ASC`, ) .all(userId) as Array[0]> return rows.map(rowToGrant) @@ -120,7 +127,7 @@ export function resolveUserRepoAccess( branch?: string, ): GrantRole | undefined { const grants = rawDb - .prepare('SELECT id, user_id, repo_id, role, branch_pattern, granted_by, created_at FROM repo_grants WHERE user_id = ? AND repo_id = ?') + .prepare(`SELECT ${GRANT_COLUMNS} FROM repo_grants WHERE user_id = ? AND repo_id = ?`) .all(userId, repoId) as Array[0]> let best: GrantRole | undefined diff --git a/src/core/config/configManager.ts b/src/core/config/configManager.ts index c48561c..a659f01 100644 --- a/src/core/config/configManager.ts +++ b/src/core/config/configManager.ts @@ -67,6 +67,9 @@ export const ENV_KEY_MAP: Record = { 'auth.personalGroups': 'GITSEMA_PERSONAL_GROUPS', // Multi-tenant auth — SSO linking (Phase 124) 'auth.ssoProviders': 'GITSEMA_SSO_PROVIDERS', + // Public repo sharing (Phase 126) + 'auth.allowPublicAutoIndex': 'GITSEMA_PUBLIC_AUTO_INDEX', + 'auth.minReindexIntervalSeconds': 'GITSEMA_MIN_REINDEX_INTERVAL_SECONDS', } /** @@ -132,6 +135,9 @@ export const ALL_KEYS: ReadonlyArray = [ 'auth.personalGroups', // Multi-tenant auth — SSO linking (Phase 124) 'auth.ssoProviders', + // Public repo sharing (Phase 126) + 'auth.allowPublicAutoIndex', + 'auth.minReindexIntervalSeconds', ] // --------------------------------------------------------------------------- diff --git a/src/core/db/migrations/031_repo_visibility.ts b/src/core/db/migrations/031_repo_visibility.ts new file mode 100644 index 0000000..8cc9c6e --- /dev/null +++ b/src/core/db/migrations/031_repo_visibility.ts @@ -0,0 +1,32 @@ +import Database from 'better-sqlite3' +import type { Migration } from './runner.js' + +export const migration: Migration = { + version: 31, + description: 'Add visibility + owner_user_id columns to repos, and a source column to repo_grants (Phase 126 / public-repo-sharing §5 Phase 1)', + up(sqlite: InstanceType) { + const reposExists = sqlite + .prepare(`SELECT name FROM sqlite_master WHERE type='table' AND name='repos'`) + .get() as { name: string } | undefined + if (reposExists) { + const repoCols = sqlite.prepare(`PRAGMA table_info(repos)`).all() as Array<{ name: string }> + if (!repoCols.some((c) => c.name === 'visibility')) { + sqlite.exec(`ALTER TABLE repos ADD COLUMN visibility TEXT NOT NULL DEFAULT 'private'`) + } + if (!repoCols.some((c) => c.name === 'owner_user_id')) { + sqlite.exec(`ALTER TABLE repos ADD COLUMN owner_user_id INTEGER REFERENCES users(id)`) + } + sqlite.exec(`CREATE INDEX IF NOT EXISTS idx_repos_normalized_url_visibility ON repos(normalized_url, visibility);`) + } + + const grantsExists = sqlite + .prepare(`SELECT name FROM sqlite_master WHERE type='table' AND name='repo_grants'`) + .get() as { name: string } | undefined + if (grantsExists) { + const grantCols = sqlite.prepare(`PRAGMA table_info(repo_grants)`).all() as Array<{ name: string }> + if (!grantCols.some((c) => c.name === 'source')) { + sqlite.exec(`ALTER TABLE repo_grants ADD COLUMN source TEXT`) + } + } + }, +} diff --git a/src/core/db/migrations/runner.ts b/src/core/db/migrations/runner.ts index e32a22b..4326cd3 100644 --- a/src/core/db/migrations/runner.ts +++ b/src/core/db/migrations/runner.ts @@ -29,6 +29,7 @@ import { migration as m027 } from './027_auth_identity.js' import { migration as m028 } from './028_orgs_grants.js' import { migration as m029 } from './029_sso_identities.js' import { migration as m030 } from './030_audit_log.js' +import { migration as m031 } from './031_repo_visibility.js' export type Migration = { version: number @@ -67,6 +68,7 @@ export const migrations: Migration[] = [ m028, m029, m030, + m031, ] migrations.sort((a, b) => a.version - b.version) diff --git a/src/core/db/schema.ts b/src/core/db/schema.ts index 7985df9..d46ccbb 100644 --- a/src/core/db/schema.ts +++ b/src/core/db/schema.ts @@ -1,4 +1,4 @@ -import { sqliteTable, text, integer, blob, real, primaryKey, uniqueIndex } from 'drizzle-orm/sqlite-core' +import { sqliteTable, text, integer, blob, real, primaryKey, uniqueIndex, index } from 'drizzle-orm/sqlite-core' /** * Sub-file fragments produced by a chunker (function / fixed strategy). @@ -49,8 +49,13 @@ export const repos = sqliteTable('repos', { ephemeral: integer('ephemeral').notNull().default(0), /** Owning org (Phase 123 / multi-tenant-auth §5 Phase B); NULL = legacy unscoped repo (v28). */ orgId: integer('org_id').references(() => orgs.id), + /** 'private' (default) | 'public' — gates attach-as-reader auto-grants (Phase 126 / public-repo-sharing §5 Phase 1, v31). */ + visibility: text('visibility').notNull().default('private'), + /** The user whose registration request first created this repo (Phase 126, v31). NULL for repos created before this column existed. */ + ownerUserId: integer('owner_user_id').references(() => users.id), }, (table) => ({ uniqNormalizedUrl: uniqueIndex('idx_repos_normalized_url').on(table.normalizedUrl), + idxNormalizedUrlVisibility: index('idx_repos_normalized_url_visibility').on(table.normalizedUrl, table.visibility), })) export const savedQueries = sqliteTable('saved_queries', { @@ -495,6 +500,8 @@ export const repoGrants = sqliteTable('repo_grants', { branchPattern: text('branch_pattern'), grantedBy: integer('granted_by').notNull().references(() => users.id), createdAt: integer('created_at').notNull(), + /** Provenance of this grant, e.g. 'auto-public' for attach-as-reader grants auto-issued on registration (Phase 126, v31). NULL for manually-issued grants. */ + source: text('source'), }, (table) => ({ uniqGrant: uniqueIndex('idx_repo_grants_user_repo_branch').on(table.userId, table.repoId, table.branchPattern), })) diff --git a/src/core/db/sqlite.ts b/src/core/db/sqlite.ts index 7d15e62..97b4fe6 100644 --- a/src/core/db/sqlite.ts +++ b/src/core/db/sqlite.ts @@ -66,8 +66,11 @@ export interface DbSession { * (Phase 124 / multi-tenant-auth §5 Phase C) * 30 — Added audit_log table for the identity/authorization audit trail * (Phase 125 / multi-tenant-auth §5 Phase D) + * 31 — Added repos.visibility + repos.owner_user_id columns and a + * repo_grants.source column for the public/private repo-sharing layer + * (Phase 126 / public-repo-sharing §5 Phase 1) */ -export const CURRENT_SCHEMA_VERSION = 30 +export const CURRENT_SCHEMA_VERSION = 31 /** * Applies pending schema migrations and records the resulting version in the @@ -297,7 +300,8 @@ function initTables(sqlite: InstanceType): void { -- Multi-repo registry (Phase 41 / v14); db_path added in v15; -- normalized_url, clone_path, last_indexed_at, ephemeral added in v23 - -- (persistent server-side repo storage); org_id added in v28 (Phase 123) + -- (persistent server-side repo storage); org_id added in v28 (Phase 123); + -- visibility, owner_user_id added in v31 (Phase 126 / public-repo-sharing) CREATE TABLE IF NOT EXISTS repos ( id TEXT PRIMARY KEY, name TEXT NOT NULL, @@ -308,8 +312,11 @@ function initTables(sqlite: InstanceType): void { clone_path TEXT, last_indexed_at INTEGER, ephemeral INTEGER NOT NULL DEFAULT 0, - org_id INTEGER REFERENCES orgs(id) + org_id INTEGER REFERENCES orgs(id), + visibility TEXT NOT NULL DEFAULT 'private', + owner_user_id INTEGER REFERENCES users(id) ); + CREATE INDEX IF NOT EXISTS idx_repos_normalized_url_visibility ON repos(normalized_url, visibility); -- Per-repo access control tokens (review7 §4.1 / v21): token is stored as SHA-256 hash. -- token_prefix stores the first 8 chars of the original token for display/revoke lookup. @@ -428,6 +435,8 @@ function initTables(sqlite: InstanceType): void { ); CREATE INDEX IF NOT EXISTS idx_org_members_user ON org_members(user_id); + -- source added in v31 (Phase 126 / public-repo-sharing) — provenance for + -- auto-issued grants, e.g. 'auto-public' for attach-as-reader grants. CREATE TABLE IF NOT EXISTS repo_grants ( id INTEGER PRIMARY KEY AUTOINCREMENT, user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE, @@ -435,7 +444,8 @@ function initTables(sqlite: InstanceType): void { role TEXT NOT NULL, branch_pattern TEXT, granted_by INTEGER NOT NULL REFERENCES users(id), - created_at INTEGER NOT NULL + created_at INTEGER NOT NULL, + source TEXT ); CREATE UNIQUE INDEX IF NOT EXISTS idx_repo_grants_user_repo_branch ON repo_grants(user_id, repo_id, branch_pattern); diff --git a/src/core/indexing/repoRegistry.ts b/src/core/indexing/repoRegistry.ts index 2f20615..398af37 100644 --- a/src/core/indexing/repoRegistry.ts +++ b/src/core/indexing/repoRegistry.ts @@ -5,6 +5,9 @@ import { getActiveSession, openDatabaseAt, getOrOpenSessionAtPath, closeSessionA import { vectorSearch } from '../search/analysis/vectorSearch.js' import type { SearchResult, Embedding } from '../models/types.js' import { mergeSearchResults } from '../search/analysis/vectorSearch.js' +import { getConfigValue } from '../config/configManager.js' + +export type RepoVisibility = 'private' | 'public' export interface RepoEntry { id: string @@ -16,6 +19,10 @@ export interface RepoEntry { clonePath?: string | null lastIndexedAt?: number | null ephemeral?: boolean + /** 'private' (default) | 'public' — gates attach-as-reader auto-grants (Phase 126). */ + visibility?: RepoVisibility + /** The user whose registration request first created this repo (Phase 126). */ + ownerUserId?: number | null } export function addRepo(dbSession: ReturnType, id: string, name: string, url?: string | null, dbPath?: string | null): void { @@ -34,9 +41,11 @@ interface RepoRow { clone_path: string | null last_indexed_at: number | null ephemeral: number + visibility?: string | null + owner_user_id?: number | null } -const REPO_COLUMNS = 'id, name, url, db_path, added_at, normalized_url, clone_path, last_indexed_at, ephemeral' +const REPO_COLUMNS = 'id, name, url, db_path, added_at, normalized_url, clone_path, last_indexed_at, ephemeral, visibility, owner_user_id' function rowToRepoEntry(r: RepoRow): RepoEntry { return { @@ -49,6 +58,8 @@ function rowToRepoEntry(r: RepoRow): RepoEntry { clonePath: r.clone_path ?? undefined, lastIndexedAt: r.last_indexed_at ?? undefined, ephemeral: r.ephemeral === 1, + visibility: (r.visibility as RepoVisibility | null | undefined) ?? 'private', + ownerUserId: r.owner_user_id ?? undefined, } } @@ -231,13 +242,23 @@ export function registerPersistedRepo( clonePath: string dbPath: string ephemeral?: boolean + /** Only honored on first creation — see Phase 126 note below. */ + visibility?: RepoVisibility + /** Only honored on first creation — see Phase 126 note below. */ + ownerUserId?: number | null }, ): void { const { rawDb } = session const now = Math.floor(Date.now() / 1000) + // visibility/owner_user_id are deliberately omitted from the ON CONFLICT + // UPDATE clause below: this upsert runs on every (re-)index of an + // already-registered repo, and those two columns are first-claimer + // properties that must not be overwritten by a later touch (Phase 126 / + // public-repo-sharing §4.5 — ownership semantics stay exactly as today's + // existing dedup mechanism decides). rawDb.prepare(` - INSERT INTO repos (id, name, url, db_path, added_at, normalized_url, clone_path, last_indexed_at, ephemeral) - VALUES (?, ?, ?, ?, ?, ?, ?, NULL, ?) + INSERT INTO repos (id, name, url, db_path, added_at, normalized_url, clone_path, last_indexed_at, ephemeral, visibility, owner_user_id) + VALUES (?, ?, ?, ?, ?, ?, ?, NULL, ?, ?, ?) ON CONFLICT(id) DO UPDATE SET name = excluded.name, url = excluded.url, @@ -245,7 +266,54 @@ export function registerPersistedRepo( normalized_url = excluded.normalized_url, clone_path = excluded.clone_path, ephemeral = excluded.ephemeral - `).run(entry.id, entry.name, entry.url, entry.dbPath, now, entry.normalizedUrl, entry.clonePath, entry.ephemeral ? 1 : 0) + `).run( + entry.id, entry.name, entry.url, entry.dbPath, now, entry.normalizedUrl, entry.clonePath, entry.ephemeral ? 1 : 0, + entry.visibility ?? 'private', entry.ownerUserId ?? null, + ) +} + +/** + * Updates a repo's visibility flag. Only the repo's owner or a superadmin + * (in this codebase: anyone with local DB/CLI access — see + * `gitsema repos visibility`) may call this (Phase 126 / public-repo-sharing + * §4.1). Per the design doc's open revocation question (§7), flipping a + * repo back to private does not strip previously auto-issued `repo_grants` + * rows — they're left in place until explicitly revoked. + */ +export function setRepoVisibility( + session: ReturnType | ReturnType, + repoId: string, + visibility: RepoVisibility, +): void { + const { rawDb } = session + rawDb.prepare('UPDATE repos SET visibility = ? WHERE id = ?').run(visibility, repoId) +} + +/** + * Whether an unauthenticated/scoped caller may trigger first-time indexing + * of a brand-new repo registered as public (Phase 126 / public-repo-sharing + * §4.2). Default false; overridden by GITSEMA_PUBLIC_AUTO_INDEX or the + * `auth.allowPublicAutoIndex` config key. + */ +export function isPublicAutoIndexAllowed(cwd?: string): boolean { + const { value } = getConfigValue('auth.allowPublicAutoIndex', cwd) + if (value === undefined) return false + if (typeof value === 'boolean') return value + return String(value) === 'true' +} + +/** + * Minimum seconds between re-index triggers for the same public repo from + * non-owner callers (Phase 126 / public-repo-sharing §4.4 throttle). Default + * 300s — the design doc flags this as a placeholder guess, kept here so it's + * the single place to retune. Overridden by GITSEMA_MIN_REINDEX_INTERVAL_SECONDS + * or the `auth.minReindexIntervalSeconds` config key. + */ +export function getMinReindexIntervalSeconds(cwd?: string): number { + const { value } = getConfigValue('auth.minReindexIntervalSeconds', cwd) + if (value === undefined) return 300 + const n = typeof value === 'number' ? value : parseInt(String(value), 10) + return Number.isFinite(n) && n >= 0 ? n : 300 } /** Updates `last_indexed_at` to now for the given repo id. */ diff --git a/src/server/routes/remote.ts b/src/server/routes/remote.ts index 6c5c679..73e2855 100644 --- a/src/server/routes/remote.ts +++ b/src/server/routes/remote.ts @@ -35,7 +35,7 @@ import { getCloneSemaphore, sanitiseUrl, } from '../../core/git/cloneRepo.js' -import { getOrOpenLabeledDb, getOrOpenSessionAtPath, withDbSession } from '../../core/db/sqlite.js' +import { getActiveSession, getOrOpenLabeledDb, getOrOpenSessionAtPath, withDbSession } from '../../core/db/sqlite.js' import { getRegistrySession, getRepoClonePath, @@ -47,7 +47,11 @@ import { registerPersistedRepo, touchLastIndexed, withRepoLock, + isPublicAutoIndexAllowed, + getMinReindexIntervalSeconds, + type RepoVisibility, } from '../../core/indexing/repoRegistry.js' +import { createGrant, resolveUserRepoAccess } from '../../core/auth/grants.js' import { logger } from '../../utils/logger.js' // --------------------------------------------------------------------------- @@ -107,6 +111,12 @@ const RemoteIndexBodySchema = z.object({ persist: z.boolean().optional().default(true), /** Target a specific already-registered persisted repo explicitly. */ repoId: RepoIdSchema.optional(), + /** + * Visibility to register a brand-new persisted repo with. Only honored on + * first creation (Phase 126 / public-repo-sharing §4.1) — has no effect on + * an already-registered repo. Defaults to 'private'. + */ + visibility: z.enum(['private', 'public']).optional(), }).strict() // --------------------------------------------------------------------------- @@ -310,6 +320,31 @@ function scheduleJobCleanup(jobId: string): void { setTimeout(() => { _jobs.delete(jobId) }, JOB_TTL_MS).unref() } +// --------------------------------------------------------------------------- +// Refresh throttle (Phase 126 / public-repo-sharing §4.4) — limits how often +// a non-owner caller can trigger a re-index of an already-registered public +// repo. Keyed by `${repoId}:${userId}` so the repo's owner (and operator +// callers, who skip the check entirely) are never throttled. +// --------------------------------------------------------------------------- + +const _lastReindexTriggerAt = new Map() + +/** Returns remaining throttle seconds (0 if not throttled), and records this attempt if allowed. */ +function checkAndRecordReindexThrottle(repoId: string, userId: number, cwd?: string): number { + const key = `${repoId}:${userId}` + const minIntervalMs = getMinReindexIntervalSeconds(cwd) * 1000 + const now = Date.now() + const last = _lastReindexTriggerAt.get(key) + if (last !== undefined) { + const elapsed = now - last + if (elapsed < minIntervalMs) { + return Math.ceil((minIntervalMs - elapsed) / 1000) + } + } + _lastReindexTriggerAt.set(key, now) + return 0 +} + // --------------------------------------------------------------------------- // Core job runner // --------------------------------------------------------------------------- @@ -322,6 +357,9 @@ interface PersistentJobContext { normalizedUrl: string clonePath: string dbPath: string + /** Only meaningful on first creation — see registerPersistedRepo's ON CONFLICT note. */ + visibility: RepoVisibility + ownerUserId: number | null } async function runIndexJob( @@ -411,7 +449,7 @@ async function runIndexJob( if (persistent) { const registrySession = getRegistrySession() - registerPersistedRepo(registrySession, { + const repoRow = { id: persistent.repoId, name: persistent.name, url: persistent.url, @@ -419,8 +457,21 @@ async function runIndexJob( clonePath: persistent.clonePath, dbPath: persistent.dbPath, ephemeral: false, - }) + visibility: persistent.visibility, + } + // registry.db's own local `users` table is unrelated to whichever DB + // authMiddleware resolved req.userId against, so ownerUserId is + // deliberately omitted from this write — it would otherwise violate + // registry.db's own repos.owner_user_id FK (public-repo-sharing §4). + // registry.db remains the source of truth for clone/index paths only. + registerPersistedRepo(registrySession, repoRow) touchLastIndexed(registrySession, persistent.repoId) + // Mirror the repo row into the active (auth) DB too — the one + // authMiddleware resolves req.userId against, and the one + // orgs.ts's existing repo_grants endpoints already operate on + // (Phase 122-125) — including ownerUserId, which only this DB needs + // for the visibility/ownership/grant logic below (Phase 126). + registerPersistedRepo(getActiveSession(), { ...repoRow, ownerUserId: persistent.ownerUserId }) } logger.info( @@ -501,7 +552,7 @@ export function remoteRouter(options: RemoteRouterOptions): Router { return } - const { repoUrl, credentials, cloneDepth, indexOptions = {}, dbLabel, persist, repoId: requestedRepoId } = parsed.data + const { repoUrl, credentials, cloneDepth, indexOptions = {}, dbLabel, persist, repoId: requestedRepoId, visibility: requestedVisibility } = parsed.data // --- 2. SSRF guard ------------------------------------------------------- try { @@ -543,6 +594,59 @@ export function remoteRouter(options: RemoteRouterOptions): Router { } } + // Visibility/ownership are read from the active DB's mirrored repo row + // (getActiveSession()), not registrySession — registry.db is + // cwd-independent clone/index-path bookkeeping only and deliberately + // never stores ownerUserId (its own local `users` table is unrelated + // to whichever DB authMiddleware resolved req.userId against). The + // mirror row is written in runIndexJob after the first successful + // index of a repo (Phase 126 / public-repo-sharing §4). + const existingAuth = existing ? getRepo(getActiveSession(), existing.id) : null + + // --- 2c. First-index gate for brand-new public repos (Phase 126 §4.2) -- + // "Operator" callers (no req.userId — local CLI/global-key/no-auth- + // required requests) bypass the gate, mirroring the Phase 122-125 + // precedent that operator-equivalent access is a stronger trust tier + // than any network role. + const isOperator = req.userId === undefined + if (!existing && requestedVisibility === 'public' && !isOperator && !isPublicAutoIndexAllowed()) { + res.status(403).json({ + error: 'Registering new repos as public requires auth.allowPublicAutoIndex to be enabled', + }) + return + } + + // --- 2d. Refresh throttle for re-indexing an existing public repo + // by a non-owner caller (Phase 126 §4.4) ------------------------------- + if (existing && existingAuth?.visibility === 'public' && req.userId !== undefined && req.userId !== existingAuth.ownerUserId) { + const retryAfter = checkAndRecordReindexThrottle(existing.id, req.userId) + if (retryAfter > 0) { + res.setHeader('Retry-After', String(retryAfter)) + res.status(429).json({ error: 'Re-index triggered too recently for this repo', retryAfter }) + return + } + } + + // --- 2e. Attach-as-reader: auto-grant read access to a non-owner + // caller on an existing public repo they don't already have access to + // (Phase 126 §4.3 / public-repo-sharing — "auto-public" provenance) --- + const activeRawDb = getActiveSession().rawDb + if ( + existing && existingAuth?.visibility === 'public' && req.userId !== undefined && req.userId !== existingAuth.ownerUserId && + resolveUserRepoAccess(activeRawDb, req.userId, existing.id) === undefined + ) { + // Only issue the auto-grant when the user holds no applicable grant + // yet — createGrant() would otherwise overwrite a pre-existing + // higher-role (write/owner) all-branches grant with 'read'. + createGrant(activeRawDb, { + userId: req.userId, + repoId: existing.id, + role: 'read', + grantedBy: existingAuth.ownerUserId ?? req.userId, + source: 'auto-public', + }) + } + const repoId = existing?.id ?? deriveRepoId(normalizedUrl) persistent = { repoId, @@ -551,6 +655,8 @@ export function remoteRouter(options: RemoteRouterOptions): Router { normalizedUrl, clonePath: existing?.clonePath ?? getRepoClonePath(repoId), dbPath: existing?.dbPath ?? getRepoDbPath(repoId), + visibility: existingAuth?.visibility ?? requestedVisibility ?? 'private', + ownerUserId: existingAuth?.ownerUserId ?? req.userId ?? null, } } diff --git a/tests/remoteIndexPersistence.test.ts b/tests/remoteIndexPersistence.test.ts index 1f86007..b3d1aa0 100644 --- a/tests/remoteIndexPersistence.test.ts +++ b/tests/remoteIndexPersistence.test.ts @@ -48,6 +48,9 @@ vi.mock('../src/core/indexing/indexer.js', () => ({ import { createApp } from '../src/server/app.js' import type { EmbeddingProvider } from '../src/core/embedding/provider.js' import { getRegistrySession, closeRegistrySession, normalizeRepoUrl, deriveRepoId, registerPersistedRepo, getRepoClonePath, getRepoDbPath } from '../src/core/indexing/repoRegistry.js' +import { createUser, createSession } from '../src/core/auth/identity.js' +import { resolveUserRepoAccess, listGrants } from '../src/core/auth/grants.js' +import { getRawDb, getActiveSession } from '../src/core/db/sqlite.js' const mockProvider: EmbeddingProvider = { model: 'mock', @@ -198,3 +201,221 @@ describe('POST /api/v1/remote/index — token scoping', () => { expect(repoId).toBeTruthy() // sanity: repoId derivation didn't throw }) }) + +describe('POST /api/v1/remote/index — public repo sharing (Phase 126)', () => { + afterEach(() => { + delete process.env.GITSEMA_PUBLIC_AUTO_INDEX + }) + + it('rejects a regular user registering a brand-new public repo when the gate is off', async () => { + delete process.env.GITSEMA_PUBLIC_AUTO_INDEX + const rawDb = getRawDb() + const alice = createUser(rawDb, 'alice-gate-off', 'pw') + const token = createSession(rawDb, alice.id).token + + const res = await request(app) + .post('/api/v1/remote/index') + .set('Authorization', `Bearer ${token}`) + .send({ repoUrl: 'https://github.com/example/gate-off-new-public.git', visibility: 'public' }) + .expect(403) + + expect(res.body.error).toMatch(/allowPublicAutoIndex/) + }) + + it('allows a regular user to register a brand-new public repo when the gate is on', async () => { + process.env.GITSEMA_PUBLIC_AUTO_INDEX = 'true' + const rawDb = getRawDb() + const alice = createUser(rawDb, 'alice-gate-on', 'pw') + const token = createSession(rawDb, alice.id).token + + const res = await request(app) + .post('/api/v1/remote/index') + .set('Authorization', `Bearer ${token}`) + .send({ repoUrl: 'https://github.com/example/gate-on-new-public.git', visibility: 'public' }) + .expect(202) + + expect(res.body.repoId).toBeTruthy() + }) + + it('an operator (no req.userId) bypasses the first-index gate for a new public repo', async () => { + delete process.env.GITSEMA_PUBLIC_AUTO_INDEX + const res = await request(app) + .post('/api/v1/remote/index') + .send({ repoUrl: 'https://github.com/example/operator-new-public.git', visibility: 'public' }) + .expect(202) + + expect(res.body.repoId).toBeTruthy() + }) + + it('auto-grants read access (attach-as-reader) to a non-owner user on an existing public repo', async () => { + const rawDb = getRawDb() + const owner = createUser(rawDb, 'owner-attach', 'pw') + const repoUrl = 'https://github.com/example/public-attach-repo.git' + const normalizedUrl = normalizeRepoUrl(repoUrl) + const repoId = deriveRepoId(normalizedUrl) + registerPersistedRepo(getRegistrySession(), { + id: repoId, + name: 'public-attach-repo', + url: repoUrl, + normalizedUrl, + clonePath: getRepoClonePath(repoId), + dbPath: getRepoDbPath(repoId), + visibility: 'public', + }) + registerPersistedRepo(getActiveSession(), { + id: repoId, + name: 'public-attach-repo', + url: repoUrl, + normalizedUrl, + clonePath: getRepoClonePath(repoId), + dbPath: getRepoDbPath(repoId), + visibility: 'public', + ownerUserId: owner.id, + }) + + const reader = createUser(rawDb, 'reader-attach', 'pw') + const readerToken = createSession(rawDb, reader.id).token + + expect(resolveUserRepoAccess(rawDb, reader.id, repoId)).toBeUndefined() + + await request(app) + .post('/api/v1/remote/index') + .set('Authorization', `Bearer ${readerToken}`) + .send({ repoUrl, repoId }) + .expect(202) + + expect(resolveUserRepoAccess(rawDb, reader.id, repoId)).toBe('read') + const grants = listGrants(rawDb, repoId) + const autoGrant = grants.find((g) => g.userId === reader.id) + expect(autoGrant?.source).toBe('auto-public') + }) + + it('does not downgrade an existing higher-role grant when attaching as reader', async () => { + const rawDb = getRawDb() + const owner = createUser(rawDb, 'owner-nodowngrade', 'pw') + const repoUrl = 'https://github.com/example/public-nodowngrade-repo.git' + const normalizedUrl = normalizeRepoUrl(repoUrl) + const repoId = deriveRepoId(normalizedUrl) + registerPersistedRepo(getRegistrySession(), { + id: repoId, + name: 'public-nodowngrade-repo', + url: repoUrl, + normalizedUrl, + clonePath: getRepoClonePath(repoId), + dbPath: getRepoDbPath(repoId), + visibility: 'public', + }) + registerPersistedRepo(getActiveSession(), { + id: repoId, + name: 'public-nodowngrade-repo', + url: repoUrl, + normalizedUrl, + clonePath: getRepoClonePath(repoId), + dbPath: getRepoDbPath(repoId), + visibility: 'public', + ownerUserId: owner.id, + }) + + const writer = createUser(rawDb, 'writer-nodowngrade', 'pw') + const writerToken = createSession(rawDb, writer.id).token + const { createGrant } = await import('../src/core/auth/grants.js') + createGrant(rawDb, { userId: writer.id, repoId, role: 'write', grantedBy: owner.id }) + + await request(app) + .post('/api/v1/remote/index') + .set('Authorization', `Bearer ${writerToken}`) + .send({ repoUrl, repoId }) + .expect(202) + + expect(resolveUserRepoAccess(rawDb, writer.id, repoId)).toBe('write') + }) + + it('throttles a non-owner re-index trigger on a public repo within the configured interval', async () => { + process.env.GITSEMA_MIN_REINDEX_INTERVAL_SECONDS = '3600' + const rawDb = getRawDb() + const owner = createUser(rawDb, 'owner-throttle', 'pw') + const repoUrl = 'https://github.com/example/public-throttle-repo.git' + const normalizedUrl = normalizeRepoUrl(repoUrl) + const repoId = deriveRepoId(normalizedUrl) + registerPersistedRepo(getRegistrySession(), { + id: repoId, + name: 'public-throttle-repo', + url: repoUrl, + normalizedUrl, + clonePath: getRepoClonePath(repoId), + dbPath: getRepoDbPath(repoId), + visibility: 'public', + }) + registerPersistedRepo(getActiveSession(), { + id: repoId, + name: 'public-throttle-repo', + url: repoUrl, + normalizedUrl, + clonePath: getRepoClonePath(repoId), + dbPath: getRepoDbPath(repoId), + visibility: 'public', + ownerUserId: owner.id, + }) + + const caller = createUser(rawDb, 'caller-throttle', 'pw') + const callerToken = createSession(rawDb, caller.id).token + + await request(app) + .post('/api/v1/remote/index') + .set('Authorization', `Bearer ${callerToken}`) + .send({ repoUrl, repoId }) + .expect(202) + + const res = await request(app) + .post('/api/v1/remote/index') + .set('Authorization', `Bearer ${callerToken}`) + .send({ repoUrl, repoId }) + .expect(429) + + expect(res.headers['retry-after']).toBeTruthy() + delete process.env.GITSEMA_MIN_REINDEX_INTERVAL_SECONDS + }) + + it('does not throttle the repo owner re-indexing their own public repo', async () => { + process.env.GITSEMA_MIN_REINDEX_INTERVAL_SECONDS = '3600' + const rawDb = getRawDb() + const owner = createUser(rawDb, 'owner-no-throttle', 'pw') + const ownerToken = createSession(rawDb, owner.id).token + const repoUrl = 'https://github.com/example/public-owner-no-throttle-repo.git' + const normalizedUrl = normalizeRepoUrl(repoUrl) + const repoId = deriveRepoId(normalizedUrl) + registerPersistedRepo(getRegistrySession(), { + id: repoId, + name: 'public-owner-no-throttle-repo', + url: repoUrl, + normalizedUrl, + clonePath: getRepoClonePath(repoId), + dbPath: getRepoDbPath(repoId), + visibility: 'public', + }) + registerPersistedRepo(getActiveSession(), { + id: repoId, + name: 'public-owner-no-throttle-repo', + url: repoUrl, + normalizedUrl, + clonePath: getRepoClonePath(repoId), + dbPath: getRepoDbPath(repoId), + visibility: 'public', + ownerUserId: owner.id, + }) + + await request(app) + .post('/api/v1/remote/index') + .set('Authorization', `Bearer ${ownerToken}`) + .send({ repoUrl, repoId }) + .expect(202) + + await request(app) + .post('/api/v1/remote/index') + .set('Authorization', `Bearer ${ownerToken}`) + .send({ repoUrl, repoId }) + .expect(202) + + delete process.env.GITSEMA_MIN_REINDEX_INTERVAL_SECONDS + }) +}) From d4364414cd0372091cd3567b45beada508e2f4cb Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 23 Jun 2026 12:24:31 +0000 Subject: [PATCH 2/4] refactor: simplify public-repo-sharing throttle/grant logic and test fixtures Hoist the duplicated non-owner-on-public-repo guard into a single named boolean, add TTL cleanup for the reindex throttle map (matching the existing jobs map pattern), and extract repeated test fixture setup into a shared helper. Co-Authored-By: Claude Sonnet 4.6 Claude-Session: https://claude.ai/code/session_01BYNogpqPb8oyTF3B5Ebu5p --- src/server/routes/remote.ts | 23 ++++-- tests/remoteIndexPersistence.test.ts | 110 +++++++-------------------- 2 files changed, 42 insertions(+), 91 deletions(-) diff --git a/src/server/routes/remote.ts b/src/server/routes/remote.ts index 73e2855..b508bd3 100644 --- a/src/server/routes/remote.ts +++ b/src/server/routes/remote.ts @@ -342,6 +342,9 @@ function checkAndRecordReindexThrottle(repoId: string, userId: number, cwd?: str } } _lastReindexTriggerAt.set(key, now) + // Once the throttle window elapses the entry is dead weight — same TTL-cleanup + // pattern as the `_jobs` map above, sized to this call's own interval. + setTimeout(() => { _lastReindexTriggerAt.delete(key) }, minIntervalMs).unref() return 0 } @@ -616,10 +619,16 @@ export function remoteRouter(options: RemoteRouterOptions): Router { return } + // 2d and 2e share this precondition: an authenticated caller who is + // not the owner, acting on an already-registered public repo. + const isNonOwnerOnExistingPublicRepo = Boolean( + existing && existingAuth?.visibility === 'public' && req.userId !== undefined && req.userId !== existingAuth.ownerUserId, + ) + // --- 2d. Refresh throttle for re-indexing an existing public repo // by a non-owner caller (Phase 126 §4.4) ------------------------------- - if (existing && existingAuth?.visibility === 'public' && req.userId !== undefined && req.userId !== existingAuth.ownerUserId) { - const retryAfter = checkAndRecordReindexThrottle(existing.id, req.userId) + if (isNonOwnerOnExistingPublicRepo) { + const retryAfter = checkAndRecordReindexThrottle(existing!.id, req.userId!) if (retryAfter > 0) { res.setHeader('Retry-After', String(retryAfter)) res.status(429).json({ error: 'Re-index triggered too recently for this repo', retryAfter }) @@ -632,17 +641,17 @@ export function remoteRouter(options: RemoteRouterOptions): Router { // (Phase 126 §4.3 / public-repo-sharing — "auto-public" provenance) --- const activeRawDb = getActiveSession().rawDb if ( - existing && existingAuth?.visibility === 'public' && req.userId !== undefined && req.userId !== existingAuth.ownerUserId && - resolveUserRepoAccess(activeRawDb, req.userId, existing.id) === undefined + isNonOwnerOnExistingPublicRepo && + resolveUserRepoAccess(activeRawDb, req.userId!, existing!.id) === undefined ) { // Only issue the auto-grant when the user holds no applicable grant // yet — createGrant() would otherwise overwrite a pre-existing // higher-role (write/owner) all-branches grant with 'read'. createGrant(activeRawDb, { - userId: req.userId, - repoId: existing.id, + userId: req.userId!, + repoId: existing!.id, role: 'read', - grantedBy: existingAuth.ownerUserId ?? req.userId, + grantedBy: existingAuth!.ownerUserId ?? req.userId!, source: 'auto-public', }) } diff --git a/tests/remoteIndexPersistence.test.ts b/tests/remoteIndexPersistence.test.ts index b3d1aa0..d35e150 100644 --- a/tests/remoteIndexPersistence.test.ts +++ b/tests/remoteIndexPersistence.test.ts @@ -59,6 +59,28 @@ const mockProvider: EmbeddingProvider = { dimensions: 4, } +/** + * Registers an existing public repo in both the registry DB and the active DB, + * mirroring the dual-write the route handler performs (ownerUserId only lands + * in the active DB — see CLAUDE.md's two-database note). + */ +function registerPublicRepoFixture(repoUrl: string, name: string, ownerUserId: number) { + const normalizedUrl = normalizeRepoUrl(repoUrl) + const repoId = deriveRepoId(normalizedUrl) + const base = { + id: repoId, + name, + url: repoUrl, + normalizedUrl, + clonePath: getRepoClonePath(repoId), + dbPath: getRepoDbPath(repoId), + visibility: 'public' as const, + } + registerPersistedRepo(getRegistrySession(), base) + registerPersistedRepo(getActiveSession(), { ...base, ownerUserId }) + return repoId +} + let app: ReturnType beforeAll(() => { @@ -251,27 +273,7 @@ describe('POST /api/v1/remote/index — public repo sharing (Phase 126)', () => const rawDb = getRawDb() const owner = createUser(rawDb, 'owner-attach', 'pw') const repoUrl = 'https://github.com/example/public-attach-repo.git' - const normalizedUrl = normalizeRepoUrl(repoUrl) - const repoId = deriveRepoId(normalizedUrl) - registerPersistedRepo(getRegistrySession(), { - id: repoId, - name: 'public-attach-repo', - url: repoUrl, - normalizedUrl, - clonePath: getRepoClonePath(repoId), - dbPath: getRepoDbPath(repoId), - visibility: 'public', - }) - registerPersistedRepo(getActiveSession(), { - id: repoId, - name: 'public-attach-repo', - url: repoUrl, - normalizedUrl, - clonePath: getRepoClonePath(repoId), - dbPath: getRepoDbPath(repoId), - visibility: 'public', - ownerUserId: owner.id, - }) + const repoId = registerPublicRepoFixture(repoUrl, 'public-attach-repo', owner.id) const reader = createUser(rawDb, 'reader-attach', 'pw') const readerToken = createSession(rawDb, reader.id).token @@ -294,27 +296,7 @@ describe('POST /api/v1/remote/index — public repo sharing (Phase 126)', () => const rawDb = getRawDb() const owner = createUser(rawDb, 'owner-nodowngrade', 'pw') const repoUrl = 'https://github.com/example/public-nodowngrade-repo.git' - const normalizedUrl = normalizeRepoUrl(repoUrl) - const repoId = deriveRepoId(normalizedUrl) - registerPersistedRepo(getRegistrySession(), { - id: repoId, - name: 'public-nodowngrade-repo', - url: repoUrl, - normalizedUrl, - clonePath: getRepoClonePath(repoId), - dbPath: getRepoDbPath(repoId), - visibility: 'public', - }) - registerPersistedRepo(getActiveSession(), { - id: repoId, - name: 'public-nodowngrade-repo', - url: repoUrl, - normalizedUrl, - clonePath: getRepoClonePath(repoId), - dbPath: getRepoDbPath(repoId), - visibility: 'public', - ownerUserId: owner.id, - }) + const repoId = registerPublicRepoFixture(repoUrl, 'public-nodowngrade-repo', owner.id) const writer = createUser(rawDb, 'writer-nodowngrade', 'pw') const writerToken = createSession(rawDb, writer.id).token @@ -335,27 +317,7 @@ describe('POST /api/v1/remote/index — public repo sharing (Phase 126)', () => const rawDb = getRawDb() const owner = createUser(rawDb, 'owner-throttle', 'pw') const repoUrl = 'https://github.com/example/public-throttle-repo.git' - const normalizedUrl = normalizeRepoUrl(repoUrl) - const repoId = deriveRepoId(normalizedUrl) - registerPersistedRepo(getRegistrySession(), { - id: repoId, - name: 'public-throttle-repo', - url: repoUrl, - normalizedUrl, - clonePath: getRepoClonePath(repoId), - dbPath: getRepoDbPath(repoId), - visibility: 'public', - }) - registerPersistedRepo(getActiveSession(), { - id: repoId, - name: 'public-throttle-repo', - url: repoUrl, - normalizedUrl, - clonePath: getRepoClonePath(repoId), - dbPath: getRepoDbPath(repoId), - visibility: 'public', - ownerUserId: owner.id, - }) + const repoId = registerPublicRepoFixture(repoUrl, 'public-throttle-repo', owner.id) const caller = createUser(rawDb, 'caller-throttle', 'pw') const callerToken = createSession(rawDb, caller.id).token @@ -382,27 +344,7 @@ describe('POST /api/v1/remote/index — public repo sharing (Phase 126)', () => const owner = createUser(rawDb, 'owner-no-throttle', 'pw') const ownerToken = createSession(rawDb, owner.id).token const repoUrl = 'https://github.com/example/public-owner-no-throttle-repo.git' - const normalizedUrl = normalizeRepoUrl(repoUrl) - const repoId = deriveRepoId(normalizedUrl) - registerPersistedRepo(getRegistrySession(), { - id: repoId, - name: 'public-owner-no-throttle-repo', - url: repoUrl, - normalizedUrl, - clonePath: getRepoClonePath(repoId), - dbPath: getRepoDbPath(repoId), - visibility: 'public', - }) - registerPersistedRepo(getActiveSession(), { - id: repoId, - name: 'public-owner-no-throttle-repo', - url: repoUrl, - normalizedUrl, - clonePath: getRepoClonePath(repoId), - dbPath: getRepoDbPath(repoId), - visibility: 'public', - ownerUserId: owner.id, - }) + const repoId = registerPublicRepoFixture(repoUrl, 'public-owner-no-throttle-repo', owner.id) await request(app) .post('/api/v1/remote/index') From 1dbccbbb30daa061c5e79ee9c7748de13d6dd4b5 Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 23 Jun 2026 12:48:52 +0000 Subject: [PATCH 3/4] docs: require simplify passes to log deferred findings in feature-ideas.md Adds a CLAUDE.md rule so skipped /simplify findings aren't lost to chat history, and backfills the two deferred findings from the Phase 126/127 public-repo-sharing simplify pass. Co-Authored-By: Claude Sonnet 4.6 Claude-Session: https://claude.ai/code/session_01BYNogpqPb8oyTF3B5Ebu5p --- CLAUDE.md | 9 +++++++++ docs/feature-ideas.md | 47 ++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 55 insertions(+), 1 deletion(-) diff --git a/CLAUDE.md b/CLAUDE.md index fb8752e..964c51e 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -27,6 +27,15 @@ When implementing a new feature or phase: --- +## Simplify passes + +When running a `/simplify` pass, any finding that gets **skipped** (out of scope, +too large a refactor, judged not worth doing now) must be written up as an entry +in **`docs/feature-ideas.md`** rather than just mentioned in the chat summary — +otherwise the finding is lost once the session ends. + +--- + ## Releases & changesets This repo uses [changesets](https://github.com/changesets/changesets) for versioning, `CHANGELOG.md` generation, and npm publishing (OIDC trusted publishing — no npm token). diff --git a/docs/feature-ideas.md b/docs/feature-ideas.md index d042620..19a2f02 100644 --- a/docs/feature-ideas.md +++ b/docs/feature-ideas.md @@ -2,7 +2,7 @@ This document tracks upcoming feature ideas that are **not yet in active development** (not in `PLAN.md`) and haven't been **fully designed** (no design file). It's a staging area for "what now?" questions and medium-term product direction. -**Last updated:** 2026-06-23 (added audit log coverage enforcement idea, found during the Phase 125 `/simplify` review; refined public-repo sharing's access-control half into `docs/public-repo-sharing-plan.md` and the superadmin-locked model set idea into `docs/locked-model-set-plan.md`; kept cross-repo blob dedup as an open idea) +**Last updated:** 2026-06-23 (added the public-repo-sharing throttle/policy-extraction idea, found during the Phase 126/127 `/simplify` review; added audit log coverage enforcement idea, found during the Phase 125 `/simplify` review; refined public-repo sharing's access-control half into `docs/public-repo-sharing-plan.md` and the superadmin-locked model set idea into `docs/locked-model-set-plan.md`; kept cross-repo blob dedup as an open idea) **Audience:** Developers considering next phases; product planning > **Note:** As of this update, the LSP/MCP remote-delegation foundation this @@ -473,6 +473,51 @@ shapes (none chosen yet): --- +## Public Repo Sharing: Throttle/Rate-Limit Unification & Policy Extraction + +### Problem +- Found during the `/simplify` review of Phase 126/127 (public-repo-sharing). + `checkAndRecordReindexThrottle()` (`src/server/routes/remote.ts`) is a + bespoke per-`(repoId, userId)` cooldown map, separate from the generic + abuse-prevention rate limiter in `src/server/middleware/rateLimiter.ts` + (`express-rate-limit`, keyed by Bearer token/IP, fixed window). They serve + different concerns today — one is a global RPM cap, the other a + business-rule re-index cooldown — but having two independent + rate-limiting mechanisms in the same route module is worth revisiting if + a third throttle-shaped requirement shows up. +- The same review flagged that the 2c/2d/2e public-repo gate/throttle/grant + logic in the `POST /api/v1/remote/index` handler (first-index gate, + refresh throttle, attach-as-reader auto-grant) could be extracted into a + single `applyPublicRepoPolicy()` function for readability, but that was + judged too large a restructuring for a cleanup pass on already-shipped + code. + +### Intended Behavior +No design committed yet. Two independent, optional follow-ups: +- If a third per-key throttle need appears, consider whether a shared + generic "keyed cooldown" utility (used by both `rateLimiter.ts` and + `checkAndRecordReindexThrottle`) is worth building, vs. keeping them + separate as distinct concerns. +- Extract the public-repo gate/throttle/grant sequence in + `src/server/routes/remote.ts` into a named `applyPublicRepoPolicy()` (or + similar) function once it grows another condition or gets touched again, + rather than as a standalone refactor now. + +### Design Gaps +- [ ] Whether a generic keyed-cooldown abstraction is worth the indirection + given only one current caller (`checkAndRecordReindexThrottle`). +- [ ] Where the line is for "policy extraction" — at what point does the + 2c/2d/2e sequence justify its own function vs. staying inline. + +### Effort Estimate +- Small either way — both are isolated, mechanical refactors with existing + test coverage to verify against. + +### Prerequisites +- None — both are optional cleanups on already-shipped Phase 126/127 code. + +--- + ## Related Issues & Documents - **Parity tracking:** See `docs/parity.md` for tool availability across interfaces From fb8728b2241b25c6f25b8db4d65a45ebd1350a97 Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 23 Jun 2026 12:51:43 +0000 Subject: [PATCH 4/4] docs: note send_later/CI-webhook gaps for PR babysitting in this environment Records that this environment lacks send_later and that CI success doesn't trigger a webhook, so PR-watching sessions need a manual ~7-minute poll loop instead. Co-Authored-By: Claude Sonnet 4.6 Claude-Session: https://claude.ai/code/session_01BYNogpqPb8oyTF3B5Ebu5p --- CLAUDE.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/CLAUDE.md b/CLAUDE.md index 964c51e..7da8723 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -36,6 +36,21 @@ otherwise the finding is lost once the session ends. --- +## PR babysitting in this environment + +The `send_later` tool (claude-code-remote MCP server) is **not available** in +this environment. CI turning green also does **not** trigger a webhook event — +only CI failures, new review comments, and similar activity do. This means a +subscribed PR can sit at "CI passed" indefinitely with no event to notice it. + +When babysitting/watching a PR here, compensate by running a short polling +wait (e.g. a backgrounded `sleep ~7m` via Bash `run_in_background`, repeated as +needed) and re-checking CI status (`pull_request_read` → `get_status` / +`get_check_runs`) after each wait, instead of relying on `send_later` or +webhook delivery alone. + +--- + ## Releases & changesets This repo uses [changesets](https://github.com/changesets/changesets) for versioning, `CHANGELOG.md` generation, and npm publishing (OIDC trusted publishing — no npm token).