Skip to content

fix(attribution): normalize repo identity at every write boundary#10

Merged
iaj6 merged 2 commits into
mainfrom
fix/repo-normalization
Jun 8, 2026
Merged

fix(attribution): normalize repo identity at every write boundary#10
iaj6 merged 2 commits into
mainfrom
fix/repo-normalization

Conversation

@iaj6

@iaj6 iaj6 commented Jun 8, 2026

Copy link
Copy Markdown
Owner

What

First of the two attribution root-cause fixes — moves repo bucketing from cleanup-after to correct-at-write.

A single git repo could be persisted under several strings: a remote URL (SSH/HTTPS, ±.git), a bare owner/repo slug, or a directory basename. Because GitHub slugs are case-insensitive but case-preserving, Acme/Repo and acme/repo fragmented into separate analytics buckets. The cleanup --remap-repo backfill mopped this up after the fact; this fixes it at the source.

Change

New normalizeRepo() in core → canonical lowercase owner/name (or a lowercased single basename), strictly idempotent. Applied at every repo write boundary:

Boundary File
getCurrentRepo() (covers wrap + hooks) cli/git.ts
run start --repo / job submit --repo (operator-typed) cli/commands/run.ts, job.ts
SDK inbound environment.repo web/.../sdk/runs/route.ts

The SDK route now also rejects a non-empty repo that normalizes to "" (e.g. a host-only URL) instead of persisting an empty bucket. Read-side symmetry: run list / job list --repo filters normalize their input too, so a mixed-case filter still matches.

The --remap-repo backfill is kept — still the only fix for historical rows.

Decision

Canonical form = lowercase owner/name (per owner's call). Tradeoff: repo names render lowercase. Matches the existing backfill's de-facto target.

Known limitation

normalizeRepo can't merge a bare basename (agentops, from a repo with no git remote) into iaj6/agentops — there's no owner to recover. Repos with a remote (the trial) get owner/name and dedupe fully.

Tests

normalizeRepo unit suite (URL/SSH/slug/basename/idempotency/dedup/empty), getCurrentRepo lowercasing, CLI run start/job submit write-path normalization, SDK inbound normalize + empty-repo rejection. 1241 pass, lint clean, build green.

Review

Adversarial 2-lens review (completeness/consumer-safety + correctness) — caught two real gaps now fixed: the run start/job submit write paths were initially missed, and confirmed no consumer depends on the raw case-preserving value (--remap-repo seeds raw history; pr/link derive repo from gh, not environment.repo; analytics only group by the stored string).

Part 1 of 2 (attribution root causes). PR A (NULL user_id) follows.

🤖 Generated with Claude Code

iaj6 and others added 2 commits June 8, 2026 15:08
Same git repo could be persisted under several strings — a remote URL
(SSH/HTTPS, ±.git), a bare owner/repo slug, or a directory basename — and
since GitHub slugs are case-insensitive but case-preserving, Acme/Repo and
acme/repo fragmented into separate analytics buckets. This is the
correct-at-write half of the attribution work; the cleanup --remap-repo
backfill stays for historical rows.

Adds normalizeRepo() in core (canonical lowercase owner/name, idempotent)
and applies it at ALL repo write boundaries:
- cli/git.ts getCurrentRepo() (covers wrap + hooks)
- cli run start --repo and job submit --repo (operator-typed values)
- web SDK runs route (caller-supplied environment.repo), which also now
  rejects a non-empty repo that normalizes to "" (e.g. a host-only URL)
  rather than persisting an empty bucket.

Read-side symmetry: run list / job list --repo filters normalize their
input too, so a mixed-case filter still matches normalized rows.

Tests: normalizeRepo unit suite (URL/SSH/slug/basename/idempotency/dedup),
getCurrentRepo lowercasing, CLI run-start/job-submit write-path
normalization, and SDK inbound normalize + empty-repo rejection. Full
suite 1241 pass, lint clean, build green.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
The wrap command's auto-detected repo already normalizes via getCurrentRepo,
but the explicit `wrap --repo <value>` override bypassed it and persisted the
raw string — the same write-boundary gap fixed for run start / job submit.
Wrap the whole `opts.repo ?? getCurrentRepo()` expression in normalizeRepo
(idempotent, so the auto-detected path is unaffected).

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@iaj6 iaj6 merged commit 51c24e5 into main Jun 8, 2026
3 checks passed
@iaj6 iaj6 deleted the fix/repo-normalization branch June 8, 2026 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant