gl clone: clean partial clone of repos with private subtrees#33
gl clone: clean partial clone of repos with private subtrees#33beardthelion wants to merge 4 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds backend helpers and a new REST endpoint that expose per-caller withheld and reinclude path globs, and a ChangesBackend Withheld Paths Support
CLI Clone Command with Partial Clone Support
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/gitlawb-node/src/api/visibility.rs`:
- Around line 204-212: The handler currently returns withheld metadata without
verifying the caller can read the repository root; before calling
crate::visibility::withheld_globs or returning the JSON, perform a root-read
gate by calling the existing visibility_check (or equivalent) with path "/"
using the same auth, rules, record.is_public, and record.owner_did; if
visibility_check(..., "/") denies access, return the repo-not-found/unauthorized
response used elsewhere (same error path as other root-read failures) instead of
returning the withheld data. Ensure this check is placed after fetching rules
(list_visibility_rules) but before invoking withheld_globs and constructing the
Json response.
In `@crates/gitlawb-node/src/visibility.rs`:
- Around line 114-119: The current withheld_globs filtering uses a single
representative probe (glob_prefix) and calls visibility_check, which
misclassifies a denied parent when a more-specific child rule allows access;
update the logic in withheld_globs (or change its API contract) to detect
allow-overrides by checking for any more-specific descendant allow before
emitting the parent glob: either (A) when
visibility_check(glob_prefix(&r.path_glob)) == Decision::Deny, scan rules for
any child globs under r.path_glob that yield Decision::Allow (using
visibility_check on those child probes) and skip emitting the parent if any
exist, or (B) change the return structure to emit tuples like (denying_glob,
allowed_exceptions) so the caller can re-include allowed descendants; reference
visibility_check, glob_prefix, Decision::Deny, and the withheld_globs helper to
locate where to implement this.
In `@crates/gl/src/clone.rs`:
- Around line 133-144: The current parsing uses split_once('/') which accepts
inputs like "owner/name/extra"; update the parsing logic in clone.rs (the block
that computes owner and name from stripped) to reject any repo string containing
more than one slash: after trimming trailing slashes and calling
split_once('/'), validate that neither owner nor name contains another '/' (or
alternately check that the count of '/' is exactly one) and bail! with the same
error message if extra slashes are present so malformed inputs like
"owner/name/extra" fail fast instead of producing an invalid repo identifier.
- Around line 147-171: fetch_withheld currently swallows all errors and returns
an empty Vec, which allows clones to proceed when the withheld-paths endpoint
actually failed; change fetch_withheld to return a Result<Vec<String>,
anyhow::Error> (or your crate's error type) instead of Vec<String>, and
propagate any network/HTTP/json errors to the caller except for an explicit
“endpoint unsupported” HTTP status (e.g. 404 Not Found or 501 Not Implemented)
where you should return Ok(Vec::new()). Concretely: update fetch_withheld
signature, stop using unwrap_or_default on resp.json(), treat Err(resp) and
non-success statuses as Err unless status is 404/501 (in which case return
Ok([])), and update callers of fetch_withheld to handle the Result; refer to
load_keypair_from_dir, NodeClient::get_signed/get and the path
"/api/v1/repos/{owner}/{name}/withheld-paths" to locate the code to change.
- Around line 112-121: The current logic in clone.rs extracts the default branch
by parsing stdout from `git remote show origin` (variables `out`, `text`,
`head`) and does not check `out.status.success()`, which breaks for localized
output or non-zero exit; replace this with a refs-based lookup by running `git
symbolic-ref --short refs/remotes/origin/HEAD` (or equivalent) and check the
child process exit status (`out.status.success()`), then read and trim stdout to
produce the branch name, returning an error if the command fails or stdout is
empty instead of relying on localized `HEAD branch:` parsing.
- Around line 99-105: The loop in setup_partial_clone that converts
withheld_globs currently always strips a trailing "**" and emits only a
directory-exclude like "!{dir}/", which fails to exclude the exact path when the
upstream sent "/prefix" (no "/**"). Change the logic inside the for g in
withheld_globs loop: detect whether g originally ended with "**" (or "/**"); if
it did keep the existing behavior (emit "!{dir}/"), but if it did not then emit
two exclude lines for that dir—one for the exact path ("!{dir}") and one for the
subtree ("!{dir}/")—so the sparse-checkout encoding matches visibility_check
semantics; update/add a unit/integration test that calls setup_partial_clone (or
the surrounding code) with a withheld glob like "/docs/private" to assert both
the exact path and subtree are withheld.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 43094733-35b0-4310-8cae-ee86146445f9
📒 Files selected for processing (5)
crates/gitlawb-node/src/api/visibility.rscrates/gitlawb-node/src/server.rscrates/gitlawb-node/src/visibility.rscrates/gl/src/clone.rscrates/gl/src/main.rs
| let (owner, name) = stripped | ||
| .trim_end_matches('/') | ||
| .split_once('/') | ||
| .context("repo must be <owner>/<name> or gitlawb://<owner>/<name>")?; | ||
| if owner.is_empty() || name.is_empty() { | ||
| bail!("repo must be <owner>/<name> or gitlawb://<owner>/<name>"); | ||
| } | ||
| Ok(( | ||
| format!("gitlawb://{owner}/{name}"), | ||
| owner.to_string(), | ||
| name.to_string(), | ||
| )) |
There was a problem hiding this comment.
Reject repo inputs with more than one slash.
split_once('/') accepts owner/name/extra, and that name then flows into /api/v1/repos/{owner}/{name}/withheld-paths, which silently degrades to [] on the fetch path. This should fail fast as malformed input instead of building an invalid repo identifier.
Suggested fix
if owner.is_empty() || name.is_empty() {
bail!("repo must be <owner>/<name> or gitlawb://<owner>/<name>");
}
+ if name.contains('/') {
+ bail!("repo must be <owner>/<name> or gitlawb://<owner>/<name>");
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let (owner, name) = stripped | |
| .trim_end_matches('/') | |
| .split_once('/') | |
| .context("repo must be <owner>/<name> or gitlawb://<owner>/<name>")?; | |
| if owner.is_empty() || name.is_empty() { | |
| bail!("repo must be <owner>/<name> or gitlawb://<owner>/<name>"); | |
| } | |
| Ok(( | |
| format!("gitlawb://{owner}/{name}"), | |
| owner.to_string(), | |
| name.to_string(), | |
| )) | |
| let (owner, name) = stripped | |
| .trim_end_matches('/') | |
| .split_once('/') | |
| .context("repo must be <owner>/<name> or gitlawb://<owner>/<name>")?; | |
| if owner.is_empty() || name.is_empty() { | |
| bail!("repo must be <owner>/<name> or gitlawb://<owner>/<name>"); | |
| } | |
| if name.contains('/') { | |
| bail!("repo must be <owner>/<name> or gitlawb://<owner>/<name>"); | |
| } | |
| Ok(( | |
| format!("gitlawb://{owner}/{name}"), | |
| owner.to_string(), | |
| name.to_string(), | |
| )) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gl/src/clone.rs` around lines 133 - 144, The current parsing uses
split_once('/') which accepts inputs like "owner/name/extra"; update the parsing
logic in clone.rs (the block that computes owner and name from stripped) to
reject any repo string containing more than one slash: after trimming trailing
slashes and calling split_once('/'), validate that neither owner nor name
contains another '/' (or alternately check that the count of '/' is exactly one)
and bail! with the same error message if extra slashes are present so malformed
inputs like "owner/name/extra" fail fast instead of producing an invalid repo
identifier.
Three fixes from the PR Gitlawb#33 review: - withheld_paths now applies the whole-repo "/" read gate (returns repo-not-found when the caller cannot read the root), matching the git read endpoints. Without it the endpoint disclosed a private repo's existence and path layout to unauthorized callers. The withheld_globs doc already assumed this gate existed; now it does. - A nested allow under a denied parent (e.g. "/secret/public/**" allowed, "/secret/**" denied) was over-withheld: the client sparse-excluded the whole parent and hid paths the caller may read. The endpoint now also returns a "reinclude" list (allowed globs strictly under a denied one) and gl clone re-includes them in the sparse spec after the excludes. - Wildcard-free globs like "/docs/private" match both the exact path and a subtree (per glob_matches), but the client only emitted the subtree exclude. sparse_patterns now emits both "/docs/private" and "/docs/private/". Verified the exclude-then-reinclude sparse ordering checks out cleanly with real git, plus unit tests for reincluded_globs, the nested re-include, the exact-path exclude, and sparse_patterns.
Pairs with #28 (subtree content withholding). That PR makes the node withhold private blob bytes from the served pack; the resulting pack is not closed under reachability, so a stock
git cloneis refused at fetch unless the user manually passes--filter. This adds the client-side piece so a non-reader gets a clean checkout with one command.What's here
GET /api/v1/repos/{owner}/{repo}/withheld-paths: returns only the path globs the (optionally authenticated) caller is denied. Not owner-gated and never exposes reader DIDs, unlikelist_visibility. Computed from the existingvisibility_checkvia a new purewithheld_globs.gl clone <gitlawb://owner/name | owner/name> [dir]: asks the node what's withheld for the caller, then clones as a promisor (git clone --filter=blob:none --no-checkout) and sparse-excludes those globs before checkout. Public files land, private paths are absent, the tree entries and SHAs stay visible, no error.A
connect-mode helper can't influence git's own clone logic, so the orchestration lives ingl clonerather than trying to make a baregit clone gitlawb://...work without flags (noted as a follow-up).Notes
withheld-pathsreturns[]andgl clonebehaves like a normal clone. It starts doing real work once feat(node): subtree content withholding (Phase 3) for #18 #28 lands.--filter=blob:noneis the exact client invocation feat(node): subtree content withholding (Phase 3) for #18 #28's node test already proved accepts the withholding pack.--no-conesparse mode is required for the negated!/dir/excludes.Test plan
cargo test -p gl -p gitlawb-node(new:withheld_globs,gl cloneorchestration against afile://bare, repo-arg parsing)/secretmode-B repo,gl cloneit and confirmpublic/present,secret/absent from the worktree, secret path + SHA still ingit ls-tree.Summary by CodeRabbit
New Features
Tests