feat(node): subtree content withholding (Phase 3) for #18#28
feat(node): subtree content withholding (Phase 3) for #18#28beardthelion wants to merge 12 commits into
Conversation
…al clone upload_pack_excluding emitted a v2 packfile section, but info_refs advertises v0, so real clients negotiated v0 and rejected the response with 'expected ACK/NAK, got packfile'. Frame the v0 stateless-rpc shape instead (NAK, then the pack via side-band-64k when offered). Add an end-to-end test that stands up info_refs + upload_pack_excluding and runs a real git partial clone, asserting the withheld blob's bytes never reach the client while its tree entry and SHA stay visible. A stock full clone cannot consume the pack (it is not closed under reachability, so fetch fails the connectivity check); a partial clone is required.
…tion choice Add a real-git test that partial-clones, pushes a new commit server-side, then fetches: the new object arrives and the withheld blob stays absent. This pins down that ignoring have/want negotiation (always sending a self-contained pack of all refs minus withheld, with NAK) is correct for both clone and fetch; the only cost is a fetch re-sends the full object set. Refactor the real-git tests onto a shared server harness and document the negotiation decision in code and in the plan's follow-ups.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR implements blob-level withholding for Git smart-HTTP ChangesSubtree Blob Withholding on Upload-Pack
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
crates/gitlawb-node/src/api/repos.rs (1)
398-411: ⚡ Quick winConsider fast-path when no subtree rules apply.
withheld_blob_oidsrunsgit ls-tree -rfor every ref on every upload-pack request, even for fully public repos with no visibility rules. This is unnecessary overhead when there are no subtree restrictions.A quick check before the tree walk could skip the work entirely:
+ // Fast path: if the repo is public and has no visibility rules, no blobs are withheld. + let withheld = if record.is_public && rules.is_empty() { + std::collections::HashSet::new() + } else { + visibility_pack::withheld_blob_oids( + &disk_path, + &rules, + record.is_public, + &record.owner_did, + caller, + ) + .map_err(|e| AppError::Git(e.to_string()))? + };Alternatively, push this optimization into
withheld_blob_oidsitself.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/api/repos.rs` around lines 398 - 411, Currently every upload-pack calls visibility_pack::withheld_blob_oids which does expensive git tree walks even when no subtree rules apply; short-circuit this by checking before the call (e.g., if record.is_public is true AND rules is empty/no subtree rules) and skip calling withheld_blob_oids, directly calling smart_http::upload_pack, or alternatively add an early-return fast-path inside withheld_blob_oids itself to return an empty Vec when rules indicate no filtering; update the code paths around withheld_blob_oids and the subsequent conditional that uses withheld.is_empty() (the smart_http::upload_pack vs upload_pack_excluding branch) so behavior is unchanged when filtering is needed.crates/gitlawb-node/src/git/smart_http.rs (1)
128-169: ⚖️ Poor tradeoffBlocking I/O in async context.
build_filtered_packuses synchronousstd::process::Commandbut is called from the asyncupload_pack_excluding. On large repositories,git rev-listandgit pack-objectscan take significant time, blocking the tokio worker thread.Consider wrapping in
tokio::task::spawn_blockingor switching totokio::process::Commandwith async I/O to avoid starving other tasks.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/git/smart_http.rs` around lines 128 - 169, build_filtered_pack performs blocking std::process::Command calls (git rev-list / git pack-objects) but is invoked from async upload_pack_excluding, which can block the tokio runtime; fix by moving the blocking work into a spawn_blocking closure or converting the function to async and using tokio::process::Command with async I/O. Specifically, either (A) leave build_filtered_pack as-is but call it via tokio::task::spawn_blocking from upload_pack_excluding and await the JoinHandle, or (B) change build_filtered_pack to async and replace std::process::Command usages with tokio::process::Command and async stdin/stdout handling so it no longer blocks the runtime; update return signatures and error propagation accordingly while keeping the same behavior for collecting keep oids and returning the packed Vec<u8>.crates/gitlawb-node/src/git/visibility_pack.rs (1)
25-31: ⚡ Quick winConsider adding a timeout to prevent indefinite blocking.
The synchronous
git ls-treecommand has no timeout. On a corrupted or malicious repository, this could block the async runtime thread indefinitely. Since this runs per-ref and is called from thegit_upload_packasync handler, a hung git process would stall the request.Consider wrapping with
std::process::Commandtimeout or switching to asynctokio::process::Commandwithtokio::time::timeout.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/git/visibility_pack.rs` around lines 25 - 31, The git ls-tree invocation (creating `listing` via std::process::Command with `refname` and `repo_path`) can block the async runtime; replace the synchronous call with an async spawn using tokio::process::Command and wrap the await in tokio::time::timeout (e.g., Duration::from_secs(5-10)). Specifically, change the code that builds and awaits `Command::output()` to use tokio::process::Command::new("git")...output().await inside tokio::time::timeout(...). If the timeout elapses, handle it by logging or treating it like a non-successful listing (skip this ref) and propagate/contextualize errors similarly to the existing .context("git ls-tree -r failed") semantics so callers like the async git_upload_pack handler are not blocked indefinitely.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/superpowers/plans/2026-06-05-phase3-subtree-content-withholding.md`:
- Line 712: There is a stray fenced code block marker (```) in the markdown (the
fenced code block delimiter itself) with no language label which triggers
markdownlint MD040; fix it by either removing the stray fence or adding an
appropriate language specifier after the opening backticks (e.g., ```json,
```bash, etc.) so the fenced code block is valid and the linter warning is
resolved.
- Around line 102-108: Update the documented protocol framing from "v2 packfile
section" to the actual v0 framing used by the implementation: change the
description that mentions emitting `packfile\n`, `pkt_line`, and driving `git
upload-pack --stateless-rpc` with `GIT_PROTOCOL=version=2` to instead describe
the v0 sideband framing (sideband-64k with `0x01` for pack data, `0x02` for
progress if present, pack payload beginning with `PACK...`, pkt-line chunking
with `0x01` prefixes, and termination with the `0000` flush), and remove or
correct any references to v2-specific control lines or flags so the text matches
the current implementation's v0 contract.
---
Nitpick comments:
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 398-411: Currently every upload-pack calls
visibility_pack::withheld_blob_oids which does expensive git tree walks even
when no subtree rules apply; short-circuit this by checking before the call
(e.g., if record.is_public is true AND rules is empty/no subtree rules) and skip
calling withheld_blob_oids, directly calling smart_http::upload_pack, or
alternatively add an early-return fast-path inside withheld_blob_oids itself to
return an empty Vec when rules indicate no filtering; update the code paths
around withheld_blob_oids and the subsequent conditional that uses
withheld.is_empty() (the smart_http::upload_pack vs upload_pack_excluding
branch) so behavior is unchanged when filtering is needed.
In `@crates/gitlawb-node/src/git/smart_http.rs`:
- Around line 128-169: build_filtered_pack performs blocking
std::process::Command calls (git rev-list / git pack-objects) but is invoked
from async upload_pack_excluding, which can block the tokio runtime; fix by
moving the blocking work into a spawn_blocking closure or converting the
function to async and using tokio::process::Command with async I/O.
Specifically, either (A) leave build_filtered_pack as-is but call it via
tokio::task::spawn_blocking from upload_pack_excluding and await the JoinHandle,
or (B) change build_filtered_pack to async and replace std::process::Command
usages with tokio::process::Command and async stdin/stdout handling so it no
longer blocks the runtime; update return signatures and error propagation
accordingly while keeping the same behavior for collecting keep oids and
returning the packed Vec<u8>.
In `@crates/gitlawb-node/src/git/visibility_pack.rs`:
- Around line 25-31: The git ls-tree invocation (creating `listing` via
std::process::Command with `refname` and `repo_path`) can block the async
runtime; replace the synchronous call with an async spawn using
tokio::process::Command and wrap the await in tokio::time::timeout (e.g.,
Duration::from_secs(5-10)). Specifically, change the code that builds and awaits
`Command::output()` to use tokio::process::Command::new("git")...output().await
inside tokio::time::timeout(...). If the timeout elapses, handle it by logging
or treating it like a non-successful listing (skip this ref) and
propagate/contextualize errors similarly to the existing .context("git ls-tree
-r failed") semantics so callers like the async git_upload_pack handler are not
blocked indefinitely.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 5a58718e-8f0d-4eb9-ad32-a28944a9f957
📒 Files selected for processing (5)
crates/gitlawb-node/src/api/repos.rscrates/gitlawb-node/src/git/mod.rscrates/gitlawb-node/src/git/smart_http.rscrates/gitlawb-node/src/git/visibility_pack.rsdocs/superpowers/plans/2026-06-05-phase3-subtree-content-withholding.md
Move the two blocking git shell-outs in the filtered upload-pack path off the async worker thread, matching the tokio::process / spawn_blocking usage already in this file: build_filtered_pack (rev-list + pack-objects) and withheld_blob_oids (per-ref ls-tree) now run inside spawn_blocking so a large repo cannot stall the tokio runtime. Behavior is unchanged. Also fix the Task 0 findings block in the Phase 3 plan: it still recorded v2 packfile framing, which is the exact path that failed against a real client and was corrected to v0. The block now documents the shipped v0 contract. Drop a stray trailing code fence flagged by markdownlint (MD040). The speculative ls-tree timeout and the public/no-rules fast-path from the review are intentionally left out: the timeout guards against adversarial repos we do not yet host, and the fast-path is a micro-optimization not worth the extra branch right now.
kevincodex1
left a comment
There was a problem hiding this comment.
please remove the plans under doc. i think its not necessary to be committed
|
|
||
| Run: | ||
| ```bash | ||
| cd "$(mktemp -d)" && export FIX=$PWD |
There was a problem hiding this comment.
can we remove this docs? I think we should not commit this
kevincodex1 asked to keep the superpowers planning docs out of the repo. The Phase 3 plan was scaffolding for this change, not something the project needs to carry. Removing it leaves only the code and tests in the PR.
|
Good catch, removed it. |
Phase 1 (#25) made "private" a property of a path subtree and enforced whole-repo reads. This adds the piece Phase 1 deferred: a mode-B subtree rule now actually withholds that subtree's file content on the git read path, while the directory structure and blob SHAs stay visible.
How it works
withheld_blob_oidswalks each ref's tree and, reusing the existingvisibility_check, returns the blob OIDs a caller may not read. A blob is withheld only if it is denied at every path it appears at, so a blob that is also reachable through an allowed path is still sent.upload_pack_excludingbuilds the response pack from all reachable objects minus those blob OIDs (commits and trees are always included, so SHAs stay intact) and frames it as a protocol v0 upload-pack response, matching whatinfo_refsadvertises.git_upload_packbranches to the filtered serve only when the caller has at least one withheld blob. Public and fully authorized clones take the unchanged fast path and stay byte-identical to before.Client behavior worth knowing
A pack that omits a blob still referenced by a sent tree is not closed under reachability, so a stock full
git cloneis refused at fetch time with "remote did not send all necessary objects". A partial clone (git clone --filter=...) accepts it with the private blob absent, and the tree entry and SHA remain visible. This is a git constraint the server cannot override: the client has to opt into partial-clone semantics. A plaingit cloneworking without--filteris a separate client-side follow-up (git-remote-gitlawb).The security guarantee, that private bytes never enter the served pack, holds for every client and is what the tests assert.
Tests
withheld_blob_oidstruth table (owner / listed reader / non-reader / no rules) against real git repos.build_filtered_packexcludes the withheld OID and keeps the public one.gitpartial clone through the actualinfo_refs+upload_pack_excluding: the withheld blob's bytes never arrive, its tree entry and SHA remain.git fetchafter a partial clone: new objects arrive, the withheld blob stays absent.Full suite green, fmt and clippy clean.
Out of scope (follow-ups)
--filter.info/refsis intentionally not gated on subtree rules (refs expose commit tips only); noted in code.Refs #18.
Summary by CodeRabbit