Skip to content

fix(review-pr): handle large PRs by filtering non-reviewable files#47

Merged
cblecker merged 1 commit into
mainfrom
fix/pr-review-large-prs
Jun 17, 2026
Merged

fix(review-pr): handle large PRs by filtering non-reviewable files#47
cblecker merged 1 commit into
mainfrom
fix/pr-review-large-prs

Conversation

@cblecker

@cblecker cblecker commented Jun 17, 2026

Copy link
Copy Markdown
Owner

Summary

  • Split Phase 1 into a pipeline: discover filenames, categorize into vendor/generated/reviewable, then fetch merge-base patches only for reviewable files — prevents token limit failures on large PRs
  • Add git fetch to ensure merge-base matches GitHub's PR diff; narrow allowed-tools to read-only git and jq commands
  • Restore file status metadata in review context (gracefully optional)
  • Broaden generated-file patterns (*_generated.go, *.pb.go, *_string.go, bindata.go) and short-circuit when all files are excluded

Test plan

  • Run /review on a small PR (<10 files) — verify status shows in file summary and diff headings
  • Run /review on a large PR with vendor files — verify vendor files are excluded and excludedFileSummary appears in agent context
  • Run /review on a PR where local HEAD matches — verify isLocal path uses git diff with merge-base
  • Run /review on a PR where local HEAD does not match — verify not-isLocal path uses get_files + jq

Summary by CodeRabbit

  • Chores

    • Version bumped to 1.1.1
  • Documentation

    • Enhanced PR review skill instructions with detailed multi-phase review workflow
    • Added support for local and non-local PR review scenarios with improved file categorization and deduplication of existing review comments

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@cblecker, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 41 minutes and 36 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7c73e7ed-cfde-438a-9e64-7e70ff725a8e

📥 Commits

Reviewing files that changed from the base of the PR and between d8c558b and cfe127d.

📒 Files selected for processing (3)
  • pr-review-toolkit/.claude-plugin/plugin.json
  • pr-review-toolkit/skills/review-pr/SKILL.md
  • pr-review-toolkit/skills/review-pr/review-pr.js
📝 Walkthrough

Walkthrough

The review-pr skill is expanded into a multi-phase procedure that detects whether the PR is checked out locally, collects and categorizes changed files (vendor/generated/reviewable), extracts merge-base patches via git diff or jq-filtered get_files responses, deduplicates findings against existing review comments, and passes a richer argument set to review-pr.js. The JS context builder is updated to match the new file shape and inject excluded-file notes. The plugin manifest is bumped to 1.1.1.

Changes

review-pr Skill Overhaul

Layer / File(s) Summary
Phase 1: isLocal detection, file collection, and categorization
pr-review-toolkit/skills/review-pr/SKILL.md
allowed-tools gains git fetch, git diff, and jq. Phase 1 parses the PR URL, compares head.sha with git rev-parse HEAD to set isLocal, collects changed filenames via local git diff --name-only or paginated get_files, and categorizes files into vendor/generated/reviewable while building excludedFileSummary.
Phase 1: merge-base patch extraction
pr-review-toolkit/skills/review-pr/SKILL.md
Adds patch extraction: locally a single git diff over all reviewable files parsed into { filename, patch, status }; non-locally a dynamically constructed jq filter over saved get_files pages producing the same shape.
Phase 2 args and Phase 3 deduplication
pr-review-toolkit/skills/review-pr/SKILL.md
Phase 2 Workflow invocation now passes headSha, isLocal, changedFiles (reviewable-only with patches), and excludedFileSummary. Phase 3 fetches existing review comments and deduplicates findings by file/approx-line/description similarity before user selection.
buildContext() updates
pr-review-toolkit/skills/review-pr/review-pr.js
fileSummary drops additions/deletions counts; diffSection header is reshaped to include optional status; excludedNote is conditionally built from config.excludedFileSummary and injected into the review context output.
Plugin version bump
pr-review-toolkit/.claude-plugin/plugin.json
version incremented from 1.1.0 to 1.1.1.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • cblecker/claude-plugins#45: Introduced the workflow-based pr-review-toolkit plugin; this PR's changes to SKILL.md and review-pr.js are direct follow-on modifications to that foundation.
  • cblecker/claude-plugins#46: Also modifies buildContext() in review-pr.js for the "Changed files"/diff review context representation, overlapping at the same code path changed here.

Poem

🐇 Hoppity-hop through the diff we go,
Local or remote, the patches flow,
Vendor files? Excluded with care,
Deduplication handled with flair,
A version bumped, the skill complete—
This bunny's review is oh so neat! 🌿

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately summarizes the main objective: implementing file filtering to handle large PRs by excluding non-reviewable files, which directly aligns with the core changes across all modified files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/pr-review-large-prs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pr-review-toolkit/skills/review-pr/SKILL.md (2)

104-113: ⚡ Quick win

Clarify the dynamic jq filter construction for large reviewable sets.

Line 112 states "For large reviewable sets, build the jq filter from the reviewable filename list," but the mechanism is unspecified. For a reviewer or maintainer implementing this step, clarify the exact approach:

  • How is the list of filenames converted into a jq filter condition? (e.g., concatenate into a chain of or .filename == "...", use an array input, build a regex?)
  • Should the filter construction avoid exceeding any command-line length limits?

While this may be acceptable as an agent-level implementation detail, explicit guidance here would improve consistency and debuggability.

💡 Example clarification for jq filter construction
For large reviewable sets, construct a jq filter by:
1. Joining reviewable filenames with `" or "` and `.filename == "..."` 
   e.g., `.filename == "file1" or .filename == "file2" or ...`
2. Wrapping the filter as: `jq '[.[] | select(<filter>) | {filename, patch, status}]' <saved-response-file>`
3. If the filter exceeds command-line limits, process pages sequentially 
   or split the reviewable set into batches.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pr-review-toolkit/skills/review-pr/SKILL.md` around lines 104 - 113, The
documentation in Step 3 of the SKILL.md file (the section describing jq filter
construction for extracting file data) lacks concrete implementation details for
handling large reviewable sets. Add explicit guidance explaining how to build
the dynamic jq filter by constructing a select() condition that joins the
reviewable filenames with "or .filename ==" operators. Additionally, include
guidance on handling potential command-line length limits by either processing
the reviewable set in batches or handling pages sequentially. This will clarify
the mechanism referenced in the final sentence and provide clear, actionable
steps for maintainers implementing this review workflow.

132-137: ⚡ Quick win

Define quantitative thresholds for deduplication matching.

Phase 3 deduplication (lines 132–137) references matching by file, "approximate line," and "similar description," but these criteria lack precise definitions:

  • What constitutes an "approximate line" match? (e.g., ±5 lines, same line number only, ignore line numbers entirely?)
  • What metric defines "similar description"? (e.g., string similarity threshold, keyword matching, semantic distance?)

Without explicit thresholds, different implementations of this step may deduplicate inconsistently, potentially missing duplicates or false-matching unrelated comments.

💡 Example deduplication criteria definition
### Deduplication criteria
Two findings are considered duplicates if:
1. **File match**: Both reference the same file
2. **Line match**: Line numbers are within ±5 lines (or exact match preferred)
3. **Description match**: Description text shares > 70% character-level similarity 
   or contains the same capitalized keywords (case-insensitive match on 5+ char words)

Apply filters in order; all three must match to drop a finding as duplicate.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pr-review-toolkit/skills/review-pr/SKILL.md` around lines 132 - 137, The
Phase 3 deduplication section lacks precise definitions for "approximate line"
and "similar description" matching criteria. Define concrete quantitative
thresholds for each deduplication component: specify the line number tolerance
(e.g., ±5 lines or exact match), provide a measurable similarity metric for
descriptions (e.g., character-level similarity percentage or keyword matching
rules), and clearly state that all three criteria (file, line, description) must
match before a finding is dropped as a duplicate. Replace the vague language in
the get_review_comments and get_reviews sections with these explicit thresholds.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pr-review-toolkit/skills/review-pr/SKILL.md`:
- Around line 104-113: The documentation in Step 3 of the SKILL.md file (the
section describing jq filter construction for extracting file data) lacks
concrete implementation details for handling large reviewable sets. Add explicit
guidance explaining how to build the dynamic jq filter by constructing a
select() condition that joins the reviewable filenames with "or .filename =="
operators. Additionally, include guidance on handling potential command-line
length limits by either processing the reviewable set in batches or handling
pages sequentially. This will clarify the mechanism referenced in the final
sentence and provide clear, actionable steps for maintainers implementing this
review workflow.
- Around line 132-137: The Phase 3 deduplication section lacks precise
definitions for "approximate line" and "similar description" matching criteria.
Define concrete quantitative thresholds for each deduplication component:
specify the line number tolerance (e.g., ±5 lines or exact match), provide a
measurable similarity metric for descriptions (e.g., character-level similarity
percentage or keyword matching rules), and clearly state that all three criteria
(file, line, description) must match before a finding is dropped as a duplicate.
Replace the vague language in the get_review_comments and get_reviews sections
with these explicit thresholds.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 153c4568-3bfe-4dae-9a80-040af5046c9d

📥 Commits

Reviewing files that changed from the base of the PR and between 9543da2 and d8c558b.

📒 Files selected for processing (3)
  • pr-review-toolkit/.claude-plugin/plugin.json
  • pr-review-toolkit/skills/review-pr/SKILL.md
  • pr-review-toolkit/skills/review-pr/review-pr.js

@cblecker cblecker force-pushed the fix/pr-review-large-prs branch from d8c558b to bfe7e77 Compare June 17, 2026 22:49
…fore fetching patches

The review toolkit failed on PRs with hundreds of files because it
tried to pass full patch content for every file through the workflow
args, exceeding token limits.

Split Phase 1 into a pipeline: discover filenames, categorize into
vendor/generated/reviewable, then fetch merge-base patches only for
reviewable files. This keeps authoritative diffs in the agent context
(preventing hallucination) while dropping the volume that caused
failures.

Other changes:
- Add git fetch to ensure merge-base matches GitHub's PR diff
- Add excludedFileSummary so agents know what was filtered
- Narrow allowed-tools to read-only git and jq commands
- Forbid python for JSON extraction

Assisted-by: Claude:claude-opus-4-6
@cblecker cblecker force-pushed the fix/pr-review-large-prs branch from bfe7e77 to cfe127d Compare June 17, 2026 22:58
@cblecker cblecker merged commit baaf598 into main Jun 17, 2026
10 checks passed
@cblecker cblecker deleted the fix/pr-review-large-prs branch June 17, 2026 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant