fix(review-pr): handle large PRs by filtering non-reviewable files by cblecker · Pull Request #47 · cblecker/claude-plugins

cblecker · 2026-06-17T22:41:15Z

Summary

Split Phase 1 into a pipeline: discover filenames, categorize into vendor/generated/reviewable, then fetch merge-base patches only for reviewable files — prevents token limit failures on large PRs
Add git fetch to ensure merge-base matches GitHub's PR diff; narrow allowed-tools to read-only git and jq commands
Restore file status metadata in review context (gracefully optional)
Broaden generated-file patterns (*_generated.go, *.pb.go, *_string.go, bindata.go) and short-circuit when all files are excluded

Test plan

Run /review on a small PR (<10 files) — verify status shows in file summary and diff headings
Run /review on a large PR with vendor files — verify vendor files are excluded and excludedFileSummary appears in agent context
Run /review on a PR where local HEAD matches — verify isLocal path uses git diff with merge-base
Run /review on a PR where local HEAD does not match — verify not-isLocal path uses get_files + jq

Summary by CodeRabbit

Chores
- Version bumped to 1.1.1
Documentation
- Enhanced PR review skill instructions with detailed multi-phase review workflow
- Added support for local and non-local PR review scenarios with improved file categorization and deduplication of existing review comments

coderabbitai · 2026-06-17T22:41:27Z

Warning

Review limit reached

@cblecker, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 41 minutes and 36 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7c73e7ed-cfde-438a-9e64-7e70ff725a8e

📥 Commits

Reviewing files that changed from the base of the PR and between d8c558b and cfe127d.

📒 Files selected for processing (3)

pr-review-toolkit/.claude-plugin/plugin.json
pr-review-toolkit/skills/review-pr/SKILL.md
pr-review-toolkit/skills/review-pr/review-pr.js

📝 Walkthrough

Walkthrough

The review-pr skill is expanded into a multi-phase procedure that detects whether the PR is checked out locally, collects and categorizes changed files (vendor/generated/reviewable), extracts merge-base patches via git diff or jq-filtered get_files responses, deduplicates findings against existing review comments, and passes a richer argument set to review-pr.js. The JS context builder is updated to match the new file shape and inject excluded-file notes. The plugin manifest is bumped to 1.1.1.

Changes

review-pr Skill Overhaul

Layer / File(s)	Summary
Phase 1: isLocal detection, file collection, and categorization `pr-review-toolkit/skills/review-pr/SKILL.md`	`allowed-tools` gains `git fetch`, `git diff`, and `jq`. Phase 1 parses the PR URL, compares `head.sha` with `git rev-parse HEAD` to set `isLocal`, collects changed filenames via local `git diff --name-only` or paginated `get_files`, and categorizes files into vendor/generated/reviewable while building `excludedFileSummary`.
Phase 1: merge-base patch extraction `pr-review-toolkit/skills/review-pr/SKILL.md`	Adds patch extraction: locally a single `git diff` over all reviewable files parsed into `{ filename, patch, status }`; non-locally a dynamically constructed `jq` filter over saved `get_files` pages producing the same shape.
Phase 2 args and Phase 3 deduplication `pr-review-toolkit/skills/review-pr/SKILL.md`	Phase 2 Workflow invocation now passes `headSha`, `isLocal`, `changedFiles` (reviewable-only with patches), and `excludedFileSummary`. Phase 3 fetches existing review comments and deduplicates findings by file/approx-line/description similarity before user selection.
`buildContext()` updates `pr-review-toolkit/skills/review-pr/review-pr.js`	`fileSummary` drops additions/deletions counts; `diffSection` header is reshaped to include optional status; `excludedNote` is conditionally built from `config.excludedFileSummary` and injected into the review context output.
Plugin version bump `pr-review-toolkit/.claude-plugin/plugin.json`	`version` incremented from `1.1.0` to `1.1.1`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

cblecker/claude-plugins#45: Introduced the workflow-based pr-review-toolkit plugin; this PR's changes to SKILL.md and review-pr.js are direct follow-on modifications to that foundation.
cblecker/claude-plugins#46: Also modifies buildContext() in review-pr.js for the "Changed files"/diff review context representation, overlapping at the same code path changed here.

Poem

🐇 Hoppity-hop through the diff we go,
Local or remote, the patches flow,
Vendor files? Excluded with care,
Deduplication handled with flair,
A version bumped, the skill complete—
This bunny's review is oh so neat! 🌿

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately summarizes the main objective: implementing file filtering to handle large PRs by excluding non-reviewable files, which directly aligns with the core changes across all modified files.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/pr-review-large-prs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

pr-review-toolkit/skills/review-pr/SKILL.md (2)
104-113: ⚡ Quick win

Clarify the dynamic jq filter construction for large reviewable sets.

Line 112 states "For large reviewable sets, build the jq filter from the reviewable filename list," but the mechanism is unspecified. For a reviewer or maintainer implementing this step, clarify the exact approach:

How is the list of filenames converted into a jq filter condition? (e.g., concatenate into a chain of or .filename == "...", use an array input, build a regex?)

Should the filter construction avoid exceeding any command-line length limits?

While this may be acceptable as an agent-level implementation detail, explicit guidance here would improve consistency and debuggability.
💡 Example clarification for jq filter construction
For large reviewable sets, construct a jq filter by:
1. Joining reviewable filenames with `" or "` and `.filename == "..."` 
   e.g., `.filename == "file1" or .filename == "file2" or ...`
2. Wrapping the filter as: `jq '[.[] | select(<filter>) | {filename, patch, status}]' <saved-response-file>`
3. If the filter exceeds command-line limits, process pages sequentially 
   or split the reviewable set into batches.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pr-review-toolkit/skills/review-pr/SKILL.md` around lines 104 - 113, The
documentation in Step 3 of the SKILL.md file (the section describing jq filter
construction for extracting file data) lacks concrete implementation details for
handling large reviewable sets. Add explicit guidance explaining how to build
the dynamic jq filter by constructing a select() condition that joins the
reviewable filenames with "or .filename ==" operators. Additionally, include
guidance on handling potential command-line length limits by either processing
the reviewable set in batches or handling pages sequentially. This will clarify
the mechanism referenced in the final sentence and provide clear, actionable
steps for maintainers implementing this review workflow.
132-137: ⚡ Quick win

Define quantitative thresholds for deduplication matching.

Phase 3 deduplication (lines 132–137) references matching by file, "approximate line," and "similar description," but these criteria lack precise definitions:

What constitutes an "approximate line" match? (e.g., ±5 lines, same line number only, ignore line numbers entirely?)

What metric defines "similar description"? (e.g., string similarity threshold, keyword matching, semantic distance?)

Without explicit thresholds, different implementations of this step may deduplicate inconsistently, potentially missing duplicates or false-matching unrelated comments.
💡 Example deduplication criteria definition
### Deduplication criteria
Two findings are considered duplicates if:
1. **File match**: Both reference the same file
2. **Line match**: Line numbers are within ±5 lines (or exact match preferred)
3. **Description match**: Description text shares > 70% character-level similarity 
   or contains the same capitalized keywords (case-insensitive match on 5+ char words)

Apply filters in order; all three must match to drop a finding as duplicate.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pr-review-toolkit/skills/review-pr/SKILL.md` around lines 132 - 137, The
Phase 3 deduplication section lacks precise definitions for "approximate line"
and "similar description" matching criteria. Define concrete quantitative
thresholds for each deduplication component: specify the line number tolerance
(e.g., ±5 lines or exact match), provide a measurable similarity metric for
descriptions (e.g., character-level similarity percentage or keyword matching
rules), and clearly state that all three criteria (file, line, description) must
match before a finding is dropped as a duplicate. Replace the vague language in
the get_review_comments and get_reviews sections with these explicit thresholds.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pr-review-toolkit/skills/review-pr/SKILL.md`:
- Around line 104-113: The documentation in Step 3 of the SKILL.md file (the
section describing jq filter construction for extracting file data) lacks
concrete implementation details for handling large reviewable sets. Add explicit
guidance explaining how to build the dynamic jq filter by constructing a
select() condition that joins the reviewable filenames with "or .filename =="
operators. Additionally, include guidance on handling potential command-line
length limits by either processing the reviewable set in batches or handling
pages sequentially. This will clarify the mechanism referenced in the final
sentence and provide clear, actionable steps for maintainers implementing this
review workflow.
- Around line 132-137: The Phase 3 deduplication section lacks precise
definitions for "approximate line" and "similar description" matching criteria.
Define concrete quantitative thresholds for each deduplication component:
specify the line number tolerance (e.g., ±5 lines or exact match), provide a
measurable similarity metric for descriptions (e.g., character-level similarity
percentage or keyword matching rules), and clearly state that all three criteria
(file, line, description) must match before a finding is dropped as a duplicate.
Replace the vague language in the get_review_comments and get_reviews sections
with these explicit thresholds.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 153c4568-3bfe-4dae-9a80-040af5046c9d

📥 Commits

Reviewing files that changed from the base of the PR and between 9543da2 and d8c558b.

📒 Files selected for processing (3)

pr-review-toolkit/.claude-plugin/plugin.json
pr-review-toolkit/skills/review-pr/SKILL.md
pr-review-toolkit/skills/review-pr/review-pr.js

…fore fetching patches The review toolkit failed on PRs with hundreds of files because it tried to pass full patch content for every file through the workflow args, exceeding token limits. Split Phase 1 into a pipeline: discover filenames, categorize into vendor/generated/reviewable, then fetch merge-base patches only for reviewable files. This keeps authoritative diffs in the agent context (preventing hallucination) while dropping the volume that caused failures. Other changes: - Add git fetch to ensure merge-base matches GitHub's PR diff - Add excludedFileSummary so agents know what was filtered - Narrow allowed-tools to read-only git and jq commands - Forbid python for JSON extraction Assisted-by: Claude:claude-opus-4-6

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

cblecker force-pushed the fix/pr-review-large-prs branch from d8c558b to bfe7e77 Compare June 17, 2026 22:49

cblecker force-pushed the fix/pr-review-large-prs branch from bfe7e77 to cfe127d Compare June 17, 2026 22:58

cblecker merged commit baaf598 into main Jun 17, 2026
10 checks passed

cblecker deleted the fix/pr-review-large-prs branch June 17, 2026 22:58

coderabbitai Bot mentioned this pull request Jun 17, 2026

refactor(review-pr): delegate diff collection to workflow agents #49

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(review-pr): handle large PRs by filtering non-reviewable files#47

fix(review-pr): handle large PRs by filtering non-reviewable files#47
cblecker merged 1 commit into
mainfrom
fix/pr-review-large-prs

cblecker commented Jun 17, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cblecker commented Jun 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cblecker commented Jun 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading