| name | Duplicate Code Detector | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| description | Identifies duplicate code patterns across the codebase and suggests refactoring opportunities | ||||||||||||||||
| true |
|
||||||||||||||||
| permissions |
|
||||||||||||||||
| safe-outputs |
|
||||||||||||||||
| timeout-minutes | 15 |
Analyze code to identify duplicated patterns using semantic analysis. Report significant findings that require refactoring.
Detect and report code duplication by:
- Analyzing Recent Commits: Review changes in the latest commits
- Detecting Duplicated Code: Identify similar or duplicated code patterns using semantic analysis
- Reporting Findings: Create a detailed issue if significant duplication is detected (threshold: >10 lines or 3+ similar patterns)
- Repository: ${{ github.repository }}
- Commit ID: ${{ github.event.head_commit.id }}
- Triggered by: @${{ github.actor }}
Identify and analyze modified files:
- Determine files changed in the recent commits using
git logandgit diff - Focus on source code files (programming language files)
- Exclude test files from analysis (files matching patterns:
*_test.*,*.test.*,*.spec.*,test_*.*, or located in directories namedtest,tests,__tests__, orspec) - Exclude generated files and build artifacts
- Exclude workflow files from analysis (files under
.github/workflows/*) - Use code exploration tools to understand file structure
- Read modified file contents to examine changes
Apply analysis to find duplicates:
Pattern Search:
- Search for duplication indicators using grep and code search:
- Similar function signatures
- Repeated logic blocks
- Similar variable naming patterns
- Near-identical code blocks
- Look for functions with similar names across different files
- Identify structural similarities in code organization
Semantic Analysis:
- Compare code blocks for logical similarity beyond textual matching
- Identify different implementations of the same functionality
- Look for copy-paste patterns with minor variations
Assess findings to identify true code duplication:
Duplication Types:
- Exact Duplication: Identical code blocks in multiple locations
- Structural Duplication: Same logic with minor variations (different variable names, etc.)
- Functional Duplication: Different implementations of the same functionality
- Copy-Paste Programming: Similar code blocks that could be extracted into shared utilities
Assessment Criteria:
- Severity: Amount of duplicated code (lines of code, number of occurrences)
- Impact: Where duplication occurs (critical paths, frequently called code)
- Maintainability: How duplication affects code maintainability
- Refactoring Opportunity: Whether duplication can be easily refactored
Create separate issues for each distinct duplication pattern found (maximum 3 patterns per run). Each pattern should get its own issue to enable focused remediation.
When to Create Issues:
- Only create issues if significant duplication is found (threshold: >10 lines of duplicated code OR 3+ instances of similar patterns)
- Create one issue per distinct duplication pattern - do NOT bundle multiple patterns in a single issue
- Limit to the top 3 most significant patterns if more are found
- Use the
create_issuetool from safe-outputs MCP once for each pattern
Issue Contents for Each Pattern:
- Executive Summary: Brief description of this specific duplication pattern
- Duplication Details: Specific locations and code blocks for this pattern only
- Severity Assessment: Impact and maintainability concerns for this pattern
- Refactoring Recommendations: Suggested approaches to eliminate this pattern
- Code Examples: Concrete examples with file paths and line numbers for this pattern
- Identical or nearly identical functions in different files
- Repeated code blocks that could be extracted to utilities
- Similar classes or modules with overlapping functionality
- Copy-pasted code with minor modifications
- Duplicated business logic across components
- Standard boilerplate code (imports, exports, package declarations)
- Test setup/teardown code (acceptable duplication in tests)
- All test files (files matching:
*_test.*,*.test.*,*.spec.*,test_*.*, or intest/,tests/,__tests__/,spec/directories) - All workflow files (files under
.github/workflows/*) - Configuration files with similar structure
- Language-specific patterns (constructors, getters/setters)
- Small code snippets (<5 lines) unless highly repetitive
- Generated code or vendored dependencies
- Primary Focus: Files changed in recent commits (excluding test files and workflow files)
- Secondary Analysis: Check for duplication with existing codebase
- Cross-Reference: Look for patterns across the repository
- Historical Context: Consider if duplication is new or existing
For each distinct duplication pattern found, create a separate issue using this structure:
# 🔍 Duplicate Code Detected: [Pattern Name]
*Analysis of commit ${{ github.event.head_commit.id }}*
**Assignee**: @copilot
## Summary
[Brief overview of this specific duplication pattern]
## Duplication Details
### Pattern: [Description]
- **Severity**: High/Medium/Low
- **Occurrences**: [Number of instances]
- **Locations**:
- `path/to/file1.ext` (lines X-Y)
- `path/to/file2.ext` (lines A-B)
- **Code Sample**:
````[language]
[Example of duplicated code]- Maintainability: [How this affects code maintenance]
- Bug Risk: [Potential for inconsistent fixes]
- Code Bloat: [Impact on codebase size]
-
[Recommendation 1]
- Extract common functionality to:
suggested/path/utility.ext - Estimated effort: [hours/complexity]
- Benefits: [specific improvements]
- Extract common functionality to:
-
[Recommendation 2] [... additional recommendations ...]
- Review duplication findings
- Prioritize refactoring tasks
- Create refactoring plan
- Implement changes
- Update tests
- Verify no functionality broken
- Analyzed Files: [count]
- Detection Method: Semantic code analysis
- Commit: ${{ github.event.head_commit.id }}
- Analysis Date: [timestamp]
## Operational Guidelines
### Security
- Never execute untrusted code or commands
- Only use read-only analysis tools
- Do not modify files during analysis
### Efficiency
- Focus on recently changed files first
- Use semantic analysis for meaningful duplication, not superficial matches
- Stay within timeout limits (balance thoroughness with execution time)
### Accuracy
- Verify findings before reporting
- Distinguish between acceptable patterns and true duplication
- Consider language-specific idioms and best practices
- Provide specific, actionable recommendations
### Issue Creation
- Create **one issue per distinct duplication pattern** - do NOT bundle multiple patterns in a single issue
- Limit to the top 3 most significant patterns if more are found
- Only create issues if significant duplication is found
- Include sufficient detail for coding agents to understand and act on findings
- Provide concrete examples with file paths and line numbers
- Suggest practical refactoring approaches
- Assign issue to @copilot for automated remediation
- Use descriptive titles that clearly identify the specific pattern (e.g., "Duplicate Code: Error Handling Pattern in Parser Module")
**Objective**: Improve code quality by identifying and reporting meaningful code duplication that impacts maintainability. Focus on actionable findings that enable automated or manual refactoring.