Skip to content

Attribute duplicate-code messages to involved modules, not the last-checked one#10880

Draft
Pierre-Sassoulas wants to merge 5 commits intomainfrom
fix-2368
Draft

Attribute duplicate-code messages to involved modules, not the last-checked one#10880
Pierre-Sassoulas wants to merge 5 commits intomainfrom
fix-2368

Conversation

@Pierre-Sassoulas
Copy link
Copy Markdown
Member

Type of Changes

Type
🐛 Bug fix

Description

R0801 messages emitted in SimilaritiesChecker.close() were attributed
to whichever module happened to be checked last, because add_message()
without a node falls back to linter.current_name. Now the checker
saves a module to filepath mapping during process_module() and sets the
correct module context before each add_message() call.

Closes #2368

Co-Authored-By: Claude Opus 4.6 [email protected]

@Pierre-Sassoulas Pierre-Sassoulas added this to the 4.0.6 milestone Mar 2, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.04%. Comparing base (71caace) to head (8a02431).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main   #10880   +/-   ##
=======================================
  Coverage   96.04%   96.04%           
=======================================
  Files         177      177           
  Lines       19625    19633    +8     
=======================================
+ Hits        18848    18856    +8     
  Misses        777      777           
Files with missing lines Coverage Δ
pylint/checkers/base_checker.py 95.00% <ø> (ø)
pylint/checkers/symilar.py 96.33% <100.00%> (+0.04%) ⬆️
pylint/lint/pylinter.py 96.33% <100.00%> (+0.02%) ⬆️
pylint/testutils/unittest_linter.py 100.00% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 2, 2026

🤖 According to the primer, this change has no effect on the checked open source code. 🤖🎉

This comment was generated for commit 6d8df7e

Comment thread pylint/checkers/symilar.py Outdated
total = sum(len(lineset) for lineset in self.linesets)
duplicated = 0
stats = self.linter.stats
original_name = self.linter.current_name
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be current_ instead of original_?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the name of the first file, current would make more sense if we were taking the file in the for loop that follows, no ?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the first file? I thought it was the last file to be linted and therefore the "current" file according to the PyLinter class


# Attribute the message to the first involved module rather than
# the last-checked module which may be unrelated (see #2368).
first_module = min(c[0].name for c in couples)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the min do here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure we're getting the module with the lower alphabetical value, so the result is deterministic. We're only raising one message per couples.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add this in the comment?

# Attribute the message to the first involved module rather than
# the last-checked module which may be unrelated (see #2368).

Does not really convey this to me

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread pylint/checkers/symilar.py Outdated
first_module = min(c[0].name for c in couples)
self.linter.current_name = first_module
self.linter.current_file = self._module_filepaths.get(first_module)
self.add_message("R0801", args=(len(couples), "\n".join(msg)))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really feels like we are misusing the linter here. Should emitting a message in a different module than the current one be an API we expose?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a duplicate code to raise in another file, we have cyclic-import that behave like this too. I'm not sure there's any other way

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might still be good enough reason to make this an actual (internal only) API rather than "patching" the current file like this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding optional params to add_message as in self.add_message("R0801", args=(...), module="foo", filepath="foo.py") ? Not sure if we need to be protective of this API, this is a "normal use case", being able to lint multiple files is a distinguishing factor of pylint.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes a lot of sense to me!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ddi it in #10894, we won't be able to backport this if we need a new API for a clean fix.

Refs #10880 — allows checkers to override the reported module name
and file path in message location, instead of relying on the current
file context or node.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Pierre-Sassoulas and others added 4 commits March 7, 2026 20:36
…hecked one

R0801 messages emitted in SimilaritiesChecker.close() were attributed
to whichever module happened to be checked last, because add_message()
without a node falls back to linter.current_name. Now the checker
saves a module→filepath mapping during process_module() and sets the
correct module context before each add_message() call.

Closes #2368

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@Pierre-Sassoulas Pierre-Sassoulas marked this pull request as draft March 7, 2026 19:42
@Pierre-Sassoulas Pierre-Sassoulas self-assigned this Mar 7, 2026
Pierre-Sassoulas added a commit that referenced this pull request Mar 7, 2026
Refs #10880 — allows checkers to override the reported module name
and file path in message location, instead of relying on the current
file context or node.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Pierre-Sassoulas added a commit that referenced this pull request Mar 7, 2026
Covers the new `module` and `filepath` parameters that allow
overriding the reported message location.

Refs #10880

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug 🪲 duplicate-code Related to code duplication checker

Projects

None yet

Development

Successfully merging this pull request may close these issues.

duplicate-code are always counted on the last module checked

2 participants