Skip to content
5 changes: 5 additions & 0 deletions doc/whatsnew/4/4.1/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,11 @@
Summary -- Release highlights
=============================

The duplicate-code checker and ``symilar`` received optimizations that
result in considerable performance improvements and memory use reduction
on larger codebases. For example, pandas analysis went from 20 min to
55 s and pylint does not get OOM-killed when analyzing cpython anymore.

The required ``astroid`` version is now 4.1.1. See the
`astroid changelog <https://pylint.readthedocs.io/projects/astroid/en/latest/changelog.html#what-s-new-in-astroid-4-1-0>`_
for additional fixes, features, and performance improvements applicable to pylint.
Expand Down
14 changes: 14 additions & 0 deletions doc/whatsnew/fragments/10881.performance
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
Sped up the ``duplicate-code`` checker. When run inside pylint the
checker now reuses the already-parsed AST instead of re-parsing every
file like it has to do when launched via ``symilar``, and it uses a
rolling hash window with caching across file pairs. Additionally, a
quadratic blow-up in the hash-matching phase is avoided by switching
algorithm at a threshold, which previously caused the checker to hang
on files with many repeated lines.

Speedup scales with codebase size from 1.5x on small projects
(~10k lines), to 20x on large ones (500k+ lines). Memory usage also
drops 12-27%. Codebases that previously hung or were OOM-killed could
now complete.

Refs #10881
Loading
Loading