Skip to content

fix: prevent infinite loop in label propagation community detection#1576

Open
Saltasm wants to merge 1 commit into
getzep:mainfrom
Saltasm:fix/label-propagation-infinite-loop
Open

fix: prevent infinite loop in label propagation community detection#1576
Saltasm wants to merge 1 commit into
getzep:mainfrom
Saltasm:fix/label-propagation-infinite-loop

Conversation

@Saltasm

@Saltasm Saltasm commented Jun 11, 2026

Copy link
Copy Markdown

Summary

label_propagation() in graphiti_core/utils/maintenance/community_operations.py can loop forever, pinning a CPU core and never returning from build_communities().

The algorithm uses synchronous label updates — each round's new_community_map is computed entirely from the previous round's community_map — inside while True with no iteration cap. For any pair of nodes connected by parallel RELATES_TO edges (edge_count > 1, which the projection query produces whenever duplicate edges exist between two entities), the plurality gate candidate_rank > 1 passes for both nodes simultaneously, so A adopts B's label while B adopts A's label, every round, forever. no_change is never true and the loop never exits.

Reproduced live (graphiti-core 0.29.1, same code on main): a 461-entity graph containing duplicate parallel edges caused build_communities() to spin at 100% CPU (pure-Python, no I/O) indefinitely.

Minimal reproduction against the current code:

from graphiti_core.utils.maintenance.community_operations import Neighbor, label_propagation

projection = {
    'a': [Neighbor(node_uuid='b', edge_count=2)],
    'b': [Neighbor(node_uuid='a', edge_count=2)],
}
label_propagation(projection)  # never returns

Fix

  1. Switch to in-place (asynchronous) label updates: nodes visited later in a round see the labels already assigned earlier in the same round. Asynchronous LPA converges on these symmetric configurations — the first node of the pair adopts its neighbor's label, and the neighbor then sees agreement instead of last round's stale label.
  2. Cap the loop at MAX_LABEL_PROPAGATION_ITERATIONS = 100 as a backstop so no configuration can hang the process.

Tie-break semantics are unchanged: plurality by summed edge count, ties broken toward the larger community index, and the existing candidate_rank > 1 gate versus max(candidate, current).

Type of Change

  • Bug fix

Objective

N/A (bug fix).

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • All existing tests pass

New tests/utils/maintenance/test_community_operations.py:

  • Regression: a two-node pair with edge_count=2 must terminate and merge into one community. The test runs label_propagation under a daemon-thread timeout so a regression fails CI instead of hanging it. Verified to fail against the pre-fix implementation and pass with the fix.
  • Convergence/merge behavior for a single-edge pair and a triangle, isolated nodes, an empty projection, and disconnected components (two parallel-edge pairs forming separate communities).

pytest tests/utils/maintenance -k "not _int": 88 passed. ruff and pyright clean on the changed files.

Breaking Changes

  • This PR contains breaking changes

Cluster outputs can differ from the previous implementation in cases where the old code did terminate (any label propagation variant is iteration-order sensitive), but the algorithm, tie-breaks, and output shape are the same.

Checklist

  • Code follows project style guidelines (make lint passes)
  • Self-review completed
  • Documentation updated where necessary (inline comment explains the convergence requirement)
  • No secrets or sensitive information committed

Related Issues

N/A

label_propagation() used synchronous label updates (the next round's
community map was built solely from the previous round's) inside
`while True` with no iteration cap. For any pair of nodes connected by
parallel RELATES_TO edges (edge_count > 1), the plurality gate
(candidate_rank > 1) passes for both nodes, so each adopts the other's
label every round: the labels swap forever, no_change is never true, and
build_communities() spins at 100% CPU without returning. Reproduced live
on a 461-entity graph containing duplicate parallel edges.

Switch to in-place (asynchronous) label updates so nodes visited later
in a round see labels already assigned this round - asynchronous label
propagation converges on such configurations - and cap the loop at 100
iterations as a backstop. Tie-break semantics are unchanged: plurality
by summed edge count, ties to the larger community index, and the
candidate_rank > 1 gate versus max(candidate, current).

Adds unit tests covering the parallel-edge regression (run under a
timeout guard so a regression fails instead of hanging CI), basic
merge/convergence behavior, isolated nodes, and disconnected components.

Co-Authored-By: Claude Fable 5 <[email protected]>
@zep-cla-assistant

zep-cla-assistant Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@Saltasm

Saltasm commented Jun 11, 2026

Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA behalf on myself, e-mail: [email protected]

@Saltasm

Saltasm commented Jun 11, 2026

Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

zep-cla-assistant Bot added a commit that referenced this pull request Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant