fix: prevent infinite loop in label propagation community detection#1576
Open
Saltasm wants to merge 1 commit into
Open
fix: prevent infinite loop in label propagation community detection#1576Saltasm wants to merge 1 commit into
Saltasm wants to merge 1 commit into
Conversation
label_propagation() used synchronous label updates (the next round's community map was built solely from the previous round's) inside `while True` with no iteration cap. For any pair of nodes connected by parallel RELATES_TO edges (edge_count > 1), the plurality gate (candidate_rank > 1) passes for both nodes, so each adopts the other's label every round: the labels swap forever, no_change is never true, and build_communities() spins at 100% CPU without returning. Reproduced live on a 461-entity graph containing duplicate parallel edges. Switch to in-place (asynchronous) label updates so nodes visited later in a round see labels already assigned this round - asynchronous label propagation converges on such configurations - and cap the loop at 100 iterations as a backstop. Tie-break semantics are unchanged: plurality by summed edge count, ties to the larger community index, and the candidate_rank > 1 gate versus max(candidate, current). Adds unit tests covering the parallel-edge regression (run under a timeout guard so a regression fails instead of hanging CI), basic merge/convergence behavior, isolated nodes, and disconnected components. Co-Authored-By: Claude Fable 5 <[email protected]>
Contributor
|
All contributors have signed the CLA ✍️ ✅ |
Author
|
I have read the CLA Document and I hereby sign the CLA behalf on myself, e-mail: [email protected] |
Author
|
I have read the CLA Document and I hereby sign the CLA |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
label_propagation()ingraphiti_core/utils/maintenance/community_operations.pycan loop forever, pinning a CPU core and never returning frombuild_communities().The algorithm uses synchronous label updates — each round's
new_community_mapis computed entirely from the previous round'scommunity_map— insidewhile Truewith no iteration cap. For any pair of nodes connected by parallelRELATES_TOedges (edge_count > 1, which the projection query produces whenever duplicate edges exist between two entities), the plurality gatecandidate_rank > 1passes for both nodes simultaneously, so A adopts B's label while B adopts A's label, every round, forever.no_changeis never true and the loop never exits.Reproduced live (graphiti-core 0.29.1, same code on main): a 461-entity graph containing duplicate parallel edges caused
build_communities()to spin at 100% CPU (pure-Python, no I/O) indefinitely.Minimal reproduction against the current code:
Fix
MAX_LABEL_PROPAGATION_ITERATIONS = 100as a backstop so no configuration can hang the process.Tie-break semantics are unchanged: plurality by summed edge count, ties broken toward the larger community index, and the existing
candidate_rank > 1gate versusmax(candidate, current).Type of Change
Objective
N/A (bug fix).
Testing
New
tests/utils/maintenance/test_community_operations.py:edge_count=2must terminate and merge into one community. The test runslabel_propagationunder a daemon-thread timeout so a regression fails CI instead of hanging it. Verified to fail against the pre-fix implementation and pass with the fix.pytest tests/utils/maintenance -k "not _int": 88 passed.ruffandpyrightclean on the changed files.Breaking Changes
Cluster outputs can differ from the previous implementation in cases where the old code did terminate (any label propagation variant is iteration-order sensitive), but the algorithm, tie-breaks, and output shape are the same.
Checklist
make lintpasses)Related Issues
N/A