fix(search): parenthesize group_id OR-chain in multi-group fulltext queries#1581
Open
Saltasm wants to merge 1 commit into
Open
fix(search): parenthesize group_id OR-chain in multi-group fulltext queries#1581Saltasm wants to merge 1 commit into
Saltasm wants to merge 1 commit into
Conversation
…ueries
The Neo4j fulltext query builders join group filters with OR, then append the
query terms with AND, without parenthesizing the OR-chain:
group_id:"a" OR group_id:"b" OR group_id:"c" AND (terms)
Lucene's classic query parser treats AND as binding the immediately adjacent
clauses, so this parses as:
group_id:"a" SHOULD
group_id:"b" SHOULD
group_id:"c" MUST <- promoted by the following AND
(terms) MUST
A BooleanQuery with any MUST clause requires only the MUST clauses, so the query
collapses to (group_id:"c" AND terms): groups a and b become scoring-only and the
terms are effectively scoped to just the last group. A search over multiple groups
silently returns matches from only the last group_id passed (and zero rows when
that last group is empty), while the semantic/cosine leg masks it.
Fix: wrap the OR-chain in parens so the terms apply across all groups:
(group_id:"a" OR group_id:"b" OR group_id:"c") AND (terms)
Applied to both Neo4j builders that have the issue:
- graphiti_core/search/search_utils.py (fulltext_query, neo4j branch)
- graphiti_core/driver/neo4j/operations/search_ops.py (_build_neo4j_fulltext_query)
Single-group queries are unaffected (a AND terms parses correctly); FalkorDB and
Kuzu builders already parenthesize / use different syntax.
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The Neo4j fulltext query builders join group filters with
OR, then append the query terms withAND, without parenthesizing the OR-chain:Lucene's classic query parser treats
ANDas promoting its adjacent clauses toMUST, so this parses as:group_id:"a"group_id:"b"group_id:"c"AND)(terms)A
BooleanQuerywith anyMUSTclause requires only theMUSTclauses, so this effectively becomes(group_id:"c" AND terms): the leading groups become scoring-only and the query terms are scoped to just the lastgroup_id. A multi-group fulltext search therefore silently returns matches from only the last group passed (and zero rows when that group is empty), while the semantic leg of hybrid search masks the regression.Fix
Wrap the OR-chain in parens so the terms apply across all groups:
Applied to both Neo4j builders that have the issue:
graphiti_core/search/search_utils.py—fulltext_query(neo4j branch)graphiti_core/driver/neo4j/operations/search_ops.py—_build_neo4j_fulltext_querySingle-group queries are unaffected (
group_id:"a" AND (terms)already parses correctly). FalkorDB and Kuzu builders are unaffected — they already parenthesize / use different syntax.Verification
With the parenthesized query, an identical multi-group search returns the same cross-group result set regardless of the order group_ids are passed; without it, results collapse to whichever group_id is passed last.