perf(agtype): arena passthrough + binary-direct id extraction#2424
Open
jrgemignani wants to merge 1 commit intoapache:masterfrom
Open
perf(agtype): arena passthrough + binary-direct id extraction#2424jrgemignani wants to merge 1 commit intoapache:masterfrom
jrgemignani wants to merge 1 commit intoapache:masterfrom
Conversation
Targets the agtype value-tree allocation cost that dominates LDBC IC4 and other sort/comparator-bound queries. Two complementary optimizations (A1 arena, A2 fast-id) plus the sanitizer issues uncovered while verifying the change. A1 — per-tuple agtype arena (Pattern B applied): Add agt_arena_begin / AGT_ARENA_BEGIN_SHARED / agt_arena_reset / agt_arena_end helpers (agtype_util.c, agtype.h). Two usage patterns: disposable per-call arenas for one-shot tree builds, and a long-lived shared arena (file-static, parented under CacheMemoryContext) for hot inner loops that would otherwise pay AllocSetCreate cost per call. Apply to compare_agtype_scalar_containers: when the comparator deserializes VERTEX/EDGE/PATH composites via fill_agtype_value_no_copy, route those allocations into a shared arena and reset (O(1)) at the end of the comparison instead of recursively pfree'ing the tree (O(N)). Master IC4 spent 17.64% of total CPU in pfree_agtype_value_content from this comparator; the arena collapses that walk to one MemoryContextReset. A2 — binary-direct id extraction: When compare_agtype_scalar_containers compares two VERTEX or two EDGE values, only the graphid 'id' determines order (compare_agtype_scalar_values ignores all other fields). Master IC4 spent 73.48% inclusive in ag_deserialize_composite called from this comparator, building the entire agtype_value tree (label, properties, nested values) just to read one int64. Add extract_composite_id_fast: walks the binary VERTEX/EDGE container directly and reads the int64 id at field index 0 (always 'id' due to length-sorted key ordering established by uniqueify_agtype_object — the same invariant PR apache#2302 leveraged for direct array indexing). Wired into compare_agtype_scalar_containers as a fast path before the existing arena+slow path. Falls through on cross-type comparisons (VERTEX vs EDGE), PATH, and malformed extended-type entries. ASAN/UBSan fixes (10 issues found while verifying A1+A2 under -fsanitize=address -fsanitize=undefined; 1 introduced by A2, 9 pre-existing). The 11th finding (uninitialized parent_cpstate fields in cypher_analyze.c) was independently fixed upstream by PR apache#2423; this branch is rebased on top of that fix and no longer carries a duplicate change. - HIGH: heap-buffer-overflow in agtype_raw.c:write_container — VARSIZE included the 4-byte varlena header but copy started past it, reading VARHDRSZ bytes past source allocation. Subtract VARHDRSZ. ASan flagged 4 times in installcheck. - 7 unaligned int64/float8 reads/writes (AGT_HEADER is 4 bytes, leaving payload 4-byte- but not 8-byte-aligned). UB under strict-alignment rules. Replaced typed loads/stores with memcpy in agtype_ext.c (4 sites), agtype_util.c (3 sites incl. extract_composite_id_fast), agtype_raw.c (write_graphid). - agtype.c:agtype_hash_cmp: reading r->val.array.raw_scalar at WAGT_END_ARRAY where the iterator does not populate *val. Replaced with explicit raw_scalar bool stack tracked across BEGIN/END pairs. Hash output unchanged. - agtype_util.c:copy_to_buffer: memcpy(NULL, NULL, 0) is UB. Guard with if (len > 0). Performance (SF3 LDBC, paired same-session vs master): IC sum: 201,930 -> 157,129 ms = -22.19% IC4: 50,570 -> 7,615 ms = -84.94% IC5: 21,779 -> 20,646 ms = -5.20% IS, IU: within noise (<= +/-3% on sub-second queries) Tests: 34/34 installcheck on production PG18.3 34/34 installcheck on ASAN+UBSan PG18.3 (zero unique runtime errors, zero heap-buffer-overflows; the original audit found 10 unique runtime issues plus 1 HBO with 4 instances, all addressed here or upstream) Co-authored-by: Claude <[email protected]> modified: src/backend/utils/adt/agtype.c modified: src/backend/utils/adt/agtype_ext.c modified: src/backend/utils/adt/agtype_raw.c modified: src/backend/utils/adt/agtype_util.c modified: src/include/utils/agtype.h
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Targets the agtype value-tree allocation cost that dominates LDBC IC4 and other sort/comparator-bound queries. Two complementary optimizations (A1 arena, A2 fast-id) plus the sanitizer issues uncovered while verifying the change.
A1 — per-tuple agtype arena (Pattern B applied):
Add agt_arena_begin / AGT_ARENA_BEGIN_SHARED / agt_arena_reset /
agt_arena_end helpers (agtype_util.c, agtype.h). Two usage patterns:
disposable per-call arenas for one-shot tree builds, and a long-lived
shared arena (file-static, parented under CacheMemoryContext) for hot
inner loops that would otherwise pay AllocSetCreate cost per call.
Apply to compare_agtype_scalar_containers: when the comparator
deserializes VERTEX/EDGE/PATH composites via fill_agtype_value_no_copy,
route those allocations into a shared arena and reset (O(1)) at the
end of the comparison instead of recursively pfree'ing the tree (O(N)).
Master IC4 spent 17.64% of total CPU in pfree_agtype_value_content from
this comparator; the arena collapses that walk to one MemoryContextReset.
A2 — binary-direct id extraction:
When compare_agtype_scalar_containers compares two VERTEX or two EDGE
values, only the graphid 'id' determines order (compare_agtype_scalar_values
ignores all other fields). Master IC4 spent 73.48% inclusive in
ag_deserialize_composite called from this comparator, building the entire
agtype_value tree (label, properties, nested values) just to read one int64.
Add extract_composite_id_fast: walks the binary VERTEX/EDGE container
directly and reads the int64 id at field index 0 (always 'id' due to
length-sorted key ordering established by uniqueify_agtype_object — the
same invariant PR #2302 leveraged for direct array indexing).
Wired into compare_agtype_scalar_containers as a fast path before the
existing arena+slow path. Falls through on cross-type comparisons
(VERTEX vs EDGE), PATH, and malformed extended-type entries.
ASAN/UBSan fixes (10 issues found while verifying A1+A2 under -fsanitize=address -fsanitize=undefined; 1 introduced by A2, 9 pre-existing). The 11th finding (uninitialized parent_cpstate fields in cypher_analyze.c) was independently fixed upstream by PR #2423; this branch is rebased on top of that fix and no longer carries a duplicate change.
Performance (SF3 LDBC, paired same-session vs master):
IC sum: 201,930 -> 157,129 ms = -22.19%
IC4: 50,570 -> 7,615 ms = -84.94%
IC5: 21,779 -> 20,646 ms = -5.20%
IS, IU: within noise (<= +/-3% on sub-second queries)
Tests:
34/34 installcheck on production PG18.3
34/34 installcheck on ASAN+UBSan PG18.3 (zero unique runtime errors,
zero heap-buffer-overflows; the original audit found 10 unique
runtime issues plus 1 HBO with 4 instances, all addressed here
or upstream)
Co-authored-by: Claude [email protected]
modified: src/backend/utils/adt/agtype.c
modified: src/backend/utils/adt/agtype_ext.c
modified: src/backend/utils/adt/agtype_raw.c
modified: src/backend/utils/adt/agtype_util.c
modified: src/include/utils/agtype.h