Skip to content

Latest commit

 

History

History
236 lines (211 loc) · 18.4 KB

File metadata and controls

236 lines (211 loc) · 18.4 KB

SQL4Json Performance

What changed in 1.2.0: Memory-compaction work — the sealed JsonNumberValue / SqlNumber numeric variants with unboxed primitive storage, plus FlatRow Object[]-backed materialization for GROUP BY / HAVING / WINDOW / ORDER BY / JOIN / DISTINCT / SELECT and the engine pre-flatten path. The numbers below are the 1.2.0 release measurement, captured on 2026-04-30 against the reference machine. Re-run ./mvnw test -Plarge-tests -Dtest=ProfilingTest -Dprofiling.runs=3 on your hardware to reproduce.

How to read these numbers

Median of 3 runs; warmup excluded; wall-clock time captured via JMX. Single reference machine. Numbers will differ on other hardware — re-run locally to compare.

Single-threaded. Each query runs end-to-end on the calling thread — SQL4Json does not parallelize a single query internally. Multi-core scaling, if any, comes from running independent queries concurrently in your application.

Source: ProfilingTest.java. Regen: ./mvnw test -Plarge-tests -Dtest=ProfilingTest.

Reference environment

  • JVM: OpenJDK 64-Bit Server VM 21.0.1+12-LTS (Amazon.com Inc.)
  • OS: Windows 11 10.0 (amd64)
  • Cores: 32
  • Total RAM: 64,671 MB
  • Max heap (-Xmx): 8,192 MB
  • Initial heap (-Xms): 2,640 MB (approx, post-warmup)
  • Profiling runs: 3
  • Data seed: 20260428

Dataset

Synthetic. Generated by generate_json.py with --seed 20260428. Each record has up to 13 top-level fields (some randomly dropped) and up to depth-3 nesting (address.geo, metadata.audit, preferences.notifications).

Record counts per size:

Size Bytes Records
8 MB 8,389,001 12,239
16 MB 16,778,023 24,431
32 MB 33,554,658 48,831
64 MB 67,109,367 97,598
128 MB 134,218,195 195,334
256 MB 268,435,588 390,657
512 MB 536,871,390 781,553

Headline results

Median wall-clock time per query at four representative sizes. The Rows out column shows how many records each query returns at 512 MB — context for the time numbers.

Query 8 MB 32 MB 128 MB 512 MB Rows out (at 512 MB)
SELECT * FROM $r 68 183 821 3,392 781,553
SELECT id, is_active, score FROM $r 55 217 986 3,936 781,553
WHERE score >= 80 AND score <= 90 47 183 752 2,993 58,520
GROUP BY country (multi-aggregate) 69 270 1,276 5,460 9
ORDER BY score DESC LIMIT 50 47 179 728 2,883 50
users JOIN countries ON country=code 97 349 1,436 5,445 808,446
ROW_NUMBER() OVER (PARTITION BY country ORDER BY score DESC) 79 300 1,377 6,118 781,553
Filter + GROUP BY + HAVING + ORDER BY (kitchen sink) 51 198 815 3,304 9

All times in milliseconds.

Full sweep — wall time (ms, median of 3 runs)

A cell means the scenario was not run at that size. Only HEADLINE and CURATED-tier scenarios run at 128 / 256 / 512 MB; FULL-only scenarios are 8 / 16 / 32 / 64 MB.

Label 8 MB 16 MB 32 MB 64 MB 128 MB 256 MB 512 MB
p01 SELECT-ALL 68 93 183 391 821 1,680 3,392
p02 SELECT-ALL-LIMIT-100 0 0 0 0 0 0 1
p03 LIMIT-1000-OFFSET-5000 22 24 23 22
p04 PROJECT-TOP-LEVEL 55 109 217 469 986 1,959 3,936
p05 PROJECT-NESTED 66 130 258 562
p06 PROJECT-ALIAS 58 116 229 498
p07 GROUP-BY-CITY-COUNT 66 127 253 591 1,184 2,482 4,962
p08 GROUP-BY-COUNTRY-MULTI-AGG 69 135 270 652 1,276 2,669 5,460
p09 GROUP-BY-THEME-HAVING 69 134 266 602
p10 GROUP-BY-REVIEW-STATUS 68 131 261 610
p11 WHERE-BOOL-EQ 49 97 192 416
p12 WHERE-RANGE-SCORE 47 92 183 373 752 1,509 2,993
p13 WHERE-STRING-EQ 47 94 186 379
p14 WHERE-LIKE 54 106 211 407 821 1,634 3,243
p15 WHERE-DEEP-NESTED 50 99 196 393
p16 WHERE-OR-COMPOUND 48 96 191 386
p17 WHERE-IN 51 100 199 407
p18 WHERE-BETWEEN 47 92 184 371
p19 WHERE-IS-NULL 49 94 188 379
p20 WHERE-NOT-LIKE 58 115 229 471
p21 ORDER-BY-SCORE-ASC 69 135 279 639 1,401 2,855 6,229
p22 ORDER-BY-CITY-DESC 71 136 275 608
p23 ORDER-BY-MULTI 87 168 358 775
p24 ORDER-BY-SCORE-TOP50 47 91 179 367 728 1,494 2,883
p25 DISTINCT-CITY 48 94 187 383 753 1,541 3,011
p26 DISTINCT-COUNTRY-THEME 50 98 195 390
p27 FUNC-UPPER 47 47 46 47
p28 FUNC-CONCAT 47 47 47 47
p29 FUNC-LENGTH-FILTER 52 103 205 432
p30 FUNC-ROUND 45 45 45 45
p31 FUNC-ABS-FILTER 51 102 203 413
p32 FUNC-COALESCE 48 48 47 47
p33 FUNC-NESTED-ROUND-AVG 67 130 266 596
p34 FUNC-SUBSTRING 45 43 45 44
p35 FUNC-TRIM-UPPER 48 47 47 48
p36 COMPLEX-FILTER-GROUP-SORT 51 98 198 403 815 1,672 3,304
p37 COMPLEX-NESTED-GROUP-HAVING 64 126 255 564
p38 COMPLEX-MULTI-COND-PROJ 47 92 186 373
p39 SUBQUERY-SIMPLE 49 95 192 399
p40 SUBQUERY-GROUP-AFTER-FILTER 51 101 207 420
p41 PREPARED-GROUPBY 63 126 255 575
p42 PREPARED-REUSE 6 6 6 6
p43 ENGINE-NO-CACHE 64 127 254 601
p44 ENGINE-CACHE 63 125 251 568
p45 ENGINE-MIXED 45 90 181 364
p46 STRESS-FULL-SORT 64 134 291 636
p47 STRESS-HIGH-CARDINALITY-GROUP 81 151 310 728
p48 STRESS-DISTINCT-ID 52 106 216 478
p49 STRESS-WIDE-PROJECTION 113 218 499 482
p50 STRESS-COMPLEX-SUBQUERY 51 100 204 415 846 1,736 3,485
p51 JOIN-INNER 97 164 349 683 1,436 2,688 5,445
p52 JOIN-LEFT 92 168 354 687
p53 JOIN-INNER-FILTER 79 155 328 658
p54 JOIN-INNER-GROUPBY 80 158 331 660 1,351 2,659 5,489
p55 JOIN-INNER-ORDERBY-LIMIT 79 154 323 646
p56 JOIN-LEFT-ANTI 80 161 342 666
p57 WINDOW-ROW-NUMBER-GLOBAL 63 122 249 579
p58 WINDOW-ROW-NUMBER-PART 79 142 300 644 1,377 2,863 6,118
p59 WINDOW-RANK-PART 71 148 315 674
p60 WINDOW-DENSE-RANK-PART 71 148 315 675
p61 WINDOW-NTILE-4 62 126 276 622
p62 WINDOW-LAG 75 152 314 666
p63 WINDOW-LEAD 74 153 310 667
p64 WINDOW-AVG-OVER-PART 64 131 262 556 1,148 2,393 4,796
p65 FUNC-NESTED-LPAD-TRIM-COALESCE 50 49 49 49
p66 FUNC-YEAR-GROUPBY 112 201 482 925
p67 FUNC-DATE-DIFF-FILTER 71 115 229 487
p68 FUNC-CAST-INT 45 44 44 44

Full sweep — peak heap (MB, single-run measurement)

Label 8 MB 16 MB 32 MB 64 MB 128 MB 256 MB 512 MB
p01 SELECT-ALL 93 178 350 711 1,204 2,428 4,436
p02 SELECT-ALL-LIMIT-100 13 22 38 70 134 262 518
p03 LIMIT-1000-OFFSET-5000 53 58 74 106
p04 PROJECT-TOP-LEVEL 101 210 414 844 1,227 2,906 5,170
p05 PROJECT-NESTED 129 254 490 847
p06 PROJECT-ALIAS 109 218 430 874
p07 GROUP-BY-CITY-COUNT 129 254 498 895 1,511 3,342 5,642
p08 GROUP-BY-COUNTRY-MULTI-AGG 129 254 502 1,006 1,526 3,245 5,643
p09 GROUP-BY-THEME-HAVING 129 254 502 811
p10 GROUP-BY-REVIEW-STATUS 129 254 498 879
p11 WHERE-BOOL-EQ 97 194 382 571
p12 WHERE-RANGE-SCORE 93 186 366 608 693 966 1,962
p13 WHERE-STRING-EQ 97 190 374 716
p14 WHERE-LIKE 93 186 366 683 746 898 1,674
p15 WHERE-DEEP-NESTED 101 194 386 740
p16 WHERE-OR-COMPOUND 97 190 374 708
p17 WHERE-IN 101 198 390 774
p18 WHERE-BETWEEN 97 186 366 735
p19 WHERE-IS-NULL 97 194 378 745
p20 WHERE-NOT-LIKE 101 202 398 861
p21 ORDER-BY-SCORE-ASC 109 218 430 898 1,571 2,745 4,971
p22 ORDER-BY-CITY-DESC 113 222 434 855
p23 ORDER-BY-MULTI 121 242 486 854
p24 ORDER-BY-SCORE-TOP50 93 182 362 496 742 1,359 1,723
p25 DISTINCT-CITY 101 202 394 642 746 910 1,654
p26 DISTINCT-COUNTRY-THEME 105 210 410 682
p27 FUNC-UPPER 93 102 118 150
p28 FUNC-CONCAT 93 102 118 150
p29 FUNC-LENGTH-FILTER 101 198 394 866
p30 FUNC-ROUND 89 98 114 146
p31 FUNC-ABS-FILTER 101 202 398 715
p32 FUNC-COALESCE 93 102 118 150
p33 FUNC-NESTED-ROUND-AVG 129 254 502 905
p34 FUNC-SUBSTRING 89 102 114 146
p35 FUNC-TRIM-UPPER 93 102 118 150
p36 COMPLEX-FILTER-GROUP-SORT 97 194 382 556 872 1,188 2,813
p37 COMPLEX-NESTED-GROUP-HAVING 121 242 474 883
p38 COMPLEX-MULTI-COND-PROJ 93 186 366 565
p39 SUBQUERY-SIMPLE 93 186 370 769
p40 SUBQUERY-GROUP-AFTER-FILTER 101 198 390 745
p41 PREPARED-GROUPBY 129 254 498 1,001
p42 PREPARED-REUSE 21 30 46 78
p43 ENGINE-NO-CACHE 125 250 498 992
p44 ENGINE-CACHE 129 254 498 812
p45 ENGINE-MIXED 93 186 362 504
p46 STRESS-FULL-SORT 109 218 430 882
p47 STRESS-HIGH-CARDINALITY-GROUP 157 306 606 1,065
p48 STRESS-DISTINCT-ID 105 210 418 838
p49 STRESS-WIDE-PROJECTION 297 594 796 720
p50 STRESS-COMPLEX-SUBQUERY 101 198 390 678 916 1,449 3,457
p51 JOIN-INNER 121 238 474 891 1,568 2,872 5,120
p52 JOIN-LEFT 121 242 482 857
p53 JOIN-INNER-FILTER 117 226 450 853
p54 JOIN-INNER-GROUPBY 117 230 454 861 1,343 2,823 5,217
p55 JOIN-INNER-ORDERBY-LIMIT 117 226 446 856
p56 JOIN-LEFT-ANTI 117 230 454 858
p57 WINDOW-ROW-NUMBER-GLOBAL 101 202 398 815
p58 WINDOW-ROW-NUMBER-PART 113 226 450 858 1,390 3,008 5,214
p59 WINDOW-RANK-PART 113 226 454 863
p60 WINDOW-DENSE-RANK-PART 113 230 454 867
p61 WINDOW-NTILE-4 105 210 418 866
p62 WINDOW-LAG 117 230 454 860
p63 WINDOW-LEAD 117 230 454 868
p64 WINDOW-AVG-OVER-PART 113 218 430 868 1,275 2,979 4,859
p65 FUNC-NESTED-LPAD-TRIM-COALESCE 97 106 122 150
p66 FUNC-YEAR-GROUPBY 181 346 934 1,087
p67 FUNC-DATE-DIFF-FILTER 118 230 458 667
p68 FUNC-CAST-INT 94 98 114 146

Reproducing these numbers

  1. Generate the data files (one-time, ~minutes per size):

    cd src/test/resources/data-files
    for s in 8 16 32 64 128 256 512; do
      python generate_json.py -s "$s" --seed 20260428 -o "data_${s}mb.json"
    done
  2. Run the bench (~10 minutes on the reference machine):

    ./mvnw test -Plarge-tests -Dtest=ProfilingTest
  3. Open the fresh report at target/profiling/profiling-report-<timestamp>.md and compare its tables to the ones above. Wall-time medians should match within ±5 % on similar hardware.

Why these numbers will differ on your machine: absolute milliseconds depend on CPU clock speed, memory bandwidth, GC tuning, and concurrent load. Use the published numbers as a shape (which queries are fast, which scale linearly with input, which super-linearly) rather than as a hard target.