Query JSON data using SQL — filter, aggregate, sort, and project without a database.
SQL4Json lets you query in-memory JSON data with familiar SQL SELECT syntax. If you know SQL, you already know how to use it. No database, no schema, no setup. It's just ~500KB.
SELECT dept,
COUNT(*) AS headcount,
AVG(salary) AS avgSalary,
MAX(salary) AS topSalary
FROM $r
WHERE active = true
AND hire_date BETWEEN '2023-01-01' AND '2024-12-31'
GROUP BY dept
HAVING headcount > 2
ORDER BY avgSalary DESC LIMIT 10- Requirements
- Installation
- Quick Start
- Command-line usage
- Performance
- API Reference
- SQL Syntax
- Window Functions
- Functions
- Type System
- Pipeline Stages
- NULL Handling
- Error Handling
- Security Defaults
- Thread Safety
- Limitations
- License
- Java 21 or above
<dependency>
<groupId>io.github.mnesimiyilmaz</groupId>
<artifactId>sql4json</artifactId>
<version>1.3.0</version>
</dependency>implementation 'io.github.mnesimiyilmaz:sql4json:1.3.0'import io.github.mnesimiyilmaz.sql4json.SQL4Json;
String json = """
[
{"name": "Alice", "age": 30, "dept": "Engineering"},
{"name": "Bob", "age": 25, "dept": "Marketing"},
{"name": "Carol", "age": 35, "dept": "Engineering"}
]
""";
String result = SQL4Json.query(
"SELECT name, age FROM $r WHERE age > 28 ORDER BY age DESC",
json
);
// [{"name":"Carol","age":35},{"name":"Alice","age":30}]A command-line entrypoint ships as a separate shaded jar published alongside the library under classifier cli:
<dependency>
<groupId>io.github.mnesimiyilmaz</groupId>
<artifactId>sql4json</artifactId>
<version>1.3.0</version>
<classifier>cli</classifier>
</dependency>java -jar sql4json-1.3.0-cli.jar -q "SELECT name FROM \$r WHERE age > 25" -f data.json| Flag | Description |
|---|---|
-q, --query <sql|@path> |
SQL to run; @path reads the SQL from a file. Required. |
-f, --file <path> |
JSON input file. Omit to read from stdin. |
-o, --output <path> |
Write result here. Omit for stdout. |
--data <name>=<path> |
Repeatable. Named source for JOIN queries. Excludes -f. |
-p, --param <name>=<json> |
Repeatable. Bind :name parameters. Value is a JSON literal. |
--pretty |
Pretty-print the JSON output (default: compact). |
-h, --help |
Print help and exit 0. |
-v, --version |
Print library version and exit 0. |
# Read from a file
java -jar sql4json-1.3.0-cli.jar \
-q "SELECT * FROM \$r WHERE age > 25" -f data.json
# Pipe from stdin and pretty-print
cat data.json | java -jar sql4json-1.3.0-cli.jar \
-q @query.sql --pretty
# JOIN across multiple sources
java -jar sql4json-1.3.0-cli.jar \
-q @join.sql \
--data users=users.json \
--data orders=orders.json
# Parameter binding (named only — positional ? is not exposed by the CLI)
java -jar sql4json-1.3.0-cli.jar \
-q "SELECT * FROM \$r WHERE id = :id AND tags @> :allowed" \
-p id=42 \
-p 'allowed=["admin","editor"]' \
-f data.json| Code | Meaning |
|---|---|
0 |
Success, or --help / --version short-circuit. |
1 |
Runtime failure (SQL4Json error, IO error). |
2 |
Usage error (bad flags, missing required option). |
Set SQL4JSON_DEBUG=1 to attach the full stack trace to failure messages on stderr.
Numbers below are median wall-clock time across 3 runs on a single reference machine (OpenJDK 21.0.1+12-LTS / Windows 11 amd64 / 32 cores / 64 GB RAM / -Xmx 8g -Xms 1g) against synthetic data generated with --seed 20260428. Re-run ./mvnw test -Plarge-tests -Dtest=ProfilingTest to reproduce on your hardware.
Single-threaded. Each query runs end-to-end on the calling thread — there is no internal parallelism. Multi-core scaling, if any, comes from running independent queries concurrently in your application.
| Query | 8 MB | 32 MB | 128 MB | 512 MB | Rows out (at 512 MB) |
|---|---|---|---|---|---|
SELECT * FROM $r |
68 | 183 | 821 | 3,392 | 781,553 |
SELECT id, is_active, score FROM $r |
55 | 217 | 986 | 3,936 | 781,553 |
WHERE score >= 80 AND score <= 90 |
47 | 183 | 752 | 2,993 | 58,520 |
GROUP BY country (multi-aggregate) |
69 | 270 | 1,276 | 5,460 | 9 |
ORDER BY score DESC LIMIT 50 |
47 | 179 | 728 | 2,883 | 50 |
users JOIN countries ON country=code |
97 | 349 | 1,436 | 5,445 | 808,446 |
ROW_NUMBER() OVER (PARTITION BY country ORDER BY score DESC) |
79 | 300 | 1,377 | 6,118 | 781,553 |
| Filter + GROUP BY + HAVING + ORDER BY (kitchen sink) | 51 | 198 | 815 | 3,304 | 9 |
All times in milliseconds. See docs/performance.md for the full sweep across all 7 sizes and ~50 scenarios, plus the reference environment, dataset details, and the regen recipe.
Parse and execute in a single call.
// Returns JSON string
String result = SQL4Json.query("SELECT * FROM $r WHERE active = true", jsonString);
// Returns JsonValue (library's own JSON abstraction)
JsonValue result = SQL4Json.queryAsJsonValue("SELECT * FROM $r", jsonString);
// With custom limits
Sql4jsonSettings settings = Sql4jsonSettings.builder()
.limits(l -> l.maxRowsPerQuery(10_000))
.build();
String result = SQL4Json.query(sql, jsonString, settings);Parse once, execute many times against different data. Analogous to JDBC PreparedStatement.
PreparedQuery query = SQL4Json.prepare("SELECT name FROM $r WHERE age > 25");
for(String json: dataList) {
String result = query.execute(json); // no re-parsing
}Bind JSON data once at build time and run multiple queries against it. Optionally cache query results with an LRU cache.
// Minimal — no cache
SQL4JsonEngine engine = SQL4Json.engine()
.data(jsonString)
.build();
engine.query("SELECT * FROM $r WHERE active = true");
engine.query("SELECT dept, COUNT(*) AS cnt FROM $r GROUP BY dept");
// With default LRU cache (64 entries)
SQL4JsonEngine cached = SQL4Json.engine()
.settings(Sql4jsonSettings.builder()
.cache(c -> c.queryResultCacheEnabled(true))
.build())
.data(jsonString)
.build();
cached.query(sql); // cache miss — executes and caches result
cached.query(sql); // cache hit — returns cached result instantly
// With custom-size cache
SQL4JsonEngine engine = SQL4Json.engine()
.settings(Sql4jsonSettings.builder()
.cache(c -> c.queryResultCacheEnabled(true).queryResultCacheSize(256))
.build())
.data(jsonString)
.build();The engine is designed to be long-lived (e.g., one per dataset). It holds no external resources and needs no cleanup.
You can also plug in your own cache implementation via the QueryResultCache interface:
SQL4JsonEngine engine = SQL4Json.engine()
.settings(Sql4jsonSettings.builder()
.cache(c -> c.customCache(myCaffeineCache))
.build())
.data(jsonString)
.build();For queries that JOIN across multiple JSON sources, bind named data sources:
// Static API — one-off multi-source query
String result = SQL4Json.query(
"SELECT u.name AS name, o.amount AS amount FROM users u JOIN orders o ON u.id = o.user_id",
Map.of("users", usersJson, "orders", ordersJson));
// Engine API — bind named sources for repeated queries
SQL4JsonEngine engine = SQL4Json.engine()
.data("users", usersJson)
.data("orders", ordersJson)
.build();
String result = engine.query(
"SELECT u.name AS name, o.amount AS amount FROM users u JOIN orders o ON u.id = o.user_id");Named sources and unnamed data can coexist in the same engine — unnamed for $r queries, named for JOIN queries.
SQL4Json ships with a built-in JSON parser and serializer (zero external dependencies). If you prefer to use your own
JSON library (Jackson, Gson, etc.), implement the JsonCodec interface:
import io.github.mnesimiyilmaz.sql4json.types.JsonCodec;
public class GsonJsonCodec implements JsonCodec {
private final Gson gson = new Gson();
@Override
public JsonValue parse(String json) { /* parse with Gson, convert to JsonValue */ }
@Override
public String serialize(JsonValue value) { /* convert JsonValue, serialize with Gson */ }
}
// Wrap the codec in settings and pass to any API entry point
Sql4jsonSettings settings = Sql4jsonSettings.builder()
.codec(new GsonJsonCodec())
.build();
String result = SQL4Json.query(sql, json, settings);
PreparedQuery q = SQL4Json.prepare(sql, settings);
SQL4JsonEngine engine = SQL4Json.engine().settings(settings).data(json).build();Example: Jackson-based JsonCodec
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.*;
import io.github.mnesimiyilmaz.sql4json.json.*;
import io.github.mnesimiyilmaz.sql4json.types.JsonCodec;
import io.github.mnesimiyilmaz.sql4json.types.JsonValue;
import java.util.ArrayList;
import java.util.LinkedHashMap;
public class JacksonJsonCodec implements JsonCodec {
private final ObjectMapper mapper = new ObjectMapper();
@Override
public JsonValue parse(String json) {
try {
return convertNode(mapper.readTree(json));
} catch (Exception e) {
throw new RuntimeException("Failed to parse JSON", e);
}
}
@Override
public String serialize(JsonValue value) {
try {
return mapper.writeValueAsString(toJsonNode(value));
} catch (Exception e) {
throw new RuntimeException("Failed to serialize JSON", e);
}
}
private JsonValue convertNode(JsonNode node) {
return switch (node) {
case ObjectNode obj -> {
var map = new LinkedHashMap<String, JsonValue>();
obj.fields().forEachRemaining(e -> map.put(e.getKey(), convertNode(e.getValue())));
yield new JsonObjectValue(map);
}
case ArrayNode arr -> {
var list = new ArrayList<JsonValue>(arr.size());
arr.forEach(n -> list.add(convertNode(n)));
yield new JsonArrayValue(list);
}
case TextNode t -> new JsonStringValue(t.asText());
case NumericNode n -> {
Number num = n.numberValue();
yield (num instanceof java.math.BigDecimal bd)
? new JsonDecimalValue(bd)
: (num instanceof Double || num instanceof Float)
? new JsonDoubleValue(num.doubleValue())
: new JsonLongValue(num.longValue());
}
case BooleanNode b -> new JsonBooleanValue(b.asBoolean());
case NullNode ignored -> JsonNullValue.INSTANCE;
default -> JsonNullValue.INSTANCE;
};
}
private JsonNode toJsonNode(JsonValue value) {
return switch (value) {
case JsonObjectValue(var map) -> {
var obj = mapper.createObjectNode();
map.forEach((k, v) -> obj.set(k, toJsonNode(v)));
yield obj;
}
case JsonArrayValue(var list) -> {
var arr = mapper.createArrayNode();
list.forEach(v -> arr.add(toJsonNode(v)));
yield arr;
}
case JsonStringValue(var s) -> new TextNode(s);
case JsonLongValue(long l) -> mapper.getNodeFactory().numberNode(l);
case JsonDoubleValue(double d) -> mapper.getNodeFactory().numberNode(d);
case JsonDecimalValue(var bd) -> mapper.getNodeFactory().numberNode(bd);
case JsonBooleanValue(var b) -> BooleanNode.valueOf(b);
case JsonNullValue ignored -> NullNode.getInstance();
};
}
}Usage:
String result = SQL4Json.query(sql, json,
Sql4jsonSettings.builder()
.codec(new JacksonJsonCodec())
.build());This is an example implementation. Adapt
ObjectMapperconfiguration, error handling, and security limits to your requirements.
Map query results directly to Java records or POJOs. Zero dependencies — built-in.
// Record target
record Person(String name, int age, LocalDate birthDate) {}
List<Person> people = SQL4Json.queryAsList(
"SELECT name, age, birthDate FROM $r", json, Person.class);
// POJO target
class Employee {
private String name; private int age;
public void setName(String n) { this.name = n; }
public void setAge(int a) { this.age = a; }
public String getName() { return name; }
public int getAge() { return age; }
}
List<Employee> emps = SQL4Json.queryAsList(
"SELECT name, age FROM $r", json, Employee.class);
// Single-row unwrap
Person alice = SQL4Json.queryAs(
"SELECT * FROM $r WHERE name = 'Alice'", json, Person.class);
// JsonValue.as()
Person p = SQL4Json.queryAsJsonValue(sql, json).as(Person.class);| Category | Types |
|---|---|
| Primitives / boxed | boolean, byte, short, int, long, float, double, char + wrappers |
| String | String, CharSequence |
| Big numbers | BigDecimal, BigInteger, Number |
| Time | LocalDate, LocalDateTime, Instant (ISO strings; Instant also accepts epoch millis) |
| Enums | Enum.valueOf — case-sensitive |
| Records | Via canonical constructor |
| POJOs | Public no-arg constructor + public setters (return type ignored; inherited setters included) |
| Collections | List<T>, Set<T>, Collection<T>, Map<String,V>, T[] |
| Passthrough | JsonValue (and subtypes), Object (natural Java value) |
// IGNORE (default): missing fields → null / Optional.empty() / primitive default / empty collection
Person p = SQL4Json.queryAs(sql, json, Person.class);
// FAIL: any missing field → SQL4JsonMappingException with $.path.to.field
Sql4jsonSettings strict = Sql4jsonSettings.builder()
.mapping(m -> m.missingFieldPolicy(MissingFieldPolicy.FAIL))
.build();
List<Person> people = SQL4Json.queryAsList(sql, json, Person.class, strict);Object mapping covers typical JSON → record/POJO needs. If you need full Jackson power (polymorphism, annotations, custom deserializers), use the existing codec integration path: call SQL4Json.queryAsJsonValue(...) to get the raw tree, serialize it back to a string via your preferred library, then deserialize with Jackson/Gson. See the "Custom JsonCodec" section above.
Bind values to a pre-parsed query via PreparedQuery (JDBC-style). Placeholders come in two flavors — cannot be mixed within a single query:
// Named parameters
PreparedQuery q = SQL4Json.prepare(
"SELECT * FROM $r WHERE age > :minAge AND dept = :dept");
String r = q.execute(json, BoundParameters.named()
.bind("minAge", 25)
.bind("dept", "Engineering"));
// Positional parameters
PreparedQuery q2 = SQL4Json.prepare(
"SELECT * FROM $r WHERE age > ? AND salary BETWEEN ? AND ?");
String r2 = q2.execute(json, 25, 50_000, 150_000);
// IN expansion
PreparedQuery q3 = SQL4Json.prepare("SELECT * FROM $r WHERE id IN (?)");
String r3 = q3.execute(json, BoundParameters.of(List.of(1, 2, 3)));
// Dynamic pagination
PreparedQuery page = SQL4Json.prepare(
"SELECT * FROM $r ORDER BY id LIMIT :n OFFSET :off");
String p = page.execute(json, BoundParameters.named().bind("n", 20).bind("off", 100));
// Combined with object mapping
List<Person> engineers = SQL4Json.prepare("SELECT * FROM $r WHERE dept = :d")
.executeAsList(json, Person.class, BoundParameters.named().bind("d", "Engineering"));Supported Java parameter types: String, all numeric primitives + wrappers, BigDecimal, BigInteger, Boolean, LocalDate, LocalDateTime, Instant, ZonedDateTime, OffsetDateTime, java.util.Date, null.
IN-list semantics:
IN (?)+ collection → expanded to N literalsIN (?)+ scalar → single-element listIN (?)+ empty collection → zero-row predicate (SQL-standard behavior)
Null semantics: col = :x with a null bind produces col = NULL which is always false per SQL standard (zero rows). Use literal IS NULL / IS NOT NULL to test nullability — these cannot be parameterized.
Scope: parameter binding is available on PreparedQuery and SQL4JsonEngine only; there are no SQL4Json.query(..., params) overloads. Use SQL4Json.prepare(sql).execute(json, params) for one-off parameterized queries.
Building IDE tooling? SQL4Json publishes a small read-only grammar catalog (
SQL4JsonGrammar.keywords(),functions(),tokenize(...)) for syntax highlighters, completion popups, and static analysers — see docs/grammar-api.md.
SQL4Json supports a subset of SQL SELECT. All keywords are case-insensitive.
Select all fields, specific fields, or use aliases with dot-notation for structured output:
SELECT * FROM $r
SELECT name, age FROM $r
SELECT fullname,
id AS userId,
username AS user.username
FROM $rDotted aliases create nested JSON in the output:
[
{
"fullname": "Alice",
"userId": 1,
"user": {
"username": "alice"
}
}
]Beyond field references, the SELECT list accepts literal values, function calls, and arbitrary expressions — each becomes a new column in the output:
-- string / number / boolean literals
SELECT 'hello' AS greeting FROM $r;
SELECT 42 AS answer, TRUE AS flag FROM $r;
-- functions with literal arguments
SELECT ROUND(3.14159, 2) AS pi FROM $r;
-- mix fields and literals via CONCAT
SELECT CONCAT(firstName, ' ', lastName) AS fullName FROM $r;
-- nested function calls
SELECT CONCAT(UPPER(name), '!') AS shouty FROM $r;$r references the root JSON data. Use dot-notation to drill into nested structures:
-- Query the root array
SELECT * FROM $r
-- Query a nested array
SELECT * FROM $r.response.data.itemsIf the input is a single JSON object (not an array), it is automatically wrapped in an array.
Filter rows with comparison operators, logical connectives, and parenthesized groups:
SELECT *
FROM $r
WHERE age >= 18
AND status IN ('active', 'pending')
AND name LIKE 'A%'
AND score BETWEEN 50 AND 100
AND email IS NOT NULLOperators:
| Operator | Example |
|---|---|
=, != |
status = 'active' |
>, <, >=, <= |
age >= 18 |
LIKE, NOT LIKE |
name LIKE '%son' (% = any chars, _ = one char) |
IN, NOT IN |
dept IN ('IT', 'HR') |
BETWEEN, NOT BETWEEN |
age BETWEEN 18 AND 65 |
IS NULL, IS NOT NULL |
email IS NOT NULL |
AND, OR |
a > 1 AND (b = 2 OR c = 3) |
AND has higher precedence than OR. Use parentheses to override.
Functions can be used on the left-hand side of any comparison, including nested calls:
WHERE LOWER(name) = 'alice'
WHERE CAST(age AS STRING) = '25'
WHERE TRIM(NULLIF(name, '')) = 'Alice'
WHERE LENGTH(TRIM(name)) > 5Test values inside JSON array fields without unrolling the array via FROM:
-- Scalar membership (case-sensitive equality)
SELECT id FROM $r WHERE tags CONTAINS 'admin'
-- Contains-all (PostgreSQL @>): tags includes every element of the right array
SELECT id FROM $r WHERE tags @> ARRAY['admin', 'editor']
-- Contained-by (PostgreSQL <@): every element of tags is in the right array
SELECT id FROM $r WHERE tags <@ ARRAY['admin', 'editor', 'viewer']
-- Overlap (PostgreSQL &&): tags shares at least one element with the right array
SELECT id FROM $r WHERE tags && ARRAY['blocked', 'flagged']
-- Structural equality with array literal (order-sensitive, length-sensitive)
SELECT id FROM $r WHERE tags = ARRAY['admin', 'editor']
SELECT id FROM $r WHERE tags != ARRAY['admin']Parameter binding (matches JDBC / Hibernate / jOOQ patterns):
// Whole-array bind via bare parameter — list becomes the array RHS
PreparedQuery q = SQL4Json.prepare("SELECT * FROM $r WHERE tags @> :tagList");
q.execute(json, BoundParameters.named().bind("tagList", List.of("admin","editor")));
// Element-by-element via ARRAY[?,?]
PreparedQuery q2 = SQL4Json.prepare("SELECT * FROM $r WHERE tags @> ARRAY[?,?]");
q2.execute(json, BoundParameters.of("admin","editor"));Semantics:
<,>,<=,>=againstARRAY[…]raiseSQL4JsonParseExceptionat parse time.CONTAINS,@>,<@,&&,=,!=returnfalsewhen the LHS field is missing, JSON null, or not an array.NULLelement → that pairwise comparison is false (SQL-standardcol = NULLsemantics; useIS NULLfor a null-test).@>,<@,&&are set semantics — duplicates do not affect the result.=/!=are order-sensitive structural equality.- Empty array literal
ARRAY[]:@>always true;&&always false;<@true iff LHS is empty;=matches only an empty array.
Aggregate data with GROUP BY and filter groups with HAVING:
SELECT dept,
COUNT(*) AS headcount,
SUM(salary) AS totalPay,
ROUND(AVG(salary), 2) AS avgPay,
MAX(salary) AS topPay,
MIN(salary) AS lowPay
FROM $r
GROUP BY dept
HAVING headcount > 5
AND avgPay > 50000HAVING conditions must reference aliases defined in the SELECT clause.
All aggregate functions except COUNT(*) operate on non-null values only.
Aggregate functions accept nested expressions — for example, AVG(NULLIF(salary, 0)) excludes zeros from the average.
Sort by one or more fields. Default direction is ASC. Expressions are supported:
SELECT * FROM $r
ORDER BY age DESC, name ASC
SELECT * FROM $r
ORDER BY LENGTH(TRIM(name)) ASCFor grouped results, use SELECT aliases:
SELECT dept, AVG(salary) AS avgPay
FROM $r
GROUP BY dept
ORDER BY avgPay DESCRestrict result size and paginate:
-- First 10 results
SELECT *
FROM $r
ORDER BY created_at DESC LIMIT 10
-- Page 3 (20 items per page)
SELECT *
FROM $r
ORDER BY id LIMIT 20
OFFSET 40Remove duplicate rows from results:
SELECT DISTINCT dept FROM $r
SELECT DISTINCT dept, status FROM $r ORDER BY deptUse standard SQL FROM subqueries. The inner query executes first, and its result becomes the input for the outer query.
SELECT dept, COUNT(*) AS cnt
FROM (SELECT *
FROM $r
WHERE active = true)
GROUP BY dept
ORDER BY cnt DESCSQL4Json supports both simple and searched CASE expressions, usable in SELECT, WHERE, ORDER BY, GROUP BY, and HAVING.
Simple CASE — compares an expression against values:
SELECT name,
CASE dept
WHEN 'eng' THEN 'Engineering'
WHEN 'hr' THEN 'Human Resources'
ELSE 'Other'
END AS department
FROM $rSearched CASE — evaluates boolean conditions:
SELECT name,
CASE
WHEN salary > 100000 THEN 'senior'
WHEN salary > 60000 THEN 'mid'
ELSE 'junior'
END AS level
FROM $rCASE expressions nest freely with functions and other CASE expressions:
-- Function inside CASE
SELECT CASE WHEN LENGTH(name) > 4 THEN 'long' ELSE 'short' END AS label FROM $r
-- CASE inside function
SELECT UPPER(CASE WHEN active THEN 'yes' ELSE 'no' END) AS flag FROM $r
-- Nested CASE
SELECT CASE WHEN dept = 'eng'
THEN CASE WHEN salary > 100000 THEN 'senior eng' ELSE 'eng' END
ELSE 'other'
END AS category
FROM $rIf no WHEN clause matches and no ELSE is provided, the result is
NULL. Simple CASE uses SQL NULL semantics:CASE x WHEN NULLnever matches — useCASE WHEN x IS NULLinstead.
Join across multiple JSON sources using INNER JOIN, LEFT JOIN, or RIGHT JOIN with equality-based ON conditions:
-- INNER JOIN (both spellings)
SELECT u.name AS name, o.amount AS amount
FROM users u JOIN orders o ON u.id = o.user_id
SELECT u.name AS name, o.amount AS amount
FROM users u INNER JOIN orders o ON u.id = o.user_id
-- LEFT JOIN — preserves all left-side rows, NULLs for unmatched right side
SELECT u.name AS name, o.amount AS amount
FROM users u LEFT JOIN orders o ON u.id = o.user_id
-- RIGHT JOIN — preserves all right-side rows, NULLs for unmatched left side
SELECT u.name AS name, o.amount AS amount
FROM users u RIGHT JOIN orders o ON u.id = o.user_id
-- Chained JOINs
SELECT u.name AS user_name, o.amount AS amount, p.name AS product
FROM users u
JOIN orders o ON u.id = o.user_id
LEFT JOIN products p ON o.product_id = p.id
-- Multi-column ON
SELECT * FROM a JOIN b ON a.x = b.x AND a.y = b.y
-- JOIN with all clauses
SELECT u.dept AS dept, COUNT(*) AS cnt, AVG(o.amount) AS avg_amount
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.status = 'completed'
GROUP BY u.dept
HAVING cnt >= 5
ORDER BY avg_amount DESC
LIMIT 10Table aliases are recommended. Without aliases, table names are used as prefixes (e.g., users.name).
ON conditions support only equality (=) with AND:
ON a.col = b.col— single columnON a.x = b.x AND a.y = b.y— multiple columns
Output format: Use AS aliases for flat output keys. Without aliases, dotted column refs like u.name produce the key u.name in the output.
Data binding: JOIN queries require named data sources — see Multi-Source Queries in the API Reference.
Window functions perform calculations across a set of rows related to the current row, without collapsing rows like GROUP BY. Use the OVER clause with optional PARTITION BY and ORDER BY:
-- Ranking within departments
SELECT name, dept, salary,
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS dept_rank,
RANK() OVER (ORDER BY salary DESC) AS global_rank
FROM $r
-- Running context: previous and next salary
SELECT name, salary,
LAG(salary) OVER (ORDER BY salary ASC) AS prev_salary,
LEAD(salary) OVER (ORDER BY salary ASC) AS next_salary
FROM $r
-- Aggregate windows — same value for every row in the partition
SELECT name, dept, salary,
SUM(salary) OVER (PARTITION BY dept) AS dept_total,
AVG(salary) OVER (PARTITION BY dept) AS dept_avg,
COUNT(*) OVER (PARTITION BY dept) AS dept_count
FROM $r
-- NTILE — divide into buckets
SELECT name, salary,
NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM $r
-- Combine with WHERE, ORDER BY, LIMIT
SELECT name, salary,
RANK() OVER (ORDER BY salary DESC) AS rnk
FROM $r
WHERE dept = 'Engineering'
ORDER BY rnk
LIMIT 10| Function | Description |
|---|---|
ROW_NUMBER() |
Sequential number (1, 2, 3, ...) per partition |
RANK() |
Rank with gaps on ties (1, 1, 3, ...) |
DENSE_RANK() |
Rank without gaps on ties (1, 1, 2, ...) |
NTILE(n) |
Divide rows into n roughly equal buckets |
| Function | Description |
|---|---|
LAG(col [, offset]) |
Value from offset rows before (default 1), NULL at boundary |
LEAD(col [, offset]) |
Value from offset rows after (default 1), NULL at boundary |
Aggregate functions (SUM, AVG, COUNT, MIN, MAX) can be used with OVER to compute values across the full partition without collapsing rows. COUNT(*) counts all rows; COUNT(col) counts non-null values.
SELECT name, dept, salary,
SUM(salary) OVER (PARTITION BY dept) AS dept_total,
COUNT(*) OVER () AS total_employees
FROM $rOVER ()— entire result set is one partitionOVER (PARTITION BY col1, col2)— group rows into partitionsOVER (ORDER BY col ASC/DESC)— order within partition (required for ranking/offset functions)OVER (PARTITION BY dept ORDER BY salary DESC)— both
Note: Window functions are only valid in SELECT. They cannot appear in WHERE, HAVING, or GROUP BY.
Note: All string functions auto-coerce non-string inputs to text via
toString(matchingCONCAT). For example,LPAD(salary, 10, '0')wheresalaryis a number returns the zero-padded numeric string. Null inputs always returnNULL.
| Function | Description | Example |
|---|---|---|
LOWER(str) |
Lowercase (optional locale: LOWER(str, 'tr-TR')) |
LOWER(name) |
UPPER(str) |
Uppercase (optional locale) | UPPER(name, 'en-US') |
CONCAT(a, b, ...) |
Concatenate strings | CONCAT(first, ' ', last) |
SUBSTRING(str, start, len) |
Extract substring (1-based start, length optional) | SUBSTRING(name, 1, 3) |
TRIM(str) |
Remove leading/trailing whitespace | TRIM(name) |
LENGTH(str) |
String length | LENGTH(name) |
REPLACE(str, search, replacement) |
Replace all occurrences | REPLACE(name, ' ', '_') |
LEFT(str, n) |
First n characters | LEFT(name, 3) |
RIGHT(str, n) |
Last n characters | RIGHT(name, 3) |
LPAD(str, len, pad) |
Left-pad to length | LPAD(id, 5, '0') |
RPAD(str, len, pad) |
Right-pad to length | RPAD(name, 20, '.') |
REVERSE(str) |
Reverse string | REVERSE(name) |
POSITION(substr, str) |
Find position (1-based, 0 if not found) | POSITION('a', name) |
| Function | Description | Example |
|---|---|---|
ABS(n) |
Absolute value | ABS(delta) |
ROUND(n, decimals) |
Round (decimals optional, default 0) | ROUND(price, 2) |
CEIL(n) |
Ceiling (round up) | CEIL(score) |
FLOOR(n) |
Floor (round down) | FLOOR(score) |
MOD(a, b) |
Modulo | MOD(id, 10) |
POWER(base, exp) |
Exponentiation | POWER(n, 2) |
SQRT(n) |
Square root | SQRT(area) |
SIGN(n) |
Returns -1, 0, or 1 | SIGN(balance) |
| Function | Description | Example |
|---|---|---|
TO_DATE(str, pattern?) |
Parse date string (ISO format if no pattern) | TO_DATE(d, 'yyyy-MM-dd') |
NOW() |
Current date and time | NOW() |
YEAR(d) |
Extract year | YEAR(created_at) |
MONTH(d) |
Extract month (1-12) | MONTH(created_at) |
DAY(d) |
Extract day of month | DAY(created_at) |
HOUR(dt) |
Extract hour (0 for date-only) | HOUR(timestamp) |
MINUTE(dt) |
Extract minute | MINUTE(timestamp) |
SECOND(dt) |
Extract second | SECOND(timestamp) |
DATE_ADD(d, amount, unit) |
Add to date (units: DAYS, MONTHS, HOURS, etc.) | DATE_ADD(d, 30, 'DAYS') |
DATE_DIFF(d1, d2, unit) |
Difference between dates | DATE_DIFF(end, start, 'DAYS') |
TO_DATE without a pattern tries ISO datetime (yyyy-MM-dd'T'HH:mm:ss) first, then ISO date (yyyy-MM-dd).
| Function | Description | Example |
|---|---|---|
CAST(expr AS type) |
Type conversion | CAST(age AS STRING) |
NULLIF(a, b) |
Returns NULL if a = b, otherwise a |
NULLIF(status, 'N/A') |
COALESCE(a, b, ...) |
First non-null argument | COALESCE(nickname, name) |
CAST target types: STRING, NUMBER, INTEGER, DECIMAL, BOOLEAN, DATE, DATETIME
Used with GROUP BY. All operate on non-null values except COUNT(*).
| Function | Description |
|---|---|
COUNT(field) |
Count of non-null values |
COUNT(*) |
Count of all rows including nulls |
SUM(field) |
Sum of numeric values |
AVG(field) |
Average of numeric values |
MIN(field) |
Minimum value |
MAX(field) |
Maximum value |
Functions nest arbitrarily — in SELECT, WHERE, GROUP BY, ORDER BY, and HAVING:
-- Exclude zeros from average, then round
SELECT dept, ROUND(AVG(NULLIF(salary, 0)), 2) AS avg_salary
FROM $r
GROUP BY dept
-- Pad trimmed values
SELECT LPAD(TRIM(NULLIF(name, '')), 10, '*') AS padded
FROM $r
-- Nested functions in WHERE and ORDER BY
SELECT *
FROM $r
WHERE TRIM(NULLIF(name, '')) = 'Alice'
ORDER BY LENGTH(TRIM(name)) ASCSQL4Json maps JSON values to a sealed type hierarchy for type-safe processing:
| JSON Type | SQL4Json Type | Notes |
|---|---|---|
| String | SqlString |
|
| Number | SqlNumber (sealed: SqlLong / SqlDouble / SqlDecimal) |
Sealed since 1.2.0 — primitive long/double stored unboxed; BigDecimal reserved for arbitrary-precision values. |
| Boolean | SqlBoolean |
true / false literals |
| Null | SqlNull |
Singleton |
| Date string | SqlDate |
Via TO_DATE() or CAST(x AS DATE) |
| DateTime string | SqlDateTime |
Via TO_DATE(), NOW(), or CAST(x AS DATETIME) |
The JsonValue sealed interface provides the library's own JSON abstraction (object / array / string / boolean / null
plus the sealed JsonNumberValue family JsonLongValue / JsonDoubleValue / JsonDecimalValue). The public API accepts
and returns String or JsonValue with zero external dependencies. You can plug in your own JSON library via the
JsonCodec interface.
SQL4Json executes queries through a staged pipeline. Each stage transforms the row stream and is either lazy (processes rows one at a time, O(1) memory) or materializing (buffers the entire input, O(n) memory).
WHERE → GROUP BY → HAVING → WINDOW → ORDER BY → LIMIT → SELECT → DISTINCT
(lazy) (mat.) (lazy) (mat.) (mat.) (mat.) (lazy) (mat.)
| Stage | SQL Clause | Behavior | Row-count Enforced | Notes |
|---|---|---|---|---|
| WHERE | WHERE |
Lazy | STREAMING | Filters rows as they stream through |
| GROUP BY | GROUP BY |
Materializing | GROUP_BY | Collects all rows, groups by key, applies aggregates |
| HAVING | HAVING |
Lazy | — | Filters groups after aggregation |
| WINDOW | OVER (...) |
Materializing | WINDOW | Partitions, sorts, computes window function values |
| ORDER BY | ORDER BY |
Materializing | ORDER_BY | Sorts result set; fused with LIMIT into a bounded heap (TopN optimization) when both are present |
| LIMIT | LIMIT / OFFSET |
Materializing | — | Truncates result set |
| SELECT | SELECT |
Lazy | — | Projects columns, evaluates expressions and aliases |
| DISTINCT | DISTINCT |
Materializing | DISTINCT | Eliminates duplicate rows |
Exceeding maxRowsPerQuery at any materializing stage throws SQL4JsonExecutionException. Configure via:
Sql4jsonSettings.builder()
.limits(l -> l.maxRowsPerQuery(50_000))
.build();SQL4Json follows these rules for NULL values:
- Non-existent fields are treated as NULL
- Any comparison involving NULL returns
false— matches standard SQL semantics where NULL represents an unknown value.NULL = NULL,NULL != x,NULL > x, etc. all evaluate tofalse. - Use
IS NULL/IS NOT NULLto test for NULL values - Aggregate functions (
SUM,AVG,MIN,MAX,COUNT(field)) ignore NULL values COUNT(*)counts all rows including those with NULL fieldsCOALESCEreturns the first non-null argument
SQL4Json uses a sealed exception hierarchy:
SQL4JsonException (sealed base)
+-- SQL4JsonParseException -- syntax errors (includes line & column position)
+-- SQL4JsonExecutionException -- runtime errors during query execution
try{
String result = SQL4Json.query(sql, json);
}catch(SQL4JsonParseException e){
// e.getLine() is 1-based, e.getCharPosition() is 0-based
System.err.println("Syntax error at line "+e.getLine()+", col "+e.getCharPosition() +": "+e.getMessage());
}catch(SQL4JsonExecutionException e){
System.err.println("Execution error: "+e.getMessage());
}SQL4Json enforces conservative limits by default to protect against denial-of-service and information-disclosure risks.
All limits are enforced by default via SQL4Json.query(sql, json); pass a custom Sql4jsonSettings to relax or tighten them.
| Setting | Default | Protects against | When to loosen |
|---|---|---|---|
security.maxLikeWildcards |
16 | Soft ReDoS via LIKE '%a%b%c%...' patterns |
Queries with many literal wildcards |
security.redactErrorDetails |
false | Leaking data source names, format strings, or user input into error messages (multi-tenant info disclosure) | Already off by default — set to true in multi-tenant deployments |
limits.maxSqlLength |
65 536 (64 KiB) | Unbounded SQL input consuming parser memory | Generated SQL longer than 64 KiB |
limits.maxSubqueryDepth |
16 | Recursion bomb via deeply nested FROM subqueries | Programmatically generated deep nestings |
limits.maxInListSize |
1 024 | Parse-time amplification attacks via huge IN (...) lists |
Bulk filter UIs that pass large allow-lists |
limits.maxRowsPerQuery |
1 000 000 | Out-of-memory at materializing pipeline stages (GROUP BY, ORDER BY, WINDOW, JOIN, DISTINCT, and the final pipeline / streaming sink) | Batch queries over datasets larger than 1M rows |
cache.likePatternCacheSize |
1 024 | Unbounded growth of compiled-LIKE-pattern cache (cache-poisoning DoS) | High-cardinality hot-pattern workloads |
cache.queryResultCacheEnabled |
false | Unbounded result memory on engines that don't need caching | When you want repeated identical queries to hit a result cache |
cache.queryResultCacheSize |
64 | Cache oversize when enabled | Workloads with more than 64 hot queries |
codec.maxInputLength |
10 MiB | Parser memory exhaustion via oversize JSON input | Known-large payloads |
codec.maxNestingDepth |
64 | Stack/recursion blowup on deeply nested JSON | Legitimately deep JSON documents |
codec.maxStringLength |
1 MiB | Memory exhaustion via a single huge string value | JSON contains embedded base64 blobs, etc. |
codec.maxNumberLength |
64 | DoS via excessively long number literals | N/A — almost never needs to be loosened |
codec.maxPropertyNameLength |
1 024 | DoS via excessively long object keys | N/A — almost never needs to be loosened |
codec.maxArrayElements |
1 000 000 | Memory exhaustion via a single massive array | Datasets with >1M elements per JSON array |
codec.duplicateKeyPolicy |
REJECT | HTTP-Parameter-Pollution-style desync between systems that disagree on which duplicate wins (RFC 8259 ambiguity) | Legacy data sources that rely on last-wins or first-wins behaviour |
To override specific limits, build a custom Sql4jsonSettings and pass it to any entry point:
Sql4jsonSettings strict = Sql4jsonSettings.builder()
.limits(l -> l.maxRowsPerQuery(10_000).maxSqlLength(8 * 1024))
.codec(new DefaultJsonCodec(
DefaultJsonCodecSettings.builder()
.maxInputLength(50 * 1024 * 1024)
.build()))
.build();
String result = SQL4Json.query(sql, json, strict);For the Javadoc on each setting, see Sql4jsonSettings, SecuritySettings, LimitsSettings, CacheSettings, and DefaultJsonCodecSettings.
PreparedQueryis immutable and safe to share across threads.SQL4JsonEnginebinds immutable data and uses a synchronized cache — safe for concurrent access from multiple threads.- Static
SQL4Jsonmethods are stateless and thread-safe. - Each query execution creates its own scoped state (rows, interner, pipeline) — no cross-query interference.
- In-memory only. The entire JSON dataset must fit in the JVM heap.
- SELECT only. No
INSERT,UPDATE,DELETE,CREATE, or DDL operations. - No FULL OUTER JOIN or CROSS JOIN. Only INNER, LEFT, and RIGHT JOINs with equality ON conditions.
- No UNION / INTERSECT / EXCEPT. One query, one result set.
- No window frame clause. Window functions operate over the full partition.
ROWS BETWEEN N PRECEDING AND M FOLLOWINGis not supported.