-
-
Notifications
You must be signed in to change notification settings - Fork 2
05 Semantic Actions
Semantic actions are JSON command blocks that describe how to build TypedAST nodes from parsed syntax. This chapter covers all available commands and patterns.
Every grammar rule can have a semantic action:
rule_name = { pattern }
-> ResultType {
// JSON commands
}
The ResultType indicates what kind of AST node the rule produces:
-
TypedExpression- Expressions (literals, operations, calls) -
TypedStatement- Statements (if, while, return) -
TypedDeclaration- Top-level declarations (functions, structs) -
TypedBlock- Statement blocks -
TypedParameter- Function parameters -
TypedField- Struct fields -
TypedVariant- Enum variants -
Type- Type expressions -
List- Collect multiple children -
String- Extract text
Commands can reference parsed children using $N syntax:
// Children are numbered by their position in the parse tree
// $1 = first non-silent child, $2 = second, etc.
binary_expr = { left ~ "+" ~ right }
-> TypedExpression {
"commands": [
{ "define": "binary", "args": { "left": "$1", "op": "+", "right": "$2" } }
]
}
Special references:
-
$result- The result of previous commands (likeget_text) -
$1,$2, ... - Child nodes by position
Extracts the matched text as a string:
identifier = @{ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }
-> String {
"get_text": true
}
Parses extracted text as an integer:
integer_literal = @{ "-"? ~ ASCII_DIGIT+ }
-> TypedExpression {
"get_text": true,
"parse_int": true,
"define": "int_literal",
"args": { "value": "$result" }
}
Gets a specific child by index:
// Get the first child (index 0)
expr = { inner_expr }
-> TypedExpression {
"get_child": { "index": 0 }
}
Collects all children into a list:
statements = { statement* }
-> List {
"get_all_children": true
}
Calls an AST builder method with arguments:
return_stmt = { "return" ~ expr? ~ ";" }
-> TypedStatement {
"commands": [
{ "define": "return_stmt", "args": { "value": "$1" } }
]
}
The define command is the primary way to create AST nodes. It calls a method on the TypedAstBuilder.
{ "define": "method_name", "args": { "arg1": "value1", "arg2": "$1" } }| Method | Arguments | Creates |
|---|---|---|
int_literal |
value |
Integer literal |
bool_literal |
value |
Boolean literal |
string_literal |
value |
String literal |
char_literal |
value |
Character literal |
bool_literal = { "true" | "false" }
-> TypedExpression {
"get_text": true,
"define": "bool_literal",
"args": { "value": "$result" }
}
| Method | Arguments | Creates |
|---|---|---|
variable |
name |
Variable reference |
field_access |
object, field
|
Field access (obj.field) |
index |
object, index
|
Index access (arr[i]) |
field_expr = { atom ~ "." ~ identifier }
-> TypedExpression {
"commands": [
{ "define": "field_access", "args": { "object": "$1", "field": "$2" } }
]
}
| Method | Arguments | Creates |
|---|---|---|
binary |
op, left, right
|
Binary operation |
unary |
op, operand
|
Unary operation |
call |
callee, args
|
Function call |
unary_expr = { unary_op ~ primary }
-> TypedExpression {
"commands": [
{ "define": "unary", "args": { "op": "$1", "operand": "$2" } }
]
}
| Method | Arguments | Creates |
|---|---|---|
let_stmt |
name, init, is_const, type? |
Variable declaration |
return_stmt |
value? |
Return statement |
if |
condition, then_branch, else_branch? |
If statement |
while |
condition, body
|
While loop |
for |
iterable, binding, body
|
For loop |
expression_stmt |
expr |
Expression statement |
assignment |
target, value
|
Assignment |
break |
Break statement | |
continue |
Continue statement |
if_else = { "if" ~ "(" ~ expr ~ ")" ~ block ~ "else" ~ block }
-> TypedStatement {
"commands": [
{ "define": "if", "args": {
"condition": "$1",
"then_branch": "$2",
"else_branch": "$3"
}}
]
}
| Method | Arguments | Creates |
|---|---|---|
block |
statements |
Statement block |
program |
declarations |
Program root |
block = { "{" ~ statement* ~ "}" }
-> TypedBlock {
"get_all_children": true,
"define": "block",
"args": { "statements": "$result" }
}
| Method | Arguments | Creates |
|---|---|---|
function |
name, params, return_type, body
|
Function declaration |
struct |
name, fields
|
Struct declaration |
enum |
name, variants
|
Enum declaration |
param |
name, type
|
Function parameter |
field |
name, type
|
Struct field |
variant |
name |
Enum variant |
fn_decl = { "fn" ~ identifier ~ "(" ~ fn_params ~ ")" ~ type_expr ~ block }
-> TypedDeclaration {
"commands": [
{ "define": "function", "args": {
"name": "$1",
"params": "$2",
"return_type": "$3",
"body": "$4"
}}
]
}
| Method | Arguments | Creates |
|---|---|---|
struct_init |
type_name, fields
|
Struct literal |
struct_field_init |
name, value
|
Field initializer |
array_literal |
elements |
Array literal |
struct_init = { identifier ~ "{" ~ struct_init_fields? ~ "}" }
-> TypedExpression {
"commands": [
{ "define": "struct_init", "args": { "type_name": "$1", "fields": "$2" } }
]
}
| Method | Arguments | Creates |
|---|---|---|
primitive_type |
name |
Primitive type (i32, bool, etc.) |
pointer_type |
pointee |
Pointer type (*T) |
optional_type |
inner |
Optional type (?T) |
array_type |
size, element
|
Array type ([N]T) |
primitive_type = { "i32" | "i64" | "bool" | "void" }
-> Type {
"get_text": true,
"define": "primitive_type",
"args": { "name": "$result" }
}
These commands create pattern nodes for switch expressions and pattern matching:
| Method | Arguments | Creates |
|---|---|---|
literal_pattern |
value |
Match a literal value (int, string) |
wildcard_pattern |
Match anything (_ or else) |
|
range_pattern |
start, end, inclusive
|
Match a range (1..10) |
identifier_pattern |
name |
Bind matched value to variable |
struct_pattern |
name, fields
|
Match struct with field patterns |
field_pattern |
name, pattern? |
Match a struct field |
enum_pattern |
name, variant, fields
|
Match enum/tagged union variant |
array_pattern |
elements |
Match array elements |
pointer_pattern |
inner, mutable
|
Match pointer dereference |
error_pattern |
name |
Match error value (error.OutOfMemory) |
switch_expr |
scrutinee, cases
|
Switch expression |
switch_case |
pattern, body
|
Single switch case arm |
// Literal pattern: match exact value
switch_literal_pattern = { integer_literal }
-> TypedExpression {
"commands": [
{ "define": "literal_pattern", "args": { "value": "$1" } }
]
}
// Wildcard pattern: match anything
switch_wildcard_pattern = { "_" }
-> TypedExpression {
"commands": [
{ "define": "wildcard_pattern" }
]
}
// Range pattern: match value in range
switch_range_pattern = { integer_literal ~ ".." ~ integer_literal }
-> TypedExpression {
"commands": [
{ "define": "range_pattern", "args": {
"start": { "define": "literal_pattern", "args": { "value": "$1" } },
"end": { "define": "literal_pattern", "args": { "value": "$2" } },
"inclusive": false
}}
]
}
// Struct pattern: match struct fields
switch_struct_pattern = { identifier ~ "{" ~ struct_field_patterns? ~ "}" }
-> TypedExpression {
"commands": [
{ "define": "struct_pattern", "args": {
"name": { "text": "$1" },
"fields": "$2"
}}
]
}
// Tagged union pattern: .some, .none
switch_tagged_union_pattern = { "." ~ identifier }
-> TypedExpression {
"commands": [
{ "define": "enum_pattern", "args": {
"name": "",
"variant": { "text": "$1" },
"fields": []
}}
]
}
// Error pattern: error.OutOfMemory
switch_error_pattern = { "error" ~ "." ~ identifier }
-> TypedExpression {
"commands": [
{ "define": "error_pattern", "args": {
"name": { "text": "$1" }
}}
]
}
// Pointer pattern: *x
switch_pointer_pattern = { "*" ~ switch_pattern }
-> TypedExpression {
"commands": [
{ "define": "pointer_pattern", "args": {
"inner": "$1",
"mutable": false
}}
]
}
This special command builds left-associative binary expression trees from repetition patterns.
Given input 1 + 2 + 3, we want:
+
/ \
+ 3
/ \
1 2
Not:
+
/ \
1 +
/ \
2 3
addition = { term ~ ((add_op | sub_op) ~ term)* }
-> TypedExpression {
"fold_binary": { "operand": "term", "operator": "add_op|sub_op" }
}
Parameters:
-
operand: Name of the operand rule -
operator: Operator rules (pipe-separated for multiple)
For input 1 + 2 - 3:
- Parse produces:
[term(1), add_op(+), term(2), sub_op(-), term(3)] - Fold starts with first term:
result = 1 - Process pairs:
result = binary(+, result, 2)→(1 + 2) - Continue:
result = binary(-, result, 3)→((1 + 2) - 3)
Handle different operators at the same precedence level:
comparison = { addition ~ ((eq_op | neq_op | lt_op | gt_op) ~ addition)* }
-> TypedExpression {
"fold_binary": { "operand": "addition", "operator": "eq_op|neq_op|lt_op|gt_op" }
}
Use commands array to execute multiple commands in sequence:
typed_var_decl = { "const" ~ identifier ~ ":" ~ type_expr ~ "=" ~ expr ~ ";" }
-> TypedDeclaration {
"commands": [
{ "define": "let_stmt", "args": {
"name": "$1",
"type": "$2",
"init": "$3",
"is_const": true
}}
]
}
This command handles left-associative binary expressions where operators are interleaved with operands in the child list.
While fold_binary expects named rules for operands and operators, fold_left_ops works with children in a flat pattern: [operand, operator, operand, operator, operand, ...]
// Multiplication with operators in child list
multiplicative_expr = { unary_expr ~ (multiplicative_op ~ unary_expr)* }
-> TypedExpression {
"get_all_children": true,
"fold_left_ops": true
}
multiplicative_op = { "*" | "/" | "%" }
-> String {
"get_text": true
}
For input 2 * 3 / 4:
- Parse produces:
[unary(2), "*", unary(3), "/", unary(4)] - Start with first operand:
result = 2 - Process pairs:
result = binary(*, result, 3)→(2 * 3) - Continue:
result = binary(/, result, 4)→((2 * 3) / 4)
- Requires
get_all_children: truebeforefold_left_ops - Operators should be extracted as strings (use
get_text: true) - Automatically unwraps nested single-element lists
This command handles optional unary prefix operators.
When parsing unary expressions like -x or !flag, the unary operator is optional:
-
-42has a unary prefix -
42has no unary prefix
Without apply_unary, you'd need to split into separate rules.
// Unary operators: -, !
unary_expr = { unary_op? ~ postfix_expr }
-> TypedExpression {
"get_all_children": true,
"apply_unary": true
}
unary_op = { "-" | "!" }
-> String {
"get_text": true
}
- Collects all children:
[operator?, operand] - If operator present: creates
unary(op, operand) - If no operator: passes through the operand unchanged
- Unwraps nested single-element lists from expression cascading
This command provides custom left-folding behavior for special operators like the pipe operator.
// Pipe operator: x |> f(args) |> g()
// Transforms: a |> f(b) into f(a, b)
pipe_expr = { or_expr ~ ("|>" ~ pipe_call)* }
-> TypedExpression {
"get_all_children": true,
"fold_left": {
"op": "pipe",
"transform": "prepend_arg"
}
}
pipe_call = { identifier ~ "(" ~ call_args? ~ ")" }
-> TypedExpression {
"commands": [
{ "define": "pipe_target", "args": {
"callee": { "define": "variable", "args": { "name": "$1" } },
"args": "$2"
}}
]
}
-
op: The operator name ("pipe", "||", "&&", etc.) -
transform(optional): Special transformation to apply-
"prepend_arg": For pipe operator, prepends left-hand side to function args
-
For input data |> filter(pred) |> map(fn):
- Parse produces:
[data, pipe_call(filter, [pred]), pipe_call(map, [fn])] - Start with
result = data - Transform:
filter(result, pred)(prepends result to args) - Transform:
map(result2, fn)(prepends to args)
Result is equivalent to: map(filter(data, pred), fn)
For simple left-folding (like || and &&):
or_expr = { and_expr ~ ("||" ~ and_expr)* }
-> TypedExpression {
"get_all_children": true,
"fold_left": { "op": "||" }
}
This command handles postfix operations like function calls, array indexing, and member access.
Postfix expressions chain operations: obj.field[0](arg) needs to fold left-to-right into nested AST nodes.
// Postfix: function calls, indexing, member access
postfix_expr = { primary_expr ~ postfix_op* }
-> TypedExpression {
"get_all_children": true,
"fold_postfix": true
}
postfix_op = { call_op | index_op | member_op }
-> TypedExpression {
"get_child": { "index": 0 }
}
// Function call: f(args)
call_op = { "(" ~ call_args? ~ ")" }
-> TypedExpression {
"get_child": { "index": 0 },
"define": "call_args",
"args": { "args": "$result" }
}
// Indexing: x[i]
index_op = { "[" ~ expr ~ "]" }
-> TypedExpression {
"define": "index",
"args": { "index": "$1" }
}
// Member access: x.field
member_op = { "." ~ identifier }
-> TypedExpression {
"define": "member",
"args": { "field": "$1" }
}
For input arr[0].length:
- Parse produces:
[primary(arr), index_op(0), member_op(length)] - Start with
result = arr - Apply index:
result = index(result, 0) - Apply member:
result = field_access(result, "length")
Each postfix operation is wrapped with a marker so fold_postfix knows how to apply it:
| Marker | Creates |
|---|---|
call_args |
Function call with args |
index |
Array/map index access |
member |
Field/property access |
When defining variables in dynamically-typed or type-inferred languages, use these special type markers:
let_stmt = { "let" ~ identifier ~ "=" ~ expr }
-> TypedStatement {
"commands": [
{ "define": "let_stmt", "args": {
"name": "$1",
"type": { "define": "infer_type" },
"init": "$2",
"is_const": false
}}
]
}
The infer_type define returns a special null marker indicating the type should be inferred from the initializer expression by the type checker.
These aliases also work for type inference:
-
"define": "auto"- C++ style auto type -
"define": "var"- TypeScript/C# style var
When a child might be absent, the builder handles null/missing gracefully:
// expr? produces None if missing
return_stmt = { "return" ~ expr? ~ ";" }
-> TypedStatement {
"commands": [
{ "define": "return_stmt", "args": { "value": "$1" } }
]
}
For complex cases, use separate rules:
// Split into variants to avoid indexing issues
if_stmt = { if_else | if_only }
if_only = { "if" ~ "(" ~ expr ~ ")" ~ block }
-> TypedStatement {
"commands": [
{ "define": "if", "args": {
"condition": "$1",
"then_branch": "$2"
}}
]
}
if_else = { "if" ~ "(" ~ expr ~ ")" ~ block ~ "else" ~ block }
-> TypedStatement {
"commands": [
{ "define": "if", "args": {
"condition": "$1",
"then_branch": "$2",
"else_branch": "$3"
}}
]
}
Sometimes a rule just selects between alternatives without transforming:
// Just pass through the matched child
expr = { logical_or }
-> TypedExpression {
"get_child": { "index": 0 }
}
statement = { if_stmt | while_stmt | return_stmt | expr_stmt }
-> TypedStatement {
"get_child": { "index": 0 }
}
Here's a complete expression grammar with proper operator precedence:
expr = { logical_or }
-> TypedExpression { "get_child": { "index": 0 } }
logical_or = { logical_and ~ (or_op ~ logical_and)* }
-> TypedExpression {
"fold_binary": { "operand": "logical_and", "operator": "or_op" }
}
logical_and = { comparison ~ (and_op ~ comparison)* }
-> TypedExpression {
"fold_binary": { "operand": "comparison", "operator": "and_op" }
}
comparison = { addition ~ ((eq_op | neq_op | lt_op | gt_op) ~ addition)* }
-> TypedExpression {
"fold_binary": { "operand": "addition", "operator": "eq_op|neq_op|lt_op|gt_op" }
}
addition = { multiplication ~ ((add_op | sub_op) ~ multiplication)* }
-> TypedExpression {
"fold_binary": { "operand": "multiplication", "operator": "add_op|sub_op" }
}
multiplication = { unary ~ ((mul_op | div_op) ~ unary)* }
-> TypedExpression {
"fold_binary": { "operand": "unary", "operator": "mul_op|div_op" }
}
unary = { unary_with_op | primary }
-> TypedExpression { "get_child": { "index": 0 } }
unary_with_op = { unary_op ~ primary }
-> TypedExpression {
"commands": [
{ "define": "unary", "args": { "op": "$1", "operand": "$2" } }
]
}
primary = { integer | identifier_expr | paren_expr }
-> TypedExpression { "get_child": { "index": 0 } }
paren_expr = _{ "(" ~ expr ~ ")" }
integer = @{ ASCII_DIGIT+ }
-> TypedExpression {
"get_text": true,
"parse_int": true,
"define": "int_literal",
"args": { "value": "$result" }
}
identifier_expr = { identifier }
-> TypedExpression {
"get_text": true,
"define": "variable",
"args": { "name": "$result" }
}
// Operators
and_op = { "and" } -> String { "get_text": true }
or_op = { "or" } -> String { "get_text": true }
eq_op = { "==" } -> String { "get_text": true }
neq_op = { "!=" } -> String { "get_text": true }
lt_op = { "<" } -> String { "get_text": true }
gt_op = { ">" } -> String { "get_text": true }
add_op = { "+" } -> String { "get_text": true }
sub_op = { "-" } -> String { "get_text": true }
mul_op = { "*" } -> String { "get_text": true }
div_op = { "/" } -> String { "get_text": true }
unary_op = { "-" | "!" } -> String { "get_text": true }