Skip to content

05 Semantic Actions

github-actions[bot] edited this page Nov 27, 2025 · 4 revisions

Chapter 5: Semantic Actions

Semantic actions are JSON command blocks that describe how to build TypedAST nodes from parsed syntax. This chapter covers all available commands and patterns.

Basic Structure

Every grammar rule can have a semantic action:

rule_name = { pattern }
  -> ResultType {
      // JSON commands
  }

The ResultType indicates what kind of AST node the rule produces:

  • TypedExpression - Expressions (literals, operations, calls)
  • TypedStatement - Statements (if, while, return)
  • TypedDeclaration - Top-level declarations (functions, structs)
  • TypedBlock - Statement blocks
  • TypedParameter - Function parameters
  • TypedField - Struct fields
  • TypedVariant - Enum variants
  • Type - Type expressions
  • List - Collect multiple children
  • String - Extract text

Child References

Commands can reference parsed children using $N syntax:

// Children are numbered by their position in the parse tree
// $1 = first non-silent child, $2 = second, etc.

binary_expr = { left ~ "+" ~ right }
  -> TypedExpression {
      "commands": [
          { "define": "binary", "args": { "left": "$1", "op": "+", "right": "$2" } }
      ]
  }

Special references:

  • $result - The result of previous commands (like get_text)
  • $1, $2, ... - Child nodes by position

Core Commands

get_text

Extracts the matched text as a string:

identifier = @{ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }
  -> String {
      "get_text": true
  }

parse_int

Parses extracted text as an integer:

integer_literal = @{ "-"? ~ ASCII_DIGIT+ }
  -> TypedExpression {
      "get_text": true,
      "parse_int": true,
      "define": "int_literal",
      "args": { "value": "$result" }
  }

get_child

Gets a specific child by index:

// Get the first child (index 0)
expr = { inner_expr }
  -> TypedExpression {
      "get_child": { "index": 0 }
  }

get_all_children

Collects all children into a list:

statements = { statement* }
  -> List {
      "get_all_children": true
  }

define

Calls an AST builder method with arguments:

return_stmt = { "return" ~ expr? ~ ";" }
  -> TypedStatement {
      "commands": [
          { "define": "return_stmt", "args": { "value": "$1" } }
      ]
  }

The define Command

The define command is the primary way to create AST nodes. It calls a method on the TypedAstBuilder.

Syntax

{ "define": "method_name", "args": { "arg1": "value1", "arg2": "$1" } }

Available Methods

Literals

Method Arguments Creates
int_literal value Integer literal
bool_literal value Boolean literal
string_literal value String literal
char_literal value Character literal
bool_literal = { "true" | "false" }
  -> TypedExpression {
      "get_text": true,
      "define": "bool_literal",
      "args": { "value": "$result" }
  }

Variables and Access

Method Arguments Creates
variable name Variable reference
field_access object, field Field access (obj.field)
index object, index Index access (arr[i])
field_expr = { atom ~ "." ~ identifier }
  -> TypedExpression {
      "commands": [
          { "define": "field_access", "args": { "object": "$1", "field": "$2" } }
      ]
  }

Operations

Method Arguments Creates
binary op, left, right Binary operation
unary op, operand Unary operation
call callee, args Function call
unary_expr = { unary_op ~ primary }
  -> TypedExpression {
      "commands": [
          { "define": "unary", "args": { "op": "$1", "operand": "$2" } }
      ]
  }

Statements

Method Arguments Creates
let_stmt name, init, is_const, type? Variable declaration
return_stmt value? Return statement
if condition, then_branch, else_branch? If statement
while condition, body While loop
for iterable, binding, body For loop
expression_stmt expr Expression statement
assignment target, value Assignment
break Break statement
continue Continue statement
if_else = { "if" ~ "(" ~ expr ~ ")" ~ block ~ "else" ~ block }
  -> TypedStatement {
      "commands": [
          { "define": "if", "args": {
              "condition": "$1",
              "then_branch": "$2",
              "else_branch": "$3"
          }}
      ]
  }

Blocks and Programs

Method Arguments Creates
block statements Statement block
program declarations Program root
block = { "{" ~ statement* ~ "}" }
  -> TypedBlock {
      "get_all_children": true,
      "define": "block",
      "args": { "statements": "$result" }
  }

Declarations

Method Arguments Creates
function name, params, return_type, body Function declaration
struct name, fields Struct declaration
enum name, variants Enum declaration
param name, type Function parameter
field name, type Struct field
variant name Enum variant
fn_decl = { "fn" ~ identifier ~ "(" ~ fn_params ~ ")" ~ type_expr ~ block }
  -> TypedDeclaration {
      "commands": [
          { "define": "function", "args": {
              "name": "$1",
              "params": "$2",
              "return_type": "$3",
              "body": "$4"
          }}
      ]
  }

Structs and Enums

Method Arguments Creates
struct_init type_name, fields Struct literal
struct_field_init name, value Field initializer
array_literal elements Array literal
struct_init = { identifier ~ "{" ~ struct_init_fields? ~ "}" }
  -> TypedExpression {
      "commands": [
          { "define": "struct_init", "args": { "type_name": "$1", "fields": "$2" } }
      ]
  }

Types

Method Arguments Creates
primitive_type name Primitive type (i32, bool, etc.)
pointer_type pointee Pointer type (*T)
optional_type inner Optional type (?T)
array_type size, element Array type ([N]T)
primitive_type = { "i32" | "i64" | "bool" | "void" }
  -> Type {
      "get_text": true,
      "define": "primitive_type",
      "args": { "name": "$result" }
  }

Pattern Matching

These commands create pattern nodes for switch expressions and pattern matching:

Method Arguments Creates
literal_pattern value Match a literal value (int, string)
wildcard_pattern Match anything (_ or else)
range_pattern start, end, inclusive Match a range (1..10)
identifier_pattern name Bind matched value to variable
struct_pattern name, fields Match struct with field patterns
field_pattern name, pattern? Match a struct field
enum_pattern name, variant, fields Match enum/tagged union variant
array_pattern elements Match array elements
pointer_pattern inner, mutable Match pointer dereference
error_pattern name Match error value (error.OutOfMemory)
switch_expr scrutinee, cases Switch expression
switch_case pattern, body Single switch case arm
// Literal pattern: match exact value
switch_literal_pattern = { integer_literal }
  -> TypedExpression {
      "commands": [
          { "define": "literal_pattern", "args": { "value": "$1" } }
      ]
  }

// Wildcard pattern: match anything
switch_wildcard_pattern = { "_" }
  -> TypedExpression {
      "commands": [
          { "define": "wildcard_pattern" }
      ]
  }

// Range pattern: match value in range
switch_range_pattern = { integer_literal ~ ".." ~ integer_literal }
  -> TypedExpression {
      "commands": [
          { "define": "range_pattern", "args": {
              "start": { "define": "literal_pattern", "args": { "value": "$1" } },
              "end": { "define": "literal_pattern", "args": { "value": "$2" } },
              "inclusive": false
          }}
      ]
  }

// Struct pattern: match struct fields
switch_struct_pattern = { identifier ~ "{" ~ struct_field_patterns? ~ "}" }
  -> TypedExpression {
      "commands": [
          { "define": "struct_pattern", "args": {
              "name": { "text": "$1" },
              "fields": "$2"
          }}
      ]
  }

// Tagged union pattern: .some, .none
switch_tagged_union_pattern = { "." ~ identifier }
  -> TypedExpression {
      "commands": [
          { "define": "enum_pattern", "args": {
              "name": "",
              "variant": { "text": "$1" },
              "fields": []
          }}
      ]
  }

// Error pattern: error.OutOfMemory
switch_error_pattern = { "error" ~ "." ~ identifier }
  -> TypedExpression {
      "commands": [
          { "define": "error_pattern", "args": {
              "name": { "text": "$1" }
          }}
      ]
  }

// Pointer pattern: *x
switch_pointer_pattern = { "*" ~ switch_pattern }
  -> TypedExpression {
      "commands": [
          { "define": "pointer_pattern", "args": {
              "inner": "$1",
              "mutable": false
          }}
      ]
  }

The fold_binary Command

This special command builds left-associative binary expression trees from repetition patterns.

Problem It Solves

Given input 1 + 2 + 3, we want:

    +
   / \
  +   3
 / \
1   2

Not:

  +
 / \
1   +
   / \
  2   3

Usage

addition = { term ~ ((add_op | sub_op) ~ term)* }
  -> TypedExpression {
      "fold_binary": { "operand": "term", "operator": "add_op|sub_op" }
  }

Parameters:

  • operand: Name of the operand rule
  • operator: Operator rules (pipe-separated for multiple)

How It Works

For input 1 + 2 - 3:

  1. Parse produces: [term(1), add_op(+), term(2), sub_op(-), term(3)]
  2. Fold starts with first term: result = 1
  3. Process pairs: result = binary(+, result, 2)(1 + 2)
  4. Continue: result = binary(-, result, 3)((1 + 2) - 3)

Multiple Operators

Handle different operators at the same precedence level:

comparison = { addition ~ ((eq_op | neq_op | lt_op | gt_op) ~ addition)* }
  -> TypedExpression {
      "fold_binary": { "operand": "addition", "operator": "eq_op|neq_op|lt_op|gt_op" }
  }

Command Sequences

Use commands array to execute multiple commands in sequence:

typed_var_decl = { "const" ~ identifier ~ ":" ~ type_expr ~ "=" ~ expr ~ ";" }
  -> TypedDeclaration {
      "commands": [
          { "define": "let_stmt", "args": {
              "name": "$1",
              "type": "$2",
              "init": "$3",
              "is_const": true
          }}
      ]
  }

Handling Optional Children

When a child might be absent, the builder handles null/missing gracefully:

// expr? produces None if missing
return_stmt = { "return" ~ expr? ~ ";" }
  -> TypedStatement {
      "commands": [
          { "define": "return_stmt", "args": { "value": "$1" } }
      ]
  }

For complex cases, use separate rules:

// Split into variants to avoid indexing issues
if_stmt = { if_else | if_only }

if_only = { "if" ~ "(" ~ expr ~ ")" ~ block }
  -> TypedStatement {
      "commands": [
          { "define": "if", "args": {
              "condition": "$1",
              "then_branch": "$2"
          }}
      ]
  }

if_else = { "if" ~ "(" ~ expr ~ ")" ~ block ~ "else" ~ block }
  -> TypedStatement {
      "commands": [
          { "define": "if", "args": {
              "condition": "$1",
              "then_branch": "$2",
              "else_branch": "$3"
          }}
      ]
  }

Passthrough Rules

Sometimes a rule just selects between alternatives without transforming:

// Just pass through the matched child
expr = { logical_or }
  -> TypedExpression {
      "get_child": { "index": 0 }
  }

statement = { if_stmt | while_stmt | return_stmt | expr_stmt }
  -> TypedStatement {
      "get_child": { "index": 0 }
  }

Complete Example

Here's a complete expression grammar with proper operator precedence:

expr = { logical_or }
  -> TypedExpression { "get_child": { "index": 0 } }

logical_or = { logical_and ~ (or_op ~ logical_and)* }
  -> TypedExpression {
      "fold_binary": { "operand": "logical_and", "operator": "or_op" }
  }

logical_and = { comparison ~ (and_op ~ comparison)* }
  -> TypedExpression {
      "fold_binary": { "operand": "comparison", "operator": "and_op" }
  }

comparison = { addition ~ ((eq_op | neq_op | lt_op | gt_op) ~ addition)* }
  -> TypedExpression {
      "fold_binary": { "operand": "addition", "operator": "eq_op|neq_op|lt_op|gt_op" }
  }

addition = { multiplication ~ ((add_op | sub_op) ~ multiplication)* }
  -> TypedExpression {
      "fold_binary": { "operand": "multiplication", "operator": "add_op|sub_op" }
  }

multiplication = { unary ~ ((mul_op | div_op) ~ unary)* }
  -> TypedExpression {
      "fold_binary": { "operand": "unary", "operator": "mul_op|div_op" }
  }

unary = { unary_with_op | primary }
  -> TypedExpression { "get_child": { "index": 0 } }

unary_with_op = { unary_op ~ primary }
  -> TypedExpression {
      "commands": [
          { "define": "unary", "args": { "op": "$1", "operand": "$2" } }
      ]
  }

primary = { integer | identifier_expr | paren_expr }
  -> TypedExpression { "get_child": { "index": 0 } }

paren_expr = _{ "(" ~ expr ~ ")" }

integer = @{ ASCII_DIGIT+ }
  -> TypedExpression {
      "get_text": true,
      "parse_int": true,
      "define": "int_literal",
      "args": { "value": "$result" }
  }

identifier_expr = { identifier }
  -> TypedExpression {
      "get_text": true,
      "define": "variable",
      "args": { "name": "$result" }
  }

// Operators
and_op = { "and" } -> String { "get_text": true }
or_op = { "or" } -> String { "get_text": true }
eq_op = { "==" } -> String { "get_text": true }
neq_op = { "!=" } -> String { "get_text": true }
lt_op = { "<" } -> String { "get_text": true }
gt_op = { ">" } -> String { "get_text": true }
add_op = { "+" } -> String { "get_text": true }
sub_op = { "-" } -> String { "get_text": true }
mul_op = { "*" } -> String { "get_text": true }
div_op = { "/" } -> String { "get_text": true }
unary_op = { "-" | "!" } -> String { "get_text": true }

Next Steps

  • Chapter 6: Understand the TypedAST structure these commands create
  • Chapter 7: Use the builder API directly in Rust
  • Chapter 8: See all commands used in a complete grammar

Clone this wiki locally