08 Zig Example

Chapter 8: Complete Example - Zig Grammar

This chapter walks through the complete Zig grammar implementation (zig.zyn), explaining each section and the design decisions behind it.

Overview

The Zig grammar supports:

Functions with typed parameters
Structs and enums
Control flow (if, while, for)
Expressions with proper operator precedence
Type expressions (pointers, optionals, arrays)

File Structure

// 1. Language metadata
@language { ... }

// 2. Program structure
program = { ... }
declarations = { ... }
declaration = { ... }

// 3. Type declarations
struct_decl = { ... }
enum_decl = { ... }

// 4. Function declarations
fn_decl = { ... }

// 5. Statements
statement = { ... }
if_stmt = { ... }
while_stmt = { ... }
// ...

// 6. Expressions (by precedence)
expr = { ... }
logical_or = { ... }
// ... down to atoms

// 7. Literals and identifiers
integer_literal = { ... }
identifier = { ... }

// 8. Operators
add_op = { ... }
// ...

// 9. Whitespace/comments
WHITESPACE = { ... }
COMMENT = { ... }

Language Metadata

@language {
    name: "Zig",
    version: "0.11",
    file_extensions: [".zig"],
    entry_point: "main",
}

This metadata tells the compiler:

The language name for error messages
Which file extensions to recognize
Which function to execute for --run

Program Structure

The Entry Point

program = { SOI ~ declarations ~ EOI }
  -> TypedProgram {
      "get_child": { "index": 0 }
  }

This matches the entire file (SOI to EOI) and extracts the declarations.

Collecting Declarations

declarations = { declaration* }
  -> TypedProgram {
      "get_all_children": true,
      "define": "program",
      "args": { "declarations": "$result" }
  }

Key pattern: get_all_children collects all declaration matches into a list, then define: "program" creates the root node.

Declaration Dispatch

declaration = { struct_decl | enum_decl | fn_decl | const_decl | var_decl }
  -> TypedDeclaration {
      "get_child": { "index": 0 }
  }

Order matters! More specific rules should come first if there's ambiguity.

Struct Declarations

struct_decl = { "const" ~ identifier ~ "=" ~ "struct" ~ "{" ~ struct_fields? ~ "}" ~ ";" }
  -> TypedDeclaration {
      "commands": [
          { "define": "struct", "args": {
              "name": "$1",
              "fields": "$2"
          }}
      ]
  }

Example input:

const Point = struct {
    x: i32,
    y: i32,
};

Child mapping:

$1 = identifier → "Point"
$2 = struct_fields? → list of fields (or null)

Struct Fields

struct_fields = { struct_field ~ ("," ~ struct_field)* ~ ","? }
  -> List {
      "get_all_children": true
  }

struct_field = { identifier ~ ":" ~ type_expr }
  -> TypedField {
      "commands": [
          { "define": "field", "args": { "name": "$1", "type": "$2" } }
      ]
  }

Pattern: Optional trailing comma (","?) is common in modern languages.

Enum Declarations

enum_decl = { "const" ~ identifier ~ "=" ~ "enum" ~ "{" ~ enum_variants? ~ "}" ~ ";" }
  -> TypedDeclaration {
      "commands": [
          { "define": "enum", "args": {
              "name": "$1",
              "variants": "$2"
          }}
      ]
  }

enum_variants = { enum_variant ~ ("," ~ enum_variant)* ~ ","? }
  -> List {
      "get_all_children": true
  }

enum_variant = { identifier }
  -> TypedVariant {
      "get_text": true,
      "define": "variant",
      "args": { "name": "$result" }
  }

Example:

const Color = enum {
    Red,
    Green,
    Blue,
};

The runtime assigns discriminant values (0, 1, 2) automatically.

Function Declarations

The Split Pattern

fn_decl = { fn_decl_with_params | fn_decl_no_params }
  -> TypedDeclaration {
      "get_child": { "index": 0 }
  }

Why split? PEG doesn't produce placeholder children for missing optionals. With a single rule like:

// PROBLEMATIC
fn_decl = { "fn" ~ identifier ~ "(" ~ fn_params? ~ ")" ~ type_expr ~ block }

If params are missing, $3 would be type_expr, not block. By splitting, each variant has predictable child indices.

With Parameters

fn_decl_with_params = { "fn" ~ identifier ~ "(" ~ fn_params ~ ")" ~ type_expr ~ block }
  -> TypedDeclaration {
      "commands": [
          { "define": "function", "args": {
              "name": "$1",
              "params": "$2",
              "return_type": "$3",
              "body": "$4"
          }}
      ]
  }

Without Parameters

fn_decl_no_params = { "fn" ~ identifier ~ "(" ~ ")" ~ type_expr ~ block }
  -> TypedDeclaration {
      "commands": [
          { "define": "function", "args": {
              "name": "$1",
              "params": [],
              "return_type": "$2",
              "body": "$3"
          }}
      ]
  }

Note: "params": [] provides an empty list literal.

Parameters

fn_params = { fn_param ~ ("," ~ fn_param)* }
  -> List {
      "get_child": { "index": 0 }
  }

fn_param = { identifier ~ ":" ~ type_expr }
  -> TypedParameter {
      "commands": [
          { "define": "param", "args": { "name": "$1", "type": "$2" } }
      ]
  }

Statements

Statement Dispatch

statement = { if_stmt | while_stmt | for_stmt | return_stmt | break_stmt |
              continue_stmt | local_const | local_var | assign_stmt | expr_stmt }
  -> TypedStatement {
      "get_child": { "index": 0 }
  }

Order consideration: if_stmt before expr_stmt because an identifier if_something could otherwise match.

If Statement (Split Pattern Again)

if_stmt = { if_else | if_only }
  -> TypedStatement { "get_child": { "index": 0 } }

if_only = { "if" ~ "(" ~ expr ~ ")" ~ block }
  -> TypedStatement {
      "commands": [
          { "define": "if", "args": {
              "condition": "$1",
              "then_branch": "$2"
          }}
      ]
  }

if_else = { "if" ~ "(" ~ expr ~ ")" ~ block ~ "else" ~ block }
  -> TypedStatement {
      "commands": [
          { "define": "if", "args": {
              "condition": "$1",
              "then_branch": "$2",
              "else_branch": "$3"
          }}
      ]
  }

Important: if_else must come before if_only in the choice, otherwise if_only would always match first!

While Loop

while_stmt = { "while" ~ "(" ~ expr ~ ")" ~ block }
  -> TypedStatement {
      "commands": [
          { "define": "while", "args": {
              "condition": "$1",
              "body": "$2"
          }}
      ]
  }

For Loop (Zig Style)

for_stmt = { "for" ~ "(" ~ expr ~ ")" ~ "|" ~ identifier ~ "|" ~ block }
  -> TypedStatement {
      "commands": [
          { "define": "for", "args": {
              "iterable": "$1",
              "binding": "$2",
              "body": "$3"
          }}
      ]
  }

Zig's for loop: for (slice) |item| { ... }

Return Statement

return_stmt = { "return" ~ expr? ~ ";" }
  -> TypedStatement {
      "commands": [
          { "define": "return_stmt", "args": { "value": "$1" } }
      ]
  }

$1 will be null if expr? doesn't match.

Block

block = { "{" ~ statement* ~ "}" }
  -> TypedBlock {
      "get_all_children": true,
      "define": "block",
      "args": { "statements": "$result" }
  }

Expressions

The Precedence Chain

Operators are handled by a chain from lowest to highest precedence:

expr = { logical_or }
  -> TypedExpression { "get_child": { "index": 0 } }

// Lowest: OR
logical_or = { logical_and ~ (or_op ~ logical_and)* }
  -> TypedExpression {
      "fold_binary": { "operand": "logical_and", "operator": "or_op" }
  }

// AND
logical_and = { comparison ~ (and_op ~ comparison)* }
  -> TypedExpression {
      "fold_binary": { "operand": "comparison", "operator": "and_op" }
  }

// Comparison
comparison = { addition ~ ((eq_op | neq_op | lte_op | gte_op | lt_op | gt_op) ~ addition)* }
  -> TypedExpression {
      "fold_binary": { "operand": "addition", "operator": "eq_op|neq_op|lte_op|gte_op|lt_op|gt_op" }
  }

// Addition/Subtraction
addition = { multiplication ~ ((add_op | sub_op) ~ multiplication)* }
  -> TypedExpression {
      "fold_binary": { "operand": "multiplication", "operator": "add_op|sub_op" }
  }

// Multiplication/Division
multiplication = { unary ~ ((mul_op | div_op) ~ unary)* }
  -> TypedExpression {
      "fold_binary": { "operand": "unary", "operator": "mul_op|div_op" }
  }

// Unary (highest before atoms)
unary = { unary_with_op | primary }
  -> TypedExpression { "get_child": { "index": 0 } }

unary_with_op = { unary_op ~ primary }
  -> TypedExpression {
      "commands": [
          { "define": "unary", "args": { "op": "$1", "operand": "$2" } }
      ]
  }

The `fold_binary` Pattern

For 1 + 2 + 3:

Parse: [term(1), +, term(2), +, term(3)]
Start: result = 1
Fold: result = binary(+, 1, 2) → (1+2)
Fold: result = binary(+, (1+2), 3) → ((1+2)+3)

This creates left-associative trees automatically.

Postfix Expressions

postfix_expr = { call_expr | field_expr | index_expr | atom }
  -> TypedExpression { "get_child": { "index": 0 } }

// Function call
call_expr = { atom ~ "(" ~ call_args? ~ ")" }
  -> TypedExpression {
      "commands": [
          { "define": "call", "args": { "callee": "$1", "args": "$2" } }
      ]
  }

// Field access
field_expr = { atom ~ "." ~ identifier }
  -> TypedExpression {
      "commands": [
          { "define": "field_access", "args": { "object": "$1", "field": "$2" } }
      ]
  }

// Index access
index_expr = { atom ~ "[" ~ expr ~ "]" }
  -> TypedExpression {
      "commands": [
          { "define": "index", "args": { "object": "$1", "index": "$2" } }
      ]
  }

Atoms (Highest Precedence)

atom = { try_expr | struct_init | array_literal | bool_literal |
         string_literal | integer_literal | identifier_expr | paren_expr }
  -> TypedExpression { "get_child": { "index": 0 } }

Order matters: struct_init (starts with identifier) before identifier_expr.

Struct Initialization

struct_init = { identifier ~ "{" ~ struct_init_fields? ~ "}" }
  -> TypedExpression {
      "commands": [
          { "define": "struct_init", "args": { "type_name": "$1", "fields": "$2" } }
      ]
  }

struct_init_fields = { struct_init_field ~ ("," ~ struct_init_field)* ~ ","? }
  -> List { "get_all_children": true }

struct_init_field = { "." ~ identifier ~ "=" ~ expr }
  -> TypedExpression {
      "commands": [
          { "define": "struct_field_init", "args": { "name": "$1", "value": "$2" } }
      ]
  }

Example: Point{ .x = 10, .y = 20 }

Parenthesized Expressions

paren_expr = _{ "(" ~ expr ~ ")" }

Silent rule (_{ }) - matches but doesn't create a node. The inner expr passes through directly.

Type Expressions

type_expr = { pointer_type | optional_type | error_union_type | array_type |
              primitive_type | identifier }
  -> Type { "get_child": { "index": 0 } }

pointer_type = { "*" ~ "const"? ~ type_expr }
  -> Type {
      "commands": [
          { "define": "pointer_type", "args": { "pointee": "$1" } }
      ]
  }

optional_type = { "?" ~ type_expr }
  -> Type {
      "commands": [
          { "define": "optional_type", "args": { "inner": "$1" } }
      ]
  }

array_type = { "[" ~ integer_literal? ~ "]" ~ type_expr }
  -> Type {
      "commands": [
          { "define": "array_type", "args": { "size": "$1", "element": "$2" } }
      ]
  }

primitive_type = { "i8" | "i16" | "i32" | "i64" | "u8" | "u16" | "u32" | "u64" |
                   "f32" | "f64" | "bool" | "void" }
  -> Type {
      "get_text": true,
      "define": "primitive_type",
      "args": { "name": "$result" }
  }

Identifiers and Keywords

Keyword Protection

keyword = @{
    ("struct" | "enum" | "fn" | "const" | "var" | "if" | "else" | "while" | "for" |
     "return" | "break" | "continue" | "try" | "and" | "or" | "true" | "false" |
     "i8" | "i16" | "i32" | "i64" | "u8" | "u16" | "u32" | "u64" | "f32" | "f64" |
     "bool" | "void")
    ~ !(ASCII_ALPHANUMERIC | "_")
}

identifier = @{ !keyword ~ (ASCII_ALPHA | "_") ~ (ASCII_ALPHANUMERIC | "_")* }
  -> String { "get_text": true }

Key patterns:

~ !(ASCII_ALPHANUMERIC | "_") ensures "iffy" doesn't match as "if" + "fy"
!keyword prevents identifiers from being keywords
Both are atomic (@{ }) for proper token handling

Operators

Each operator is a separate rule for use with fold_binary:

// Must check longer operators first
lte_op = { "<=" } -> String { "get_text": true }
gte_op = { ">=" } -> String { "get_text": true }
eq_op = { "==" } -> String { "get_text": true }
neq_op = { "!=" } -> String { "get_text": true }
lt_op = { "<" } -> String { "get_text": true }
gt_op = { ">" } -> String { "get_text": true }

add_op = { "+" } -> String { "get_text": true }
sub_op = { "-" } -> String { "get_text": true }
mul_op = { "*" } -> String { "get_text": true }
div_op = { "/" } -> String { "get_text": true }

and_op = { "and" } -> String { "get_text": true }
or_op = { "or" } -> String { "get_text": true }

unary_op = { "-" | "!" } -> String { "get_text": true }

Whitespace and Comments

WHITESPACE = _{ " " | "\t" | "\n" | "\r" }
COMMENT = _{ "//" ~ (!"\n" ~ ANY)* ~ "\n"? }

Both are silent (_{ }) - they match but don't appear in the parse tree.

Testing the Grammar

Simple Function

fn main() i32 {
    return 42;
}

zyntax compile --grammar zig.zyn --source test.zig --run
# Output: result: main() returned: 42

Struct with Field Access

const Point = struct {
    x: i32,
    y: i32,
};

fn main() i32 {
    const p = Point{ .x = 10, .y = 20 };
    return p.x;
}

# Returns: 10

Enum Variants

const Color = enum {
    Red,
    Green,
    Blue,
};

fn main() i32 {
    return Color.Green;
}

# Returns: 1 (Green's discriminant)

Arithmetic Expression

fn main() i32 {
    return 2 + 3 * 4;
}

# Returns: 14 (multiplication before addition)

Common Patterns Summary

Pattern	Use Case
Split rules	Handle optional children with predictable indices
`fold_binary`	Left-associative binary operators
`get_all_children`	Collect repetitions into lists
Keyword protection	Prevent identifiers matching keywords
Silent rules	Grouping without AST nodes
Atomic rules	Token-level matching

Next Steps

Chapter 9: Complete command and API reference
Try modifying the grammar to add new features!

Uh oh!

08 Zig Example

Chapter 8: Complete Example - Zig Grammar

Overview

File Structure

Language Metadata

Program Structure

The Entry Point

Collecting Declarations

Declaration Dispatch

Struct Declarations

Struct Fields

Enum Declarations

Function Declarations

The Split Pattern

With Parameters

Without Parameters

Parameters

Statements

Statement Dispatch

If Statement (Split Pattern Again)

While Loop

For Loop (Zig Style)

Return Statement

Block

Expressions

The Precedence Chain

The fold_binary Pattern

Postfix Expressions

Atoms (Highest Precedence)

Struct Initialization

Parenthesized Expressions

Type Expressions

Identifiers and Keywords

Keyword Protection

Operators

Whitespace and Comments

Testing the Grammar

Simple Function

Struct with Field Access

Enum Variants

Arithmetic Expression

Common Patterns Summary

Next Steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

The `fold_binary` Pattern