Skip to content

08 Zig Example

github-actions[bot] edited this page Nov 25, 2025 · 3 revisions

Chapter 8: Complete Example - Zig Grammar

This chapter walks through the complete Zig grammar implementation (zig.zyn), explaining each section and the design decisions behind it.

Overview

The Zig grammar supports:

  • Functions with typed parameters
  • Structs and enums
  • Control flow (if, while, for)
  • Expressions with proper operator precedence
  • Type expressions (pointers, optionals, arrays)

File Structure

// 1. Language metadata
@language { ... }

// 2. Program structure
program = { ... }
declarations = { ... }
declaration = { ... }

// 3. Type declarations
struct_decl = { ... }
enum_decl = { ... }

// 4. Function declarations
fn_decl = { ... }

// 5. Statements
statement = { ... }
if_stmt = { ... }
while_stmt = { ... }
// ...

// 6. Expressions (by precedence)
expr = { ... }
logical_or = { ... }
// ... down to atoms

// 7. Literals and identifiers
integer_literal = { ... }
identifier = { ... }

// 8. Operators
add_op = { ... }
// ...

// 9. Whitespace/comments
WHITESPACE = { ... }
COMMENT = { ... }

Language Metadata

@language {
    name: "Zig",
    version: "0.11",
    file_extensions: [".zig"],
    entry_point: "main",
}

This metadata tells the compiler:

  • The language name for error messages
  • Which file extensions to recognize
  • Which function to execute for --run

Program Structure

The Entry Point

program = { SOI ~ declarations ~ EOI }
  -> TypedProgram {
      "get_child": { "index": 0 }
  }

This matches the entire file (SOI to EOI) and extracts the declarations.

Collecting Declarations

declarations = { declaration* }
  -> TypedProgram {
      "get_all_children": true,
      "define": "program",
      "args": { "declarations": "$result" }
  }

Key pattern: get_all_children collects all declaration matches into a list, then define: "program" creates the root node.

Declaration Dispatch

declaration = { struct_decl | enum_decl | fn_decl | const_decl | var_decl }
  -> TypedDeclaration {
      "get_child": { "index": 0 }
  }

Order matters! More specific rules should come first if there's ambiguity.

Struct Declarations

struct_decl = { "const" ~ identifier ~ "=" ~ "struct" ~ "{" ~ struct_fields? ~ "}" ~ ";" }
  -> TypedDeclaration {
      "commands": [
          { "define": "struct", "args": {
              "name": "$1",
              "fields": "$2"
          }}
      ]
  }

Example input:

const Point = struct {
    x: i32,
    y: i32,
};

Child mapping:

  • $1 = identifier → "Point"
  • $2 = struct_fields? → list of fields (or null)

Struct Fields

struct_fields = { struct_field ~ ("," ~ struct_field)* ~ ","? }
  -> List {
      "get_all_children": true
  }

struct_field = { identifier ~ ":" ~ type_expr }
  -> TypedField {
      "commands": [
          { "define": "field", "args": { "name": "$1", "type": "$2" } }
      ]
  }

Pattern: Optional trailing comma (","?) is common in modern languages.

Enum Declarations

enum_decl = { "const" ~ identifier ~ "=" ~ "enum" ~ "{" ~ enum_variants? ~ "}" ~ ";" }
  -> TypedDeclaration {
      "commands": [
          { "define": "enum", "args": {
              "name": "$1",
              "variants": "$2"
          }}
      ]
  }

enum_variants = { enum_variant ~ ("," ~ enum_variant)* ~ ","? }
  -> List {
      "get_all_children": true
  }

enum_variant = { identifier }
  -> TypedVariant {
      "get_text": true,
      "define": "variant",
      "args": { "name": "$result" }
  }

Example:

const Color = enum {
    Red,
    Green,
    Blue,
};

The runtime assigns discriminant values (0, 1, 2) automatically.

Function Declarations

The Split Pattern

fn_decl = { fn_decl_with_params | fn_decl_no_params }
  -> TypedDeclaration {
      "get_child": { "index": 0 }
  }

Why split? PEG doesn't produce placeholder children for missing optionals. With a single rule like:

// PROBLEMATIC
fn_decl = { "fn" ~ identifier ~ "(" ~ fn_params? ~ ")" ~ type_expr ~ block }

If params are missing, $3 would be type_expr, not block. By splitting, each variant has predictable child indices.

With Parameters

fn_decl_with_params = { "fn" ~ identifier ~ "(" ~ fn_params ~ ")" ~ type_expr ~ block }
  -> TypedDeclaration {
      "commands": [
          { "define": "function", "args": {
              "name": "$1",
              "params": "$2",
              "return_type": "$3",
              "body": "$4"
          }}
      ]
  }

Without Parameters

fn_decl_no_params = { "fn" ~ identifier ~ "(" ~ ")" ~ type_expr ~ block }
  -> TypedDeclaration {
      "commands": [
          { "define": "function", "args": {
              "name": "$1",
              "params": [],
              "return_type": "$2",
              "body": "$3"
          }}
      ]
  }

Note: "params": [] provides an empty list literal.

Parameters

fn_params = { fn_param ~ ("," ~ fn_param)* }
  -> List {
      "get_child": { "index": 0 }
  }

fn_param = { identifier ~ ":" ~ type_expr }
  -> TypedParameter {
      "commands": [
          { "define": "param", "args": { "name": "$1", "type": "$2" } }
      ]
  }

Statements

Statement Dispatch

statement = { if_stmt | while_stmt | for_stmt | return_stmt | break_stmt |
              continue_stmt | local_const | local_var | assign_stmt | expr_stmt }
  -> TypedStatement {
      "get_child": { "index": 0 }
  }

Order consideration: if_stmt before expr_stmt because an identifier if_something could otherwise match.

If Statement (Split Pattern Again)

if_stmt = { if_else | if_only }
  -> TypedStatement { "get_child": { "index": 0 } }

if_only = { "if" ~ "(" ~ expr ~ ")" ~ block }
  -> TypedStatement {
      "commands": [
          { "define": "if", "args": {
              "condition": "$1",
              "then_branch": "$2"
          }}
      ]
  }

if_else = { "if" ~ "(" ~ expr ~ ")" ~ block ~ "else" ~ block }
  -> TypedStatement {
      "commands": [
          { "define": "if", "args": {
              "condition": "$1",
              "then_branch": "$2",
              "else_branch": "$3"
          }}
      ]
  }

Important: if_else must come before if_only in the choice, otherwise if_only would always match first!

While Loop

while_stmt = { "while" ~ "(" ~ expr ~ ")" ~ block }
  -> TypedStatement {
      "commands": [
          { "define": "while", "args": {
              "condition": "$1",
              "body": "$2"
          }}
      ]
  }

For Loop (Zig Style)

for_stmt = { "for" ~ "(" ~ expr ~ ")" ~ "|" ~ identifier ~ "|" ~ block }
  -> TypedStatement {
      "commands": [
          { "define": "for", "args": {
              "iterable": "$1",
              "binding": "$2",
              "body": "$3"
          }}
      ]
  }

Zig's for loop: for (slice) |item| { ... }

Return Statement

return_stmt = { "return" ~ expr? ~ ";" }
  -> TypedStatement {
      "commands": [
          { "define": "return_stmt", "args": { "value": "$1" } }
      ]
  }

$1 will be null if expr? doesn't match.

Block

block = { "{" ~ statement* ~ "}" }
  -> TypedBlock {
      "get_all_children": true,
      "define": "block",
      "args": { "statements": "$result" }
  }

Expressions

The Precedence Chain

Operators are handled by a chain from lowest to highest precedence:

expr = { logical_or }
  -> TypedExpression { "get_child": { "index": 0 } }

// Lowest: OR
logical_or = { logical_and ~ (or_op ~ logical_and)* }
  -> TypedExpression {
      "fold_binary": { "operand": "logical_and", "operator": "or_op" }
  }

// AND
logical_and = { comparison ~ (and_op ~ comparison)* }
  -> TypedExpression {
      "fold_binary": { "operand": "comparison", "operator": "and_op" }
  }

// Comparison
comparison = { addition ~ ((eq_op | neq_op | lte_op | gte_op | lt_op | gt_op) ~ addition)* }
  -> TypedExpression {
      "fold_binary": { "operand": "addition", "operator": "eq_op|neq_op|lte_op|gte_op|lt_op|gt_op" }
  }

// Addition/Subtraction
addition = { multiplication ~ ((add_op | sub_op) ~ multiplication)* }
  -> TypedExpression {
      "fold_binary": { "operand": "multiplication", "operator": "add_op|sub_op" }
  }

// Multiplication/Division
multiplication = { unary ~ ((mul_op | div_op) ~ unary)* }
  -> TypedExpression {
      "fold_binary": { "operand": "unary", "operator": "mul_op|div_op" }
  }

// Unary (highest before atoms)
unary = { unary_with_op | primary }
  -> TypedExpression { "get_child": { "index": 0 } }

unary_with_op = { unary_op ~ primary }
  -> TypedExpression {
      "commands": [
          { "define": "unary", "args": { "op": "$1", "operand": "$2" } }
      ]
  }

The fold_binary Pattern

For 1 + 2 + 3:

  1. Parse: [term(1), +, term(2), +, term(3)]
  2. Start: result = 1
  3. Fold: result = binary(+, 1, 2)(1+2)
  4. Fold: result = binary(+, (1+2), 3)((1+2)+3)

This creates left-associative trees automatically.

Postfix Expressions

postfix_expr = { call_expr | field_expr | index_expr | atom }
  -> TypedExpression { "get_child": { "index": 0 } }

// Function call
call_expr = { atom ~ "(" ~ call_args? ~ ")" }
  -> TypedExpression {
      "commands": [
          { "define": "call", "args": { "callee": "$1", "args": "$2" } }
      ]
  }

// Field access
field_expr = { atom ~ "." ~ identifier }
  -> TypedExpression {
      "commands": [
          { "define": "field_access", "args": { "object": "$1", "field": "$2" } }
      ]
  }

// Index access
index_expr = { atom ~ "[" ~ expr ~ "]" }
  -> TypedExpression {
      "commands": [
          { "define": "index", "args": { "object": "$1", "index": "$2" } }
      ]
  }

Atoms (Highest Precedence)

atom = { try_expr | struct_init | array_literal | bool_literal |
         string_literal | integer_literal | identifier_expr | paren_expr }
  -> TypedExpression { "get_child": { "index": 0 } }

Order matters: struct_init (starts with identifier) before identifier_expr.

Struct Initialization

struct_init = { identifier ~ "{" ~ struct_init_fields? ~ "}" }
  -> TypedExpression {
      "commands": [
          { "define": "struct_init", "args": { "type_name": "$1", "fields": "$2" } }
      ]
  }

struct_init_fields = { struct_init_field ~ ("," ~ struct_init_field)* ~ ","? }
  -> List { "get_all_children": true }

struct_init_field = { "." ~ identifier ~ "=" ~ expr }
  -> TypedExpression {
      "commands": [
          { "define": "struct_field_init", "args": { "name": "$1", "value": "$2" } }
      ]
  }

Example: Point{ .x = 10, .y = 20 }

Parenthesized Expressions

paren_expr = _{ "(" ~ expr ~ ")" }

Silent rule (_{ }) - matches but doesn't create a node. The inner expr passes through directly.

Type Expressions

type_expr = { pointer_type | optional_type | error_union_type | array_type |
              primitive_type | identifier }
  -> Type { "get_child": { "index": 0 } }

pointer_type = { "*" ~ "const"? ~ type_expr }
  -> Type {
      "commands": [
          { "define": "pointer_type", "args": { "pointee": "$1" } }
      ]
  }

optional_type = { "?" ~ type_expr }
  -> Type {
      "commands": [
          { "define": "optional_type", "args": { "inner": "$1" } }
      ]
  }

array_type = { "[" ~ integer_literal? ~ "]" ~ type_expr }
  -> Type {
      "commands": [
          { "define": "array_type", "args": { "size": "$1", "element": "$2" } }
      ]
  }

primitive_type = { "i8" | "i16" | "i32" | "i64" | "u8" | "u16" | "u32" | "u64" |
                   "f32" | "f64" | "bool" | "void" }
  -> Type {
      "get_text": true,
      "define": "primitive_type",
      "args": { "name": "$result" }
  }

Identifiers and Keywords

Keyword Protection

keyword = @{
    ("struct" | "enum" | "fn" | "const" | "var" | "if" | "else" | "while" | "for" |
     "return" | "break" | "continue" | "try" | "and" | "or" | "true" | "false" |
     "i8" | "i16" | "i32" | "i64" | "u8" | "u16" | "u32" | "u64" | "f32" | "f64" |
     "bool" | "void")
    ~ !(ASCII_ALPHANUMERIC | "_")
}

identifier = @{ !keyword ~ (ASCII_ALPHA | "_") ~ (ASCII_ALPHANUMERIC | "_")* }
  -> String { "get_text": true }

Key patterns:

  1. ~ !(ASCII_ALPHANUMERIC | "_") ensures "iffy" doesn't match as "if" + "fy"
  2. !keyword prevents identifiers from being keywords
  3. Both are atomic (@{ }) for proper token handling

Operators

Each operator is a separate rule for use with fold_binary:

// Must check longer operators first
lte_op = { "<=" } -> String { "get_text": true }
gte_op = { ">=" } -> String { "get_text": true }
eq_op = { "==" } -> String { "get_text": true }
neq_op = { "!=" } -> String { "get_text": true }
lt_op = { "<" } -> String { "get_text": true }
gt_op = { ">" } -> String { "get_text": true }

add_op = { "+" } -> String { "get_text": true }
sub_op = { "-" } -> String { "get_text": true }
mul_op = { "*" } -> String { "get_text": true }
div_op = { "/" } -> String { "get_text": true }

and_op = { "and" } -> String { "get_text": true }
or_op = { "or" } -> String { "get_text": true }

unary_op = { "-" | "!" } -> String { "get_text": true }

Whitespace and Comments

WHITESPACE = _{ " " | "\t" | "\n" | "\r" }
COMMENT = _{ "//" ~ (!"\n" ~ ANY)* ~ "\n"? }

Both are silent (_{ }) - they match but don't appear in the parse tree.

Testing the Grammar

Simple Function

fn main() i32 {
    return 42;
}
zyntax compile --grammar zig.zyn --source test.zig --run
# Output: result: main() returned: 42

Struct with Field Access

const Point = struct {
    x: i32,
    y: i32,
};

fn main() i32 {
    const p = Point{ .x = 10, .y = 20 };
    return p.x;
}
# Returns: 10

Enum Variants

const Color = enum {
    Red,
    Green,
    Blue,
};

fn main() i32 {
    return Color.Green;
}
# Returns: 1 (Green's discriminant)

Arithmetic Expression

fn main() i32 {
    return 2 + 3 * 4;
}
# Returns: 14 (multiplication before addition)

Common Patterns Summary

Pattern Use Case
Split rules Handle optional children with predictable indices
fold_binary Left-associative binary operators
get_all_children Collect repetitions into lists
Keyword protection Prevent identifiers matching keywords
Silent rules Grouping without AST nodes
Atomic rules Token-level matching

Next Steps

  • Chapter 9: Complete command and API reference
  • Try modifying the grammar to add new features!

Clone this wiki locally