Skip to content

Support Multiple Start Symbols (%start) with Type-Safe Context Wrappers and Duplicate Validation#80

Merged
ehwan merged 4 commits into
mainfrom
multiple_start
Jun 22, 2026
Merged

Support Multiple Start Symbols (%start) with Type-Safe Context Wrappers and Duplicate Validation#80
ehwan merged 4 commits into
mainfrom
multiple_start

Conversation

@ehwan

@ehwan ehwan commented Jun 22, 2026

Copy link
Copy Markdown
Owner

Summary

This Pull Request introduces the capability to specify multiple entry points (%start) in RustyLR grammars. This allows a single parser to support different start symbols (e.g., parsing expressions vs. parsing statements) conflict-free.
Additionally, this PR adds validation to check for duplicate %start symbol declarations at the argument validation phase (ArgError).


Technical Details

1. Core Changes (rusty_lr_core)

  • Virtual Start Symbols: Added the VirtualStart(u32) variant to TerminalSymbol<Term> to represent synthetic/virtual terminals that differentiate entry points in the state machine transition table:
    [
    S' \rightarrow V_0 S_0 \text{ eof} \mid V_1 S_1 \text{ eof} \mid \dots
    ]
  • TerminalClass Mapping: Added from_virtual_start to the TerminalClass trait for mapping virtual starts.
  • Branch-Aware Initialization: Added new_with_branch, with_default_userdata_and_branch, and with_capacity_and_branch constructors to both deterministic and non-deterministic (GLR) parser contexts.

2. Codegen and Parser Generator (rusty_lr_parser)

  • Grammar Augmentation: Reachability analysis, optimization, and augmented rule generation have been updated in grammar.rs to process multiple start rules concurrently.
  • Start Type Enum: Emits a sum-type enum <Grammar>StartType matching the types of all defined start symbols.
  • Branch-Aware pop_start: Implements pop_start in the generated DataStack struct to inspect self.branch_idx and return the correct enum variant corresponding to the matched branch.
  • Type-Safe Context Wrappers: Automatically generates individual context wrappers (e.g. ExprContext, StmtContext) for each start symbol. These wrappers automatically inject the branch-specific virtual terminal on initialization and unwrap <Grammar>StartType to return the exact type of that start symbol inside accept and accept_all.
  • API Compatibility: Implemented compatibility traits (Deref, DerefMut, Clone, Default, Debug, Display) on generated context wrapper structs.

3. Duplicate Start Validation (ArgError)

  • Argument Level Checks: Added duplicate detection inside Grammar::arg_check_error using ArgError::DuplicateStartSymbol.
  • Diagnostics Reporting: Added high-fidelity compiler diagnostics in rusty_lr_buildscript to point out the duplicate start symbol's exact location in the source grammar file.

@ehwan ehwan self-assigned this Jun 22, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for multiple start symbols in RustyLR, allowing multiple %start directives to generate individual context wrapper structs for each entry point. While this is a valuable feature, the code review identified critical issues that must be addressed. Specifically, the generated accept methods in both GLR and deterministic context wrappers incorrectly handle errors with _ => unreachable!(), causing panics on syntax errors instead of returning them. Additionally, with_capacity_and_branch fails to set the branch_idx on the data_stack, which will lead to incorrect parsing results or panics for any start symbol other than the first one.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +131 to +135
pub fn accept(self) -> Result<(#s_ruletype, <#data_stack_typename as #module_prefix::parser::data_stack::DataStack>::UserData), #parse_error_typename> {
match self.inner.accept() {
Ok((#start_type_enum_name::#start_rule_ident(val), userdata)) => Ok((val, userdata)),
_ => unreachable!(),
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

In the generated GLR context wrapper, the accept method matches any non-successful result (including Err(ParseError)) with _ => unreachable!(). This will cause the parser to panic on any syntax/parse error instead of returning the error to the caller.

                            pub fn accept(self) -> Result<(#s_ruletype, <#data_stack_typename as #module_prefix::parser::data_stack::DataStack>::UserData), #parse_error_typename> {
                                match self.inner.accept() {
                                    Ok((#start_type_enum_name::#start_rule_ident(val), userdata)) => Ok((val, userdata)),
                                    Ok(_) => unreachable!(),
                                    Err(err) => Err(err),
                                }
                            }

Comment on lines +255 to +259
pub fn accept(self) -> Result<(#s_ruletype, <#data_stack_typename as #module_prefix::parser::data_stack::DataStack>::UserData), #parse_error_typename> {
match self.inner.accept() {
Ok((#start_type_enum_name::#start_rule_ident(val), userdata)) => Ok((val, userdata)),
_ => unreachable!(),
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

In the generated deterministic context wrapper, the accept method matches any non-successful result (including Err(ParseError)) with _ => unreachable!(). This will cause the parser to panic on any syntax/parse error instead of returning the error to the caller.

                            pub fn accept(self) -> Result<(#s_ruletype, <#data_stack_typename as #module_prefix::parser::data_stack::DataStack>::UserData), #parse_error_typename> {
                                match self.inner.accept() {
                                    Ok((#start_type_enum_name::#start_rule_ident(val), userdata)) => Ok((val, userdata)),
                                    Ok(_) => unreachable!(),
                                    Err(err) => Err(err),
                                }
                            }

Comment on lines +139 to +140
let mut ctx = Self::with_capacity(capacity, userdata);
let class = P::TermClass::from_virtual_start(branch_idx);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In with_capacity_and_branch, the branch_idx is not set on the data_stack. This will cause the parser to use the default branch index (0) when popping the start symbol upon acceptance, leading to incorrect parsing results or panics for any branch other than 0.

        let mut ctx = Self::with_capacity(capacity, userdata);
        ctx.data_stack.set_branch_idx(branch_idx);
        let class = P::TermClass::from_virtual_start(branch_idx);

@ehwan ehwan merged commit 2e97b79 into main Jun 22, 2026
1 check passed
@ehwan ehwan deleted the multiple_start branch June 22, 2026 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant