Support Multiple Start Symbols (%start) with Type-Safe Context Wrappers and Duplicate Validation#80
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for multiple start symbols in RustyLR, allowing multiple %start directives to generate individual context wrapper structs for each entry point. While this is a valuable feature, the code review identified critical issues that must be addressed. Specifically, the generated accept methods in both GLR and deterministic context wrappers incorrectly handle errors with _ => unreachable!(), causing panics on syntax errors instead of returning them. Additionally, with_capacity_and_branch fails to set the branch_idx on the data_stack, which will lead to incorrect parsing results or panics for any start symbol other than the first one.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| pub fn accept(self) -> Result<(#s_ruletype, <#data_stack_typename as #module_prefix::parser::data_stack::DataStack>::UserData), #parse_error_typename> { | ||
| match self.inner.accept() { | ||
| Ok((#start_type_enum_name::#start_rule_ident(val), userdata)) => Ok((val, userdata)), | ||
| _ => unreachable!(), | ||
| } |
There was a problem hiding this comment.
In the generated GLR context wrapper, the accept method matches any non-successful result (including Err(ParseError)) with _ => unreachable!(). This will cause the parser to panic on any syntax/parse error instead of returning the error to the caller.
pub fn accept(self) -> Result<(#s_ruletype, <#data_stack_typename as #module_prefix::parser::data_stack::DataStack>::UserData), #parse_error_typename> {
match self.inner.accept() {
Ok((#start_type_enum_name::#start_rule_ident(val), userdata)) => Ok((val, userdata)),
Ok(_) => unreachable!(),
Err(err) => Err(err),
}
}| pub fn accept(self) -> Result<(#s_ruletype, <#data_stack_typename as #module_prefix::parser::data_stack::DataStack>::UserData), #parse_error_typename> { | ||
| match self.inner.accept() { | ||
| Ok((#start_type_enum_name::#start_rule_ident(val), userdata)) => Ok((val, userdata)), | ||
| _ => unreachable!(), | ||
| } |
There was a problem hiding this comment.
In the generated deterministic context wrapper, the accept method matches any non-successful result (including Err(ParseError)) with _ => unreachable!(). This will cause the parser to panic on any syntax/parse error instead of returning the error to the caller.
pub fn accept(self) -> Result<(#s_ruletype, <#data_stack_typename as #module_prefix::parser::data_stack::DataStack>::UserData), #parse_error_typename> {
match self.inner.accept() {
Ok((#start_type_enum_name::#start_rule_ident(val), userdata)) => Ok((val, userdata)),
Ok(_) => unreachable!(),
Err(err) => Err(err),
}
}| let mut ctx = Self::with_capacity(capacity, userdata); | ||
| let class = P::TermClass::from_virtual_start(branch_idx); |
There was a problem hiding this comment.
In with_capacity_and_branch, the branch_idx is not set on the data_stack. This will cause the parser to use the default branch index (0) when popping the start symbol upon acceptance, leading to incorrect parsing results or panics for any branch other than 0.
let mut ctx = Self::with_capacity(capacity, userdata);
ctx.data_stack.set_branch_idx(branch_idx);
let class = P::TermClass::from_virtual_start(branch_idx);
Summary
This Pull Request introduces the capability to specify multiple entry points (
%start) in RustyLR grammars. This allows a single parser to support different start symbols (e.g., parsing expressions vs. parsing statements) conflict-free.Additionally, this PR adds validation to check for duplicate
%startsymbol declarations at the argument validation phase (ArgError).Technical Details
1. Core Changes (
rusty_lr_core)VirtualStart(u32)variant toTerminalSymbol<Term>to represent synthetic/virtual terminals that differentiate entry points in the state machine transition table:[
S' \rightarrow V_0 S_0 \text{ eof} \mid V_1 S_1 \text{ eof} \mid \dots
]
from_virtual_startto theTerminalClasstrait for mapping virtual starts.new_with_branch,with_default_userdata_and_branch, andwith_capacity_and_branchconstructors to both deterministic and non-deterministic (GLR) parser contexts.2. Codegen and Parser Generator (
rusty_lr_parser)grammar.rsto process multiple start rules concurrently.<Grammar>StartTypematching the types of all defined start symbols.pop_start: Implementspop_startin the generatedDataStackstruct to inspectself.branch_idxand return the correct enum variant corresponding to the matched branch.ExprContext,StmtContext) for each start symbol. These wrappers automatically inject the branch-specific virtual terminal on initialization and unwrap<Grammar>StartTypeto return the exact type of that start symbol insideacceptandaccept_all.Deref,DerefMut,Clone,Default,Debug,Display) on generated context wrapper structs.3. Duplicate Start Validation (
ArgError)Grammar::arg_check_errorusingArgError::DuplicateStartSymbol.rusty_lr_buildscriptto point out the duplicate start symbol's exact location in the source grammar file.