Add no-argument String.split whitespace tokenization#3677
Conversation
Co-authored-by: Dhilan Shah <[email protected]>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
1 Skipped Deployment
|
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
⏭️ Performance benchmarks were skippedPerf benchmarks (CodSpeed) are opt-in on pull requests — they no longer run on every push. They always run automatically after merge to To run them on this PR, do any of the following, then push a commit (or re-run CI):
|
Co-authored-by: Dhilan Shah <[email protected]>
The error
Calling
String.split()without a separator currently fails arity validation instead of tokenizing whitespace:Current diagnostic from
engine/baml-compiler/src/thir/typecheck.rs:The explicit-separator workaround also produces incorrect tokens for mixed whitespace:
It only splits literal spaces, leaving tab/newline/carriage-return whitespace attached and preserving empty fields from repeated or leading/trailing spaces.
Root cause
engine/baml-compiler/src/thir/typecheck.rsregisteredbaml.String.splitas a two-parameter native signature: receiver plus explicit separator, and method-call arity validation rejected receiver-only calls.engine/baml-vm/src/vm.rsenforced each native function's fixed arity before dispatch, so a receiver-only bytecode call tobaml.String.splitcould not reach the native implementation.engine/baml-vm/src/native.rs::string_splitalways readargs[1]as the delimiter and used Ruststr::split.engine/baml-compiler/src/thir/interpret.rs::evaluate_method_callalso required exactly one delimiter argument.The fix
s.split()through THIR typechecking while preservings.split(separator)typechecking.baml.String.splitwith one VM argument only for the receiver-only no-arg form.string_splitto use Rustsplit_whitespace()when called with only the receiver, which collapses contiguous whitespace and omits empty tokens; explicit separators continue to usestr::split.Verification
Passing commands:
The same reproduction now succeeds via the VM test added in
engine/baml-vm/tests/strings.rs:Expected and now-passing output:
Full language integration runner status:
This reached the TypeScript integration phase and then blocked on an interactive Infisical login prompt:
Running the TypeScript integration tests directly without Infisical also failed due missing/invalid provider credentials, not this code change:
Issue Reference
Changes
Implemented no-argument
String.split()for whitespace tokenization while preserving explicit separator behavior.Testing
Screenshots
Not applicable.
PR Checklist
Additional Notes
The full integration suite requires authenticated provider credentials or a valid Infisical session in this environment.