Add no-argument String.split whitespace tokenization by ATX24 · Pull Request #3677 · BoundaryML/baml

ATX24 · 2026-06-04T01:37:57Z

The error

Calling String.split() without a separator currently fails arity validation instead of tokenizing whitespace:

function ReverseWords(s: string) -> string {
  let tokens = s.split()
  let reversed = tokens | reverse()
  reversed | join(" ")
}

Current diagnostic from engine/baml-compiler/src/thir/typecheck.rs:

Function baml.String.split expects 2 arguments, got 0

The explicit-separator workaround also produces incorrect tokens for mixed whitespace:

function Tokens() -> string[] {
  let s = " \thello  \nworld\r\nBAML "
  s.split(" ")
}

It only splits literal spaces, leaving tab/newline/carriage-return whitespace attached and preserving empty fields from repeated or leading/trailing spaces.

Root cause

engine/baml-compiler/src/thir/typecheck.rs registered baml.String.split as a two-parameter native signature: receiver plus explicit separator, and method-call arity validation rejected receiver-only calls.
engine/baml-vm/src/vm.rs enforced each native function's fixed arity before dispatch, so a receiver-only bytecode call to baml.String.split could not reach the native implementation.
engine/baml-vm/src/native.rs::string_split always read args[1] as the delimiter and used Rust str::split.
engine/baml-compiler/src/thir/interpret.rs::evaluate_method_call also required exactly one delimiter argument.

The fix

Allow s.split() through THIR typechecking while preserving s.split(separator) typechecking.
Allow the VM to dispatch baml.String.split with one VM argument only for the receiver-only no-arg form.
Update string_split to use Rust split_whitespace() when called with only the receiver, which collapses contiguous whitespace and omits empty tokens; explicit separators continue to use str::split.
Update the direct THIR interpreter path to support both zero and one split argument.
Add Rust tests for no-argument whitespace splitting and typechecking.

Verification

Passing commands:

$ mise exec -- cargo test -p baml-vm string_split --test strings -- --nocapture && mise exec -- cargo test -p baml-compiler typecheck_string_split_without_separator --lib -- --nocapture
...
test result: ok

$ mise exec -- cargo test --lib
...
test result: ok

$ mise exec -- cargo test --features skip-integ-tests
...
test result: ok

The same reproduction now succeeds via the VM test added in engine/baml-vm/tests/strings.rs:

function main() -> string[] {
  let s = " \thello  \nworld\r\nBAML "
  s.split()
}

Expected and now-passing output:

["hello", "world", "BAML"]

Full language integration runner status:

$ mise exec -- ./run-tests.sh

This reached the TypeScript integration phase and then blocked on an interactive Infisical login prompt:

No valid login session found, triggering login flow
? Select your hosting option:

Running the TypeScript integration tests directly without Infisical also failed due missing/invalid provider credentials, not this code change:

$ pnpm test -- --silent false --testTimeout 60000
...
LLM client 'GPT35' requires environment variable 'OPENAI_API_KEY' to be set but it is not
LLM client 'Sonnet' requires environment variable 'ANTHROPIC_API_KEY' to be set but it is not
LLM client 'Gemini' requires environment variable 'GOOGLE_API_KEY' to be set but it is not
Request failed with status code: 401 Unauthorized ... invalid_api_key
Test Suites: 34 failed, 8 passed, 42 total
Tests: 155 failed, 81 passed, 236 total

Issue Reference

This PR fixes/closes #[issue number]

Changes

Implemented no-argument String.split() for whitespace tokenization while preserving explicit separator behavior.

Testing

Unit tests added/updated
Manual testing performed through focused VM/typechecker tests
Tested in Cursor Cloud Linux environment

Screenshots

Not applicable.

PR Checklist

I have read and followed the contributing guidelines
My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings

Additional Notes

The full integration suite requires authenticated provider credentials or a valid Infisical session in this environment.

Co-authored-by: Dhilan Shah <[email protected]>

vercel · 2026-06-04T01:38:03Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
beps	Ready	Preview, Comment	Jun 4, 2026 1:59am
promptfiddle2	Ready	Preview, Comment	Jun 4, 2026 1:59am

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
promptfiddle	Skipped		Jun 4, 2026 1:59am

coderabbitai · 2026-06-04T01:38:05Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7993b926-e6d4-4b5e-9b5c-970aabd7dbc4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch cursor/no-arg-string-split-8a66

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-04T01:38:42Z

⏭️ Performance benchmarks were skipped

Perf benchmarks (CodSpeed) are opt-in on pull requests — they no longer run on every push. They always run automatically after merge to canary/main.

To run them on this PR, do any of the following, then push a commit (or re-run CI):

Add RUN_CODSPEED=1 to the PR description, or
Include run-perf or /perf in the PR title or any commit message.

Co-authored-by: Dhilan Shah <[email protected]>

Add whitespace String.split overload

0be234e

Co-authored-by: Dhilan Shah <[email protected]>

vercel Bot deployed to Preview – beps June 4, 2026 01:38 View deployment

vercel Bot deployed to Preview – promptfiddle2 June 4, 2026 01:43 View deployment

Avoid extra native global for split overload

220486b

Co-authored-by: Dhilan Shah <[email protected]>

vercel Bot temporarily deployed to Preview – promptfiddle June 4, 2026 01:52 Inactive

vercel Bot deployed to Preview – beps June 4, 2026 01:53 View deployment

vercel Bot deployed to Preview – promptfiddle2 June 4, 2026 01:59 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add no-argument String.split whitespace tokenization#3677

Add no-argument String.split whitespace tokenization#3677
ATX24 wants to merge 2 commits into
canaryfrom
cursor/no-arg-string-split-8a66

ATX24 commented Jun 4, 2026 •

edited by cursor Bot

Loading

Uh oh!

vercel Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Review skipped

Uh oh!

github-actions Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ATX24 commented Jun 4, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The error

Root cause

The fix

Verification

Issue Reference

Changes

Testing

Screenshots

PR Checklist

Additional Notes

Uh oh!

vercel Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

github-actions Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⏭️ Performance benchmarks were skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ATX24 commented Jun 4, 2026 •

edited by cursor Bot

Loading

vercel Bot commented Jun 4, 2026 •

edited

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

github-actions Bot commented Jun 4, 2026 •

edited

Loading