Skip to content

fix: escape multi-character separators before building the cleanup regex#146

Open
spokodev wants to merge 1 commit into
pid:masterfrom
spokodev:fix-multichar-separator-escape
Open

fix: escape multi-character separators before building the cleanup regex#146
spokodev wants to merge 1 commit into
pid:masterfrom
spokodev:fix-multichar-separator-escape

Conversation

@spokodev

Copy link
Copy Markdown

Problem

A multi-character separator (documented as a {string} option) is concatenated straight into a RegExp with only its first character escaped, so it either silently drops data or throws:

getSlug('foo bar baz', { separator: '..' }); // "foo"  (bar and baz dropped)
getSlug('foo bar baz', { separator: '++' }); // throws: Invalid regular expression: /\+++/g
getSlug('foo bar baz', { separator: '()' }); // throws: Invalid regular expression: /\()+/g

For '..' the cleanup regex becomes /\..+/ — the second . is an unescaped wildcard that greedily eats the rest of the string (data loss). For '++'/'()'/'**' the result is an invalid pattern that throws an unhandled exception. Single-character separators, and multi-char separators made only of non-metacharacters ('::', '__'), happen to work, which is why the issue went unnoticed.

Cause

result = result.replace(/\s+/g, separator)
    .replace(new RegExp('\\' + separator + '+', 'g'), separator)
    .replace(new RegExp('(^\\' + separator + '+|\\' + separator + '+$)', 'g'), '');

'\\' + separator escapes only the first character of the separator.

Fix

Escape the whole separator with the file's existing escapeChars helper (already used elsewhere) and group it so the full sequence is matched when collapsing duplicates and trimming the ends.

Verification

  • New test in test/test-separator.js for '..'/'++'/'()'; fails before, passes after.
  • Full suite: 114 passing, 0 regressions; single-character separators (-, _, *, .) verified unchanged.
  • Idempotence fuzz, patched vs current release, 150,000 random inputs × 12 separators: the fix removes all crashes and the data-loss cases and introduces 0 new non-idempotence (the residual non-idempotent cases — input that itself contains the separator characters — are pre-existing and identical to the current release).

A multi-character `separator` (a documented string option) was concatenated
straight into a `RegExp`, escaping only its first character:

  getSlug('foo bar baz', { separator: '..' }) // "foo"  (bar, baz dropped)
  getSlug('foo bar baz', { separator: '++' }) // threw "Invalid regular expression"
  getSlug('foo bar baz', { separator: '()' }) // threw "Invalid regular expression"

For '..' the second dot became an unescaped wildcard that swallowed the rest
of the string (data loss); for '++'/'()' the result was an invalid pattern
that threw. Escape the whole separator with the existing `escapeChars` helper
and group it so the full sequence is matched when collapsing duplicates and
trimming the ends. Single-character separators are unaffected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant