Skip to content

Multibyte first character#18

Open
atrakowski wants to merge 2 commits into
tenderlove:masterfrom
atrakowski:multibyte-first-letter
Open

Multibyte first character#18
atrakowski wants to merge 2 commits into
tenderlove:masterfrom
atrakowski:multibyte-first-letter

Conversation

@atrakowski
Copy link
Copy Markdown

Problem

There are names starting with an Umlaut or some other multibyte character, e.g. Öykü Çelik. Currently they are incorrectly lowercased, because the Regexp for upcasing the first letter only matches /\b\w/ i.e. /\b[a-zA-Z0-9_]/.

NameCase("Öykü Çelik", lazy: false) # => "öykü çelik" (expected "Öykü Çelik")
NameCase("Öykü Çelik".downcase) # => "öykü çelik" (expected "Öykü Çelik")

Fix

Use a Regexp based on Unicode properties like /\b\p{Word}/. With that the tests still pass, including a new test for a name starting with a multibyte character.

FYI: There are two commits. The first one is the fix itself, in the second commit I've replaced all remaining occurrences of \w with \p{Word}. If the latter is too much of a change, I can easily undo it.

@atrakowski atrakowski force-pushed the multibyte-first-letter branch from 9f7c5af to 3201976 Compare April 13, 2026 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant