fix: handle non-BMP UTF-16 characters in markdown formatting#63
Conversation
📝 WalkthroughWalkthroughThe formatter now correctly handles non-BMP characters by tracking markdown positions in UTF-16 code units. A new infrastructure layer (BMP_MAX constant and _code_units_len helper) enables the parser to distinguish BMP characters (1 code unit) from non-BMP characters (2 code units), allowing accurate position advancement across LINK, HEADING, QUOTE, and character parsing paths. ChangesUTF-16 Position Tracking
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Описание
Исправляет некорректное форматирование Markdown-ссылок при наличии в тексте non-BMP символов, например эмодзи.
Тип изменений
Связанные задачи / Issue
Closes #62.
Тестирование
Summary by CodeRabbit
Release Notes