fix: handle non-BMP UTF-16 characters in markdown formatting by Arondy · Pull Request #63 · MaxApiTeam/PyMax

Arondy · 2026-06-11T18:54:55Z

Описание

Исправляет некорректное форматирование Markdown-ссылок при наличии в тексте non-BMP символов, например эмодзи.

Тип изменений

Исправление бага
Новая функциональность
Улучшение документации
Рефакторинг

Связанные задачи / Issue

Closes #62.

Тестирование

from pymax.formatting.markdown import Formatter

clean, entities = Formatter.format_markdown("🔥 [a](https://x.com) 👍 [b](https://y.com)")
assert entities[0].from_ == 3   # 🔥(2) + пробел(1)
assert entities[1].from_ == 8   # 🔥(2) + пробел(1) + a(1) + пробел(1) + 👍(2) + пробел(1)

Summary by CodeRabbit

Release Notes

Bug Fixes
- Improved handling of emoji and extended Unicode characters in markdown formatting.

coderabbitai · 2026-06-11T18:55:07Z

📝 Walkthrough

Walkthrough

The formatter now correctly handles non-BMP characters by tracking markdown positions in UTF-16 code units. A new infrastructure layer (BMP_MAX constant and _code_units_len helper) enables the parser to distinguish BMP characters (1 code unit) from non-BMP characters (2 code units), allowing accurate position advancement across LINK, HEADING, QUOTE, and character parsing paths.

Changes

UTF-16 Position Tracking

Layer / File(s)	Summary
UTF-16 Infrastructure `src/pymax/formatting/markdown.py`	`Formatter` gains `BMP_MAX = 0xFFFF` constant and `_code_units_len()` helper to compute UTF-16 code-unit lengths by encoding text as UTF-16LE and dividing by 2.
UTF-16 Position Tracking in Parsing `src/pymax/formatting/markdown.py`	LINK label extraction, HEADING/QUOTE text loops, and default character advancement now increment `clean_pos` using UTF-16 code-unit sizing (2 for non-BMP, 1 for BMP) instead of fixed `+1` increments.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related issues

#62: The UTF-16 code-unit position tracking changes directly address the bug where non-BMP characters shifted LINK positions by fixing offset calculations throughout the markdown formatter.

Poem

🐰 In UTF-16's binary dance,
Where code units bloom and prance,
BMP bounds are now our guide,
Non-BMP marked with widened stride,
Positions tracked both true and bright! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: fixing handling of non-BMP UTF-16 characters in markdown formatting, which matches the core objective of the PR.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The pull request description follows the required template with all sections completed: description, change type, related issues, and testing example.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

fix: handle non-BMP UTF-16 characters in markdown formatting

22d7efa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle non-BMP UTF-16 characters in markdown formatting#63

fix: handle non-BMP UTF-16 characters in markdown formatting#63
Arondy wants to merge 1 commit into
MaxApiTeam:mainfrom
Arondy:fix/non-bmp-utf16-positions

Arondy commented Jun 11, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Arondy commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Описание

Тип изменений

Связанные задачи / Issue

Тестирование

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Arondy commented Jun 11, 2026 •

edited

Loading

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading