Skip to content

Normalize NUL bytes to U+FFFD at parse entry#248

Merged
dereuromark merged 1 commit into
masterfrom
fix/nul-normalization
Jun 16, 2026
Merged

Normalize NUL bytes to U+FFFD at parse entry#248
dereuromark merged 1 commit into
masterfrom
fix/nul-normalization

Conversation

@dereuromark

Copy link
Copy Markdown
Contributor

A raw NUL (U+0000) must never reach rendered output. This replaces it with the U+FFFD replacement character at the parse entry (WHATWG-style normalization, for cross-impl conformance), so a control byte cannot survive into the produced HTML.

What changed

  • BlockParser: at the parse entry, any NUL in the input is replaced with U+FFFD before line splitting.
  • SafeMode: the URL-scheme normalization now also strips the U+FFFD replacement character. A NUL-in-scheme evasion such as java\x00script: arrives as java\u{FFFD}script: after normalization; stripping U+FFFD keeps that evasion detected and blocked (empty href), preserving the existing SafeMode guarantee.

The dangerous-scheme detection regex itself was kept intact; the change only extends the set of stripped characters so the upstream NUL normalization does not open a new bypass.

Ported from carve-php commit ff40264.

A raw NUL (U+0000) must never reach rendered output. Replace it with the
U+FFFD replacement character at the parse entry (WHATWG-style, decided
cross-impl behavior), so a control byte cannot survive into HTML.

SafeMode now also strips U+FFFD from a URL scheme, so a `java\x00script:`
evasion - which arrives as `java\u{FFFD}script:` after normalization - is
still detected and blocked (empty href).

Ported from carve-php commit ff40264.
@dereuromark dereuromark added the bug Something isn't working label Jun 16, 2026
@dereuromark dereuromark merged commit 72c4591 into master Jun 16, 2026
4 checks passed
@dereuromark dereuromark deleted the fix/nul-normalization branch June 16, 2026 12:21
@codecov

codecov Bot commented Jun 16, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.06%. Comparing base (306e362) to head (5c05fac).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##             master     #248   +/-   ##
=========================================
  Coverage     92.06%   92.06%           
- Complexity     3571     3572    +1     
=========================================
  Files           107      107           
  Lines         10118    10120    +2     
=========================================
+ Hits           9315     9317    +2     
  Misses          803      803           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant