Skip to content

HTML API: Introduce CSS class-list splitter.#10043

Open
dmsnell wants to merge 2 commits intoWordPress:trunkfrom
dmsnell:html-api/wp-split-class-names
Open

HTML API: Introduce CSS class-list splitter.#10043
dmsnell wants to merge 2 commits intoWordPress:trunkfrom
dmsnell:html-api/wp-split-class-names

Conversation

@dmsnell
Copy link
Copy Markdown
Member

@dmsnell dmsnell commented Sep 25, 2025

Trac ticket: Core-63694.

This patch extracts WP_HTML_Tag_Processor->class_list() for static calls via a new static WP_HTML_Tag_Processor::parse_class_list() method. This new method contains the internal CSS parsing code to take an HTML class attribute value and return a Generator to iterate over the classes in that value. Class names are appropriately deduplicated according to the given document compatibility mode, whose default is no-quirks mode.

Design review requests

  • The name isn’t great. It iterates over the class names, splits them, but also “deduplicates” them according to the parsing rules for a classattribute _no-quirks-mode_ HTML and it decodes HTML entities. It also _MUST_ represent a fullclass attribute because the parsing of trailing character references which are missing a semicolon or otherwise incomplete is dependent on whether they fall at the end of a string.
    • wp_classname_walker()
    • wp_walk_class_attribute()
    • wp_unique_classnames()
  • Should it be more useful to people wanting to conditionally add class names? Something more akin to classnames() in JS? We could pass varargs which are string|false or an array of additional class names to add.
function wp_classnames( 'wp-block-paragraph', array( 'display-wide' => $is_wide ) ) { … };
  • Update This patch has changed from introducing a new function to exposing an internal method on the Tag Processor. By making this change no new module needs to exist, and the method receives its own kind of helpful namespacing by nature of being a static method on the Tag Processor class.

Background

Many existing functions perform ad-hoc parsing of CSS class names, usually by splitting on a space character. However, there are issues with this approach:

  • There is no decoding of HTML character references, which is normative inside HTML attributes.
  • There is no handling of null bytes.
  • Class names can be split by more than just the space character.
  • There is no handling of duplicates, and while mostly benign, code forgetting to account for duplicates can lead to defects.

The new function handles the nuances to let developers focus on reading CSS class names, adding new class names, and removing class names. This serves a middleground between legacy code interacting with CSS class names in isolation and code processing full HTML documents.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Sep 25, 2025

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props dmsnell, westonruter, mukesh27, jonsurrell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@github-actions
Copy link
Copy Markdown

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@dmsnell dmsnell force-pushed the html-api/wp-split-class-names branch from b1fa4c5 to e0a9bff Compare September 25, 2025 21:08
Comment on lines +44 to +46
* @return Generator Use this in a foreach loop to iterate over the class names.
*/
function wp_split_class_names( $class_attribute_string ) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return Generator Use this in a foreach loop to iterate over the class names.
*/
function wp_split_class_names( $class_attribute_string ) {
* @return Generator<non-empty-string> Use this in a foreach loop to iterate over the class names.
*/
function wp_split_class_names( $class_attribute_string ): Generator {

Naturally, the same should be applied to \WP_HTML_Tag_Processor::class_list()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something

function wp_split_class_names( $class_attribute_string ): ?Generator {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this function contains yield statements, I don't think it ever can return null. It will always return Generator. See https://3v4l.org/YgDnb

@@ -0,0 +1,59 @@
<?php

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

* @return Generator Use this in a foreach loop to iterate over the class names.
*/
function wp_split_class_names( $class_attribute_string ) {
if ( '' === $class_attribute_string ) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ( '' === $class_attribute_string ) {
if ( '' === $class_attribute_string || ! is_string( $class_attribute_string ) ) {

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in 8185519

I’m really torn over these new functions and how to balance types. I don’t want to make permissive functions that hide problems, but I also don’t want sites to crash.

maybe we should add a _doing_it_wrong() for the non-string case?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me I think types are good to use except in the filter callback context, where other plugins can return mixed values from other callbacks.

So actually I think I would go ahead and add the string type to this param then there's no need for the type check. I don't think all APIs need be crafted with kid gloves. Adding type checks for each and every function and _doing_it_wrong() for every parameter seems not ideal, as PHP has this built-in with types, and otherwise there is phpdoc for static analysis to alert to the dev that they're doing it wrong.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding type checks for each and every function and _doing_it_wrong() for every parameter seems not ideal

With this I agree. My main problem is that the static type annotations have a poor mix of behaviors:

  • They will lead to crashes in production when we know developers will likely be testing with string values. They are less likely to test for the cases that crash it, and may miss that they have type errors.
  • They are not enforced, meaning the addition of type annotations and the removal of type checks hides invalid values deeper in the code. Even if this code had strict mode enabled, PHP wouldn’t enforce the types unless the file containing the calling code enables strict mode as well.

I’ve just seen way too many crashes when applying type annotations to function arguments, and it makes me feel like it’s a bad fit here.

}

// Get these from the HTML API to avoid ad-hoc parsing HTML or CSS class names.
$processor = new WP_HTML_Tag_Processor( '<wp-noop>' );
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If WP_HTML_Tag_Processor or class_list() can throw exceptions? Consider try/catch or input validation to avoid fatal errors.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should not throw exceptions.

@dmsnell dmsnell force-pushed the html-api/wp-split-class-names branch 5 times, most recently from e2b7b9b to a561a69 Compare October 6, 2025 21:18
@dmsnell dmsnell force-pushed the html-api/wp-split-class-names branch 4 times, most recently from e6ff4e2 to cdb967d Compare October 9, 2025 23:39
@westonruter
Copy link
Copy Markdown
Member

  • The name isn’t great.

What about wp_parse_css_class_names()? I think this would be more clear. Mentioning “css” makes it clear you're not talking about PHP class names somehow. And “parse” implies it's not as simple as just splitting on whitespace tokens.

@westonruter
Copy link
Copy Markdown
Member

  • Should it be more useful to people wanting to conditionally add class names? Something more akin to classnames() in JS? We could pass varargs which are string|false or an array of additional class names to add.

Seems cool, but do we have any use cases for this in core PHP? It would be nice to include some example implementations in the core codebase for this function to actually leverage it.

@dmsnell
Copy link
Copy Markdown
Member Author

dmsnell commented Oct 10, 2025

What about wp_parse_css_class_names()?

I like this, though I still like split since it communicates the intent. parse here feels like it communicates more than it performs. I am changing it to wp_split_css_class_list() — maybe something like wp_explode_css_class_names() would also work, at the cost of getting long.

Would love to continue stewing on the name. Overly-short, overly-long, it’s hard to find one that’s just right.

@dmsnell dmsnell force-pushed the html-api/wp-split-class-names branch 2 times, most recently from 82e9ae0 to 500bbb8 Compare October 10, 2025 03:26
@dmsnell dmsnell changed the title HTML API: Introduce wp_split_class_names(). HTML API: Introduce CSS class-list splitter. Oct 10, 2025
@dmsnell
Copy link
Copy Markdown
Member Author

dmsnell commented Oct 10, 2025

I’ve turned this into a static method on the Tag Processor, but I instantly don’t like it because it lost the nuance of decoding HTML character references.

This is a conundrum, however, because existing code mixes decoded and non-decoded class names. For example, code will read the class attribute on an HTML string, but then add new raw class names to a list. While it’s unlikely that someone adds a class whose name should be &amp;, if they do so, there’Í a discrepancy between the existing classes and this new one — what should be escaped or unescaped?


I may revert the last commit. While it’s helpful that this function properly splits and deduplicates that class names, decoding the HTML character references was an important piece as well, and I think that’s a bit harder to merge into the Tag Processor’s interface.

@dmsnell dmsnell force-pushed the html-api/wp-split-class-names branch from 500bbb8 to 81ef5e0 Compare October 10, 2025 03:43
@dmsnell
Copy link
Copy Markdown
Member Author

dmsnell commented Oct 10, 2025

@westonruter I tossed out some refactors in #10215. They highlight two things to me:

  • there needs to be more clarity around whether the inputs are HTML escaped or not.
  • the functions should return an array and not an iterator.

It also leads me to feel like having a new separate function is best and exporting the internals of the HTML API is a mistake. Perhaps there is room for two new functions:

  • wp_parse_html_class_attribute()
  • wp_split_decoded_class_list()

Something like this to more clearly communicate whether things like null bytes and character references shall be transformed or whether it’s assumed that the class names are the “raw” and unescaped class names build within source code.

@dmsnell dmsnell force-pushed the html-api/wp-split-class-names branch 3 times, most recently from 858440d to 661d588 Compare October 21, 2025 09:23
@dmsnell dmsnell force-pushed the html-api/wp-split-class-names branch from 661d588 to 9430d73 Compare November 24, 2025 20:26
This patch introduces a new CSS helper module containing a new function,
`wp_split_class_names()`. This function wraps some code to rely on the
HTML API to take an HTML `class` attribute value and return a
`Generator` to iterate over the classes in that value.

Many existing functions perform ad-hoc parsing of CSS class names,
usually by splitting on a space character. However, there are issues
with this approach:

 - There is no decoding of HTML character references, which is normative
   inside HTML attributes.
 - There is no handling of null bytes.
 - Class names can be split by more than just the space character.
 - There is no handling of duplicates, and while mostly benign, code
   forgetting to account for duplicates can lead to defects.

The new function handles the nuances to let developers focus on reading
CSS class names, adding new class names, and removing class names. This
serves a middleground between legacy code interacting with CSS class
names in isolation and code processing full HTML documents.
@dmsnell dmsnell force-pushed the html-api/wp-split-class-names branch from 9430d73 to e5a3a62 Compare December 18, 2025 03:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants