HTML API: Introduce CSS class-list splitter. by dmsnell · Pull Request #10043 · WordPress/wordpress-develop

dmsnell · 2025-09-25T18:49:05Z

This patch extracts WP_HTML_Tag_Processor->class_list() for static calls via a new static WP_HTML_Tag_Processor::parse_class_list() method. This new method contains the internal CSS parsing code to take an HTML class attribute value and return a Generator to iterate over the classes in that value. Class names are appropriately deduplicated according to the given document compatibility mode, whose default is no-quirks mode.

Design review requests

The name isn’t great. It iterates over the class names, splits them, but also “deduplicates” them according to the parsing rules for a classattribute _no-quirks-mode_ HTML and it decodes HTML entities. It also _MUST_ represent a fullclass attribute because the parsing of trailing character references which are missing a semicolon or otherwise incomplete is dependent on whether they fall at the end of a string.
- wp_classname_walker()
- wp_walk_class_attribute()
- wp_unique_classnames()
Should it be more useful to people wanting to conditionally add class names? Something more akin to classnames() in JS? We could pass varargs which are string|false or an array of additional class names to add.

function wp_classnames( 'wp-block-paragraph', array( 'display-wide' => $is_wide ) ) { … };

Update This patch has changed from introducing a new function to exposing an internal method on the Tag Processor. By making this change no new module needs to exist, and the method receives its own kind of helpful namespacing by nature of being a static method on the Tag Processor class.

Background

Many existing functions perform ad-hoc parsing of CSS class names, usually by splitting on a space character. However, there are issues with this approach:

There is no decoding of HTML character references, which is normative inside HTML attributes.
There is no handling of null bytes.
Class names can be split by more than just the space character.
There is no handling of duplicates, and while mostly benign, code forgetting to account for duplicates can lead to defects.

The new function handles the nuances to let developers focus on reading CSS class names, adding new class names, and removing class names. This serves a middleground between legacy code interacting with CSS class names in isolation and code processing full HTML documents.

github-actions · 2025-09-25T19:41:23Z

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props dmsnell, westonruter, mukesh27, jonsurrell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

github-actions · 2025-09-25T19:51:27Z

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

The Plugin and Theme Directories cannot be accessed within Playground.
All changes will be lost when closing a tab with a Playground instance.
All changes will be lost when refreshing the page.
A fresh instance is created each time the link below is clicked.
Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

westonruter · 2025-09-26T04:20:57Z

+ * @return Generator Use this in a foreach loop to iterate over the class names.
+ */
+function wp_split_class_names( $class_attribute_string ) {


Suggested change

* @return Generator Use this in a foreach loop to iterate over the class names.

*/

function wp_split_class_names( $class_attribute_string ) {

* @return Generator<non-empty-string> Use this in a foreach loop to iterate over the class names.

*/

function wp_split_class_names( $class_attribute_string ): Generator {

Naturally, the same should be applied to \WP_HTML_Tag_Processor::class_list()

Something

function wp_split_class_names( $class_attribute_string ): ?Generator {

Given that this function contains yield statements, I don't think it ever can return null. It will always return Generator. See https://3v4l.org/YgDnb

westonruter · 2025-09-26T04:21:17Z

@@ -0,0 +1,59 @@
+<?php
+


Suggested change

westonruter · 2025-09-26T04:23:40Z

+ * @return Generator Use this in a foreach loop to iterate over the class names.
+ */
+function wp_split_class_names( $class_attribute_string ) {
+	if ( '' === $class_attribute_string ) {


Suggested change

if ( '' === $class_attribute_string ) {

if ( '' === $class_attribute_string || ! is_string( $class_attribute_string ) ) {

added in 8185519

I’m really torn over these new functions and how to balance types. I don’t want to make permissive functions that hide problems, but I also don’t want sites to crash.

maybe we should add a _doing_it_wrong() for the non-string case?

To me I think types are good to use except in the filter callback context, where other plugins can return mixed values from other callbacks.

So actually I think I would go ahead and add the string type to this param then there's no need for the type check. I don't think all APIs need be crafted with kid gloves. Adding type checks for each and every function and _doing_it_wrong() for every parameter seems not ideal, as PHP has this built-in with types, and otherwise there is phpdoc for static analysis to alert to the dev that they're doing it wrong.

Adding type checks for each and every function and _doing_it_wrong() for every parameter seems not ideal

With this I agree. My main problem is that the static type annotations have a poor mix of behaviors:

They will lead to crashes in production when we know developers will likely be testing with string values. They are less likely to test for the cases that crash it, and may miss that they have type errors.

They are not enforced, meaning the addition of type annotations and the removal of type checks hides invalid values deeper in the code. Even if this code had strict mode enabled, PHP wouldn’t enforce the types unless the file containing the calling code enables strict mode as well.

I’ve just seen way too many crashes when applying type annotations to function arguments, and it makes me feel like it’s a bad fit here.

mukeshpanchal27 · 2025-09-26T05:23:05Z

+	}
+
+	// Get these from the HTML API to avoid ad-hoc parsing HTML or CSS class names.
+	$processor = new WP_HTML_Tag_Processor( '<wp-noop>' );


If WP_HTML_Tag_Processor or class_list() can throw exceptions? Consider try/catch or input validation to avoid fatal errors.

They should not throw exceptions.

westonruter · 2025-10-10T00:39:53Z

The name isn’t great.

What about wp_parse_css_class_names()? I think this would be more clear. Mentioning “css” makes it clear you're not talking about PHP class names somehow. And “parse” implies it's not as simple as just splitting on whitespace tokens.

westonruter · 2025-10-10T00:42:34Z

Should it be more useful to people wanting to conditionally add class names? Something more akin to classnames() in JS? We could pass varargs which are string|false or an array of additional class names to add.

Seems cool, but do we have any use cases for this in core PHP? It would be nice to include some example implementations in the core codebase for this function to actually leverage it.

dmsnell · 2025-10-10T02:53:14Z

What about wp_parse_css_class_names()?

I like this, though I still like split since it communicates the intent. parse here feels like it communicates more than it performs. I am changing it to wp_split_css_class_list() — maybe something like wp_explode_css_class_names() would also work, at the cost of getting long.

Would love to continue stewing on the name. Overly-short, overly-long, it’s hard to find one that’s just right.

dmsnell · 2025-10-10T03:34:10Z

I’ve turned this into a static method on the Tag Processor, but I instantly don’t like it because it lost the nuance of decoding HTML character references.

This is a conundrum, however, because existing code mixes decoded and non-decoded class names. For example, code will read the class attribute on an HTML string, but then add new raw class names to a list. While it’s unlikely that someone adds a class whose name should be &, if they do so, there’Í a discrepancy between the existing classes and this new one — what should be escaped or unescaped?

I may revert the last commit. While it’s helpful that this function properly splits and deduplicates that class names, decoding the HTML character references was an important piece as well, and I think that’s a bit harder to merge into the Tag Processor’s interface.

dmsnell · 2025-10-10T03:59:28Z

@westonruter I tossed out some refactors in #10215. They highlight two things to me:

there needs to be more clarity around whether the inputs are HTML escaped or not.
the functions should return an array and not an iterator.

It also leads me to feel like having a new separate function is best and exporting the internals of the HTML API is a mistake. Perhaps there is room for two new functions:

wp_parse_html_class_attribute()
wp_split_decoded_class_list()

Something like this to more clearly communicate whether things like null bytes and character references shall be transformed or whether it’s assumed that the class names are the “raw” and unescaped class names build within source code.

This patch introduces a new CSS helper module containing a new function, `wp_split_class_names()`. This function wraps some code to rely on the HTML API to take an HTML `class` attribute value and return a `Generator` to iterate over the classes in that value. Many existing functions perform ad-hoc parsing of CSS class names, usually by splitting on a space character. However, there are issues with this approach: - There is no decoding of HTML character references, which is normative inside HTML attributes. - There is no handling of null bytes. - Class names can be split by more than just the space character. - There is no handling of duplicates, and while mostly benign, code forgetting to account for duplicates can lead to defects. The new function handles the nuances to let developers focus on reading CSS class names, adding new class names, and removing class names. This serves a middleground between legacy code interacting with CSS class names in isolation and code processing full HTML documents.

dmsnell force-pushed the html-api/wp-split-class-names branch from b1fa4c5 to e0a9bff Compare September 25, 2025 21:08

westonruter reviewed Sep 26, 2025

View reviewed changes

mukeshpanchal27 reviewed Sep 26, 2025

View reviewed changes

dmsnell force-pushed the html-api/wp-split-class-names branch 5 times, most recently from e2b7b9b to a561a69 Compare October 6, 2025 21:18

dmsnell force-pushed the html-api/wp-split-class-names branch 4 times, most recently from e6ff4e2 to cdb967d Compare October 9, 2025 23:39

dmsnell force-pushed the html-api/wp-split-class-names branch 2 times, most recently from 82e9ae0 to 500bbb8 Compare October 10, 2025 03:26

dmsnell changed the title ~~HTML API: Introduce wp_split_class_names().~~ HTML API: Introduce CSS class-list splitter. Oct 10, 2025

dmsnell force-pushed the html-api/wp-split-class-names branch from 500bbb8 to 81ef5e0 Compare October 10, 2025 03:43

westonruter mentioned this pull request Oct 20, 2025

Make the filter links links in the posts list table filterable #9950

Open

dmsnell force-pushed the html-api/wp-split-class-names branch 3 times, most recently from 858440d to 661d588 Compare October 21, 2025 09:23

dmsnell force-pushed the html-api/wp-split-class-names branch from 661d588 to 9430d73 Compare November 24, 2025 20:26

dmsnell added 2 commits December 16, 2025 14:02

Reimagine as HTML API export.

e5a3a62

dmsnell force-pushed the html-api/wp-split-class-names branch from 9430d73 to e5a3a62 Compare December 18, 2025 03:52

	if ( '' === $class_attribute_string ) {
	if ( '' === $class_attribute_string \|\| ! is_string( $class_attribute_string ) ) {

Conversation

dmsnell commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design review requests

Background

Uh oh!

github-actions Bot commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Sep 25, 2025

Test using WordPress Playground

Some things to be aware of

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

westonruter commented Oct 10, 2025

Uh oh!

westonruter commented Oct 10, 2025

Uh oh!

dmsnell commented Oct 10, 2025

Uh oh!

dmsnell commented Oct 10, 2025

Uh oh!

dmsnell commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dmsnell commented Sep 25, 2025 •

edited

Loading

github-actions Bot commented Sep 25, 2025 •

edited

Loading