HTML API: Lower-case HTML tag names in get_qualified_tag_name().#7332
HTML API: Lower-case HTML tag names in get_qualified_tag_name().#7332dmsnell wants to merge 1 commit intoWordPress:trunkfrom
get_qualified_tag_name().#7332Conversation
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN: To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
e677f56 to
36776ce
Compare
sirreal
left a comment
There was a problem hiding this comment.
This seems reasonable. Maybe the PHPDoc could give a few examples of how this behaves or mention that tag names are lowercase to explain how it's different from get_tag().
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
|
@sirreal I don't feel strongly about this, but I do think that if we want to consider the change it'd be best to do before 6.7 is released, as after that it would be a backwards-compatibility break. I've expanded the docblock with a comparison to |
|
I've thought about this some more and I don't think we should make this change.
I'm not convinced this is the purpose of the method, although it depends what is meant by "printing and display." Primarily, HTML API is for working with HTML input and HTML output. In HTML, the casing of tag names is irrelevant. There's no reason for svg tags to be "correctly" cased ( It doesn't seem more correct to me to lowercase HTML tag names, MathML tag names, and then used some mixed casing on SVG tag names when the element name differs. Here's my take on this method. This method applies the rules from the specification on parsing foreign content:
This seems to adjust for a difference between an HTML tag name (case insensitive) and an SVG element name. This adjustment is important as a consideration of tree construction where HTML tokens are transformed into elements in a tree. At the moment, this roughly corresponds to And to state the obvious, it should be trivial for consuming code to lowercase tag names if desired.
Definitely. |
It's less about being correct and more about expectations. I think if you survey a bunch of people and ask them how they feel about turning Here is a survey from my list of ~300k HTML pages.
My point in sharing these numbers isn't to say they dictate what we do; just noting that the overwhelming majority of HTML out there is using lower-case tag names and people have grown accustomed to them. In the case of normalization, this is the default behavior, which is why I care about it. But also going back in time, the reason I remember for introducing these functions was just to ensure that the html5lib tests pass which check against the adjusted foreign content tag names and attribute names. I don't feel these have a central important role in the spec compliance. Your point is sound: it's trivial for calling code to lower-case-fold the tag names. Except, then they also have to remember to only do that for elements in the HTML namespace and not to do so for foreign elements. That leaves calling code calling this function and then immediately asking if it's an HTML element and then lower-casing. $tag_name = $this->get_qualified_tag_name();
if ( 'html' === $this->get_namespace() ) {
$tag_name = strtolower( $tag_name );
}maybe I had a gut reaction since I was going to close this, but I think I'll leave it open at least a little longer to continue pondering. |
|
For general interest: here is the list of all-caps and mixed-case tag names from my survey. The list includes tag closers, and I didn't attempt to check if the closer casing matched the opening casing. Obvious HTML errors are evident, especially in the list of once-seen tags. The list |
|
@sirreal want to examine this again and consider it? I still don’t have strong feelings about it, but the more I do with constructive uses of the HTML API, the more I like having functions like this available for things like it’s mostly nice for the SVG/MathML elements requiring mixed case, but convenient to not have to export |
|
I've been going back and forth. I want to make a coherent decision here. As implemented, this would print lower case tag names for HTML and MATH elements, but for SVG tags it will print lower or kebab case —e.g. In trunk, the method roughly corresponds to Node.nodeName for elements (and Element.tagName). With this change, it corresponds to I'm not opposed to this, however I'm not sure what benefit there is to printing a few SVG tags with kebab case. They're not treated any differently, and I suspect the vast majority of web developers would lower case the SVG tags as well. My big question is, why not always use |
this probably goes back to XML being case sensitive whereas HTML is not, but here we have something that’s mostly like XML being embedded within HTML. Also, the tag names in the XHTML and MathML namespace are all lowercase, so SVG remains unique. It makes me wonder if there’s relevance here with safe SVG handling, or export to XML. None of this is particularly decisive, but I tried an experiment with SVG tag names and also with attribute names, thinking along the same lines. Below are the source and a render from Safari of proper and lower casing. We can see that when embedded it has no bearing, but when provided as an external document or when enclosed as a data URI it does impact the render. I count 39 mixed-case tag names and 53 mixed-case attribute names. I wonder if it would be worth incorporating these. If we really don’t like them here we can put it all inside of @sirreal
|
|
I now think that it could be appropriate here to only lower-case normative HTML elements. Custom elements likely should have their casing preserved. Though I don’t know what to do about unknown HTML elements that are not custom elements, like |
4ba41a8 to
ae47481
Compare
Wouldn't it be an HTML element? (Aren't custom elements also technically HTML elements?) It would be handled by the "any other start/end tag" rules, the start rule is to:
If we inspect in the browser, the element's Unless, of course, it's being parsed in foreign content, in which case the unknown element seems to inherit the namespace. I think all the namespacing is handled correctly by the HTML Processor so I don't think unknown or custom elements should require special handling. |
| case 'attributename': | ||
| return 'attributeName'; | ||
|
|
||
| // @todo Is this right? |
There was a problem hiding this comment.
Yes, the three that are flagged are correct:
https://html.spec.whatwg.org/multipage/parsing.html#adjust-svg-attributes
There was a problem hiding this comment.
thank you for confirming!
| case 'solidcolor': | ||
| return 'solidColor'; | ||
|
|
||
| default: | ||
| return $lower_tag_name; | ||
| } | ||
| case 'textarea': | ||
| return 'textArea'; |
There was a problem hiding this comment.
I don't see these on the standard page, why are they added?
There was a problem hiding this comment.
while I can’t find the reference now, I see this note in the SVG 2.0 spec
Added the solidcolor element and its two properties solid-color and solid-opacity, ported over from SVG Tiny 1.2. (Renamed 'solidColor' to 'solidcolor'.)
https://svgwg.org/svg2-draft/single-page.html#changes-pservers
also I’m not sure about textArea. I do remember being bothered by the discrepancies in the existing list and the new one.
here are some links for reference:
I want to review all this and be certain on what should happen before merging. thanks for the HTML API debugger link. that’s helpful.
There was a problem hiding this comment.
okay it was resolved to add solidColor to SVG 2.0 but it wasn’t added, or it was renamed.
https://www.w3.org/TR/2013/WD-SVG2-20130409/pservers.html#SolidColors
aha! it was SVG 1.2 Tiny. I must have been examining the wrong spec document.
https://www.w3.org/TR/SVGMobile12/single-page.html
There is an element table there with the mystery elements. I feel much better now knowing where these slipped in.
Yes of course but I wasn’t talking about namespaces. I was contrasting the fact that we have distinct custom elements which behave differently than the set of tags defined as HTML elements. We could say that something like So I’m not trying to delineate in which namespace these belong but whether we should be applying the rules for custom elements to them since they are indeed custom. In writing this it seems strange to do anything other than lowercase them. I don’t know why I thought we should handle them differently. I’ll get back to this soon and fully review everything that was previously uncertain. |
53d1d4c to
2d598e6
Compare
2d598e6 to
f5d1c7d
Compare
Since this method is meant for printing and display, a more expected return value would be the lower-case variant of a given HTML tag name. This patch changes the behavior accordingly. Follow-up to [58867]. See Core-61576.
f5d1c7d to
5c0a1e7
Compare


Trac ticket: Core-61576.
Since this method is meant for printing and display, a more expected return value would be the lower-case variant of a given HTML tag name.
This patch changes the behavior accordingly. No tests are impacted by this change.
Diff best viewed ignoring whitespace changes.
Follow-up to [58867].