Skip to content

RFC 237: Exclusions in WEB_FEATURES.yml files#237

Open
jugglinmike wants to merge 9 commits into
web-platform-tests:mainfrom
bocoup:web-features-exclusions
Open

RFC 237: Exclusions in WEB_FEATURES.yml files#237
jugglinmike wants to merge 9 commits into
web-platform-tests:mainfrom
bocoup:web-features-exclusions

Conversation

@jugglinmike

Copy link
Copy Markdown
Contributor

@jugglinmike jugglinmike changed the title RFC 236: Exclusions in WEB_FEATURES.yml files RFC 237: Exclusions in WEB_FEATURES.yml files Mar 26, 2026
@jcscottiii

jcscottiii commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

Thanks @jugglinmike!

The goal of establishing symbolic relationships (Single Source of Truth) is fantastic and definitely the right direction for maintainability over time.

I notice the RFC leans toward a string-based micro-syntax (!#) because it keeps the YAML schema "flat" and simple for the parser. However, I believe this shifts the complexity onto the human author. (or your favorite AI haha)

I’d love for us to consider a hybrid like Schema that supports both simple strings and objects.

Example

We keep simple strings for standard paths (no visual tax), but allow objects when a rule needs metadata (like an exclusion).

features:
  - name: alerts
    files:
      - path: ./*
        exclude_ids:       # Standard YAML list, zero ambiguity
          - print
          - logging
  - name: print
    files:
      - ./print-*         # Simple file path can stay a simple string!

Compared to the original

features:
  - name: alerts
    files:
      - ./*
      - "!#print"
      - "!#logging"

One major benefit:

  • No Custom Parsing Logic: Any standard YAML parser reads exclude_ids as a list natively. We don't need custom regex to find where the feature name starts in !#feature-name. Something I sometimes regret doing with the **

@jugglinmike

Copy link
Copy Markdown
Contributor Author

Hi @jcscottiii! Thanks for your feedback!

One novel aspect of your proposal is that it scopes exclusions to individual path patterns. While I think that could be tenable, it's not a capability that we've specifically felt a need for.

It sounds like we're aligned on prioritizing human authors/readers. To that end, I think "scoped" exclusions may make these rules more difficult to understand since they would effectively introduce a grouping operator that hasn't been motivated by experience. (Well, not our experience, anyway. I'd be happy to hear about any instances where you wanted it!)

How would you feel about expressing exclusions with a standalone object so that each list item could be either a string value or a dict with a single key (namely, exclude_ids)? For example:

 features:
   - name: alerts
     files:
-      - path: ./*
-        exclude_ids:       # Standard YAML list, zero ambiguity
+      - ./*
+      - exclude_ids:       # Standard YAML list, zero ambiguity
           - print
           - logging
   - name: print
     files:
       - ./print-*         # Simple file path can stay a simple string!

Anecdotally, I've only observed a small number (read: 1 to 3) of exclusions per feature entry, so a nested list like exclude_ids might be more structure than we truly need. I'm curious about simplifying it further to a string value for a key named exclude_id (despite the bit of repetition it adds to the running example):

 features:
   - name: alerts
     files:
-      - path: ./*
-        exclude_ids:       # Standard YAML list, zero ambiguity
-          - print
-          - logging
+      - ./*
+      - exclude_id: print   # Standard YAML string, zero ambiguity
+      - exclude_id: logging # Standard YAML string, zero ambiguity
   - name: print
     files:
       - ./print-*         # Simple file path can stay a simple string!

...but I don't feel this flattening would have a huge impact on ergonomics, so I could go either way!

In any case (in your proposal and in my suggested amendments), the ./ prefix is not technically necessary. Are you suggesting it should become mandatory?

@jcscottiii

Copy link
Copy Markdown
Contributor

@jgraham mentioned that this should be a set of rules to web-feature-ids. Something like this:

	file-1.html: css-flexbox
	file-2.html: css-grid
	*: css-multicol

I'll let @jgraham provide a more thorough review.

@jugglinmike

Copy link
Copy Markdown
Contributor Author

A bit more context for the benefit of tomorrow's RFC triage call:

Since this RFC is about streamlining the metadata, here's an approximation of its impact. While the syntax is still under consideration, that patch shows the order of magnitude: "130 files changed, 275 insertions(+), 791 deletions(-)"

Until recently, WPT didn't formally document the WEB_FEATURES.yml files. We landed some documentation last week, so you can now find it online at https://web-platform-tests.org/writing-tests/out-of-band-metadata.html

Looking forward to discussing more tomorrow!

@jgraham

jgraham commented May 5, 2026

Copy link
Copy Markdown
Contributor

@jgraham mentioned that this should be a set of rules to web-feature-ids

Yes.

Mapping feature ids to filename patterns makes sense if you are trying to work out "which files are part of this feature". You find the id you care about and then evaluate the given rules against all the files in the directory. That is indeed a common use case for this data, but critically it's a use case that's basically always automated.

On the other hand if you have one file and want to know what features it will correspond to you have to evaluate the full set of rules. That's a use case you have when adding a file when you want to know if it will be correctly labelled, or if a rule update is required. Typically that use case isn't automated.

So if we want to optimise for the latter, we should look to make it as easy as possible to figure out what features correspond to a given file. In that case having the rules be a list of patterns that could apply and stopping on the first that does apply seems likely to be much easier to work with e.g.

*-grid-*: [css-grid, css-multicol]
*-flex-*: [css-flexbox, css-multicol]
*: css-multicol

There is arguably a bug there that a file which has both -grid- and -flex in it would only be labelled as css-grid and css-multicol. In theory that seems bad. In practice it seems likely to be fine to fix it by adding another rule like:

*-grid-flex-*: [css-grid, css-flexbox, css-multicol]

and just assuming that people can follow a naming convention rather than requiring a perfectly general syntax. In particular, I think that once you start making things additive you end up back at the starting point for this RFC which is "we need to invent an exclusion mechanism so that some rules don't apply in some cases" and then you're back at it being really hard for a human to correctly deduce the impact of the rules.

@jugglinmike

Copy link
Copy Markdown
Contributor Author

we should look to make it as easy as possible to figure out what features correspond to a given file.

I believe that we can achieve that goal without any changes to the schema. Since all file-matching patterns are strictly ordered in the current design (via lists of lists), we can simply change the way the files are interpreted (namely by adding strict precedence). Just like in the flat structure that @jgraham sketched out above, this will obviate the need for the ! prefix, reducing total entry count.

This might be worth considering because the flat structure would introduce more repetition in entries that hold a lot of patterns. The pathological case is ./css/css-conditional/container-queries/WEB_FEATURES.yml with its 128 patterns for container-queries. We shouldn't design for the pathological case, though, so I've collected some stats to give a better sense for the number of file patterns typically used (taking care to ignore the !-prefixed entries).

Mean: 2.3628988642509463
Standard deviation: 5.748726896571267

xychart
    title "File patterns per web-feature entry"
    x-axis "# of web-features" [1, 2, 3, 4, 5, 6, 7, 8, 9+]
    y-axis "# of file patterns" 1 --> 869
    bar [869, 748, 78, 43, 27, 17, 12, 4, 51]
Loading

Grist for the mill, in any case!

Source code
#!/usr/bin/env python3

import itertools
import os
import yaml

MAX_BUCKET = 9

def find(root):
    for dirpath, dirnames, filenames in os.walk(root):
        if 'WEB_FEATURES.yml' not in filenames:
            continue

        filename = os.path.join(dirpath, 'WEB_FEATURES.yml')
        with open(filename, 'r') as handle:
            yield (filename, yaml.safe_load(handle)['features'])

def int_list(ints):
    return ', '.join(map(str, ints))

def render(mean, standard_deviation, grouped):
    x_axis = int_list(range(1, MAX_BUCKET)) + f', {MAX_BUCKET}+'
    max_value = max(grouped.values())
    values = int_list(grouped.values())
    return f'''
Mean: {mean}  
Standard deviation: {standard_deviation}

```mermaid
xychart
    title "File patterns per web-feature entry"
    x-axis "# of web-features" [{x_axis}]
    y-axis "# of file patterns" 1 --> {max_value}
    bar [{values}]
```
    '''

def main(root):
    file_entry_counts = []
    grouped = {x: 0 for x in range(1, MAX_BUCKET + 1)}
    for filename, features in find(root):
        for feature in features:
            # Negation entries will be obviated by the new semantics
            count = len([x for x in feature['files'] if not x.startswith('!')])
            grouped[min(MAX_BUCKET, count)] += 1

            file_entry_counts.append(count)

    size = len(file_entry_counts)
    mean = sum(file_entry_counts) / size
    variance = sum([(count - mean) ** 2 for count in file_entry_counts]) / size
    standard_deviation = variance ** 0.5
    return render(mean, standard_deviation, grouped)

if __name__ == '__main__':
    print(main('.'))

@jugglinmike

Copy link
Copy Markdown
Contributor Author

@jcscottiii @jgraham @gsnedders I've updated the proposal to reflect the consensus of the latest RFCs & Infrastructure meeting.

@jcscottiii jcscottiii left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a lot better. I have some questions. Feel free to post your thoughts on them.

Comment thread rfcs/web_features_exclusions.md Outdated
Comment thread rfcs/web_features_exclusions.md
Comment thread rfcs/web_features_exclusions.md Outdated
Comment on lines +84 to +88
```yaml
features:
- foo.html: NULL
- print-*: print
- "*": alerts

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old schema used explicit properties (name and files). Because of this, it was trivial to add new metadata fields in the future (e.g., bug_url, reason) without breaking the parser.

We could change it be something like:

Suggested change
```yaml
features:
- foo.html: NULL
- print-*: print
- "*": alerts
```yaml
features:
- pattern: "print-*"
features: ["print"]
- pattern: "*"
features: ["alerts"]
- pattern: "foo.html"
features: []
reason: "Excluded due to flakiness" # Easily extensible in the future. And not just rely on yaml comments that don't make it into the metadata

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we feel that future extensions are sufficiently likely (and if we think that another schema revision would be sufficiently disruptive), then I would vote for keeping the existing design as-is. As noted in this latest version of the RFC, the schema currently in use is expressive enough to implement the new semantics. And as you note, the schema currently in use also supports extension. In addition, it is the most concise out of all the designs we've considered thus far.

That said, I've read in @jgraham's feedback a general interest in promoting usability for contributors today. The latest proposal seems optimized for that. It might be easier to justify a design that favors future work if we had some indication about the likelihood/timeline for that work.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted to extend it we could do it like

features:
  - print-*:
      features: ["print"]
      reason: "All the print reftests"

That's slightly less verbose in any case and would continue to allow the optimisation where the right hand side can just be a string or a list and that's interpreted as identical to an object with just a "features" key.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like @jgraham's version of this. Thoughts on adopting this way? @jugglinmike. I am in favor of this. For now, features would be the only key.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgraham's suggestion is an extension to syntax currently proposed by the RFC, not an alternative to it. I'm in favor of supporting an extension because it would mean we would only pay the price of overhead when we need it, and because I still don't have any insight into the likelihood/timeline of additional metadata.

Although making that syntax the only option would simplify the parser, it would still increase verbosity well beyond what our current needs require. I feel that the existing syntax strikes a much stronger balance between parser simplicity, maintainability, and concision.

In any case, I think all extant proposals which include a property name for the list of web-feature IDs should be revised. In the case of the existing syntax, the property name is "name". It should instead be "ids" both to reflect the plurality of the value and to align with the WebDX terminology (where a web-feature's "name" is distinct from its "ID"). The name "ids" is especially appealing in this latest alternative because it avoids using the same property for two distinct parts of the data structure.

It seems like we're close; here are the options that are on the table right now:

  1. keep the existing syntax (but replace "name" with "ids"):
    features:
      - ids: []
        files:
        - foo.html
      - ids: [print]
        files:
        - print-*
  2. use the abbreviated syntax with an optional extension:
    features:
    - foo.html: []
    - print-*:
        ids: [print-*]
  3. use the extended form of the abbreviated syntax only:
    features:
    - foo.html:
        ids: []
    - print-*:
        ids: [print-*]

My preference is option 2 followed by option 1 followed by option 3. What do you folks think?

@jcscottiii jcscottiii Jun 17, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can go with option 2.

For background: Initially I was in favor of option 3 because I was looking ahead to increased automation (e.g., tooling like wpt-gen or LLMs). Consistent schemas (always objects) are generally easier for tools to write and modify programmatically without bugs.

Since these YAML files are internal to WPT infrastructure and changes here won't impact the final consumers of WEB_FEATURES_MANIFEST.json, I felt we had the flexibility to prioritize that tool-friendliness.

But after thinking about it, I agree with what you said that keeping the barrier to entry low for human contributors is really important (which option 2 does).

One thing I wanted to point out. I like the fact that you mentioned changing name to ids. Maybe for option 2, we could do the same for the high level key and change to files or rules. Either is okay. But, the fact that the underlying list is essentially pointing to files or rules makes more sense than features now.

files: # or rules here.
  - foo.html: [print, alerts]
  - print-*:
      ids: [print]
      reason: "All the print reftests"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

Swapping out "features" makes a lot of sense to me! I like "rules" more than "files" because the values describe more than just files.

This conversation about extensibility is worth highlighting. I've tried to do that by framing it as a risk within the RFC itself: how the proposal appears to be less extensible, but how we've considered the direction that future additions can make. This will help guide our grandchildren when they find the time to add metadata.

Thanks @jcscottiii! Thanks @jgraham!

Comment thread rfcs/web_features_exclusions.md

@jcscottiii jcscottiii left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jugglinmike Sorry it took so long on the follow up review. PTAL at the unresolved conversations.

deviation of approximately 5.7487. This reduces the benefit of optimizing for
the case of web-features with many associated file-matching patterns.

### Extendability

@jcscottiii jcscottiii Jun 18, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this section! I get that this is a risk. But supporting both the short hand and extended form out of the box will be good long term!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants