RFC 237: Exclusions in WEB_FEATURES.yml files#237
Conversation
|
Thanks @jugglinmike! The goal of establishing symbolic relationships (Single Source of Truth) is fantastic and definitely the right direction for maintainability over time. I notice the RFC leans toward a string-based micro-syntax ( I’d love for us to consider a hybrid like Schema that supports both simple strings and objects. ExampleWe keep simple strings for standard paths (no visual tax), but allow objects when a rule needs metadata (like an exclusion). features:
- name: alerts
files:
- path: ./*
exclude_ids: # Standard YAML list, zero ambiguity
- print
- logging
- name: print
files:
- ./print-* # Simple file path can stay a simple string!Compared to the originalfeatures:
- name: alerts
files:
- ./*
- "!#print"
- "!#logging"One major benefit:
|
|
Hi @jcscottiii! Thanks for your feedback! One novel aspect of your proposal is that it scopes exclusions to individual path patterns. While I think that could be tenable, it's not a capability that we've specifically felt a need for. It sounds like we're aligned on prioritizing human authors/readers. To that end, I think "scoped" exclusions may make these rules more difficult to understand since they would effectively introduce a grouping operator that hasn't been motivated by experience. (Well, not our experience, anyway. I'd be happy to hear about any instances where you wanted it!) How would you feel about expressing exclusions with a standalone object so that each list item could be either a string value or a dict with a single key (namely, features:
- name: alerts
files:
- - path: ./*
- exclude_ids: # Standard YAML list, zero ambiguity
+ - ./*
+ - exclude_ids: # Standard YAML list, zero ambiguity
- print
- logging
- name: print
files:
- ./print-* # Simple file path can stay a simple string!Anecdotally, I've only observed a small number (read: 1 to 3) of exclusions per feature entry, so a nested list like features:
- name: alerts
files:
- - path: ./*
- exclude_ids: # Standard YAML list, zero ambiguity
- - print
- - logging
+ - ./*
+ - exclude_id: print # Standard YAML string, zero ambiguity
+ - exclude_id: logging # Standard YAML string, zero ambiguity
- name: print
files:
- ./print-* # Simple file path can stay a simple string!...but I don't feel this flattening would have a huge impact on ergonomics, so I could go either way! In any case (in your proposal and in my suggested amendments), the |
|
A bit more context for the benefit of tomorrow's RFC triage call: Since this RFC is about streamlining the metadata, here's an approximation of its impact. While the syntax is still under consideration, that patch shows the order of magnitude: "130 files changed, 275 insertions(+), 791 deletions(-)" Until recently, WPT didn't formally document the Looking forward to discussing more tomorrow! |
Yes. Mapping feature ids to filename patterns makes sense if you are trying to work out "which files are part of this feature". You find the id you care about and then evaluate the given rules against all the files in the directory. That is indeed a common use case for this data, but critically it's a use case that's basically always automated. On the other hand if you have one file and want to know what features it will correspond to you have to evaluate the full set of rules. That's a use case you have when adding a file when you want to know if it will be correctly labelled, or if a rule update is required. Typically that use case isn't automated. So if we want to optimise for the latter, we should look to make it as easy as possible to figure out what features correspond to a given file. In that case having the rules be a list of patterns that could apply and stopping on the first that does apply seems likely to be much easier to work with e.g. There is arguably a bug there that a file which has both and just assuming that people can follow a naming convention rather than requiring a perfectly general syntax. In particular, I think that once you start making things additive you end up back at the starting point for this RFC which is "we need to invent an exclusion mechanism so that some rules don't apply in some cases" and then you're back at it being really hard for a human to correctly deduce the impact of the rules. |
I believe that we can achieve that goal without any changes to the schema. Since all file-matching patterns are strictly ordered in the current design (via lists of lists), we can simply change the way the files are interpreted (namely by adding strict precedence). Just like in the flat structure that @jgraham sketched out above, this will obviate the need for the This might be worth considering because the flat structure would introduce more repetition in entries that hold a lot of patterns. The pathological case is Mean: 2.3628988642509463 xychart
title "File patterns per web-feature entry"
x-axis "# of web-features" [1, 2, 3, 4, 5, 6, 7, 8, 9+]
y-axis "# of file patterns" 1 --> 869
bar [869, 748, 78, 43, 27, 17, 12, 4, 51]
Grist for the mill, in any case! Source code |
|
@jcscottiii @jgraham @gsnedders I've updated the proposal to reflect the consensus of the latest RFCs & Infrastructure meeting. |
jcscottiii
left a comment
There was a problem hiding this comment.
This looks a lot better. I have some questions. Feel free to post your thoughts on them.
| ```yaml | ||
| features: | ||
| - foo.html: NULL | ||
| - print-*: print | ||
| - "*": alerts |
There was a problem hiding this comment.
The old schema used explicit properties (name and files). Because of this, it was trivial to add new metadata fields in the future (e.g., bug_url, reason) without breaking the parser.
We could change it be something like:
| ```yaml | |
| features: | |
| - foo.html: NULL | |
| - print-*: print | |
| - "*": alerts | |
| ```yaml | |
| features: | |
| - pattern: "print-*" | |
| features: ["print"] | |
| - pattern: "*" | |
| features: ["alerts"] | |
| - pattern: "foo.html" | |
| features: [] | |
| reason: "Excluded due to flakiness" # Easily extensible in the future. And not just rely on yaml comments that don't make it into the metadata |
There was a problem hiding this comment.
If we feel that future extensions are sufficiently likely (and if we think that another schema revision would be sufficiently disruptive), then I would vote for keeping the existing design as-is. As noted in this latest version of the RFC, the schema currently in use is expressive enough to implement the new semantics. And as you note, the schema currently in use also supports extension. In addition, it is the most concise out of all the designs we've considered thus far.
That said, I've read in @jgraham's feedback a general interest in promoting usability for contributors today. The latest proposal seems optimized for that. It might be easier to justify a design that favors future work if we had some indication about the likelihood/timeline for that work.
There was a problem hiding this comment.
If we wanted to extend it we could do it like
features:
- print-*:
features: ["print"]
reason: "All the print reftests"That's slightly less verbose in any case and would continue to allow the optimisation where the right hand side can just be a string or a list and that's interpreted as identical to an object with just a "features" key.
There was a problem hiding this comment.
I like @jgraham's version of this. Thoughts on adopting this way? @jugglinmike. I am in favor of this. For now, features would be the only key.
There was a problem hiding this comment.
@jgraham's suggestion is an extension to syntax currently proposed by the RFC, not an alternative to it. I'm in favor of supporting an extension because it would mean we would only pay the price of overhead when we need it, and because I still don't have any insight into the likelihood/timeline of additional metadata.
Although making that syntax the only option would simplify the parser, it would still increase verbosity well beyond what our current needs require. I feel that the existing syntax strikes a much stronger balance between parser simplicity, maintainability, and concision.
In any case, I think all extant proposals which include a property name for the list of web-feature IDs should be revised. In the case of the existing syntax, the property name is "name". It should instead be "ids" both to reflect the plurality of the value and to align with the WebDX terminology (where a web-feature's "name" is distinct from its "ID"). The name "ids" is especially appealing in this latest alternative because it avoids using the same property for two distinct parts of the data structure.
It seems like we're close; here are the options that are on the table right now:
- keep the existing syntax (but replace
"name"with"ids"):features: - ids: [] files: - foo.html - ids: [print] files: - print-*
- use the abbreviated syntax with an optional extension:
features: - foo.html: [] - print-*: ids: [print-*]
- use the extended form of the abbreviated syntax only:
features: - foo.html: ids: [] - print-*: ids: [print-*]
My preference is option 2 followed by option 1 followed by option 3. What do you folks think?
There was a problem hiding this comment.
We can go with option 2.
For background: Initially I was in favor of option 3 because I was looking ahead to increased automation (e.g., tooling like wpt-gen or LLMs). Consistent schemas (always objects) are generally easier for tools to write and modify programmatically without bugs.
Since these YAML files are internal to WPT infrastructure and changes here won't impact the final consumers of WEB_FEATURES_MANIFEST.json, I felt we had the flexibility to prioritize that tool-friendliness.
But after thinking about it, I agree with what you said that keeping the barrier to entry low for human contributors is really important (which option 2 does).
One thing I wanted to point out. I like the fact that you mentioned changing name to ids. Maybe for option 2, we could do the same for the high level key and change to files or rules. Either is okay. But, the fact that the underlying list is essentially pointing to files or rules makes more sense than features now.
files: # or rules here.
- foo.html: [print, alerts]
- print-*:
ids: [print]
reason: "All the print reftests"There was a problem hiding this comment.
Awesome!
Swapping out "features" makes a lot of sense to me! I like "rules" more than "files" because the values describe more than just files.
This conversation about extensibility is worth highlighting. I've tried to do that by framing it as a risk within the RFC itself: how the proposal appears to be less extensible, but how we've considered the direction that future additions can make. This will help guide our grandchildren when they find the time to add metadata.
Thanks @jcscottiii! Thanks @jgraham!
jcscottiii
left a comment
There was a problem hiding this comment.
@jugglinmike Sorry it took so long on the follow up review. PTAL at the unresolved conversations.
| deviation of approximately 5.7487. This reduces the benefit of optimizing for | ||
| the case of web-features with many associated file-matching patterns. | ||
|
|
||
| ### Extendability |
There was a problem hiding this comment.
Thanks for adding this section! I get that this is a risk. But supporting both the short hand and extended form out of the box will be good long term!
Rendered