registry: auto-generate 475 entity stubs from disclosure data#83
Merged
Conversation
Closes the disclosure->entity coverage gap audited earlier: 486 disclosure entries (mostly CNAs) existed but only 11 had parallel entity entries. This commit ships 475 auto-generated entity stubs, one for each disclosure namespace that didn't already have an entity record. Each stub carries: - namespace, official_name, common_name, alternate_names, urls (website only — CNA-specific URLs like disclosure-policy stay on the disclosure record where they belong) - wikidata, wikipedia (usually null in the source disclosure data; preserved as-is for opportunistic future population) - notes pointing at the companion secid:disclosure/<ns>/cna for the vulnerability-reporting program - status: "draft" with status_notes explicitly marking the stub as auto-generated and flagging that match_nodes are empty pending human research - match_nodes: [] (the disclosure data doesn't tell us specific products or services to pattern-match; that's per-vendor research) scripts/generate-entity-stubs-from-disclosure.py (new): - Idempotent: re-running emits zero stubs if all disclosure entries already have entity coverage - Inventories existing entity files by namespace (not just by path) so an entity record at a non-mirror path still skips creation - --dry-run flag for preview - Preserves disclosure-side directory structure under entity/ Validation: - All 690 entity JSON files (475 new + 215 existing) parse cleanly - All 475 new entries have the required schema fields and type: entity - scripts/validate-subtypes.py passes (none of the stubs introduce new subtype values) Future work (not in this PR): - Per-vendor human research to populate match_nodes with product patterns (e.g., for Adobe, entries for Reader, Acrobat, Photoshop...) - Reference-type coverage is separately uneven (89 of 486 disclosure entries have parallel reference entries); flagged but not addressed here — reference content requires per-vendor judgement (which blog, which advisory page) that doesn't auto-generate cleanly Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
kurtseifried
added a commit
that referenced
this pull request
May 28, 2026
- Update stale namespace counts (1,151 -> 1,768) in three places; table refreshed by scripts/update-counts.sh. - Add five scripts missing from the Scripts section (generate-entity-stubs-from-disclosure, check-security-txt, scan-well-known, scan-mcp-endpoints, validate-subtypes). - Document the auto-generated-entity-stub pattern introduced in PR #83 so future contributors know thin entity files are deliberate. - Trim Repository Structure tree (45 -> 14 lines); collapse the exhaustive docs/ enumeration to one line pointing at the Document Map above, which already routes by question. Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes the disclosure→entity coverage gap. 486 disclosure entries existed but only 11 had parallel entity entries — this PR ships 475 auto-generated entity stubs.
What
`scripts/generate-entity-stubs-from-disclosure.py` (new):
Bulk result: 475 new entity files created across the appropriate reverse-DNS directories.
Test plan
Notes on the result
🤖 Generated with Claude Code