registry: auto-generate 475 entity stubs from disclosure data by kurtseifried · Pull Request #83 · CloudSecurityAlliance/SecID

kurtseifried · 2026-05-23T02:41:07Z

Closes the disclosure→entity coverage gap. 486 disclosure entries existed but only 11 had parallel entity entries — this PR ships 475 auto-generated entity stubs.

What

`scripts/generate-entity-stubs-from-disclosure.py` (new):

Reads each `registry/disclosure/.json`
Skips if any entity entry already exists for that namespace
Emits a parallel entity stub at the mirrored path with:
- `namespace`, `official_name`, `common_name`, `alternate_names` copied
- Website URL copied (CNA-specific URLs like disclosure-policy stay on the disclosure record)
- `wikidata`, `wikipedia` preserved (usually null in source data)
- `notes` cross-referencing `secid:disclosure//cna`
- `status: "draft"` with `status_notes` flagging the auto-gen and empty match_nodes
- `match_nodes: []` — the disclosure data doesn't tell us specific products
`--dry-run` flag for preview
Idempotent: re-running emits zero stubs if all disclosure entries have entity coverage

Bulk result: 475 new entity files created across the appropriate reverse-DNS directories.

Test plan

All 690 entity JSON files (475 new + 215 existing) parse cleanly
All 475 new entries have required schema fields (`schema_version`, `namespace`, `type`, `status`, `official_name`) and `type: entity`
`scripts/validate-subtypes.py` passes — none of the stubs introduce new subtype values (match_nodes are empty)
After merge + auto-deploy: `curl 'https://secid.cloudsecurityalliance.org/api/v1/resolve?secid=secid:entity/adobe.com'\` returns the new Adobe entity entry

Notes on the result

The stubs are intentionally minimal. Future human research per vendor will populate `match_nodes` with product/service patterns. The stub just makes the entity citable by namespace.
The reference-type coverage gap remains. 397 disclosure entries still lack reference entries; that's a separate problem because "what's worth referencing" needs per-vendor judgement (which blog? which advisory archive? which whitepapers page?) that doesn't auto-generate cleanly.
Why `match_nodes: []` is OK: existing entity entries like `amazon.com` already use this pattern — namespace-level metadata only, no sub-resolution. Resolves at `secid:entity/`.

🤖 Generated with Claude Code

Closes the disclosure->entity coverage gap audited earlier: 486 disclosure entries (mostly CNAs) existed but only 11 had parallel entity entries. This commit ships 475 auto-generated entity stubs, one for each disclosure namespace that didn't already have an entity record. Each stub carries: - namespace, official_name, common_name, alternate_names, urls (website only — CNA-specific URLs like disclosure-policy stay on the disclosure record where they belong) - wikidata, wikipedia (usually null in the source disclosure data; preserved as-is for opportunistic future population) - notes pointing at the companion secid:disclosure/<ns>/cna for the vulnerability-reporting program - status: "draft" with status_notes explicitly marking the stub as auto-generated and flagging that match_nodes are empty pending human research - match_nodes: [] (the disclosure data doesn't tell us specific products or services to pattern-match; that's per-vendor research) scripts/generate-entity-stubs-from-disclosure.py (new): - Idempotent: re-running emits zero stubs if all disclosure entries already have entity coverage - Inventories existing entity files by namespace (not just by path) so an entity record at a non-mirror path still skips creation - --dry-run flag for preview - Preserves disclosure-side directory structure under entity/ Validation: - All 690 entity JSON files (475 new + 215 existing) parse cleanly - All 475 new entries have the required schema fields and type: entity - scripts/validate-subtypes.py passes (none of the stubs introduce new subtype values) Future work (not in this PR): - Per-vendor human research to populate match_nodes with product patterns (e.g., for Adobe, entries for Reader, Acrobat, Photoshop...) - Reference-type coverage is separately uneven (89 of 486 disclosure entries have parallel reference entries); flagged but not addressed here — reference content requires per-vendor judgement (which blog, which advisory page) that doesn't auto-generate cleanly Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

- Update stale namespace counts (1,151 -> 1,768) in three places; table refreshed by scripts/update-counts.sh. - Add five scripts missing from the Scripts section (generate-entity-stubs-from-disclosure, check-security-txt, scan-well-known, scan-mcp-endpoints, validate-subtypes). - Document the auto-generated-entity-stub pattern introduced in PR #83 so future contributors know thin entity files are deliberate. - Trim Repository Structure tree (45 -> 14 lines); collapse the exhaustive docs/ enumeration to one line pointing at the Document Map above, which already routes by question. Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>

kurtseifried merged commit 892e241 into main May 23, 2026
1 check passed

kurtseifried deleted the feat/entity-stubs-from-disclosure branch May 23, 2026 02:41

This was referenced May 23, 2026

Resolver returns empty results for namespace-level entity queries with empty match_nodes CloudSecurityAlliance/SecID-Service#11

Open

CLAUDE.md: refresh counts, document new scripts, trim repo tree #84

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

registry: auto-generate 475 entity stubs from disclosure data#83

registry: auto-generate 475 entity stubs from disclosure data#83
kurtseifried merged 1 commit into
mainfrom
feat/entity-stubs-from-disclosure

kurtseifried commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kurtseifried commented May 23, 2026

What

Test plan

Notes on the result

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant