Add 247 entity stubs from CSA public member list#87
Merged
Conversation
New ingestion track parallel to the CNA disclosure-stub generator. Reads the CSA website member roster (entity_name + domain) and emits draft entity stubs for members with no existing entity record. - scripts/generate-entity-from-csa-members.py: generator (domain cleanup for scraped junk, skip-by-namespace so existing/hand-curated files are never overwritten, deterministic output, --dry-run) - 247 new registry/entity/**/*.json: status draft, empty match_nodes, same fidelity as the existing CNA stubs - Generated untagged (no subtype:) to match the rest of the entity registry; the entity subtype catalog is still being designed. Provenance notes flag them as candidates for ["organization","company"] in a later sweep. - CSA membership itself is intentionally NOT encoded — that is a relationship-layer fact, not an intrinsic entity property. Skipped: 44 namespaces already present (untouched), 3 in-CSV duplicate domains, 10 rows with no usable domain (need a domain hand-filled). All 2015 registry files validate against the schema; subtype check clean. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds 247 new entity stub records generated from the CSA public member roster (
csa-website-members-2026-06-08.csv), plus a reusable generator script.This is a third entity-ingestion track alongside the existing CNA disclosure-stub generator and hand-curated entries. The CSA member list is a curated, security-relevant roster (overwhelmingly commercial vendors) that fills gaps the CNA-derived stubs miss — e.g. members like Abnormal AI, A-LIGN, AVEVA had no entity record at all.
How
scripts/generate-entity-from-csa-members.py(parallel togenerate-entity-stubs-from-disclosure.py):www., zero-width chars, trailing protocol junk like"huaweicloud.com http:")--dry-runDesign notes
subtype:) to match the current state of the entity registry. The entity subtype catalog (organization/product/service/model/dataset+ org-nature tags) is still being designed; provenance notes flag these as prime candidates for["organization","company"]in a later tagging sweep.match_nodes: []), so no regex-safety considerations.Validation
Both CI checks pass locally:
validate-registry-schema.py→ all 2015 files validate against the schemavalidate-subtypes.py→ clean (new files add no subtypes)Follow-ups (not in this PR)
🤖 Generated with Claude Code