Add OWASP AI resource importers#858
Conversation
|
Screenshot of OWASP Top 10 2021 (Already added) :
Addition :There is an addition of OWASP families with reference to issue #471 which includes OWASP Top 10 2025 (https://owasp.org/Top10/2025/), OWASP API Security Top 10 2023 (https://owasp.org/API-Security/editions/2023/en/), OWASP Top 10 for LLM and GenAI Apps 2025 (https://genai.owasp.org/llmrisk/), OWASP AI Security Verfication Standard (https://github.com/OWASP/AISVS/tree/main/1.0/en/). Screenshot of OWASP Top 10 2025 :
Screenshot of OWASP API Security Top 10 2023 :
Screenshot of OWASP Top 10 for LLM and GenAI Apps 2025 :
Screenshot of OWASP AI Security Verification Standard (AISVS) :
|
|
Requesting kind reviews and feedback for this feature from : @northdpole , @Pa04rth , @robvanderveer |
42c4e6b to
fab3098
Compare
b6290af to
afdcaf6
Compare
5f0f501 to
499bb38
Compare
4fdd836 to
6b5d70b
Compare
6b5d70b to
548f6fb
Compare
|
Warning Review limit reached
More reviews will be available in 12 minutes and 13 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThis PR adds parser implementations for four OWASP security standards (AISVS 1.0, API Top 10 2023, LLM Top 10 2025, Top 10 2025), each with bundled JSON data, a parser class that converts standards to CRE links, and unit tests. The cheatsheets parser is updated to use official OWASP URLs, and the CLI gains six import flags for the new standards. ChangesOWASP Standards Integration
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
application/utils/external_project_parsers/parsers/owasp_aisvs.py (2)
13-45: ⚡ Quick winConsider a shared base parser to remove duplication.
This parser is structurally identical to
owasp_llm_top10_2025.py(and the API/Top10 2025 parsers in this stack), differing only innameanddata_file. A small JSON-backed base class would collapse four copies of the load/convert/link loop into one and keep future OWASP datasets consistent.♻️ Sketch of a shared base
class _JsonStandardParser(ParserInterface): name: str data_file: Path def parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler): with self.data_file.open("r", encoding="utf-8") as handle: raw_entries = json.load(handle) entries = [] for entry in raw_entries: standard = defs.Standard( name=self.name, sectionID=entry["section_id"], section=entry["section"], hyperlink=entry["hyperlink"], ) for cre_id in entry.get("cre_ids", []): cres = cache.get_CREs(external_id=cre_id) if cres: standard.add_link( defs.Link( ltype=defs.LinkTypes.LinkedTo, document=cres[0].shallow_copy(), ) ) entries.append(standard) return ParseResult( results={self.name: entries}, calculate_gap_analysis=False, calculate_embeddings=False, )Each concrete parser then only declares
nameanddata_file.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@application/utils/external_project_parsers/parsers/owasp_aisvs.py` around lines 13 - 45, The OwaspAisvs parser duplicates parsing logic present in other OWASP parsers (e.g., owasp_llm_top10_2025.py); refactor by extracting the JSON-driven loop into a small base class (suggested name _JsonStandardParser) that defines the parse method and expects subclasses to supply name and data_file attributes, then have OwaspAisvs inherit that base and only set name and data_file; ensure the base parse uses defs.Standard, defs.Link/LinkTypes, cache.get_CREs(external_id=...), standard.add_link(...), and returns ParseResult(... with calculate_gap_analysis=False and calculate_embeddings=False) so all concrete parsers collapse to just declaring those two attributes.
29-38: Fix the CRE lookup concern:external_idcorrectly resolves seeded CREs
cache.get_CREssupportsexternal_id(it filtersCRE.external_id == external_id), andadd_crepersistsdefs.CRE(id=...)intoCRE.external_id(usesexternal_id=cre.id). Socache.get_CREs(external_id=cre_id)should retrieve the same CREs that were added withid=..., making the link creation atowasp_aisvs.pylines 29-38 consistent.
- Optional: reduce duplication across the near-identical OWASP parsers (likely via a shared base class/config since they differ mainly by name/data file).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@application/utils/external_project_parsers/parsers/owasp_aisvs.py` around lines 29 - 38, The CRE lookup is correct: cache.get_CREs supports filtering by external_id and seeded CREs set CRE.external_id via add_cre (CRE.external_id = cre.id), so leave the loop in owasp_aisvs.py that calls cache.get_CREs(external_id=cre_id) and then standard.add_link(defs.Link(... ltype=defs.LinkTypes.LinkedTo, document=cres[0].shallow_copy())) intact; ensure you do not change the external_id parameter usage and, if desired, refactor duplicated logic across OWASP parsers into a shared helper or base class to reduce near-identical code (centralize the loop that calls cache.get_CREs, constructs defs.Link, and calls standard.add_link).application/utils/external_project_parsers/parsers/owasp_api_top10_2023.py (1)
13-47: 🏗️ Heavy liftConsider extracting a shared base parser to remove duplication.
This
parsebody is identical toowasp_top10_2025.py(and will repeat across the LLM/AISVS/Kubernetes parsers in this stack). Onlynameanddata_filediffer. Consolidating into a single base avoids drift across ~8 near-identical files.♻️ Suggested shared base (each parser then sets only `name`/`data_file`)
# e.g. in a new module application/utils/external_project_parsers/parsers/_owasp_standard_base.py class OwaspStandardJsonParser(ParserInterface): name: str data_file: Path def parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler): with self.data_file.open("r", encoding="utf-8") as handle: raw_entries = json.load(handle) entries = [] for entry in raw_entries: standard = defs.Standard( name=self.name, sectionID=entry["section_id"], section=entry["section"], hyperlink=entry["hyperlink"], ) for cre_id in entry.get("cre_ids", []): cres = cache.get_CREs(external_id=cre_id) if not cres: continue standard.add_link( defs.Link( ltype=defs.LinkTypes.LinkedTo, document=cres[0].shallow_copy(), ) ) entries.append(standard) return ParseResult( results={self.name: entries}, calculate_gap_analysis=False, calculate_embeddings=False, )class OwaspApiTop10_2023(OwaspStandardJsonParser): name = "OWASP API Security Top 10 2023" data_file = ( Path(__file__).resolve().parent.parent / "data" / "owasp_api_top10_2023.json" )Caveat: confirm the parser auto-discovery mechanism does not pick up/instantiate the abstract base itself (e.g. keep it out of the discovered module path or mark it non-instantiable).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@application/utils/external_project_parsers/parsers/owasp_api_top10_2023.py` around lines 13 - 47, The parse implementation in OwaspApiTop10_2023 is duplicated across many parsers; extract the shared logic into a single base class (e.g., OwaspStandardJsonParser) that defines parse(self, cache, ph) and uses self.name and self.data_file, then have OwaspApiTop10_2023 inherit from that base and only set name and data_file; ensure the new base is placed where the parser auto-discovery won’t instantiate it (or mark it abstract) and update other similar parser classes to subclass OwaspStandardJsonParser to remove duplication.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@application/tests/owasp_aisvs_parser_test.py`:
- Around line 51-53: The test uses an ambiguous loop variable "l" in the list
comprehensions (e.g., [l.document.id for l in entries[0].links]) which triggers
Ruff E741; rename "l" to "link" in those comprehensions (and the other
occurrence around the same test, e.g., the one at the second occurrence) to
improve readability and satisfy the linter, updating any matching references in
the same expressions or nearby assertions (e.g., change [l.document.id for l in
...] to [link.document.id for link in ...]).
In `@application/tests/owasp_api_top10_2023_parser_test.py`:
- Around line 39-43: The test uses an ambiguous loop variable name `l` in two
list comprehensions ([l.document.id for l in entries[0].links] and
[l.document.id for l in entries[-1].links]) which triggers Ruff E741; rename `l`
to a descriptive name such as `link` (or `entry_link`) in both comprehensions so
they read like [link.document.id for link in entries[0].links] and
[link.document.id for link in entries[-1].links], keeping the rest of the
assertions unchanged in owasp_api_top10_2023_parser_test.py.
In `@application/utils/external_project_parsers/data/owasp_aisvs_1_0.json`:
- Line 5: In the owasp_aisvs_1_0.json hyperlink entries, replace all GitHub
"/tree/" file URLs with the canonical "/blob/" path and correct the two broken
filenames so they point to the actual file names on GitHub: update the entries
that reference "0x10-C01-Training-Data-Governance.md" and
"0x10-C02-User-Input-Validation.md" to the correct filenames as found in the
OWASP AISVS repo, and ensure every object with a "hyperlink" property (including
the ones listed in the comment) uses the "/blob/" form consistently.
In `@cre.py`:
- Around line 170-199: The new CLI flags added in cre.py (--owasp_top10_2025_in,
--owasp_api_top10_2023_in, --owasp_kubernetes_top10_2022_in,
--owasp_kubernetes_top10_2025_in, --owasp_llm_top10_2025_in, --owasp_aisvs_in)
are not wired into cre_main.run and the two Kubernetes parser modules are
missing; update cre_main.run to read these flags from the parsed args and
dispatch to the corresponding import/parsing routines for the existing parsers
(owasp_top10_2025, owasp_api_top10_2023, owasp_llm_top10_2025, owasp_aisvs) by
invoking their parser/ingest functions, and for --owasp_kubernetes_top10_2022_in
and --owasp_kubernetes_top10_2025_in either add/implement the Kubernetes parser
modules under application/utils/external_project_parsers/parsers or guard
cre_main.run to skip/emit a warning when those flags are set but the parser
modules are absent so the CLI does not error.
---
Nitpick comments:
In `@application/utils/external_project_parsers/parsers/owasp_aisvs.py`:
- Around line 13-45: The OwaspAisvs parser duplicates parsing logic present in
other OWASP parsers (e.g., owasp_llm_top10_2025.py); refactor by extracting the
JSON-driven loop into a small base class (suggested name _JsonStandardParser)
that defines the parse method and expects subclasses to supply name and
data_file attributes, then have OwaspAisvs inherit that base and only set name
and data_file; ensure the base parse uses defs.Standard, defs.Link/LinkTypes,
cache.get_CREs(external_id=...), standard.add_link(...), and returns
ParseResult(... with calculate_gap_analysis=False and
calculate_embeddings=False) so all concrete parsers collapse to just declaring
those two attributes.
- Around line 29-38: The CRE lookup is correct: cache.get_CREs supports
filtering by external_id and seeded CREs set CRE.external_id via add_cre
(CRE.external_id = cre.id), so leave the loop in owasp_aisvs.py that calls
cache.get_CREs(external_id=cre_id) and then standard.add_link(defs.Link(...
ltype=defs.LinkTypes.LinkedTo, document=cres[0].shallow_copy())) intact; ensure
you do not change the external_id parameter usage and, if desired, refactor
duplicated logic across OWASP parsers into a shared helper or base class to
reduce near-identical code (centralize the loop that calls cache.get_CREs,
constructs defs.Link, and calls standard.add_link).
In `@application/utils/external_project_parsers/parsers/owasp_api_top10_2023.py`:
- Around line 13-47: The parse implementation in OwaspApiTop10_2023 is
duplicated across many parsers; extract the shared logic into a single base
class (e.g., OwaspStandardJsonParser) that defines parse(self, cache, ph) and
uses self.name and self.data_file, then have OwaspApiTop10_2023 inherit from
that base and only set name and data_file; ensure the new base is placed where
the parser auto-discovery won’t instantiate it (or mark it abstract) and update
other similar parser classes to subclass OwaspStandardJsonParser to remove
duplication.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: cbc73005-7fda-4e74-a411-99cb8e70a3f0
📒 Files selected for processing (15)
application/tests/cheatsheets_parser_test.pyapplication/tests/owasp_aisvs_parser_test.pyapplication/tests/owasp_api_top10_2023_parser_test.pyapplication/tests/owasp_llm_top10_2025_parser_test.pyapplication/tests/owasp_top10_2025_parser_test.pyapplication/utils/external_project_parsers/data/owasp_aisvs_1_0.jsonapplication/utils/external_project_parsers/data/owasp_api_top10_2023.jsonapplication/utils/external_project_parsers/data/owasp_llm_top10_2025.jsonapplication/utils/external_project_parsers/data/owasp_top10_2025.jsonapplication/utils/external_project_parsers/parsers/cheatsheets_parser.pyapplication/utils/external_project_parsers/parsers/owasp_aisvs.pyapplication/utils/external_project_parsers/parsers/owasp_api_top10_2023.pyapplication/utils/external_project_parsers/parsers/owasp_llm_top10_2025.pyapplication/utils/external_project_parsers/parsers/owasp_top10_2025.pycre.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
application/cmd/cre_main.py (1)
927-934: ⚡ Quick winConsider raising an error instead of silently skipping missing parsers.
The current implementation logs a warning but continues silently when users specify the Kubernetes Top 10 flags. This could confuse users who expect the import to succeed or receive an error.
Consider one of these approaches:
- Preferred: Don't add these CLI flags until the parser modules are ready
- Alternative: Raise a clear error (not just a warning) to inform users the feature is unavailable
💡 Proposed fix to raise an error
if getattr(args, "owasp_kubernetes_top10_2022_in", False): - logger.warning( - "--owasp_kubernetes_top10_2022_in requested, but no Kubernetes 2022 parser module is present in this branch; skipping" - ) + raise NotImplementedError( + "--owasp_kubernetes_top10_2022_in is not yet implemented. The Kubernetes 2022 parser module is not available in this branch." + ) if getattr(args, "owasp_kubernetes_top10_2025_in", False): - logger.warning( - "--owasp_kubernetes_top10_2025_in requested, but no Kubernetes 2025 parser module is present in this branch; skipping" - ) + raise NotImplementedError( + "--owasp_kubernetes_top10_2025_in is not yet implemented. The Kubernetes 2025 parser module is not available in this branch." + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@application/cmd/cre_main.py` around lines 927 - 934, The warning-only behavior for the Kubernetes Top 10 CLI flags (getattr(args, "owasp_kubernetes_top10_2022_in", False) and getattr(args, "owasp_kubernetes_top10_2025_in", False)) is confusing; either remove these CLI flags until parser modules exist or change the current checks to raise a clear error instead of calling logger.warning. Update the cre_main.py logic that currently calls logger.warning(...) to instead raise a descriptive exception (e.g., RuntimeError or argparse.ArgumentTypeError) when those flags are set, or remove the flag registration so users cannot pass --owasp_kubernetes_top10_2022_in / --owasp_kubernetes_top10_2025_in until the parsers are implemented.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@application/cmd/cre_main.py`:
- Around line 927-934: The warning-only behavior for the Kubernetes Top 10 CLI
flags (getattr(args, "owasp_kubernetes_top10_2022_in", False) and getattr(args,
"owasp_kubernetes_top10_2025_in", False)) is confusing; either remove these CLI
flags until parser modules exist or change the current checks to raise a clear
error instead of calling logger.warning. Update the cre_main.py logic that
currently calls logger.warning(...) to instead raise a descriptive exception
(e.g., RuntimeError or argparse.ArgumentTypeError) when those flags are set, or
remove the flag registration so users cannot pass
--owasp_kubernetes_top10_2022_in / --owasp_kubernetes_top10_2025_in until the
parsers are implemented.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: f7b808f3-d588-4191-93d3-3177efff6473
📒 Files selected for processing (4)
application/cmd/cre_main.pyapplication/tests/owasp_aisvs_parser_test.pyapplication/tests/owasp_api_top10_2023_parser_test.pyapplication/utils/external_project_parsers/data/owasp_aisvs_1_0.json
✅ Files skipped from review due to trivial changes (1)
- application/utils/external_project_parsers/data/owasp_aisvs_1_0.json
🚧 Files skipped from review as they are similar to previous changes (2)
- application/tests/owasp_api_top10_2023_parser_test.py
- application/tests/owasp_aisvs_parser_test.py
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
06581ca to
cc633db
Compare





Summary
This PR adds AI-related OWASP resource importer support for issue #471.
It introduces the parser/data/test layer for:
This is the first upstream PR in the stacked
#471review series.What changed
Why this is split out
The full
#471work is too large to review effectively as one PR.This PR isolates one OWASP resource family so the parser/data model can be reviewed independently before the later Kubernetes, cheat sheet, backend analysis, and frontend changes.
Validation