Skip to content

Commit 400b975

Browse files
committed
SARIF reporter
SARIF is a unified file format used to exchange information between static analysis tools (like pylint) and various types of formatters, meta-runners, broadcasters / alert system, ... This implementation is ad-hoc, and non-validating. Spec v Github ------------- Turns out Github both doesn't implement all of SARIF (which makes sense) and requires a bunch of properties which the spec considers optional. The [official SARIF validator][] (linked to by both oasis and github) was used to validate the output of the reporter, ensuring that all the github requirements it flags are fulfilled, and fixing *some* of the validator's pet issues. As of now the following issues are left unaddressed: - azure requires `run.automationDetails`, looking at the spec I don't think it makes sense for the reporter to inject that, it's more up to the CI - the validator wants a `run.versionControlProvenance`, same as above - the validator wants rule names in PascalCase, lol - the validator wants templated result messages, but without pylint providing the args as part of the `Message` that's a bit of a chore - the validator wants `region` to include a snippet (the flagged content) - the validator wants `physicalLocation` to have a `contextRegion` (most likely with a snippet) On URIs ------- The reporter makes use of URIs for artifacts (~files). Per ["guidance on the use of artifactLocation objects"][3.4.7], `uri` *should* capture the deterministic part of the artifact location and `uriBaseId` *should* capture the non-deterministic part. However as far as I can tell pylint has no requirement (and no clean way to require) consistent resolution roots: `path` is just relative to the cwd, and there is no requirement to have project-level files to use pylint. This makes the use of relative uris dodgy, but absolute uris are pretty much always broken for the purpose of *interchange* so they're not really any better. As a side-note, Github [asserts][relative-uri-guidance] > While this [nb: `originalUriBaseIds`] is not required by GitHub for > the code scanning results to be displayed correctly, it is required > to produce a valid SARIF output when using relative URI references. However per [3.4.4][] this is incorrect, the `uriBaseId` can be resolved through end-user configuration, `originalUriBaseIds`, external information (e.g. envvars), or heuristics. It would be nice to document the "relative root" via `originalUriBaseIds` (which may be omitted for that purpose per [3.14.14][], but per the above claiming a consistent project root is dodgy. We *could* resolve known project files (e.g. pyproject.toml, tox.ini, etc...) in order to find a consistent root (project root, repo root, ...) and set / use that for relative URIs but that's a lot of additional complexity which I'm not sure is warranted at least for a first version. Fixes #5493 [3.4.4]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html#_Toc10540869 [3.4.7]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html#_Toc10540872 [3.14.14]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html#_Toc10540936 [relative-uri-guidance]: https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/sarif-support-for-code-scanning#relative-uri-guidance-for-sarif-producers [official SARIF validator]: https://sarifweb.azurewebsites.net/
1 parent 7588243 commit 400b975

4 files changed

Lines changed: 300 additions & 5 deletions

File tree

pylint/lint/base_options.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,8 @@ def _make_linter_options(linter: PyLinter) -> Options:
104104
"group": "Reports",
105105
"help": "Set the output format. Available formats are: 'text', "
106106
"'parseable', 'colorized', 'json2' (improved json format), 'json' "
107-
"(old json format), msvs (visual studio) and 'github' (GitHub actions). "
107+
"(old json format), msvs (visual studio), 'github' (GitHub actions), "
108+
"and 'sarif'. "
108109
"You can also give a reporter class, e.g. mypackage.mymodule."
109110
"MyReporterClass.",
110111
"kwargs": {"linter": linter},

pylint/reporters/json_reporter.py

Lines changed: 108 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,16 +7,22 @@
77
from __future__ import annotations
88

99
import json
10-
from typing import TYPE_CHECKING, TypedDict
10+
import sys
11+
from textwrap import shorten
12+
from typing import TYPE_CHECKING, TypedDict, TextIO
1113

14+
from pylint.constants import MSG_TYPES
15+
16+
import pylint
1217
from pylint.interfaces import CONFIDENCE_MAP, UNDEFINED
1318
from pylint.message import Message
1419
from pylint.reporters.base_reporter import BaseReporter
15-
from pylint.typing import MessageLocationTuple
20+
from pylint.typing import MessageLocationTuple, MessageTypesFullName
1621

1722
if TYPE_CHECKING:
1823
from pylint.lint.pylinter import PyLinter
1924
from pylint.reporters.ureports.nodes import Section
25+
from pylint.reporters import sarif_types
2026

2127
# Since message-id is an invalid name we need to use the alternative syntax
2228
OldJsonExport = TypedDict(
@@ -196,6 +202,106 @@ def serialize_stats(self) -> dict[str, str | int | dict[str, int]]:
196202
}
197203

198204

205+
class SARIFReporter(BaseReporter):
206+
name = "sarif"
207+
extension = "sarif"
208+
linter: PyLinter
209+
210+
def display_reports(self, layout: Section) -> None:
211+
"""Don't do anything in this reporter."""
212+
213+
def _display(self, layout: Section) -> None:
214+
"""Do nothing."""
215+
216+
def display_messages(self, layout: Section | None) -> None:
217+
"""Launch layouts display."""
218+
output: sarif_types.Log = {
219+
"version": "2.1.0",
220+
"$schema": "https://docs.oasis-open.org/sarif/sarif/v2.1.0/cs01/schemas/sarif-schema-2.1.0.json",
221+
"runs": [{
222+
"tool": {
223+
"driver": {
224+
"name": "pylint",
225+
"fullName": f"pylint {pylint.__version__}",
226+
"version": pylint.__version__,
227+
# should be versioned but not all versions are kept so...
228+
"informationUri": "https://pylint.readthedocs.io/",
229+
"rules": [
230+
{
231+
"id": m.msgid,
232+
"name": m.symbol,
233+
"deprecatedIds": [msgid for msgid, _ in m.old_names],
234+
"deprecatedNames": [name for _, name in m.old_names],
235+
# per 3.19.19 shortDescription should be a
236+
# single sentence which can't be guaranteed,
237+
# however github requires it...
238+
"shortDescription": {'text': m.description.split(".", 1)[0]},
239+
# github requires that this is less than 1024 characters
240+
"fullDescription": {'text': shorten(m.description, 1024, placeholder="...")},
241+
"help": {"text": m.format_help()},
242+
"helpUri": f"https://pylint.readthedocs.io/en/stable/user_guide/messages/{MSG_TYPES[m.msgid[0]]}/{m.symbol}.html"
243+
# handle_message only gets the formatted message,
244+
# so to use `messageStrings` we'd need to
245+
# convert the templating and extract the args
246+
# out of the msg
247+
}
248+
for checker in self.linter.get_checkers()
249+
for m in checker.messages
250+
if m.symbol in self.linter.stats.by_msg
251+
]
252+
}
253+
},
254+
"results": [self.serialize(message) for message in self.messages],
255+
}]
256+
}
257+
json.dump(output, self.out)
258+
259+
@staticmethod
260+
def serialize(message: Message) -> sarif_types.Result:
261+
region: sarif_types.Region = {
262+
"startLine": message.line,
263+
"startColumn": message.column + 1,
264+
"endLine": message.end_line or message.line,
265+
"endColumn": (message.end_column or message.column) + 1,
266+
}
267+
268+
location: sarif_types.Location = {
269+
"physicalLocation": {
270+
"artifactLocation": {
271+
"uri": message.path.replace('\\', '/'),
272+
},
273+
"region": region,
274+
},
275+
}
276+
if message.obj:
277+
logical_location: sarif_types.LogicalLocation = {
278+
"name": message.obj,
279+
"fullyQualifiedName": f"{message.module}.{message.obj}",
280+
}
281+
location["logicalLocations"] = [logical_location]
282+
283+
return {
284+
"ruleId": message.msg_id,
285+
"message": {"text": message.msg},
286+
"level": CATEGORY_MAP[message.category],
287+
"locations": [location],
288+
"partialFingerprints": {
289+
# encoding the node path seems like it would be useful to dedup alerts?
290+
"nodePath/v1": "",
291+
}
292+
}
293+
294+
CATEGORY_MAP: dict[str, sarif_types.ResultLevel] = {
295+
"convention": "note",
296+
"refactor": "note",
297+
"statement": "note",
298+
"info": "note",
299+
"warning": "warning",
300+
"error": "error",
301+
"fatal": "error",
302+
}
303+
199304
def register(linter: PyLinter) -> None:
200305
linter.register_reporter(JSONReporter)
201306
linter.register_reporter(JSON2Reporter)
307+
linter.register_reporter(SARIFReporter)

pylint/reporters/sarif_types.py

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
"""
2+
SARIF is a pretty sprawling and deeply nested format, which makes it quite
3+
verbose to express in a relatively classic type system like Python's.
4+
5+
As such, this module provides the subset of the SARIF schema necessary for
6+
pylint's output, translated to Python types.
7+
"""
8+
from __future__ import annotations
9+
10+
from typing import TypedDict, Literal
11+
12+
13+
class Run(TypedDict):
14+
tool: Tool
15+
# invocation parameters / environment for the tool
16+
# invocation: list[Invocations]
17+
results: list[Result]
18+
# originalUriBaseIds: dict[str, ArtifactLocation]
19+
20+
21+
Log = TypedDict(
22+
"Log",
23+
{
24+
"version": Literal["2.1.0"],
25+
"$schema": Literal["https://docs.oasis-open.org/sarif/sarif/v2.1.0/cs01/schemas/sarif-schema-2.1.0.json"],
26+
"runs": list[Run],
27+
}
28+
)
29+
30+
31+
class Tool(TypedDict):
32+
driver: Driver
33+
34+
35+
class Driver(TypedDict):
36+
name: Literal["pylint"]
37+
# optional but azure wants it
38+
fullName: str
39+
version: str
40+
informationUri: str # not required but validator wants it
41+
rules: list[ReportingDescriptor]
42+
43+
44+
class ReportingDescriptorOpt(TypedDict, total=False):
45+
deprecatedIds: list[str]
46+
deprecatedNames: list[str]
47+
messageStrings: dict[str, MessageString]
48+
49+
50+
class ReportingDescriptor(ReportingDescriptorOpt):
51+
id: str
52+
# optional but validator really wants it (then complains that it's not pascal cased)
53+
name: str
54+
# not required per spec but required by github
55+
shortDescription: MessageString
56+
fullDescription: MessageString
57+
help: MessageString
58+
helpUri: str
59+
60+
61+
class MarkdownMessageString(TypedDict, total=False):
62+
markdown: str
63+
64+
65+
class MessageString(MarkdownMessageString):
66+
text: str
67+
68+
69+
ResultLevel = Literal["none", "note", "warning", "error"]
70+
71+
72+
class ResultOpt(TypedDict, total=False):
73+
ruleId: str
74+
ruleIndex: int
75+
76+
level: ResultLevel
77+
78+
79+
class Result(ResultOpt):
80+
message: Message
81+
# not required per spec but required by github
82+
locations: list[Location]
83+
partialFingerprints: dict[str, str]
84+
85+
86+
class Message(TypedDict, total=False):
87+
# needs to have either text or id but it's a PITA to type
88+
89+
#: plain text message string (can have markdown links but no other formatting)
90+
text: str
91+
#: formatted GFM text
92+
markdown: str
93+
#: rule id
94+
id: str
95+
#: arguments for templated rule messages
96+
arguments: list[str]
97+
98+
99+
class Location(TypedDict, total=False):
100+
physicalLocation: PhysicalLocation # actually required by github
101+
logicalLocations: list[LogicalLocation]
102+
103+
104+
class PhysicalLocation(TypedDict):
105+
artifactLocation: ArtifactLocation
106+
# not required per spec, required by github
107+
region: Region
108+
109+
class ArtifactLocation(TypedDict, total=False):
110+
uri: str
111+
#: id of base URI for resolving relative `uri`
112+
uriBaseId: str
113+
description: Message
114+
115+
116+
class LogicalLocation(TypedDict, total=False):
117+
name: str
118+
fullyQualifiedName: str
119+
#: schema is `str` with a bunch of *suggested* terms, of which this is a subset
120+
kind: Literal['function', 'member', 'module', 'parameter', 'returnType', 'type', 'variable']
121+
122+
123+
class Region(TypedDict):
124+
# none required per spec, all required by github
125+
startLine: int
126+
startColumn: int
127+
endLine: int
128+
endColumn: int

tests/reporters/unittest_json_reporter.py

Lines changed: 62 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,11 @@
1313

1414
import pytest
1515

16-
from pylint import checkers
16+
from pylint import checkers, __version__
1717
from pylint.interfaces import HIGH, UNDEFINED
1818
from pylint.lint import PyLinter
1919
from pylint.message import Message
20-
from pylint.reporters.json_reporter import JSON2Reporter, JSONReporter
20+
from pylint.reporters.json_reporter import JSON2Reporter, JSONReporter, SARIFReporter
2121
from pylint.reporters.ureports.nodes import EvaluationSection
2222
from pylint.typing import MessageLocationTuple
2323

@@ -263,3 +263,63 @@ def test_json2_result_with_broken_score() -> None:
263263
reporter.display_messages(None)
264264
report_result = json.loads(output.getvalue())
265265
assert "division by zero" in report_result["statistics"]["score"]
266+
267+
def test_simple_sarif():
268+
output = StringIO()
269+
reporter = SARIFReporter(output)
270+
linter = PyLinter(reporter=reporter)
271+
checkers.initialize(linter)
272+
linter.config.persistent = 0
273+
linter.open()
274+
linter.set_current_module("0123")
275+
linter.add_message(
276+
"line-too-long",
277+
line=1,
278+
args=(1, 2),
279+
end_lineno=1,
280+
end_col_offset=4
281+
)
282+
reporter.display_messages(None)
283+
assert json.loads(output.getvalue()) == {
284+
"version": "2.1.0",
285+
"$schema": "https://docs.oasis-open.org/sarif/sarif/v2.1.0/cs01/schemas/sarif-schema-2.1.0.json",
286+
"runs": [{
287+
"tool": {
288+
"driver": {
289+
"name": "pylint",
290+
"fullName": f"pylint {__version__}",
291+
"version": __version__,
292+
"informationUri": "https://pylint.readthedocs.io/",
293+
"rules": [{
294+
"id": "C0301",
295+
"deprecatedIds": [],
296+
"name": "line-too-long",
297+
"deprecatedNames": [],
298+
"shortDescription": {"text": "Used when a line is longer than a given number of characters"},
299+
"fullDescription": {"text": "Used when a line is longer than a given number of characters."},
300+
"help": {"text": ":line-too-long (C0301): *Line too long (%s/%s)*\n Used when a line is longer than a given number of characters."},
301+
"helpUri": "https://pylint.readthedocs.io/en/stable/user_guide/messages/convention/line-too-long.html",
302+
}],
303+
},
304+
},
305+
"results": [{
306+
"ruleId": "C0301",
307+
"message": {"text": "Line too long (1/2)"},
308+
"level": "note",
309+
"locations": [{
310+
"physicalLocation": {
311+
"artifactLocation": {
312+
"uri": "0123",
313+
},
314+
"region": {
315+
"startLine": 1,
316+
"startColumn": 1,
317+
"endLine": 1,
318+
"endColumn": 5,
319+
},
320+
},
321+
}],
322+
"partialFingerprints": {"nodePath/v1": ""},
323+
}]
324+
}]
325+
}

0 commit comments

Comments
 (0)