Skip to content
This repository was archived by the owner on Mar 31, 2026. It is now read-only.

Latest commit

 

History

History
26 lines (18 loc) · 7.8 KB

File metadata and controls

26 lines (18 loc) · 7.8 KB

PackageCertificateToCsv

This driver performs deep analysis of the certificates used in NuGet package signatures and their relationship to one or more NuGet packages.

CatalogScanDriverType enum value PackageCertificateToCsv
Driver implementation PackageCertificateToCsvDriver
Processing mode process latest catalog leaf per package ID and version
Cursor dependencies LoadPackageArchive: needs the full package signature
Components using driver output Many-to-many reference clean-up by CleanupOrphanRecordsService using CleanupOrphanCertificateRecordsAdapter
Kusto ingestion via KustoIngestionMessageProcessor, since this driver produces CSV data
Temporary storage config Table Storage:
CsvRecordTableName (name prefix): holds CSV records before they are added to a CSV blob
TaskStateTableName (name prefix): tracks completion of CSV blob aggregation
Persistent storage config Blob Storage:
CertificateContainerName: contains CSVs for the Certificates table
PackageCertificateContainerName: contains CSVs for the PackageCertificates table

Table Storage:
CertificateToPackageTableName: mapping from certificate to related packages
PackageToCertificateTableName: mapping from package to related package
Output CSV tables Certificates
PackageCertificates

Algorithm

This driver is more complex that others because it maintains a set of many-to-many relationships between NuGet packages and certificates. A certificate (e.g. a CA) can be used by many packages. A package can contain many certificates (e.g. a certificate chain for timestamping or code signing). Because packages can be deleted but a certificate may still be used by other packages, it's not straight forward to clean up certificate metadata. A certificate record will only be purged from the Certificates table if all of the packages that use that certificate are deleted. In other words, there's a bit of reference counting to be done. The matter is made even more complicated because of the distributed storage options used by NuGet Insights. Azure Table Storage does not have a foreign key or referential integrity concept meaning more bookkeeping must be done by the application. This is done in the ReferenceTracker.

A batch of catalog leaves is passed to the driver. The leaves are grouped by package ID and processed in package ID groups. This is done because a partition key for some of the many-to-many references are partitioned by package ID.

For each package leaf, the package signature is read from Azure Table Storage (as stored by LoadPackageArchive). Both of the repository signature and author signature (if present) are read. Both the code signing and timestamp certificate chains are loaded. For each certificate found, a relationship to the package is recorded. The different certificate relationship types are represented by CertificateRelationshipTypes.

Each certificate is verified using online methods similar to those implemented in NuGet.org's validation pipeline (e.g. OnlineCertificateVerifier). In addition to certificate verification, various X.509 certificate fields (ASN.1 encoded) are parsed into more friendly object models. All of this information is written to a CSV record per certificate.

For each package to certificate relationship, another CSV record is created so that how a certificate is used can be discovered.