This driver performs deep analysis of the certificates used in NuGet package signatures and their relationship to one or more NuGet packages.
CatalogScanDriverType enum value |
PackageCertificateToCsv |
| Driver implementation | PackageCertificateToCsvDriver |
| Processing mode | process latest catalog leaf per package ID and version |
| Cursor dependencies | LoadPackageArchive: needs the full package signature |
| Components using driver output | Many-to-many reference clean-up by CleanupOrphanRecordsService using CleanupOrphanCertificateRecordsAdapterKusto ingestion via KustoIngestionMessageProcessor, since this driver produces CSV data |
| Temporary storage config | Table Storage:CsvRecordTableName (name prefix): holds CSV records before they are added to a CSV blobTaskStateTableName (name prefix): tracks completion of CSV blob aggregation |
| Persistent storage config | Blob Storage:CertificateContainerName: contains CSVs for the Certificates tablePackageCertificateContainerName: contains CSVs for the PackageCertificates tableTable Storage: CertificateToPackageTableName: mapping from certificate to related packagesPackageToCertificateTableName: mapping from package to related package |
| Output CSV tables | CertificatesPackageCertificates |
This driver is more complex that others because it maintains a set of many-to-many relationships between NuGet packages and certificates. A certificate (e.g. a CA) can be used by many packages. A package can contain many certificates (e.g. a certificate chain for timestamping or code signing). Because packages can be deleted but a certificate may still be used by other packages, it's not straight forward to clean up certificate metadata. A certificate record will only be purged from the Certificates table if all of the packages that use that certificate are deleted. In other words, there's a bit of reference counting to be done. The matter is made even more complicated because of the distributed storage options used by NuGet Insights. Azure Table Storage does not have a foreign key or referential integrity concept meaning more bookkeeping must be done by the application. This is done in the ReferenceTracker.
A batch of catalog leaves is passed to the driver. The leaves are grouped by package ID and processed in package ID groups. This is done because a partition key for some of the many-to-many references are partitioned by package ID.
For each package leaf, the package signature is read from Azure Table Storage (as stored by LoadPackageArchive). Both of the repository signature and author signature (if present) are read. Both the code signing and timestamp certificate chains are loaded. For each certificate found, a relationship to the package is recorded. The different certificate relationship types are represented by CertificateRelationshipTypes.
Each certificate is verified using online methods similar to those implemented in NuGet.org's validation pipeline (e.g. OnlineCertificateVerifier). In addition to certificate verification, various X.509 certificate fields (ASN.1 encoded) are parsed into more friendly object models. All of this information is written to a CSV record per certificate.
For each package to certificate relationship, another CSV record is created so that how a certificate is used can be discovered.