Skip to content
This repository was archived by the owner on Mar 31, 2026. It is now read-only.

Latest commit

 

History

History
22 lines (16 loc) · 3.81 KB

File metadata and controls

22 lines (16 loc) · 3.81 KB

PackageArchiveToCsv

This driver maps ZIP archive details to CSV about each .nupkg (NuGet package) on NuGet.org. It writes archive-level information as well as ZIP entry (file) level information.

CatalogScanDriverType enum value PackageArchiveToCsv
Driver implementation PackageArchiveToCsvDriver
Processing mode process latest catalog leaf per package ID and version
Cursor dependencies PackageFileToCsv: provides .nupkg hash in table storage
(transitive) LoadPackageArchive: needed by PackageFileToCsv
Components using driver output Kusto ingestion via KustoIngestionMessageProcessor, since this driver produces CSV data
Temporary storage config Table Storage:
CsvRecordTableName (name prefix): holds CSV records before they are added to a CSV blob
TaskStateTableName (name prefix): tracks completion of CSV blob aggregation
Persistent storage config Blob Storage:
PackageArchiveContainerName: contains CSVs for the PackageArchives table
PackageArchiveEntryContainerName: contains CSVs for the PackageArchiveEntries table
Output CSV tables PackageArchiveEntries
PackageArchives

Algorithm

For each catalog leaf passed to driver, the ZIP central directory, size, and HTTP response headers are fetched from Azure Table Storage. These are populated by the LoadPackageArchive driver. Hashes of the whole ZIP are also read from table storage. These are populated by the PackageAssemblyToCsv driver (because it needs the full ZIP content).

The ZIP central directory is enumerated. A single CSV record is produced for each .nupkg and one or more CSV records are created for each entry in the ZIP file.

Detailed ZIP information is included in the produced CSV records to aid in the debugging of esoteric ZIP archive issues.