This driver reads data directly from the NuGet.org catalog and project it into CSV. Data that is not directly available in the catalog should not be seen or processed by this driver.
CatalogScanDriverType enum value |
CatalogDataToCsv |
| Driver implementation | CatalogDataToCsvDriver |
| Processing mode | process latest catalog leaf per package ID and version |
| Cursor dependencies | V3 package content: blocks on this cursor to align with other drivers |
| Components using driver output | Kusto ingestion via KustoIngestionMessageProcessor, since this driver produces CSV data |
| Temporary storage config | Table Storage:CsvRecordTableName (name prefix): holds CSV records before they are added to a CSV blobTaskStateTableName (name prefix): tracks completion of CSV blob aggregation |
| Persistent storage config | Blob Storage:CatalogLeafItemContainerName: contains CSVs for the CatalogLeafItems tablePackageDeprecationContainerName: contains CSVs for the PackageDeprecations tablePackageVulnerabilityContainerName: contains CSVs for the PackageVulnerabilities table |
| Output CSV tables | CatalogLeafItemsPackageDeprecationsPackageVulnerabilities |
This driver produces multiple views (or projections) of the catalog data.
CatalogLeafItemsis raw data pulled from aPackageDetailscatalog leaf. This contains historical data found in the catalog.PackageDeprecationsis nicely formatted and latest deprecation data.PackageVulnerabilitiesis nicely formatted and latest vulnerability data.
The driver reads the set of package leaf documents that are in the commit timestamp bounds for the catalog scan, generate CSV record instances in memory, and appends them to a temporary CSV record table in Azure Table Storage. When all of the catalog leaves have been process, batches of records from table storage are pulled into memory and merged into CSV blobs.
When all of the CSV blobs have been updated, the temporary table is deleted leaving just the updated CSV blobs.