This driver builds lookup table for all existing package IDs and versions, included a "deleted" flag. The lookup table is meant to be as small as possible and uses compressed MessagePack serialization. This lookup table is used by other components to determine if a package ID or version ever existing on NuGet.org
CatalogScanDriverType enum value |
BuildVersionSet |
| Driver implementation | BuildVersionSetDriver |
| Processing mode | process just the catalog page |
| Cursor dependencies | V3 package content: blocks on this cursor to align with other drivers |
| Components using driver output | DownloadsToCsvUpdater: ignore invalid download count dataOwnersToCsvUpdater: ignore invalid owner dataVerifiedPackagesToCsvUpdater: ignore invalid verified packages data |
| Temporary storage config | Table Storage:VersionSetAggregateTableName (name prefix): store known package IDs and versions per catalog page |
| Persistent storage config | Blob Storage:VersionSetContainerName: store the final MessagePack blog |
| Output CSV tables | none |
This driver reads catalog items from the catalog page. It does not need to process individual catalog leaf documents. Per catalog page, the driver appends data batches to Azure Table Storage containing package IDs, normalized package versions, and deleted state per package.
After all catalog pages in the catalog scan commit timestamp range are processed, all of the table storage batches are read into memory and deduplicated.
If the version set blob already exists and is older than the incoming data, it is also loaded. The new package information is merged into the existing data and the new version set blob is updated in Azure Blob Storage.
If the version set blob does not yet exist, a new one is created with just the new package information.
After this is done, the temporary table is deleted leaving just the updated version set blob.
The goal of the version set blob is to implement the IVersionSet interface.