Skip to content
This repository was archived by the owner on Mar 31, 2026. It is now read-only.

Latest commit

 

History

History
22 lines (16 loc) · 4.53 KB

File metadata and controls

22 lines (16 loc) · 4.53 KB

LoadBucketedPackage

This driver records the latest catalog leaf document per package ID and version. The packages are partitioned by a bucket number (000 to 999). This allows other components to process some percentage of all packages by performing a range query on the bucket numbers. The primary purpose is to enable the timed reprocess system.

CatalogScanDriverType enum value LoadBucketedPackage
Driver implementation Generic FindLatestLeafDriver with a BucketedPackageStorageFactory adapter
Processing mode process just the catalog page
Cursor dependencies V3 package content: blocks on this cursor to align with other drivers
Components using driver output EnqueueCatalogLeafScanDriver: create leaf scan items for buckets of packages
Temporary storage config none
Persistent storage config Table Storage:
BucketedPackageTableName: packages buckets (partitioned) into 1000 buckets to allow processing some percentage of all packages
Output CSV tables none

Algorithm

Similar to the LoadLatestPackageLeaf, this driver takes each catalog leaf item and maintains an Azure Table Storage table entity per leaf item (i.e. per package version). In this case, the partition key of the table entity is a bucket number (000 to 999). The bucket number is generated from a concatenation of the package ID and version. This is the same bucketing strategy used for the output CSV blobs generated by many other drivers. The row key for the table entity is a concatenation of the package ID and version. This allows a range query on bucket numbers to return some subset of all packages. For example, if all entities with partition key 042 are queried, this equates to roughly 0.1% (1 of 1000) packages. A query for entities with partition key greater than or equal 023 and less than or equal to 042 would cover 20 buckets and equate to roughly 2.0% (20 of 1000) packages.

This allows other components to incrementally process all packages on NuGet.org in "bite size" chunks instead of more unevenly if they were to process packages chronologically or lexicographically.

The core logic of this driver is very simple since all it needs to do is map an ICatalogLeafItem to a BucketedPackage entity and then plug in to the generic FindLatestLeafDriver logic also used by several other drivers or by the catalog index scanning logic itself.