Subsystem: Search 🔎
This job updates the Azure Search indexes used by the search service.
Catalog2AzureSearch uses the catalog resource to track package events, like uploads and deletes. It also uses the package metadata resource to fetch packages' metadata. Finally, it tracks the latest versions of packages using the version list resource.
You can run this job using:
NuGet.Jobs.Catalog2AzureSearch.exe -Configuration path\to\your\settings.jsonThis job is a singleton. Only a single instance of the job should be running per Azure Search resource.
The easiest way to run the tool if you are on the nuget.org team is to use the DEV environment resources:
- Install the certificate used to authenticate as our client AAD app registration into your
CurrentUsercertificate store. - Clone our internal
NuGetDeploymentrepository. - Update your cloned copy of the DEV Catalog2AzureSearch appsettings.json file to authenticate using the certificate you installed:
{
...
"KeyVault_VaultName": "PLACEHOLDER",
"KeyVault_ClientId": "PLACEHOLDER",
"KeyVault_CertificateThumbprint": "PLACEHOLDER",
"KeyVault_ValidateCertificate": true,
"KeyVault_StoreName": "My",
"KeyVault_StoreLocation": "CurrentUser"
...
}- Update the
-ConfigurationCLI option to point to the DEV Azure Search settings:NuGetDeployment/src/Jobs/NuGet.Jobs.Cloud/Jobs/Catalog2AzureSearch/DEV/northcentralus/a/appsettings.json
As an alternative to using nuget.org's DEV resources, you can also run this tool using your personal Azure resources.
Run the Db2AzureSearch tool.
Once you've created your Azure resources, you can create your settings.json file. There's a few PLACEHOLDER values you will need to fill in yourself:
- The
GalleryDb:ConnectionStringsetting is the connection string to your Gallery DB. - The
SearchServiceNamesetting is the name of your Azure Search resource. For example, use the namefoo-barfor the Azure Search service with URLhttps://foo-bar.search.windows.net. - The
SearchServiceApiKeysetting is an admin key that has write permissions to the Azure Search resource. Make sure the Azure Search resource you're connecting to has API keys enabled (either in parallel with managed identities "RBAC" access or with managed identities authentication disabled). - The
StorageConnectionStringsetting is the connection string to your Azure Blob Storage account.
{
"GalleryDb": {
"ConnectionString": "PLACEHOLDER"
},
"Catalog2AzureSearch": {
"AzureSearchBatchSize": 1000,
"MaxConcurrentBatches": 4,
"MaxConcurrentVersionListWriters": 8,
"SearchServiceName": "PLACEHOLDER",
"SearchServiceApiKey": "PLACEHOLDER",
"SearchIndexName": "search-000",
"HijackIndexName": "hijack-000",
"StorageConnectionString": "PLACEHOLDER",
"StorageContainer": "v3-azuresearch-000",
"StoragePath": "",
"GalleryBaseUrl": "https://www.nuget.org/",
"FlatContainerBaseUrl": "https://api.nuget.org/",
"FlatContainerContainerName": "v3-flatcontainer",
"AllIconsInFlatContainer": false,
"Source": "https://api.nuget.org/v3/catalog0/index.json",
"HttpClientTimeout": "00:10:00",
"DependencyCursorUrls": [
"https://nugetgallery.blob.core.windows.net/v3-registration5-semver1/cursor.json"
],
"RegistrationsBaseUrl": "https://api.nuget.org/v3/registration5-gz-semver2/"
}
}At a high-level, here's how Catalog2AzureSearch works:
- Load its catalog cursor from Azure Blob Storage
- Fetch catalog leaves that are newer than the catalog cursor value
- For each package ID in the catalog leaves:
- Fetch the version list resource for the package ID
- Apply the package's catalog leaves to the version list resource to understand which search documents need to be updated. In some cases, use the Package Metadata resource to fetch additional package metadata and catalog leaves
- Generate Azure Search actions to update the indexes
- Push all generated Azure Search index actions
- Save the catalog cursor to Azure Blob Storage