Subsystem: Search 🔎
This tool creates the resources needed to run the NuGet search service. These resources can be updated using the Catalog2AzureSearch and Auxiliary2AzureSearch jobs.
Specifically, this tool creates:
You can run this job using:
NuGet.Jobs.Db2AzureSearch.exe -Configuration path\to\your\settings.jsonThe easiest way to run the tool if you are on the nuget.org team is to use the DEV environment resources:
- Install the certificate used to authenticate as our client AAD app registration into your
CurrentUsercertificate store. - Clone our internal
NuGetDeploymentrepository. - Update your cloned copy of the DEV Db2AzureSearch appsettings.json file to authenticate using the certificate you installed:
{
...
"KeyVault_VaultName": "PLACEHOLDER",
"KeyVault_ClientId": "PLACEHOLDER",
"KeyVault_CertificateThumbprint": "PLACEHOLDER",
"KeyVault_ValidateCertificate": true,
"KeyVault_StoreName": "My",
"KeyVault_StoreLocation": "CurrentUser"
...
}- Update the
-ConfigurationCLI option to point to the DEV Azure Search settings:NuGetDeployment/src/Jobs/NuGet.Jobs.Cloud/Jobs/Db2AzureSearch/DEV/northcentralus/appsettings.json
As an alternative to using nuget.org's DEV resources, you can also run this tool using your personal Azure resources.
- Gallery DB. This can be initialized locally using the NuGetGallery.
- Azure Search. You can create your own Azure Search resource using the Azure Portal.
- Azure Blob Storage. You can create your own Azure Blob Storage using the Azure Portal.
In your Azure Blob Storage account, you will need to create a container named ng-search-data and upload the following files:
downloads.v1.jsonwith content[]ExcludedPackages.v1.jsonwith content[]
You will also need to create a second container (if it does not already exist) named content and upload the following file:
flags.jsonwith content{}
If you are on the nuget.org team, you can copy these files from the PROD auxiliary files container.
Once you've created your Azure resources, you can create your settings.json file. There's a few PLACEHOLDER values you will need to fill in yourself:
- The
GalleryDb:ConnectionStringsetting is the connection string to your Gallery DB. - The
SearchServiceNamesetting is the name of your Azure Search resource. For example, use the namefoo-barfor the Azure Search service with URLhttps://foo-bar.search.windows.net. - The
SearchServiceApiKeysetting is an admin key that has write permissions to the Azure Search resource. Make sure the Azure Search resource you're connecting to has API keys enabled (either in parallel with managed identities "RBAC" access or with managed identities authentication disabled). - The
StorageConnectionStringandAuxiliaryDataStorageConnectionStringsettings are both the connection string to your Azure Blob Storage account. - The
DownloadsV1JsonUrlsetting is the URL todownloads.v1.jsonfile above. Make sure it works without authentication. - The
FeatureFlags:ConnectionStringsetting is the connection string to your Azure Blob storage account.
{
"GalleryDb": {
"ConnectionString": "PLACEHOLDER"
},
"Db2AzureSearch": {
"AzureSearchBatchSize": 1000,
"MaxConcurrentBatches": 4,
"MaxConcurrentVersionListWriters": 8,
"SearchServiceName": "PLACEHOLDER",
"SearchServiceApiKey": "PLACEHOLDER",
"SearchIndexName": "search-000",
"HijackIndexName": "hijack-000",
"StorageConnectionString": "PLACEHOLDER",
"StorageContainer": "v3-azuresearch-000",
"StoragePath": "",
"GalleryBaseUrl": "https://www.nuget.org/",
"AuxiliaryDataStorageConnectionString": "PLACEHOLDER",
"AuxiliaryDataStorageContainer": "ng-search-data",
"AuxiliaryDataStorageExcludedPackagesPath": "ExcludedPackages.v1.json",
"DownloadsV1JsonUrl": "PLACEHOLDER",
"FlatContainerBaseUrl": "https://api.nuget.org/",
"FlatContainerContainerName": "v3-flatcontainer",
"AllIconsInFlatContainer": false,
"DatabaseBatchSize": 10000,
"CatalogIndexUrl": "https://api.nuget.org/v3/catalog0/index.json",
"EnablePopularityTransfers": true,
"Scoring": {
"FieldWeights": {
"PackageId": 9,
"TokenizedPackageId": 9,
"Tags": 5
},
"DownloadScoreBoost": 30000,
"PopularityTransfer": 0.99
}
},
"FeatureFlags": {
"ConnectionString": "PLACEHOLDER"
}
}For local development and fast iteration, you can build the job with the NuGet.Insights Kusto tables.
You can use the following configuration as a starting point:
{
"Db2AzureSearch": {
"AzureSearchBatchSize": 1000,
"MaxConcurrentBatches": 4,
"MaxConcurrentVersionListWriters": 8,
"SearchServiceName": "<AZURE AI SEARCH RESOURCE NAME>",
"SearchServiceUseDefaultCredential": true,
"SearchIndexName": "search-001",
"HijackIndexName": "hijack-001",
"StorageConnectionString": "<AZURE STORAGE CONNECTION STRING>",
"StorageContainer": "v3-azuresearch-001",
"StoragePath": "",
"GalleryBaseUrl": "https://www.nuget.org/",
"FlatContainerBaseUrl": "https://api.nuget.org/",
"FlatContainerContainerName": "v3-flatcontainer",
"AllIconsInFlatContainer": false,
"EnablePopularityTransfers": true,
"Scoring": {
"FieldWeights": {
"PackageId": 9,
"TokenizedPackageId": 9,
"Tags": 5
},
"DownloadScoreBoost": 30000,
"PopularityTransfer": 0.99
},
"Development": {
"ReplaceContainersAndIndexes": true,
"DisableVersionListWriters": false,
"KustoConnectionString": "https://<KUSTO CLUSTER NAME>.kusto.windows.net",
"KustoDatabaseName": "<KUSTO DATABASE NAME>",
"KustoTableNameFormat": "Ni{0}",
"KustoTopPackageCount": 100000,
"KustoOnlyLatestPackages": true
}
},
"FeatureFlags": {
"ConnectionString": "<FEATURE FLAGS AZURE STORAGE CONNECTION STRING>"
},
"KeyVault_VaultName": "<KEY VAULT NAME, IF NEEDED>",
"KeyVault_UseManagedIdentity": true
}At a high-level, here's how Db2AzureSearch works:
- Create the Azure Search indexes
- Create the Azure Blob storage container for the search auxiliary files
- Capture the catalog's cursor
- Load initial data from Gallery DB and statistics auxiliary files
- Process package metadata in batches
- Load a chunk of packages from Gallery DB
- Generate and upload documents to the Azure Search indexes
- Update the search version list resource
- Write the search auxiliary files to search storage
- Write the catalog's cursor to search storage