[SDKs] Self-heal catalog cache on BYOM cache-miss#760
Open
bmehta001 wants to merge 11 commits into
Open
Conversation
The v1 SDKs (js, python, cs, rust) cache the catalog in memory with a 6-hour TTL. When a user manually drops a model directory (BYOM) into the cache while the SDK is running, the SDK's in-memory map is stale: Core returns the BYOM id from `get_cached_models` IPC and `get_model_list`, but the SDK filters it out because its `modelIdToModelVariant` map does not yet know the id. Add a force-refresh path on cache-miss in the four user-facing entry points: - get_cached_models / get_loaded_models: when any id in the fresh list returned by Core does not resolve against the local map, force a catalog refresh and re-resolve only the unresolved ids. - get_model(alias) / get_model_variant(id): on miss, force a refresh and retry the lookup once. No new public APIs and no new IPC commands. Relies on the FLCore `GetModelsAsync` override that surfaces BYOM stubs. Validated end-to-end via the Python SDK against a real BYOM directory dropped into the cache at runtime: get_cached_models now returns the BYOM, get_model resolves the alias, and Core's view is consistent. Co-authored-by: Copilot <[email protected]>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
There was a problem hiding this comment.
Pull request overview
Adds “self-healing” catalog refresh behavior across the Rust, Python, JS, and C# SDKs so that when the on-disk BYOM cache changes while the SDK is running (and Core returns new IDs), the SDK refreshes its in-memory catalog on cache-miss and retries resolution.
Changes:
- Add a forced refresh + retry path to
get_model(alias)/get_model_variant(id)when the initial lookup misses. - Add forced refresh-on-miss when resolving IDs returned by Core for
get_cached_models/get_loaded_models, via new helper resolvers. - Add
force/force: trueparameters to the various “update models” caching methods to bypass the 6-hour TTL when self-healing.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| sdk/rust/src/catalog.rs | Adds forced-refresh retry on alias/id miss and a helper to resolve cached/loaded model IDs. |
| sdk/python/src/catalog.py | Adds force refresh support and self-heal retry logic for lookups and cached/loaded model resolution. |
| sdk/js/src/catalog.ts | Adds force refresh support and self-heal retry logic plus ID resolution helper for cached/loaded APIs. |
| sdk/cs/src/Catalog.cs | Adds forced-refresh retry logic and shared ID-resolution helper; updates UpdateModels signature to support forced refresh. |
PR #760 review feedback: the previous resolve_model_ids implementation in all four SDKs (Rust, Python, JS, C#) split the input model_ids into `resolved` and `unresolved` lists in a single pass, then on cache-miss forced a refresh and appended the newly-resolvable IDs after the first batch. This reordered the output relative to the input model_ids list, which is a behavioral regression vs. pre-PR get_cached_models / get_loaded_models that returned variants in Core's ID order (minus unknowns). Switch all four SDKs to the same shape: 1. First pass: check whether any id is missing (short-circuit on first miss); this needs no scratch storage. 2. If anything is missing, force-refresh once. 3. Final pass: iterate model_ids in input order and collect variants present in the catalog (filter-map style). The single final pass preserves input ordering on both the warm path (no refresh) and the self-heal path (after refresh). The JS dedupe behavior via Set<IModel> is preserved. Side benefit: -32 net lines across the four files; the implementation now matches the structure the reviewer suggested. Co-authored-by: Copilot <[email protected]>
Addresses PR #760 review comments: - Python get_model/get_model_variant and C# GetModelImplAsync/GetModelVariantImplAsync now short-circuit on null/empty/whitespace input, mirroring the validation already present in Rust and JS. Without this, those entry points triggered the wasteful self-heal forced refresh just to return None on bad input. - Adds Python unit tests for self-heal on get_model and get_cached_models (mock _MockCoreInterop returns no models on warm fetch, then BYOM on forced refresh) and a no-refresh-on-empty-input test that warms the catalog and asserts the forced refresh is not triggered by '', ' ', or None. - Adds matching JS self-heal unit tests for getModel/getModelVariant and getCachedModels using the same mock pattern. Co-authored-by: Copilot <[email protected]>
Preserve wrapper identity and per-Model variant selection across forced catalog refreshes. The old clear-and-rebuild pattern destroyed Model and ModelVariant wrapper identity inside the catalog on every refresh and silently reverted any explicit select_variant() choice back to the first-cached default. Both became noticeable when the BYOM self-heal path made force=True refreshes fire much more often. * ModelVariant._refresh_info updates the cached ModelInfo snapshot in place so held wrappers keep surfacing fresh data (notably cached) without identity churn. * Model._refresh_variants swaps the variant list while preserving the currently selected variant if its id is still present, falling back to first-cached then first variant otherwise. * Catalog._update_models now drops stale ids, refreshes existing wrappers in place, and adds new ids — never tearing down references the caller already holds. Adds TestIncrementalRefresh with 6 tests pinning down identity preservation, selection survival, in-place info refresh, stale-id eviction, new-id insertion, and the selection-fallback rule. All 19 test_catalog.py tests pass. Co-authored-by: Copilot <[email protected]>
Mirrors the Python incremental-refresh contract: preserve IModel / ModelVariant identity and per-Model variant selection across forced catalog refreshes, where the old clear-and-rebuild pattern destroyed wrapper identity and silently reset selection on every refresh. * ModelVariant._refreshInfo updates the cached ModelInfo snapshot in place so held wrappers surface fresh data (notably `cached`) without identity churn. * Model._refreshVariants swaps the variant list while preserving the selected variant if its id is still present, falling back to first-cached then first variant otherwise. Also updates addVariant to read `variant.info.cached` (snapshot) instead of the `variant.isCached` getter (which does an IPC `get_cached_models` call per variant added) — brings JS in line with Python/C#/Rust and avoids an N-IPC perf cliff during catalog rebuild. * Catalog.updateModels rewritten to drop stale ids, refresh existing wrappers in place, and add new ones — never tearing down references the caller already holds. * New runRefreshExclusive helper fixes a force-promise-swallowing bug: when force=true arrives during an in-flight non-forced refresh, the forced refresh is now chained via .then() so it actually re-fetches. Previously a forced refresh raced with a non-forced one would silently no-op, defeating the BYOM self-heal path. Adds 7 tests in the `Incremental refresh` describe block covering identity preservation, selection survival, in-place info refresh, stale-id eviction, new-id insertion, the selection-fallback rule, and the force-swallow regression. All 18 Catalog Tests pass. Co-authored-by: Copilot <[email protected]>
Mirrors the Python and JS incremental-refresh contract: preserve IModel / ModelVariant identity and per-Model variant selection across forced catalog refreshes. The old clear-and-rebuild pattern destroyed wrapper identity inside the catalog on every refresh and silently reset SelectVariant() back to the first-cached default — both became noticeable when the BYOM self-heal path made force-true refreshes fire much more often. Addresses the pre-existing `// TODO: Do we need to clear this out, or can we just add new models?` comment in Catalog.UpdateModels. * ModelVariant.RefreshInfo updates the cached ModelInfo snapshot in place (Info and Version made privately settable); reference assignment on Info is atomic in CLR so concurrent readers observe either the old or new snapshot, never a torn intermediate. * Model.RefreshVariants swaps the backing variant list by reference (also atomic) while preserving the selected variant if its id is still present, falling back to first-cached then first variant. AddVariant also reworked to build-then-swap so concurrent readers iterating Variants never observe a torn collection mid-mutation. * Catalog.UpdateModels rewritten to drop stale ids, refresh existing wrappers in place, and add new ones — never tearing down references the caller already holds. Adds CatalogIncrementalRefreshTests with 6 TUnit tests covering identity preservation, selection survival, in-place info refresh, stale-id eviction, new-id insertion, and the selection-fallback rule. Force refreshes are triggered via InvalidateCache + ListModelsAsync (equivalent to UpdateModels(force: true) — bypasses the TTL gate, then re-runs the rebuild). All 6 new tests pass on net8.0; full suite 75/81 (6 pre-existing AudioClient whisper-not-on-disk failures unrelated). Build clean on netstandard2.0 + net8.0. Co-authored-by: Copilot <[email protected]>
Lighter Rust counterpart to the Python / JS / C# incremental-refresh work. The old apply_model_list path allocated fresh Arc<Model> on every refresh, churning wrapper identity in models_by_alias and variants_by_id and (for aliased models) silently resetting the AtomicUsize selection back to 0 — both became noticeable when the BYOM self-heal path made forced refreshes fire much more often. Rust's Arc<Model> + Vec<ModelVariant> ownership makes in-place ModelInfo refresh harder than the other SDKs, so we take a lighter approach instead: keep the build-fresh path, but before swapping under the data lock, replace any fresh Arc<Model> whose `(id, cached)` fingerprint matches the existing entry with the existing Arc. Since per-id metadata other than `cached` is treated as immutable by the catalog contract, fingerprint equality guarantees the reused Arc still surfaces the correct ModelInfo. The compare-and-swap is performed under a single data-lock acquisition so the merge is computed against the same old snapshot it replaces. * New alias_fingerprint helper: ordered `Vec<(id, cached)>` of every variant in the alias group. * New variant_fingerprint helper: `(id, cached)` of a single variant-Model. * apply_model_list: post-build pass swaps fresh Arc for old Arc on fingerprint match (both maps). Identity preservation: held `Arc<Model>` references survive refreshes unchanged in the common case (no variants added/removed/reordered, no cached flip), keeping any explicit select_variant choice intact via the inner AtomicUsize. cargo build, cargo clippy --all-targets -D warnings, cargo test --lib all clean (19/19 existing tests pass). The BYOM cache-rescan sample re-builds clean against the modified SDK. Co-authored-by: Copilot <[email protected]>
- Consolidate the incremental-refresh test suites in C#/JS/Python down to four focused tests each by merging redundant identity/info-propagation cases and eviction/insertion cases into single tests that cover both directions of the symmetric behavior. - Add CatalogSelfHealTests.cs for C# parity with the JS/Python self-heal tests (GetModelAsync, GetModelVariantAsync, GetCachedModelsAsync, empty-input short-circuit). - Modernize Model.cs to use C# 12 collection expressions ([]) in place of the AddVariant new-list-with-capacity / AddRange / Add sequence and the RefreshVariants ew List<IModel>(variants) copy. Apply the same [] modernization to test fixtures where target type is unambiguous. - Drop the historical "the old behavior of clear-and-rebuild..." trailer from the catalog refresh comment block in JS/Python/Rust to mirror the C# Catalog.cs hand-edit. Trim backwards-looking parentheticals, the stale Model.cs line reference, and the cross-SDK file/line citation from C# test docstrings. - Drop redundant System. prefixes in C# tests (System.Math.Min, System.Array.Empty) now that ImplicitUsings provides System. Co-authored-by: Copilot <[email protected]>
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
sdk/js/src/catalog.ts:71
runRefreshExclusive()always overwritesthis.updatePromisewith a new promise. If multiple callers requestupdateModels(true)while a refresh is in-flight, each chained callback will callrunRefreshExclusive()after the in-flight refresh completes, potentially starting multiple concurrent refreshes and letting earlier.finally()handlers clearupdatePromisewhile a later refresh is still running. MakerunRefreshExclusive()idempotent by returning the existingupdatePromisewhen it’s already set.
private runRefreshExclusive(): Promise<void> {
this.updatePromise = this.fetchAndPopulateModels()
.finally(() => { this.updatePromise = undefined; });
return this.updatePromise;
}
RefreshVariants/_refreshVariants/_refresh_variants no longer rewrite the selected variant. Because Catalog reuses ModelVariant wrappers across refreshes for ids that survive, an explicit selectVariant() choice that survives the refresh is preserved without any extra work. The previous fallback (first cached, then variants[0]) only mattered when the selected variant was *removed* by a refresh, which is out of scope for the BYOM self-heal contract; that case is left as a known TODO. Tests: - Simplified selection-survives-refresh test to phase 1 only (selection survives a refresh that does not drop the variant). Wrapper identity makes this assertion trivially true now, but it still pins the user-facing contract. - Dropped the cached-fallback distinguishing test (fallback no longer exists). - Fixed pre-existing net462-incompatible ToHashSet() in AppliesAddsAndRemovesOnRefresh. Co-authored-by: Copilot <[email protected]>
RefreshVariants's per-variant alias check and RefreshInfo's id check are unreachable: their only caller (Catalog.UpdateModels) builds the inputs from the same alias/id it indexes into. Drop the dead defenses; keep the empty-list check in RefreshVariants since an empty list would silently produce a Model whose Variants[0] crashes downstream. Co-authored-by: Copilot <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The SDKs cache the catalog in memory with a 6-hour TTL. When a user manually drops a model directory (BYOM) into the cache while the SDK is running, the SDK''s in-memory map is stale: Core returns the BYOM id from
get_cached_modelsandget_model_listIPCs, but the SDK filters it out for getModel/getModelVariant because itsmodelIdToModelVariantmap is still stale.Self-heal on cache-miss
Force-refresh path in the four user-facing entry points:
get_cached_models/get_loaded_models: when any id in the fresh list returned by Core does not resolve against the local map, force a catalog refresh and re-resolve only the unresolved ids.get_model(alias)/get_model_variant(id): on miss, force a refresh and retry the lookup once.A
UpdateModels(ct)warm-up call was added to the C#get_model/get_model_variantpaths for consistency with the other SDKs (not strictly required for BYOM).Catalog refresh is now incremental
Previously every catalog refresh was clear-and-rebuild, which orphaned every
Model/ModelVariantwrapper a caller was holding. After this PR, refresh preserves wrapper identity for ids that survive:Catalog.UpdateModelsreuses existingModelVariantwrappers and callsRefreshInfo(info)to surface fresh metadata in place.Modelwrappers are reused per alias andRefreshVariantsswaps the backing variant list by reference (no torn reads). Evicted ids are dropped from the maps.apply_model_listreuses the sameArc<Model>across refreshes when the(id, cached)fingerprint is unchanged, so the innerAtomicUsizeselected-variant index survives.Explicit
SelectVariant()choices are preserved trivially because the wrapper identity is preserved.Limitations
If the previously-selected variant is removed by a refresh,
SelectedVariantkeeps pointing at the dropped wrapper and callers must explicitly re-select - pre-existing issue that would take ~30 lines per SDK to fix, but unlikely to happen.Tests
New
CatalogSelfHealTestsandCatalogIncrementalRefreshTests(+ JS / Python equivalents) cover: self-heal on cache miss, no refresh on empty/whitespace input, identity + info propagation across refreshes, selection survival, and add/remove transitions.Validated against a real BYOM directory dropped into the cache at runtime.
=================
More detailed description:
_cachedModels (disk) and _cachedInfo (Azure metadata) are always populated at the start
FLCore Changes:
RefreshLocalCachedModelsAsync(thread-safe function bc lock) inAzureModelCatalog.cs—refreshes the in-memory
_cachedModelsdictionary from disk.so we do not make a new trip to Azure Catalog or invalidate
_cachedInfohere."Model = null(we assume they do not match a model in Azure Catalog), so no Azure call.GetCurrentCachedModelsAndPaths(existing) to extractmodelId.In FLCore,
RefreshLocalCachedModelsAsyncis called byGetCachedModelsAsync,GetModelsAsync, andRemoveCachedModelAsync. It will be called byGetModelInfoAsynconly if we have not created a BYOM model stub for the catalog, but it will be cached after that.GetCachedModelsAsyncGetModelsAsyncRemoveCachedModelAsyncGetModelInfoAsyncGetModelInfoAsyncONLY callsRefreshLocalCachedModelsAsyncif we looked up a BYOM model by calling (get_model_path,load_model,unload_modelIPC handlers called by SDKs or OpenAI API and byGetChatClientAsyncon every chat completion), and we have not created a BYOM model stub for the catalog. Once we have created a BYOM model stub, it is cached for the rest of the session, so we only validateDirectory.Existsfor the BYOM model (1 syscall is much quicker than scanning whole directory) after that to ensure it has not been deleted. (If it has been deleted, we returnnullfor AzureFoundryLocalModel)HTTP Method changes:
GET /v1/modelsGetCachedModelsAsync()(always rescans disk after this PR) + then will only check if directory for BYOM model exists inGetModelInfoAsyncGET /openai/modelsGetCachedModelsAsync()(always rescans disk after this PR)POST /v1/chat/completions(and friends)GetModelInfoAsync(id_or_alias)— validatesDirectory.Exists, falls through to refresh + synth if neededGET /openai/load/{model}GetModelInfoAsynclookupGET /openai/unload/{model}GetModelInfoAsynclookupSDK changes:
catalog.GetModelAsync(alias)catalog.GetModelVariantAsync(id)catalog.GetCachedModelsAsync()catalog.GetLoadedModelsAsync()model.IsCachedAsync()* we may end up scanning twice if we have a. once if we have hit the TTL and would have anyway and b. because of this PR, after scanning, if we still can't find the model (likely bc of a typo), we will scan again due to force refresh after cache miss
** - we will end up scanning twice - once to get the model name (string) for GetCachedModels, another time because we call
UpdateModels(ct, force: true)to get the ModelInfoFlow for GetModelAsync
Flow for GetModelVariantAsync is similar; we just access a different map
Flow for GetCachedModelsAsync must scan the disk twice - once to return strings for model names from FLCore's GetCachedModelsAsync and once to fetch latest GetModelsAsync (which has Model information). GetLoadedModelsAsync is similar.
Get Loaded models