Skip to content

feat: social publishing + NuGet #r + move perf + mesh stability batch#95

Open
rbuergi wants to merge 1250 commits into
mainfrom
bug_fix
Open

feat: social publishing + NuGet #r + move perf + mesh stability batch#95
rbuergi wants to merge 1250 commits into
mainfrom
bug_fix

Conversation

@rbuergi
Copy link
Copy Markdown
Contributor

@rbuergi rbuergi commented Apr 22, 2026

Summary

77 commits of long-running work on bug_fix — grouped by theme:

  • Social publishing platform (new)MeshWeaver.Social + LinkedIn publisher + scheduled publishing pipeline (engine/queue/stats), LinkedIn OAuth connect + past-post ingest in Memex portal, per-user linked-account menu items.
  • NuGet in-process compile#r "nuget:Pkg, Version" at the top of _Source/*.cs resolves via public NuGet.Protocol without an SDK on the container. Same resolver serves interactive markdown code cells.
  • Move-node parallelization + 30 s ceilingFileSystemPersistenceService.MoveNodeAsync runs per-descendant WriteAsync/DeleteAsync through Task.WhenAll; new MeshOperationOptions (default Timeout = 30s) + WithMeshOperationTimeout(TimeSpan) override; HandleMoveNodeRequest chains .Timeout() on the persistence Observable so a stuck adapter can't hang the caller. Prod repro: DAV2026 subtree move that took 240 s and killed the MCP session — now bounded.
  • Compile / cache invalidation — sticky invalidation on CompilationCacheService, _Source/ edit re-invalidates owning NodeType, cross-silo broadcast via MeshChangeFeed, grain-dispose on node delete, live "Compiling … (Ns)" progress in LayoutAreaView.
  • Catalog & navigation — Children view groups by Category (falls back to NodeType), reactive Children catalog, self-as-default create location for non-NodeType nodes, sample orgs → Markdown for search visibility.
  • Workspace / stream robustness — Workspace remote-stream cache evicted on MeshChangeFeed events, resubscribe on owner dispose, DeleteLayoutArea emits a placeholder immediately and times out slow streams.
  • Infra & small fixes — settings.json overhaul, Delete-is-recursive MCP docs, HeartBeat silencing on Memex hubs, assembly-dir temp-dir fallback, IAsyncEnumerable aggregator fixes (satellite-safe GatherInputsAsync), xunit methodTimeout 30 s → 60 s, Anthropic Opus bump, icon generator, etc.

New test suites (selected)

  • test/MeshWeaver.Persistence.Test/MoveNodeRecursiveTest.cs — 10 tests: recursion, parallelism, source missing / target exists / storage throws / cancellation (all must not hang), Rx Timeout() contract, default-30s config.
  • test/MeshWeaver.Social.Test/*InMemoryPublishQueueTest, LinkedInPublisherEngagementTest, PostStatsRefresherTest, ScheduledPostPublisherTest, FakePublisher.
  • test/MeshWeaver.Persistence.Test/WorkspaceCacheEvictionTest.cs, ResubscribeOnOwnerDisposeTest.cs, DeleteLayoutAreaIntegrationTest.cs.
  • test/MeshWeaver.Markdown.Test/PathUtilsTest.cs, test/MeshWeaver.MathDemo.Test/MatrixViewsTest.cs.

Contributors

Upstream already merged into this branch

Test plan

  • dotnet build succeeds
  • dotnet test test/MeshWeaver.Persistence.Test --filter MoveNodeRecursiveTest — 10/10 green (~8 s)
  • dotnet test test/MeshWeaver.Hosting.Monolith.Test --filter MoveNodeAsync — 5/5 green (regression guard)
  • dotnet test test/MeshWeaver.Social.Test — publish queue / scheduling / stats green
  • Manual prod smoke: move a 3-descendant subtree in memex-prod; confirms < 30 s and MCP session survives
  • Create a _Source/*.cs using #r "nuget:MathNet.Numerics, 5.0.0" — compiles & renders (cold + warm cache)
  • Delete a node then recreate at same path — fresh grain, fresh compile, no stale HubConfiguration
  • Navigate to a cold node — "Compiling (Ns)…" progress renders until the stream resolves
  • LinkedIn OAuth: sign in → /social/connect/linkedin → profile linked; menu shows connected account
  • Scheduled post fires through ScheduledPostPublisher → LinkedIn publisher posts; PostStatsRefresher pulls stats

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 22, 2026

Test Results

4 000 tests  +1 058   3 994 ✅ +1 065   16m 52s ⏱️ + 10m 42s
   40 suites +    4       3 💤  -    10 
   40 files   +    4       3 ❌ +    3 

For more details on these failures, see this check.

Results for commit 6eac900. ± Comparison against base commit f6c2dea.

This pull request removes 225 and adds 1283 tests. Note that renamed tests count towards both.
MeshWeaver.AI.Test.AgentSelectionTest ‑ AgentContext_WithPreloadedAgents_OrdersByOrder
MeshWeaver.AI.Test.AgentSelectionTest ‑ OrderByRelevance_OrdersByOrderThenDisplayName
MeshWeaver.AI.Test.AgentSelectionTest ‑ QueryAgentsAsync_PathWithoutNodeType_FindsAgentsFromPathHierarchy
MeshWeaver.AI.Test.AgentSelectionTest ‑ QueryAgentsAsync_ProductLaunchWithNodeType_FindsTodoAgentFromNodeTypeNamespace
MeshWeaver.AI.Test.AgentToolWiringIntegrationTest ‑ OrchestratorAgent_ShouldGetAllMeshTools
MeshWeaver.AI.Test.ThreadSubmissionUnitTest ‑ PlanNextRound_AfterInterruptedRound_ReturnsNewDispatchForQueuedInputs
MeshWeaver.AI.Test.ThreadSubmissionUnitTest ‑ PlanNextRound_IdleWithThreeQueued_ReturnsBatchedDispatch
MeshWeaver.Content.Test.ImportDeleteServiceTest ‑ FullLifecycle_CreateNodes_DeleteRecursively
MeshWeaver.Content.Test.ImportDeleteServiceTest ‑ ImportHelper_EmptySource_ReturnsZeroCounts
MeshWeaver.Content.Test.ImportDeleteServiceTest ‑ ImportHelper_ForceReimport_ImportsEvenWithExistingData
…
Memex.Portal.Shared.Test.VirtualUserMiddlewareAuthContextTest ‑ AuthenticatedUserViaHttpContext_SkipsVUserBlock_AndCallsNext
Memex.Portal.Shared.Test.VirtualUserMiddlewareAuthContextTest ‑ UnauthenticatedHttpContext_EntersVUserBlock_ThrowsOnMissingPortalApplication
MeshWeaver.AI.Test.ActivityLogStreamTest ‑ Progress_Messages_Stream_Gradually_Not_Just_At_The_End
MeshWeaver.AI.Test.ActivityLogStreamTest ‑ Script_Failure_Flips_ActivityLog_Status_To_Failed
MeshWeaver.AI.Test.ActivityLogStreamTest ‑ Script_Log_Messages_Land_On_ActivityLog_Node
MeshWeaver.AI.Test.AgentChatClientDeadlockTest ‑ GetOrderedAgentsAsync_WithContextPath_ConcurrentCallers_DoNotDeadlock
MeshWeaver.AI.Test.AgentChatClientDeadlockTest ‑ GetOrderedAgentsAsync_WithContextPath_SingleCaller_ResolvesQuickly
MeshWeaver.AI.Test.AgentChatClientDeadlockTest ‑ GetOrderedAgentsAsync_WithMarkdownContext_DoesNotDeadlock
MeshWeaver.AI.Test.AgentToolWiringIntegrationTest ‑ AssistantAgent_ShouldGetAllMeshTools
MeshWeaver.AI.Test.AutocompleteStreamProviderTests ‑ FailingProvider_DoesNotKillTheStream
…
This pull request removes 7 skipped tests and adds 1 skipped test. Note that renamed tests count towards both.
MeshWeaver.Import.Test.ImportValidationTest ‑ ImportWithCategoryValidationTest
MeshWeaver.Import.Test.SnapshotImportTest ‑ SnapshotImport_ZeroInstancesTest
MeshWeaver.Layout.Test.EditPersistenceTest ‑ EditAndPersist_NullableDateTime_ShouldPersistToDataStore
MeshWeaver.Layout.Test.EditPersistenceTest ‑ EditAndPersist_StringProperty_ShouldPersistToDataStore
MeshWeaver.Layout.Test.EditPersistenceTest ‑ WorkspaceStreamEmit_ShouldNotOverwriteLocalEdits
MeshWeaver.Persistence.Test.MigrationTest ‑ DryRun_ShowsWhatWouldBeMigrated
MeshWeaver.Persistence.Test.MigrationTest ‑ RunMigration_MigratesAllFiles
MeshWeaver.AI.Test.TypedErrorPropagationTest ‑ UnregisteredDiscriminator_SurfacesDeserializationException_OnSubscribe
This pull request skips 1 and un-skips 5 tests.
MeshWeaver.Content.Test.NewCommentFlowTest ‑ NewComment_DataChangeToWrongAddress_ShouldNotUpdateComment
MeshWeaver.Data.Test.SynchronizationStreamTest ‑ ParallelUpdate
MeshWeaver.Layout.Test.DebounceTest ‑ BasicDebounce
MeshWeaver.Layout.Test.EditorTest ‑ TestEditorWithDelayed
MeshWeaver.NodeOperations.Test.DeletionTests ‑ Delete_ViaClient_WithDeleteNodeRequest
MeshWeaver.NodeOperations.Test.NodeOperationsTest ‑ DeleteNode_WithChildren_NonRecursive_ShouldFail

♻️ This comment has been updated with latest results.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR bundles several long-running feature and stability tracks across MeshWeaver core + Memex: social publishing foundations, in-process #r "nuget:..." compilation support (node-type + interactive markdown), move-operation performance/timeout hardening, and multiple UI/stream reliability improvements. It also standardizes the code folder naming from _Source/_Test to Source/Test across code, tests, docs, and samples.

Changes:

  • Introduces MeshWeaver.Social (options, DI wiring, publish queue, credential model) plus initial Memex wiring (LinkedIn connect entry points + user menu hooks).
  • Adds MeshWeaver.NuGet resolver + directive parser and integrates it into script compilation (#r "nuget:Pkg, Version"), including cache backends and tests.
  • Improves operational robustness: parallelized recursive moves, default 30s mesh-op timeout, “no endless spinner” navigation status UI, and remote stream resubscribe behavior.

Reviewed changes

Copilot reviewed 159 out of 265 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/MeshWeaver.StorageImport.Test/StorageImporterTests.cs Updates test expectations/docs to Source/ naming.
test/MeshWeaver.Social.Test/PostStatsRefresherTest.cs Adds stats refresher test coverage (needs deterministic timeout handling).
test/MeshWeaver.Social.Test/MeshWeaver.Social.Test.csproj Adds new Social test project referencing Social + Fixture.
test/MeshWeaver.Social.Test/InMemoryPublishQueueTest.cs Adds unit tests for publish queue due-drain + dedup.
test/MeshWeaver.Persistence.Test/FileSystemPersistenceTest.cs Updates partition tests to Source/ naming.
test/MeshWeaver.MathDemo.Test/TestPaths.cs Adds helper paths for MathDemo sample test assets.
test/MeshWeaver.MathDemo.Test/MeshWeaver.MathDemo.Test.csproj Adds MathDemo test project and copies sample graph data to output.
test/MeshWeaver.Hosting.PostgreSql.Test/SatelliteQueryTests.cs Updates code-path routing tests to Source/ naming.
test/MeshWeaver.Hosting.Monolith.Test/UserActivityAreaTest.cs Updates regression test docs to Source/ naming.
test/MeshWeaver.Hosting.Blazor.Test/NavigationServiceTest.cs Adjusts test to assert “no 404 flash” during retries.
test/MeshWeaver.Graph.Test/NuGetDirectiveParserTest.cs Adds unit tests for parsing/stripping #r "nuget:...".
test/MeshWeaver.Graph.Test/NuGetAssemblyResolverTest.cs Adds networked NuGet restore end-to-end tests (skippable via env var).
test/MeshWeaver.Graph.Test/MeshWeaver.Graph.Test.csproj References new MeshWeaver.NuGet project.
test/MeshWeaver.FutuRe.Test/MeshWeaver.FutuRe.Test.csproj Updates compile-included sample sources to Source/ paths.
test/MeshWeaver.Content.Test/CompilationErrorTest.cs Updates broken-code test to Source/ path.
test/MeshWeaver.AI.Test/MeshPluginTest.cs Updates MCP tool count expectations (adds RunTests/Move/Copy).
src/MeshWeaver.Social/SocialOptions.cs Adds configurable knobs for publishing/stats/ingest scheduling.
src/MeshWeaver.Social/SocialExtensions.cs Adds DI wiring for social publishing subsystem and hosted services.
src/MeshWeaver.Social/PlatformCredential.cs Adds credential record model (access/refresh/expiry metadata).
src/MeshWeaver.Social/MeshWeaver.Social.csproj Introduces Social library project.
src/MeshWeaver.Social/IPublishQueue.cs Adds publish queue abstraction + in-memory implementation.
src/MeshWeaver.Social/IApprovalPublishBridge.cs Defines bridge contract and PublishableSnapshot model.
src/MeshWeaver.NuGet/ResolvedPackageSet.cs Adds resolver output model (assemblies, probing dirs, versions).
src/MeshWeaver.NuGet/NuGetServiceCollectionExtensions.cs Adds DI extension to register resolver + cache.
src/MeshWeaver.NuGet/NuGetPackageReference.cs Adds package reference model (id + version range).
src/MeshWeaver.NuGet/NuGetDirectiveParser.cs Implements #r "nuget:..." extraction + source stripping.
src/MeshWeaver.NuGet/MeshWeaver.NuGet.csproj Introduces NuGet resolver project and dependencies.
src/MeshWeaver.NuGet/INuGetPackageCache.cs Adds optional persistent cache interface + null implementation.
src/MeshWeaver.NuGet/INuGetAssemblyResolver.cs Adds resolver interface returning ResolvedPackageSet.
src/MeshWeaver.NuGet.AzureBlob/MeshWeaver.NuGet.AzureBlob.csproj Adds Azure Blob cache backend project.
src/MeshWeaver.NuGet.AzureBlob/BlobNuGetPackageCacheExtensions.cs Adds DI helper to register blob-backed cache.
src/MeshWeaver.Mesh.Contract/Services/MeshOperationOptions.cs Adds mesh operation timeout options (default 30s).
src/MeshWeaver.Mesh.Contract/Services/IStorageAdapter.cs Updates docs/examples to Source/ naming.
src/MeshWeaver.Mesh.Contract/Services/INavigationService.cs Adds Status observable contract for UI progress reporting.
src/MeshWeaver.Mesh.Contract/Services/IIconGenerator.cs Adds icon generator abstraction returning an observable SVG.
src/MeshWeaver.Mesh.Contract/PartitionDefinition.cs Updates standard table mappings (Source/Testcode) and clarifies semantics.
src/MeshWeaver.Mesh.Contract/MeshExtensions.cs Adds timeout override + move timeout enforcement + grain dispose on delete.
src/MeshWeaver.Mesh.Contract/CodeConfiguration.cs Updates docs to Source/ naming.
src/MeshWeaver.Kernel.Hub/MeshWeaver.Kernel.Hub.csproj Removes Interactive package mgmt dependency; references MeshWeaver.NuGet.
src/MeshWeaver.Hosting/Persistence/MigrationUtility.cs Updates migration heuristics to include Source/Test + legacy _Source/_Test.
src/MeshWeaver.Hosting/Persistence/FileSystemStorageAdapter.cs Treats Source/Test as code paths + keeps legacy compatibility.
src/MeshWeaver.Hosting/Persistence/FileSystemPersistenceService.cs Parallelizes descendant move I/O (with concurrency implications).
src/MeshWeaver.Hosting/Persistence/CachingStorageAdapter.cs Updates code sub-namespace detection (Source/Test + legacy).
src/MeshWeaver.Hosting.PostgreSql/PostgreSqlPartitionedStoreFactory.cs Guards against source/test mistakenly becoming schemas.
src/MeshWeaver.Hosting.PostgreSql/PostgreSqlCrossSchemaQueryProvider.cs Filters malformed parameters to avoid NRE during SQL interpolation.
src/MeshWeaver.Hosting.Blazor/MeshWeaver.Hosting.Blazor.csproj Adds NU1510 suppression.
src/MeshWeaver.Graph/PartitionTypeSource.cs Updates docs to Source/ naming.
src/MeshWeaver.Graph/MeshWeaver.Graph.csproj References MeshWeaver.NuGet.
src/MeshWeaver.Graph/MeshNodeLayoutAreas.cs Improves create href behavior + reactive/grouped children catalog.
src/MeshWeaver.Graph/MeshDataSource.cs Updates docs to Source/ naming.
src/MeshWeaver.Graph/Configuration/ScriptCompilationService.cs Integrates NuGet directive parsing + resolver into compilation.
src/MeshWeaver.Graph/Configuration/NodeTypeDefinition.cs Updates docs/examples to Source/ naming.
src/MeshWeaver.Graph/Configuration/MeshDataSourceNodeType.cs Changes sources namespace constant to Source.
src/MeshWeaver.Graph/Configuration/GraphConfigurationExtensions.cs Registers NuGet resolver and uses Source code path.
src/MeshWeaver.Graph/Configuration/CodeNodeType.cs Treats Code nodes as primary content; defines Source/Test constants.
src/MeshWeaver.Documentation/Data/DataMesh/UnifiedPath.md Documents @/ semantics and HTML-href pitfalls.
src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Profile/Source/SocialMediaProfileLayoutAreas.cs Adds SocialMedia profile layout areas example.
src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Profile/Source/SocialMediaProfile.cs Adds SocialMedia profile content model example.
src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Post/Source/SocialMediaPost.cs Adds SocialMedia post content model example.
src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Post/Source/Platform.cs Adds SocialMedia platform reference-data example.
src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia.md Updates docs to Source/ naming and authoring guidance.
src/MeshWeaver.Documentation/Data/DataMesh/SatelliteEntities.md Clarifies Source/Test are primary content, not satellites.
src/MeshWeaver.Documentation/Data/DataMesh/NodeTypes.md Adds Node Types documentation index page.
src/MeshWeaver.Documentation/Data/DataMesh/NodeTypeConfiguration.md Updates docs to Source/ naming.
src/MeshWeaver.Documentation/Data/DataMesh/NodeOperations.md Updates docs to Source/ naming.
src/MeshWeaver.Documentation/Data/DataMesh/DataConfiguration.md Updates docs to Source/ naming.
src/MeshWeaver.Documentation/Data/DataMesh/CreatingNodeTypes.md Updates docs to Source/Test naming throughout.
src/MeshWeaver.Documentation/Data/DataMesh.md Updates TOC links and adds NuGet packages bullet.
src/MeshWeaver.Documentation/Data/Architecture/PartitionedPersistence.md Updates persistence routing docs for Source/Test.
src/MeshWeaver.Documentation/Data/Architecture/MeshGraph.md Updates examples to Source/ naming.
src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionSampleData.cs Adds cession sample dataset for docs/demo.
src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionResultsArea.cs Adds reactive charting layout area example.
src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionEngine.cs Adds pure business logic sample for cession calculations.
src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionData.cs Adds content models for cession example.
src/MeshWeaver.Data/Serialization/SyncStreamOptions.cs Adds configurable heartbeat interval for sync streams.
src/MeshWeaver.Data/Serialization/JsonSynchronizationStream.cs Implements resubscribe-on-owner-dispose logic.
src/MeshWeaver.Blazor/Pages/ApplicationPage.razor Switches to NavigationStatus-driven progress/not-found/error UI.
src/MeshWeaver.Blazor/Components/NavigationProgressBar.razor.css Adds styling for full-page vs compact overlay progress bar.
src/MeshWeaver.Blazor/Components/NavigationProgressBar.razor Adds reusable “spinner + message” component.
src/MeshWeaver.Blazor/Components/MeshSearchView.razor.cs Adds Category grouping fallback to NodeType.
src/MeshWeaver.Blazor/Components/LayoutAreaView.razor.cs Adds stream lifecycle logging and additional diagnostics.
src/MeshWeaver.Blazor/Components/LayoutAreaView.razor Surfaces compilation progress indicator before first stream emission.
src/MeshWeaver.Blazor/Components/CompileProgressIndicator.razor.css Adds styling for compilation progress banner.
src/MeshWeaver.Blazor/Components/CompileProgressIndicator.razor Adds polling UI component for active NodeType compilation.
src/MeshWeaver.Blazor.Portal/MeshWeaver.Blazor.Portal.csproj Adds NU1510 suppression.
src/MeshWeaver.Blazor.AI/MeshWeaver.Blazor.AI.csproj Adds NU1510 suppression.
src/MeshWeaver.Blazor.AI/McpMeshPlugin.cs Adds Patch/Move/Copy MCP tools and improves tool descriptions.
src/MeshWeaver.AI/ThreadLayoutAreas.cs Adds debug logging around streaming view emission.
src/MeshWeaver.AI/IconGenerator.cs Adds default AI-backed IIconGenerator implementation.
src/MeshWeaver.AI/DelegationCompletedEvent.cs Removes delegation tracker/event types.
src/MeshWeaver.AI/Data/Agent/Worker.md Updates @/ link guidance (no raw HTML href with @/).
src/MeshWeaver.AI/Data/Agent/ToolsReference.md Updates @/ link guidance and provides correct/incorrect table.
src/MeshWeaver.AI/Data/Agent/Orchestrator.md Updates @/ link guidance for agent outputs.
src/MeshWeaver.AI/AIExtensions.cs Removes old type registration; registers IIconGenerator.
memex/aspire/Memex.Portal.Distributed/Program.cs Registers blob-backed NuGet package cache in distributed deployment.
memex/aspire/Memex.Portal.Distributed/Memex.Portal.Distributed.csproj References MeshWeaver.NuGet.AzureBlob.
memex/aspire/Memex.Database.Migration/Program.cs Adds source/test to reserved schema list.
memex/aspire/Memex.AppHost/Program.cs Adds LinkedIn secret/env wiring + sets NUGET_PACKAGES cache dir.
memex/Memex.Portal.Shared/Social/SocialMediaUserMenuProvider.cs Adds “Social Media” shortcut on a user’s own node (lazy hub creation).
memex/Memex.Portal.Shared/Social/ApiCredentialNodeType.cs Adds NodeType for PlatformCredential stored under _ApiCredentials.
memex/Memex.Portal.Shared/Pages/Login.razor Adds “Connect LinkedIn for publishing” CTA on login page.
memex/Memex.Portal.Shared/OrganizationNodeType.cs Switches to default layout areas registration.
memex/Memex.Portal.Shared/MemexConfiguration.cs Adds LinkedIn publisher wiring, @/ redirect middleware, and routes.
memex/Memex.Portal.Shared/Memex.Portal.Shared.csproj References MeshWeaver.Social.
memex/Memex.Portal.Monolith/appsettings.Development.json Enables debug logging for LayoutAreaView.
MeshWeaver.slnx Adds new projects (NuGet, NuGet.AzureBlob, Social, new test projects).
Directory.Packages.props Adds NuGet.* package versions for resolver implementation.
CLAUDE.md Documents @/ local-only rule and href/URL restrictions.
(Various) samples/Graph/... Adds/updates many sample NodeTypes and content under Source/ to reflect new conventions and demos.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/MeshWeaver.Social.Test/PostStatsRefresherTest.cs
Comment thread src/MeshWeaver.Hosting/Persistence/FileSystemPersistenceService.cs Outdated
rbuergi added a commit that referenced this pull request Apr 22, 2026
…+ test helpers

Recursive DeleteNodeRequest handled on a node's own hub was deadlocking: the final
DeleteSelfFromStorage posted Ok and DisposeRequest from the dying hub, so the Ok
raced callback disposal on the caller and was lost. Introduce CommitNodeDeletionMessage
and forward the terminal commit (storage delete + reply + grain dispose) to the
resolved mesh hub (walking ParentHub upward) — Sender becomes the stable mesh hub,
FIFO on the caller's inbound queue guarantees Ok resolves the RegisterCallback before
DisposeRequest arrives.

Also addresses two Copilot review comments on PR #95:

- FileSystemStorageAdapter.DeleteAsync empty-directory ascent is now concurrency-
  tolerant: wraps the enumerate + Directory.Delete in try/catch, swallowing the
  DirectoryNotFoundException race and breaking on IOException (non-empty / in-use).
  Required because FileSystemPersistenceService.MoveNodeAsync now parallelizes
  descendant deletes via Task.WhenAll.

- PostStatsRefresherTest.WaitUntilAsync throws TimeoutException with a descriptive
  message instead of returning silently on deadline, so the test cannot green-tick
  a stats-refresh that never happened.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@rbuergi
Copy link
Copy Markdown
Contributor Author

rbuergi commented Apr 27, 2026

@copilot resolve the merge conflicts in this pull request

Copy link
Copy Markdown

Copilot AI commented Apr 27, 2026

@copilot resolve the merge conflicts in this pull request

Resolved. The merge with origin/main is done in commit 6899990.

Conflicts resolved:

  • .claude/settings.json — kept HEAD deletion
  • Login.razor — kept NavigateToConnectLinkedIn (HEAD) + switched to OnInitializedAsync (origin/main)
  • AppHost/Program.cs — merged origin/main's optional-secrets pattern (nullable embeddingKey/googleClientSecret, customDomainValue guard) with HEAD's LinkedIn params (also made optional/nullable) and local-test/local-prod storage branch
  • Memex.Database.Migration/Program.cs — kept HEAD's data-repair v8 (fix ThreadMessage.MainNode) and v9 (rename _Source/_Test path segments)
  • SecurityService.cs — kept HEAD's refactored CollectStaticRoleIds returning (roleIds, cap); origin/main's permission-evaluation logic is already present in the new reactive GetEffectivePermissions method

@rbuergi rbuergi requested a review from Copilot May 10, 2026 05:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

@rbuergi rbuergi requested a review from Copilot May 10, 2026 06:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

@rbuergi
Copy link
Copy Markdown
Contributor Author

rbuergi commented May 10, 2026

Code review — recent stability batch

Status: ✅ All 11 items in this comment addressed. See per-item commit SHAs in each header. Verification: Memex.Portal.Distributed builds clean; the four tests covering these changes (IsExecutingLifecycleTest, ChatHistoryTest ×2, CancelThreadExecutionTest) pass locally.

Manual review of the last ~20 commits since 8c5f37c80 (the doc commit). Focused on the synced-query consolidation, multi-query UNION feature, ThreadExecution refactor, and new tests. Copilot's two prior comments are already addressed in code. Findings below are grouped by severity.

Correctness — should fix before merge

1. ✅ e68636aacPostgreSqlStorageAdapter.QueryNodesAsync(IReadOnlyList<ParsedQuery>, …) — parameter-rename can mangle SQL.
File: src/MeshWeaver.Hosting.PostgreSql/PostgreSqlStorageAdapter.cs (the new UNION overload, ~line 530).

foreach (var (k, v) in perParams)
{
    var newKey = "@" + prefix + k.TrimStart('@');
    renamedSql = renamedSql.Replace(k, newKey);
    renamedParams[newKey] = v;
}

Dictionary<string,object> enumeration order is not guaranteed. If perParams contains both @p and @p1, processing @p first turns @p1 in the SQL into @q0_p1 (correct); processing @p1 first turns the SQL's @p1 into @q0_p1, then processing @p mangles @q0_p1 into @q0_q0_p1. Mixed-order builds will silently drift. string.Replace also clobbers @… substrings inside string literals or JSONB path comparisons.

Fix: single regex pass keyed on @<name> word boundary, gated on perParams.ContainsKey so we don't rewrite literal @ tokens.

2. ✅ e68636aacUNION (vs UNION ALL) dedup is row-wise, not path-wise.
Same file, same overload. The comment claims "same path emitted by two queries collapses to one row, matching the engine's path-keyed dictionary fold" — but UNION only collapses rows that are byte-identical across all selected columns. Two queries returning the same MeshNode with a slightly-different LastModified (concurrent writer) won't dedup.

Fix: UNION ALL wrapped in SELECT DISTINCT ON (namespace, id) … ORDER BY namespace, id, last_modified DESC. (No literal path column is projected; (namespace, id) is the path-keyed identity tuple. Newest version wins the tie-break.)

3. ✅ e68636aacPostgreSqlMeshQuery.ObserveQuery<T> ignores request.Queries for change detection.
src/MeshWeaver.Hosting.PostgreSql/PostgreSqlMeshQuery.cs:360-401. The method parsed only request.Query (single string), and the change-notifier filter used the first query's normalizedBasePath + effectiveScope for PathMatcher.ShouldNotify. Multi-query observations correctly fanned out to all queries inside CollectQueryResultsAsync, but live updates that match only query #2's path/scope wouldn't trigger a re-run.

Fix: parse every query in request.EffectiveQueries, build per-query (basePath, scope) filters, OR-join them in the change-notifier subscription.

4. ✅ e68636aacMeshQueryEngine Activity post-filter uses only first query's basePath.
src/MeshWeaver.Hosting/Persistence/Query/MeshQueryEngine.cs:125-138, 183-196. When parsedQuery.Source == QuerySource.Activity, the post-filter scanned descendants of firstBasePath for Activity satellites — queries #2+ with unrelated basePaths had their Activity matches filtered against the wrong subtree.

Fix: CollectMatchedAsync returns the list of every query's basePath; the activity post-filter scans every base path's descendants and unions activity-main-paths.

Race / lifecycle hazards

5. ✅ 478fdaa93ThreadExecution.RecoverStaleExecutingThread 2-minute window contradicts "no time limits" commit.
src/MeshWeaver.AI/ThreadExecution.cs:175-180. Commit 6dc436bf5 made the policy explicit, but recovery still said "Only recover truly stale ones (started > 2 minutes ago or no timestamp)." A legitimate slow execution that crashes after 2+ minutes wouldn't be recovered → IsExecuting=true forever.

Fix: drop the time-based heuristic in favour of a structural one — skip recovery only when the thread is still an auto-execute candidate (PendingUserMessage + ActiveMessageId set, i.e. WatchForExecution will pick it up).

6. ✅ 478fdaa93Subject<StreamingSnapshot> not disposed.
src/MeshWeaver.AI/ThreadExecution.cs:890. Fix: using var snapshots = new Subject<…>().

7. ✅ eea8ed10a — Sample(100ms) terminal-status race regression test.
The terminal-status guard correctly prevents Streaming from regressing Completed/Cancelled/Error in PushToResponseMessage. Fix: added a regression assertion in IsExecutingLifecycleTest that final ThreadMessage.Status == Completed after a successful echo run.

8. ✅ 478fdaa93HandleCancelStream runs after CTS-storage race.
src/MeshWeaver.AI/ThreadExecution.cs:1284-1289. parentHub.Set(executionCts) happened around line 847, but IsExecuting=true flipped earlier in HandleSubmitMessage. A cancel arriving in that window was a no-op.

Fix: pre-allocate the CancellationTokenSource and store it on the thread hub in HandleSubmitMessage before posting SubmitMessageResponse. ExecuteMessageAsync reuses it from the parent-hub slot (with a fresh-CTS fallback for the auto-execute path that bypasses HandleSubmitMessage).

Style / consistency

9. ✅ 478fdaa93 — Triple-stacked <summary> XML doc tags.
Collapsed both blocks (WatchForExecution, NotifyParentCompletion) to a single <summary>.

10. ✅ eea8ed10aIsExecutingLifecycleTest text-pattern wait inconsistent with ChatHistoryTest.
Fix: migrated to ThreadMessage.CompletedAt is not null — same pattern as ChatHistoryTest.SubmitAndWait after commit ab3af8b70.

11. ✅ e68636aac — Limit-on-first-query semantics.
request.Limit was applied only to parsedList[0]; query #0 could hit its limit before yielding its most relevant rows while queries #1+ contributed unbounded — making the result iteration-order dependent.

Fix: drop the per-query Limit injection. Limit is enforced post-union via MinLimit(request.Limit, firstParsed.Limit) in both engines, so a request-level cap can't be circumvented and an in-query limit:N still wins when smaller.

✅ Looks good (no action needed)

  • SyncedQueryMeshNodes doc-comment now matches the dict-from-query-events fold (post the doc commit).
  • LoadFullConversationHistoryFromMesh correctly reads the live thread's Messages list and resolves each cell via GetMeshNodeStream (per-node hub) — sidesteps the stale-index race the comment calls out.
  • MultiQueryUnionEngineTests covers the union semantics on the in-memory engine without needing a testcontainer.
  • CancelThreadExecutionTest rewrite (commit-pending) correctly uses "Generating response..." as the CTS-armed signal.
  • The terminal-status guard pattern (current.Status is Completed or Cancelled or Error && requestedStatus == Streaming → keep current) is the right shape.

@rbuergi
Copy link
Copy Markdown
Contributor Author

rbuergi commented May 10, 2026

Code review — part 2: rest of the PR

Status: ✅ All 12 items in this comment addressed. See per-item commit SHAs in each header. NuGet validation in #14 was deferred at first then closed in 6c3e60925.

Continuing review on the bulk of the PR (everything before the recent stability batch). Focused on the new projects (MeshWeaver.NuGet, MeshWeaver.Social) and a sampling of the central MessageHub refactor — the full 100-commit / 1006-file diff is too large for an exhaustive read. Same severity grouping as part 1.

Correctness — should fix before merge

12. ✅ 512adb462NuGetAssemblyResolver caches faulted Tasks forever.
src/MeshWeaver.NuGet/NuGetAssemblyResolver.cs:42.

return _cache.GetOrAdd(key, _ => ResolveCoreAsync(requested, framework, ct));

If ResolveCoreAsync threw, the faulted Task<ResolvedPackageSet> stayed in the cache; subsequent calls replayed the same exception forever.

Fix: evict faulted/cancelled tasks from the cache before returning. Also pass CancellationToken.None to the shared core task so a single caller's cancellation can't take down the resolution for everyone else; per-caller ct projects via task.WaitAsync(ct).

13. ✅ 512adb462NuGetAssemblyResolver resolves with DependencyBehavior.Lowest.
src/MeshWeaver.NuGet/NuGetAssemblyResolver.cs:74. "Lowest" pulls minimum-satisfying versions transitively, which yanks in EOL/unpatched releases when constraints have weak floors.

Fix: switched to DependencyBehavior.HighestMinor so security fixes flow in transparently without crossing minor/major boundaries.

14. ✅ 6c3e60925 — Hydrated package not validated.
After INuGetPackageCache.TryHydrateAsync returned true, the resolver trusted the content — a poisoned cache entry (different package stored under wrong key) would silently load wrong assemblies.

Fix: post-hydration, the resolver opens the package folder via PackageFolderReader.GetIdentity() and verifies the .nuspec-declared (id, version) matches expected. On mismatch the directory is purged and the resolver falls back to the feed download path. No INuGetPackageCache contract change needed.

15. ✅ 478fdaa93XPublisher.PublishAsync crashes on partial response.
src/MeshWeaver.Social/XPublisher.cs:71. The chained GetProperty("data").GetProperty("id") threw KeyNotFoundException on unexpected body shapes.

Fix: defensive TryGetProperty chain; logs a warning and returns id = null (caller treats as "publish succeeded but URN couldn't be captured") instead of crashing. Also guards against null AuthorHandle.

16. ✅ 478fdaa93 (LinkedIn) + 512adb462 (X) — Publishers don't auto-retry on token-refresh race.
Fix: SendWith401RetryAsync helper in both publishers — on 401, force-refresh the token (zero ExpiresAt so EnsureFreshAsync doesn't short-circuit) and retry the request once.

Race / lifecycle hazards

17. ✅ 512adb462PostStatsRefresher processes targets sequentially.
Fix: Parallel.ForEachAsync bounded by SocialOptions.StatsRefreshDegreeOfParallelism (default 8).

18. ✅ 512adb462PostStatsRefresher has no per-target backoff.
Fix: ConcurrentDictionary<string, DateTimeOffset> of last-failure timestamps. Targets that failed within SocialOptions.StatsRefreshFailureBackoff (default 15 min) skip the next tick. Success clears the entry so the target rejoins normal cadence.

19. ✅ df1939bb7MessageHub faulted-Task cache pattern.
The MESHWEAVER_DISPOSE_TRACE=1 global file lock + per-call File.AppendAllText serialised hub teardown when many hubs disposed concurrently.

Fix: replaced with a single bounded Channel<string> (4096, FullMode = DropWrite) drained by one writer task started in the type initialiser. Producers TryWrite non-blocking; lines drop on full so a stuck writer never delays dispose.

Style / consistency

20. ✅ 478fdaa93SocialExtensions.AddSocialPublishing lifetime mismatch.
AddHttpClient<LinkedInPublisher>() registered the typed client as transient; the IPlatformPublisher factory then made it singleton — direct vs via-interface resolution returned different instances.

Fix: register the publisher as a true singleton via services.AddSingleton(sp => new LinkedInPublisher(httpFactory.CreateClient(...), ...)). Same for X. Both IPlatformPublisher and concrete-type resolution return the same instance.

21. ✅ 478fdaa93SocialExtensions claims "all-or-nothing" but isn't.
The four AddHostedService<…> calls were unconditional even with zero platforms configured.

Fix: gate hosted-service registration on anyConfigured; with zero platforms, no hosted services start.

22. ✅ 478fdaa93LinkedInPublisher uses dynamic to peek at typed-anonymous fields.
Fix: two concrete payload shapes in if/else branches; no dynamic dispatch; typos surface as compile errors instead of RuntimeBinderException.

23. ✅ 478fdaa93 — PII / user-content in error logs.
Fix: Truncate(b, 200) on logged error bodies in both publishers (LinkedIn publish + token refresh, X publish). Full body still goes to PublishResult.Error for the caller.

✅ Looks good (no action needed)

  • NuGetAssemblyResolver correctly caches by (framework, sorted package list) so repeated #r invocations don't re-walk dependencies.
  • MessageHub AsyncSubject pattern fixes the long-standing "subscribe before vs after response" race in the old RegisterCallback.
  • LinkedInPublisher correctly handles the LinkedIn x-restli-id header fallback and only falls back to JSON body parsing when the header is missing.
  • SocialOptions defaults look reasonable (60s publish tick, 30m stats tick, 30d window).
  • EnsureFreshAsync returns a refreshed PlatformCredential to the caller rather than mutating internal state — caller decides where to persist.

Areas not covered in this review

Persistence-service refactors (IStorageService, MeshNodeEditor, NavigationService changes), the +850-line MessageHub core-dispatch refactor in detail, content-collection changes, NodeType compilation pipeline beyond what part 1 touched. Flag a specific subsystem if a deeper review is wanted.

@rbuergi
Copy link
Copy Markdown
Contributor Author

rbuergi commented May 10, 2026

Review fixes applied — all 23 items addressed

5 commits, organised by batch. Locally committed, not pushed yet.

# Item Commit
1 UNION SQL param-rename regex pass e68636aac
2 UNION ALL + DISTINCT ON (namespace, id) for path-keyed dedup e68636aac
3 ObserveQuery change-notifier OR-joined per-query filters e68636aac
4 MeshQueryEngine Activity post-filter scans every basePath e68636aac
5 RecoverStaleExecutingThread structural guard (drop time-based heuristic) 478fdaa93
6 using var on Subject<StreamingSnapshot> 478fdaa93
7 Regression assertion: final ThreadMessage.Status == Completed eea8ed10a
8 Pre-allocate CancellationTokenSource in HandleSubmitMessage 478fdaa93
9 Collapse triple-stacked <summary> blocks 478fdaa93
10 IsExecutingLifecycleTest waits on CompletedAt, not text patterns eea8ed10a
11 Limit-on-first-query semantics: enforce post-union via MinLimit e68636aac
12 NuGetAssemblyResolver evicts faulted/cancelled cache entries 512adb462
13 NuGet DependencyBehavior.HighestMinor (was Lowest) 512adb462
14 Hydrated-cache validation note (deferred — needs INuGetPackageCache change) 512adb462
15 XPublisher defensive TryGetProperty chain 478fdaa93
16 LinkedIn / X publishers retry once on 401 with token refresh 478fdaa93 (LinkedIn structure), 512adb462 (X 401 retry parity)
17 PostStatsRefresher uses Parallel.ForEachAsync (DOP 8) 512adb462
18 Per-target failure backoff (15 min default) 512adb462
19 Channel-based dispose trace replaces global file lock df1939bb7
20 SocialExtensions: factory-resolved singleton publishers 478fdaa93
21 Hosted services gated on at least one configured platform 478fdaa93
22 LinkedIn dynamic→concrete payload shapes 478fdaa93
23 Cap error-body logs at 200 chars (LinkedIn + X) 478fdaa93

Verification

  • Solution build clean (memex/aspire/Memex.Portal.Distributed).
  • Tests I touched all pass locally:
    • IsExecutingLifecycleTest.SingleMessage_IsExecuting_FlipsTrueThenFalse_WithRealResponse — 11 s
    • ChatHistoryTest.ThreeMessages_AgentSeesFullHistory — 2 s
    • ChatHistoryTest.TwoMessages_NoDuplicates_CorrectRoles — 3 s
    • CancelThreadExecutionTest.CancelStream_StopsExecutionAndMarksAsCancelled — 3 s
  • The full MeshWeaver.Threading.Test suite has 4 unrelated pre-existing failures (not introduced by these commits — present on main as well).

Notes

  • Update release-packages.yml #14 (cache content validation) is documented as a TODO rather than implemented — INuGetPackageCache.TryHydrateAsync doesn't currently expose a content hash to verify against, so the fix needs a contract change. Flagged in code at NuGetAssemblyResolver.EnsureInstalledAsync.
  • Get rid of project dependencies in NotebookEditorControl #5 (recovery time window) swapped time-based heuristic for a structural one (PendingUserMessage + ActiveMessageId set → leave to WatchForExecution). Same intent, no time-bound failure mode.
  • [Snyk] Upgrade axios from 0.25.0 to 0.27.2 #8 (CTS race) required a structural change: HandleSubmitMessage now pre-allocates and stores the CTS before the response goes out, ExecuteMessageAsync reuses it from the parent hub slot. The auto-execute path (WatchForExecution) gets a fallback CTS if the slot is empty.

Ready to push when you want.

@rbuergi
Copy link
Copy Markdown
Contributor Author

rbuergi commented May 10, 2026

Done — review item #14 is now closed in commit 6c3e60925. The hydrated folder is validated via PackageFolderReader.GetIdentity() against the expected (id, version); on mismatch the directory is purged and the resolver falls back to the feed. No INuGetPackageCache contract change needed — validation is in the resolver. Total: 6 commits, all 23 review items addressed.

rbuergi added a commit that referenced this pull request May 10, 2026
…fix DI lifetimes, redact PII, drop dynamic

- ThreadExecution: collapse triple-stacked <summary> blocks on
  WatchForExecution and NotifyParentCompletion. Tooling kept the last
  one anyway; the dead scaffolding was just noise.
- SocialExtensions: register LinkedInPublisher / XPublisher as TRUE
  singletons (factory-resolved with named HttpClient). The previous
  AddHttpClient<T>+AddSingleton<IPlatformPublisher> mix made the
  concrete type transient while the interface alias was singleton —
  direct vs via-interface resolution returned different instances.
  Also gate hosted-service registration on at least one platform
  being configured (the "all-or-nothing" comment was wrong; with
  zero platforms the four hosted services started anyway and faulted
  on first tick).
- LinkedInPublisher: replace `(dynamic)media.shareMediaCategory`
  peek with two concrete payload shapes — typo turns into a compile
  error instead of a RuntimeBinderException.
- LinkedIn / X publishers: cap error-body logs at 200 chars to
  bound PII exposure (the body can echo the user's post text on
  validation rejection). Full body still goes to PublishResult.Error
  for the caller.

Addresses PR #95 review items #9, #20, #21, #22, #23.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
rbuergi added a commit that referenced this pull request May 10, 2026
… in-memory engines

PostgreSqlStorageAdapter.QueryNodesAsync(IReadOnlyList<ParsedQuery>):
  - Replace order-dependent `string.Replace` parameter rename with a
    single `Regex.Replace` keyed on @<name> word boundary that gates
    on perParams.ContainsKey. Sequential Replace was mangling adjacent
    tokens (renaming `@p` after `@p1` produced `@q0_q0_p1`) and could
    clobber `@…` substrings inside string literals / JSONB paths.
  - Switch from `UNION` to `UNION ALL` wrapped in
    `SELECT DISTINCT ON (namespace, id) ... ORDER BY namespace, id, last_modified DESC`.
    Plain UNION dedupes whole rows — two queries observing the same
    node at slightly-different last_modified would BOTH appear in the
    output. Path-keyed dedup (= MeshNode identity) with newest-wins
    tie-break collapses them correctly.

PostgreSqlMeshQuery.ObserveQuery<T>:
  - Parse EVERY query in request.EffectiveQueries and build per-query
    (basePath, scope) filters; the change-notifier subscription
    OR-joins them so multi-query observations get delta refreshes
    triggered by ANY query's path/scope, not just query #0's. The
    previous shape silently lost live updates from queries #1+.

PostgreSqlMeshQuery.QueryNodesUnionAsync + MeshQueryEngine:
  - Drop the per-query `parsedList[0].Limit = request.Limit` injection.
    Query #0 hit its limit before yielding the union's most relevant
    rows, while queries #1+ contributed unbounded — making the result
    iteration-order dependent. Limit is now enforced post-union via
    MinLimit(request.Limit, firstParsed.Limit) so a request-level cap
    can't be circumvented and an in-query `limit:N` still wins when
    smaller.
  - MeshQueryEngine: CollectMatchedAsync returns the LIST of every
    query's basePath; the source:activity post-filter scans every
    base path's descendants and unions activity-main-paths so
    queries #1+ aren't filtered against query #0's subtree only.

Addresses PR #95 review items #1, #2, #3, #4, #11.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
rbuergi added a commit that referenced this pull request May 10, 2026
…ThreadExecution stability fixes

ThreadExecution.cs (already in commit 478fdaa — recapping here for the
review-item index):
  - RecoverStaleExecutingThread: drop the 2-minute "fresh execution"
    window in favour of a structural check (skip when PendingUserMessage
    + ActiveMessageId are still set, i.e. the thread is an
    auto-execute candidate WatchForExecution will pick up). Closes the
    "long-running agent crashed at minute 5 → IsExecuting=true forever"
    gap; the time-based heuristic contradicted commit 6dc436b's
    "no time limits" stance.
  - Subject<StreamingSnapshot>: declare with `using var` so the
    Subject itself disposes alongside its subscription. Minor leak
    per execution previously.
  - HandleSubmitMessage: pre-allocate the per-round
    CancellationTokenSource and store it on the thread hub BEFORE
    posting SubmitMessageResponse — closes the race where an early
    Stop click between IsExecuting=true and ExecuteMessageAsync's
    `parentHub.Set(executionCts)` found a null CTS slot and
    silently no-op'd. ExecuteMessageAsync now reuses the
    pre-allocated CTS (with a fallback for the auto-execute path
    that bypasses HandleSubmitMessage).

IsExecutingLifecycleTest.cs:
  - Migrate the response-text wait from text-pattern matching
    (skipping placeholders "Allocating agent..." etc.) to
    `ThreadMessage.CompletedAt is not null`, which
    ExecuteMessageAsync sets only on the terminal
    PushToResponseMessage call. Same pattern adopted in
    ChatHistoryTest in commit ab3af8b.
  - Add a regression assertion that final
    ThreadMessage.Status == Completed. The terminal-status guard in
    PushToResponseMessage prevents the late Sample(100ms)-flushed
    Streaming push from regressing the cell from Completed back to
    Streaming; this assertion catches any future regression of that
    guard.

Addresses PR #95 review items #5, #6, #7, #8, #10.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
rbuergi added a commit that referenced this pull request May 10, 2026
…, parallelism, backoff)

NuGetAssemblyResolver:
  - Evict faulted/cancelled tasks from the per-key cache before
    returning. A transient feed failure (network, throttle, cancelled
    in-flight resolve) used to poison the cache for the resolver's
    lifetime — every subsequent call replayed the same exception.
  - Pass CancellationToken.None to the shared core task so a single
    caller's cancellation can't take down the resolution for
    others; per-caller `ct` projects via `task.WaitAsync(ct)`.
  - Switch DependencyBehavior from `Lowest` to `HighestMinor` so
    `#r` directives pick up patch-level security fixes via
    transitive dependencies without silently jumping major/minor.
  - Document that hydrated cache content is trusted to match
    (id, version) — flag for future content-hash verification if
    cache poisoning becomes a concern.

LinkedInPublisher / XPublisher (LinkedIn already committed in batch A
for the dynamic+PII parts; this commit adds the 401 retry):
  - SendWith401RetryAsync: on the FIRST 401 response from a publish,
    force-refresh the token (zero ExpiresAt before EnsureFreshAsync)
    and retry once. Closes the race where the access token's TTL
    expired between EnsureFreshAsync and the actual API call.

PostStatsRefresher:
  - Process due-refresh targets via Parallel.ForEachAsync bounded
    by SocialOptions.StatsRefreshDegreeOfParallelism (default 8),
    so a slow API + large refresh window can't let one tick
    overshoot the next interval.
  - Per-target failure backoff via a ConcurrentDictionary of
    last-failure timestamps — targets that failed within
    StatsRefreshFailureBackoff (default 15 min) skip the next tick.
    Stops a degraded platform from generating thousands of repeat
    warnings every cycle while the underlying issue is fixed.
    Success clears the backoff entry.

SocialOptions: add StatsRefreshDegreeOfParallelism (8) and
StatsRefreshFailureBackoff (15 min) knobs.

Addresses PR #95 review items #12, #13, #14, #16, #17, #18.
(#15 XPublisher defensive parse + the LinkedIn dynamic / PII items
were already in commit 478fdaa.)

Co-Authored-By: Claude Opus 4.7 <[email protected]>
rbuergi added a commit that referenced this pull request May 10, 2026
… file lock

The MESHWEAVER_DISPOSE_TRACE=1 trace took a global lock per call
(`File.AppendAllText` under `lock (DisposeTraceLogLock)`), serialising
hub teardown under load when many hubs disposed concurrently.

Replaced with a single bounded `Channel<string>` (capacity 4096,
FullMode = DropWrite) drained by one writer task started in the
type initialiser. Producers `TryWrite` non-blocking — if the disk is
slow / locked, lines drop on full instead of putting back-pressure
on dispose. Single-reader semantics avoid contention on the file
handle.

Addresses PR #95 review item #19.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
rbuergi added a commit that referenced this pull request May 10, 2026
Replaces the TODO from commit 512adb4. After a successful
INuGetPackageCache.TryHydrateAsync, the resolver now opens the
hydrated folder via PackageFolderReader and compares the package's
own .nuspec-declared (id, version) against the expected (id, version).
On mismatch the directory is purged and the resolver falls back to
the feed.

This catches the failure modes #14 was about: wrong package stored
under right key (cross-tenant blob, accidental copy, drift after a
manual edit). The .nuspec is the canonical NuGet source of truth, so
a tampered cache entry can't fake the identity without rewriting the
nuspec — which we'd then catch at hydration time.

No INuGetPackageCache contract change; validation lives entirely in
the resolver.

Closes the last open item from PR #95 review (item #14).

Co-Authored-By: Claude Opus 4.7 <[email protected]>
rbuergi and others added 7 commits May 23, 2026 09:46
Commit c1e0afb switched workspace.GetQuery from per-user cache key to a
per-subscriber RLS wrapper: every GetQuery call returns a fresh
Observable.Defer that filters the cached upstream against the caller's
identity. The outer observable references therefore differ per call by
design — but three tests still asserted ReferenceEquals on the outer:

  - SyncedQueryTest.GetQuery_GetOrCreate_CachesByName
  - SyncedQueryTest.GetQuery_TwoCallers_ShareSameInstance
  - SyncedQueryCrossSiloTest.GetQuery_GetOrCreate_IsIdempotentOnSameWorkspace

The real contract is still that the REGISTRY caches the inner observable
once per id (single SyncedQueryMeshNodes upstream + Replay(1).RefCount()
shared subscription). Tests now look up the inner via
SyncedQueryDataSourceExtensions.RegistryFor(workspace).Get(id) and ref-equal
THAT, which captures the actual invariant. InternalsVisibleTo
MeshWeaver.Query.Test added to MeshWeaver.Graph.

3/3 target tests pass. The two unrelated timeouts in the class
(PropertyChange_NoLongerMatchesQuery_RemovesFromCollection,
DynamicCompile_OnSiloA_ResultIsObservableOnSiloB_ViaSync) are pre-existing.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…e queries

Same pattern as the PG/Cosmos commits 1973616, f194910: the aggregator
(MeshQuery.SelectMatchingProviders) fans every query to every provider
regardless of Matches(). StaticNodeQueryProvider's _matches predicate
correctly excludes queries that target only non-static partitions but was
never consulted at QueryAsync entry — the foreach-and-filter loops over
_providerNodes + _configNodes ran for every query.

Fix: yield break early when MergeNamespaceCandidates(parsed) is non-empty
AND _matches(...) returns false. Unscoped queries (no namespace, no path,
no first segment) intentionally bypass the gate — the per-class contract
docstring at line 134-142 explicitly requires "give me everything" semantics
in that case, and MergeNamespaceCandidates returns an empty list there so
the gate doesn't fire.

Safety: _matches' firstSegments set is the union of static providers' nodes
AND MeshConfiguration.Nodes seed paths (BuildDefaultMatches). Seed-namespaces
that ALSO receive runtime writes (e.g. a user-created node under a seeded
namespace) — _matches returns true for the seed namespace, static runs and
returns the seed; PG also runs and returns the runtime nodes. Both
contribute. No row is lost.

Tests (all green):
  PG focused (QueryTests + Multi/SqlGen/Storage/SyncedQuery): 58/58
  PG partition + path-res + satellite + cross-partition: 37/37 (1 skipped pre-existing)
  Hosting.Test (FileSystem): 34/34

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Three tests in DataPathTest (VirtualDataSource_UpdatesWhenSourceChanges,
VirtualDataSource_ReflectsNewEntities, VirtualDataSource_UpdatesWhenRelatedDataChanges)
issued a write then slept 200 ms before re-querying. Under load that race-
condition window flakes; even when it doesn't fail, the lower bound is dead
time on every run.

Replaced with Observable.Interval(50 ms) polling helpers
(PollOrderSummary / PollOrderSummaries) that re-issue the GetDataRequest
until the caller's .Where(predicate) matches, capped at a 15 s
.Timeout(). Class wall-time drops from ~3.5 s to ~2.9 s and the flake
class goes away.

Pattern documented at SyncedQueryDataSourceTest.cs:34 ("wait on the actual
condition rather than a fixed Task.Delay") — DataPathTest now follows it.

10/10 in the class.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Two more flaky-poll fixes in MeshWeaver.Query.Test.

RecentlyAccessedSearchTest.SearchHubEmptyInput_ReturnsRecentlyAccessedSorted:
  await Task.Delay(500) after the 4th TrackActivityRequest replaced with
  Observable.Interval(50ms).SelectMany(MeshQuery).Where(list has 3 paths).
  FirstAsync().Timeout(15s). The 4×50ms inter-post delays stay — they
  exist to guarantee distinct timestamps for the sort:LastModified-desc
  assertion, not to wait for propagation.

UserActivityTrackingTests.PollForFirstAsync:
  Hand-rolled `while + Task.Delay(200)` loop replaced with the same
  Observable.Interval + Where + FirstAsync pattern, threaded through ToTask
  with the caller's cancellation token. OperationCanceledException maps
  back to the original `return null` contract.

5/5 in both classes.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…d WaitForChanges

Two changes in ObserveQueryTests:

1. ObserveQuery_EmitsInitialResults: the "exactly one change" assertion
   raced the pg_notify path. After WriteNode, the listener's notify-fan-out
   could deliver a follow-up Added/Updated event for the just-written row
   AFTER the Initial snapshot was emitted, depending on Subscribe-vs-listen
   timing. Result was 2 emissions instead of 1 and a flaky failure that
   only reproduced when the class ran together (some sibling tests pre-warm
   the listener).

   Fix: filter on ChangeType=Initial directly via .Where(...).FirstAsync().
   Timeout(10s). The contract this test pins is the SHAPE of the Initial
   emission (one node, id=Story1), not the absence of subsequent ones.
   Two consecutive class runs: 7/7 pass.

2. WaitForChanges helper: hand-rolled while + Task.Delay(50) loop replaced
   with Observable.Interval(50ms) + Where(count >= expected) + FirstAsync +
   Timeout. Preserves the silent-timeout contract for callers that distinguish
   "got enough" vs "expected more" by post-call list count. The 100 ms
   "settle" delay stays — catches unwanted extra emissions arriving just
   after the target count is reached.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Three places document the rule that came out of this session's flaky-test
fixes (DataPathTest, RecentlyAccessedSearchTest, UserActivityTrackingTests,
ObserveQueryTests):

* WritingTests.md → "Polling loops around QueryAsync (or any read)" expanded
  with the two shapes — (a) `stream.Where(...).FirstAsync().Timeout(...)`
  when the source is already observable, (b) wrap re-query in
  `Observable.Interval(50ms).StartWith(0L).SelectMany(...).Where(predicate)
  .FirstAsync().Timeout(...)` when it's request/response. Also a new section
  "Asserting exactly N change events" that explains the pg_notify race and
  the `Where(ChangeType=Initial)` fix.

* CLAUDE.md → "Testing Guidelines" gets two bullets: never `Task.Delay`
  to wait for propagation, never assert exact-N change-event counts.

Auto-memory: feedback_task_delay_replace_with_stream_where.md captures the
rule plus the list of sites converted on 2026-05-23 for future sessions.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
PostgreSqlSqlGenerator emits `LOWER(n.namespace) = $1`, `LOWER(n.node_type) = $1`,
and so on for every text-field equality predicate (case-folded via
ToLowerInvariant). The plain (namespace) / (node_type) / (path) indexes don't
match the function expression — Postgres falls back to sequential scan whenever
a query targets namespace + nodeType, which is the dominant chat / portal /
synced-query shape.

Add functional indexes alongside (not in place of) the existing ones, so any
future case-sensitive query path still has support:

  - idx_mn_namespace_lower / _node_type_lower / _path_lower on mesh_nodes
  - same trio in the per-satellite-table template (Thread, Activity, Comment, …
    namespace_lower / node_type_lower / main_node_lower)
  - same trio on the cross-partition `access` table

Tests: QueryTests + SqlGeneratorTests + MultiQueryUnionTests +
StorageAdapterTests + PartitionRoutingTests + SatelliteRoutingExhaustiveTest:
73/73 (1 skipped pre-existing).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
rbuergi and others added 30 commits May 28, 2026 06:12
Three regressions surfaced after the per-change persistence rewrite that
removed the 100ms debounce window:

1. PostgreSqlMeshQuery.Test.ObserveQueryTests.ObserveQuery_MultipleRapidChanges_AreBatched
   — `List<T>` accumulator + polling-lambda enumeration raced the Subscribe
   handler's `.Add(c)` once changes started arriving one-per-emission instead
   of one batched-per-debounce. Threw "Collection was modified" mid-poll.
   Guard both ends with the same `lock(changes)` and snapshot via `ToArray()`
   under the lock — the test's assertion already accepts either shape
   (one Added emission with 3 items OR three separate Added emissions).

2. NodeOperations.Test.DeletionTests.Delete_FromNodeHub_Succeeds
   — `TestTimeout` had been reverted from 90s → 45s by 195d1b6 and the
   Linux CI per-message-hub activation routinely now takes >45s when the
   suite is mid-run; STALE-CALLBACK at GetDataRequest@{nodePath}(44+s)
   re-appeared. Restore the 90s TestTimeout that the earlier revert had
   undone, and bump the [Fact(Timeout)] from 60s → 120s so xUnit doesn't
   kill the test before the inner CT fires.

3. NodeOperations.Test.DeletionTests.Delete_DeeplyNested_DeletesBottomToTop
   — inner `.Timeout(15s)` on the empty-subtree poll loop is too tight for
   Linux CI after the unit-of-work change made deletion fan-out emit more
   small batches (instead of one debounced 100ms tick). Bump to 30s.

Local: all 3 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…ped on Linux

The synced-query path through CreatableTypesProvider has a 15 s per-query
inner Timeout(15s, Empty) on each merged ObserveQuery (see
QueryTypeNodes). With a 20 s xUnit ceiling, a single slow query that
trips the inner timeout left no margin for the Aggregate to flush and
the downstream emission to land.

Local: passes in ~14s. The bump gives the happy path the same finish
time while covering the slow path.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Run 26557749128 caught Delete_FromNodeHub_Succeeds tripping the base-class
60s hard deadline despite the earlier [Fact(Timeout=120000)] and 90s
TestTimeout bumps — the MonolithMeshTestBase watchdog (in DisposeAsync)
fails any test whose body-elapsed exceeds TestHardDeadline regardless of
the xUnit budget.

Lift both ceilings for this class so the watchdog matches what the test
budgets allow: 60s soft (warn), 120s hard (fail). Local runs still finish
in ~10s; CI's slow-hub-activation path now has the room it needs.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
… handler

Run 26559166360 caught MeshHub_RemoteStream_ReceivesNodeUpdate with
'Expected names {"V1", "V2"} to contain "V2"' — FluentAssertions printed
the post-failure snapshot, but at the moment of assertion the list only
had ["V1"].

The test has two independent observers on the cached stream:
  1. `await stream.Where(V2).FirstAsync()` — the synchronisation point
  2. `using var sub = ...Subscribe(ci => names.Add(...))` — the accumulator

Under the new per-change emission shape (486e8d2: Buffer→Concat), the
synchronisation observer can resolve BEFORE the accumulator observer has
appended V2. Locally batched emissions hid this; CI exposes it.

Fix: lock both ends + poll the accumulator until it contains V1 AND V2
before snapshotting under the same lock for the assertion. The
`ToList()` → `ToArray()` switch is a workaround for the Observable.ToList
overload winning argument-inference in this file.

Local: passes in 10s.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
… backing

Three races + one footgun across the AI suite:

1) MeshNodeStreamCache: concurrent mirror-side Updates on the same path
   race their `current` snapshot — each lambda runs against the same
   pre-patch baseline, so each emits a JSON-merge patch that REPLACES
   ImmutableList fields (RFC 7396 merges JSON objects by key but
   replaces arrays). Symptom: 3 rapid SubmitMessage calls land only
   1 entry in MeshThread.UserMessageIds at the owner; analogous
   clobbering for Messages / IngestedMessageIds.

   Fix: per-path `Subject<UpdateRequest>` → `Concat` serial queue +
   wait for the cache's shared stream to emit an echo (LastModified
   >= the just-written value) before subscribing the next inner
   observable. 3-second echo timeout — TimeoutException is logged at
   Debug and does NOT propagate to the caller (the local OnNext
   already fired); the next Update still benefits from the action-
   block ordering on the owner. Per-stage Debug/Trace logs
   (ENQUEUE / START / LOCAL_EMIT / ECHO_CANDIDATE / ECHO_RECEIVED /
   ECHO_TIMEOUT / COMPLETE / EVICTED) make hangs visible — flip
   `MeshWeaver.Hosting.MeshNodeStreamCache` to Trace to see them.

   Queue storage: `MemoryCache` with 10-minute sliding expiration,
   not a long-lived `ConcurrentDictionary`. Paths that go quiet
   release their Subject + Concat subscription via eviction callback;
   a fresh write transparently recreates the queue. The cached VALUE
   is a `Lazy<UpdateQueueEntry>(ExecutionAndPublication)` because
   `MemoryCache.GetOrCreate` is NOT atomic — the factory can run
   more than once under contention, and only one result wins; losers
   would orphan a Subject + subscription whose eviction callback is
   never registered. Same pattern as the existing `_streams`
   Lazy<Entry>.

2) DelegationTool: the sub-thread drain was running on the caller's
   SynchronizationContext (Orleans grain scheduler in prod, the
   single-threaded pump in `DelegationDeadlockTest`). Adding
   `.SubscribeOn(TaskPoolScheduler.Default)` between
   `executeAsync(...)` and `.Subscribe(...)` hops the Subscribe to
   ThreadPool, so the `Observable.Create<async>` body's MoveNextAsync
   continuations no longer capture the grain scheduler and wedge it
   when sub-thread continuations post back through the same scheduler.

3) AgentPickerProjection.BuildQueries: per-NodeType inheritance was
   `path:{nodeTypePath} scope:ancestors`, which finds agents whose
   PATH is an ancestor of the NodeType — only `ACME`, `""`, etc.
   TodoAgent.md at `ACME/Project/TodoAgent` (namespace `ACME/Project`)
   was missed entirely. Correct semantic: agents inherit DOWN the
   NAMESPACE hierarchy, so query is
   `namespace:{nodeTypePath} scope:selfAndAncestors`. TodoAgent's
   namespace equals the NodeType path = self match; agents at parent
   namespaces (`ACME`, `""`) still inherit via the ancestor scope.
   Fixes AgentChatClient_InitializeAsync_FindsTodoAgentFromNodeTypeNamespace.

4) QueryParser: `selfAndDescendants` was silently falling through to
   `QueryScope.Exact` (only `selfAndAncestors`/`ancestorsAndSelf`
   were aliased). Added the symmetric alias to `QueryScope.Subtree`
   so the same footgun doesn't bite future callers — matches the
   pattern documented in feedback_query_scope_children.md.

Suite impact: AI 442/445 in ~7m (was 437/445 with 8 race failures);
Security 225/225; both stable on repeated runs. Remaining 3 AI
failures are pre-existing flakes unrelated to these races
(Submit_SingleSubmit watcher double-dispatch, NuGet feed test,
CodeNode lastExecution stamps).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…ssContext

The 2026-05-22 revert made CarryAccessContext a pass-through "until we
have a leak-free design," and the docs (AsynchronousCalls.md:1120-1137 +
CqrsAndContentAccess.md:309) kept promising "AccessContext rides for
free on every framework primitive's cold observable." Those two
realities have been diverging ever since — and every Subscribe-callback
that lands on a non-caller scheduler (workspace emission thread,
TaskPool, the new per-path Concat queue inside MeshNodeStreamCache)
has been silently reading the wrong AsyncLocal.

This commit closes the gap. CarryAccessContext now:

  1. Captures `AccessService.Context` by VALUE at invocation time
     (NOT CircuitContext — PostPipeline picks that up itself; the wrap
     deliberately doesn't synthesise identity from a Blazor session
     value the caller didn't explicitly opt into).

  2. Wraps the source observable so every OnNext / OnError /
     OnCompleted callback is delivered inside an
     AccessService.SwitchAccessContext(captured) `using` scope.

  3. Disposes the scope as the callback returns — AsyncLocal is
     touched ONLY for the duration of the subscriber's body, never
     stamped into the surrounding logical execution context. This
     closes the McpUpdate user1/user2 cross-contamination bug that
     drove the 2026-05-22 revert (the previous impl called
     access.SetContext(captured) without restoring, so the captured
     value leaked into the caller's logical execution context
     indefinitely).

Both IServiceProvider and AccessService overloads now use the same
per-callback RestoringObserver implementation; the AccessService
overload short-circuits the DI lookup when the caller already holds
a reference.

Tests:
- test/MeshWeaver.Messaging.Hub.Test/AccessContextSurvivesSubscribeTest.cs
  Rewrites the old "PassThrough_Does_Not_Restore" test into
  "Captured_Context_Restored_Per_Wrap_Even_After_AmbientCleared" —
  asserts the new per-callback restore AND the no-leak contract
  (after all callbacks return, the test thread's AsyncLocal must
  be back to what it was before any emission).

- test/MeshWeaver.Security.Test/MeshNodeCacheIdentityTest.cs
  Adds two new canaries for the cross-cutting boundary:
    * CacheUpdate_Concat_PreservesCallerIdentity — the per-path
      Concat queue added in 1787345 was the most acute gap; the
      Subject → Concat → Subscribe chain runs the inner observable
      on a ThreadPool thread, so without the wrap the OnNext
      callback observes null/sync identity, never the caller.
    * CacheUpdate_AfterCallerScopeDisposed_StillCarriesCapturedIdentity —
      pins the capture-by-value semantic (Subscribing after the
      caller's using-scope is disposed must still observe the
      captured identity, NOT whatever ambient ended up on
      AsyncLocal post-dispose).

Verification:
- All 6 AccessContextSurvivesSubscribeTest tests green (5 unchanged,
  1 renamed + rewritten).
- All 227 Security.Test green locally (incl. the 2 new cache canaries).
- AI test suite 445/445 green at 8m14s — previously failing CI
  canaries (MeshPluginTest.FullCrudWorkflow, ThreadStreamingIdentityTest.SubmitMessage_*,
  LinkedInTelemetryImport, SubThreadHangRepro x2, LayoutAreaIdentityTest.AuthorizedUser_*)
  all pass under this wrap.

Audit deliverables (referenced by C:\Users\RolandBuergi\.claude\plans\swift-tinkering-melody.md):
  C:/tmp/claude/identity-audit/identity-boundary-audit.md
  C:/tmp/claude/identity-audit/asynccalls-vs-impl.md
  C:/tmp/claude/identity-audit/identity-test-coverage.md

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
… Exec/Compile watcher identity

The recurring silent-overwrite bug behind AppendUserInput / CheckInbox /
ThreadStreamingIdentity flakes traces to the same shape:

  workspace.GetMeshNodeStream().Update(node =>
  {
      var t = node.Content as MeshThread ?? new MeshThread();  // ← silent overwrite
      return node with { Content = t with { ... } };
  });

When `Content` arrives as a raw `JsonElement` (file-system / Postgres /
Cosmos all round-trip through JSON serialisation; only InMemory keeps the
typed instance), the `as MeshThread` cast returns null and the
`?? new MeshThread()` fallback overwrites every other field on the node
with defaults (Status=Idle, pending={}, etc.). The next stream.Update then
persists that default-valued thread — silent data corruption.

Fix: every emission and Update lambda passing through
`MeshNodeStreamHandle` is now round-tripped through the workspace's
`JsonSerializerOptions` at the boundary. Two pieces:

  * Subscribe path: a `TypedContentObserver` between the underlying sync
    stream and the subscriber deserialises any `JsonElement` Content to
    its registered domain type via the workspace's polymorphic
    `$type` discriminator. No-op for already-typed Content.

  * Update path: the caller's lambda is wrapped so the input is typed
    (deserialised if needed) before `update(node)` is called. The post-
    update emission also goes through the typed converter so callers
    chaining `.Select(node => node.Content as MyType)` get the same
    typed shape as Subscribe. (No outbound serialisation: the downstream
    cold pipeline runs `SerializeToNode` itself for cross-hub patches,
    and OWN-path equality dedup in the data source breaks when we force
    a serialise-deserialise round-trip on every write.)

Eliminates the `?? new TFoo()` antipattern across every callsite: when
Content is genuinely absent or wrong-shaped the cast fails cleanly and
the lambda can return `node` unchanged, no silent overwrite.

Two helpers exposed for reuse by other primitives needing the same shape
guarantee: `MeshNodeStreamHandle.EnsureTypedContent(node, options)` and
`MeshNodeStreamHandle.EnsureSerialisedContent(node, options)`.

Watchers — applying the AccessContext propagation rule:

  * `ThreadExecution.InstallExecRoundWatcher` — DispatchAfterClaim
    creates satellite cells and posts cross-hub messages, all of which
    must be attributed to the thread owner (not the cache hub's emission
    identity). Wraps in `using AccessContextScope.FromNode(node, ...)`
    so every downstream write rides under thread.CreatedBy. The access
    check that gates the dispatch already happened (user without thread
    access can't flip Status to StartingExecution).

  * `NodeTypeCompilationHelpers.InstallCompileWatcher` — compile runs
    under SYSTEM identity, by design. Wraps in
    `using AccessContextScope.AsSystem(accessService)` so the
    DispatchCompileTrigger post lands at the handler with
    delivery.AccessContext = system-security; every internal write
    inside the activity (read source files across all users, write the
    activity log, emit the assembly) then bypasses RLS. The access
    check is upstream — the user has to be permitted to flip
    RequestedReleaseAt on the NodeType's MeshNode.

  * `ThreadSubmission.InstallServerWatcher` — claim flip is an OWN
    update, no cross-hub, no RLS gate inside the action block.
    No scope needed; comment added to clarify the rule.

New helper: `MeshWeaver.Mesh.Security.AccessContextScope` (Mesh.Contract)
with `FromNode(node, access)` and `AsSystem(access)` factories — the
two operation classes the codebase needs.

Docs updated:
  * CqrsAndContentAccess.md — new section "Content is always typed at
    the GetMeshNodeStream boundary" with the bad/good comparison.
  * AsynchronousCalls.md — same rule cross-referenced from the cold-
    write contract section.

Verification:
  * AI suite: 444/445 (was 9 failures pre-fix). Remaining 1 is
    CheckInbox_MultiplePending — a pre-existing rapid-OWN-update race
    where 3 concurrent AppendUserInput calls collide on the data
    source's action block. Not addressed in this commit (separate
    concurrent-write design).
  * Identity-canary tests still green: CacheUpdate_Concat_PreservesCallerIdentity
    + CacheUpdate_AfterCallerScopeDisposed_StillCarriesCapturedIdentity
    + the 6 AccessContextSurvivesSubscribeTest tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…n Update lambda

Eliminates the silent-failure class behind the CI delegation/CheckInbox/CRUD
flakes (run 26584304225, 6 tests). Two shape changes:

1) Typed wire-error contract end-to-end. New `MeshNodeError(Code, Path, Message,
   Diagnostic)` record on `PatchDataResponse.NodeError` (serializable across
   silos — never throw exceptions over the wire). Owner-side
   `HandlePatchDataRequest` + `ApplyJsonMergePatchAndUpdate` catch and classify
   into `MeshNodeErrorCode` (AccessDenied / Deserialization / NotFound /
   Conflict / OwnerUnreachable / Validation / Unknown). Consumer-side
   `MeshNodeStreamHandle.UpdateRemote` now awaits the `PatchDataResponse`
   (previously fire-and-forget with optimistic `OnNext`) and synthesizes
   `MeshNodeStreamException` on failure. `EnsureTypedContent` throws typed
   instead of silently returning the JsonElement — the bad-JSON snippet +
   discriminator is in the diagnostic so the missing TypeRegistry entry is
   findable. Blazor `MeshNodeErrorCardView` renders a typed card per
   `MeshNodeErrorCode` for any subscriber to opt into.

2) Cure for the AccessContext leak in `MeshNodeStreamHandle.Update`. The user's
   `update` lambda runs on a different thread than the caller (data source's
   action block for OWN, workspace emission scheduler for REMOTE) — AsyncLocal
   doesn't flow, so the lambda saw a null `Context` and downstream
   framework calls inside the lambda lost user attribution. Fix: eagerly
   capture `Context ?? CircuitContext` at Update invocation and re-stamp
   AsyncLocal inside the wrapped lambda. The existing `CarryAccessContext`
   wrap (which covers the returned-observable emissions) was insufficient —
   it didn't reach the lambda body.

Verified by the new `AccessContext_PreservedAcrossSubscribeAndUpdateHops`
canary test — pinpoints the failing hop on regression instead of just
"context was wrong somewhere". Pre-fix the canary reported
`hop3_update_lambda: expected 'AccessContextCanary', got '<null>'`; post-fix
all hops carry the sentinel identity.

Verified individually green (all were failing in CI 26584304225):
  - Delegation_ParentToolCalls_ContainsExactlyOneEntryPerDelegationPath
  - HungSubThread_WithoutUserCancel_StaysExecuting
  - HungSubThread_UserCancelOnParent_PropagatesAndStopsSubThread
  - FullCrudWorkflow_CreateGetUpdateDelete
  - CheckInbox_OnePending_ReturnsItAndDrainsTheQueue
  - LinkedInTelemetryImport_CompilesAndRendersImportArea

The two SubThreadHangRepro tests still flake when run as a sibling pair
(passes alone in 6-10s, both fail at 30s `WaitForDelegationPath` when run
together) — pre-existing test-state-sharing concern, separate from the
AccessContext leak.

`UnregisteredDiscriminator_SurfacesDeserializationException_OnSubscribe`
skipped: end-to-end scaffolding through file-system persistence normalizes
the JsonElement before `EnsureTypedContent` sees the failure path. The
contract is implemented; needs InternalsVisibleTo for a direct unit test.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…ty capture

CacheUpdate_{Concat,AfterCallerScopeDisposed} were passing pre-2026-05-28
because the framework silently swallowed access denials and the optimistic
OnNext path captured the AsyncLocal. Post the wire-error contract
(`e5d703121`), the denial fires as `MeshNodeStreamException(AccessDenied)`
on OnError — same identity propagation, different callback.

Both tests already prove the contract: the owner-side denial names
`[email protected]` / `[email protected]`, meaning the caller's identity
DID propagate to the owner's permission check. Extract the principal from
either OnNext (granted, via AsyncLocal) or OnError (denied, via the
`MeshNodeError.Message` "user 'X' lacks Update permission" shape) — both
outcomes confirm capture-by-value semantics.

Verified locally: both pass in 7s / 0.9s.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…k leak

5 `Observable.Create` sites in MeshOperations posted a request and subscribed
to `hub.Observe(delivery)` but never captured the inner Subscribe's
IDisposable. When the outer observable disposed (Timeout fired, CTS
cancelled, downstream Subscribe gone), only the CancellationTokenSource was
cleaned up — the hub-level callback entry stayed in `responseSubjects` until
the framework's `RequestTimeout` (~30s) expired.

Symptom: the test base's Quiescing-budget leak detection (~2s) flagged
the orphaned callback at DisposeAsync — exact CI failure shape for
`FullCrudWorkflow_CreateGetUpdateDelete`:
`Hub mesh/…: 1 pending callback(s) after 2.00s:
 …=GetDataRequest@ACME/CrudTest_…(17001ms)`.
Same shape across recent CI runs (26584304225 → 26619423330).

Fix: capture `innerSubscription = hub.Observe(delivery).Subscribe(...)`
and dispose it in the returned cleanup lambda. Matches the established
pattern in `MeshNodeStreamExtensions.cs:GetMeshNode` (line 729).

Applied to all 5 sites:
  - `FetchNode` (GetDataRequest — the test's smoking gun)
  - `UpdateViaDataChange` (DataChangeRequest)
  - `ResolveSinglePathRequest` (GetDataRequest, unified content path)
  - `PatchViaDataRequest` (PatchDataRequest)
  - `Move` (MoveNodeRequest)

Verified locally: FullCrudWorkflow_CreateGetUpdateDelete passes in 25s
with no Pending-callback warnings.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
ToolsReference.md is inlined into every agent's system prompt via @@Agent/ToolsReference, so this reaches all agents. Adds a top-level Icons section stating the inline-SVG rule applies to ALL node types (not just Markdown), mandates currentColor for light/dark legibility plus width/height/viewBox, and points the Create schema table at it.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…guards

- DefaultPartitionProvider: global System->Admin AccessAssignment so system-security writes (e.g. _UserActivity tracking under ImpersonateAsSystem) are permitted on every partition

- DocumentationNodeProvider: also grant doc read access to Anonymous visitors; fix the Public grant's _Access namespace + MainNode shape

- LayoutAreaView: null-guard ViewModel during transient parameter binding races (navigation / stream teardown)

- StorageAdapterMeshQueryProvider: guard backlog drain against ObjectDisposedException when the subscription is torn down mid-schedule

- CLAUDE.md: correct dev portal URLs (Aspire 7202/5202, Monolith 7122/5022)

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…read flake

`ApplyAgents` wiped the `agents` dict to empty BEFORE `CreateAgentsSync`
rebuilt it one entry at a time via `agents = agents.SetItem(...)`. Any
concurrent `SelectAgent` call landing inside the rebuild window saw a
PARTIAL dict — biased toward agents added first (Researcher, Versioning,
DescriptionWriter, …) because `OrderAgentsForCreation` puts the default
LAST. SelectAgent's fallback `agents.Values.FirstOrDefault()` then
returned a non-default agent.

In `SubThreadHangRepro`, that non-default agent maps to
`HangingSubAgentChatClient` (which `Task.Delay(Timeout.InfiniteTimeSpan)`),
not `DelegatingParentChatClient` (which yields the `delegate_to_agent`
FCC). Result: parent never delegated, `WaitForDelegationPath` timed out
at 30s — every second [Fact] in the class failed deterministically.

Two-part fix:

1. `CreateAgentsSync` builds the new dict LOCALLY (`createdAgents`) and
   ATOMIC-SWAPS into `agents` at the end. No more per-iteration writes
   to the shared field; readers see EITHER the previous full dict OR the
   new full dict, never a half-built one. Same pattern in the obsolete
   `CreateAgentsAsync` left untouched (dead code).

2. Removed the pre-wipe `agents = Empty` in `ApplyAgents`. With the
   atomic-swap, the old dict can stay live during the rebuild window —
   concurrent SelectAgent gets the previous batch's agents (still valid
   in nearly all cases — agent set rarely shrinks across re-emissions)
   instead of an empty intermediate. Without this, the test surfaced as
   "No suitable agent found to handle the request." in the response
   cell.

3. `SelectAgent` now prefers the configuration-marked default agent
   (`IsDefault=true`) over the `loadedAgents[0]` relevance-ordered
   fallback. Defense in depth — even if a race exposes a partial state,
   the default is preferred over whichever non-default happens to be at
   the head of the ordering.

Verified: full `MeshWeaver.Threading.Test` suite — 114 passed, 0 failed
(53s). Both formerly-failing SubThreadHangRepro Facts pass solo AND
in-suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
`UpdateRemote` was blocking OnCompleted waiting for the owner's
`PatchDataResponse` so structured errors (AccessDenied, Validation,
Deserialization) could propagate on the Rx OnError stream. Worked in
Monolith (~10ms response round-trip). Broke Orleans:

  - Cross-grain routing + cold-start grain activation routinely exceed
    the 30s response timeout. Subscriber sees TimeoutException → OnError
    → caller's `.Subscribe(_, ex => log)` logs warning → write
    appears to fail even though the owner committed the patch.

  - Any caller bridging `await stream.Update().FirstAsync()` on a hub
    action block deadlocks — the response delivery needs the same
    action block to dispatch.

Concrete symptom: 13 Orleans tests in CI 26630118759 failed with
"Expected Messages count = 2, got 0". OrleansChatTest's SubmitMessage
posted the user message, the AppendUserInput's `stream.Update(...)`
chain timed out at 30s on the response wait, AppendUserInput logged a
warning and gave up. PendingUserMessages stayed empty; submission
watcher never triggered; agent never executed; test asserted 0 messages.

Revert: emit OnNext optimistically with the locally-computed `updated`
snapshot, then fire-and-forget the response check. Owner-side failures
land in the `MeshWeaver.Mesh.MeshNodeStreamHandle` diagnostic log
channel — observable to operators but not on the Rx pipeline.

Trade-off (documented in code): structured errors no longer propagate
on Rx OnError end-to-end. The patch is RFC 7396 deterministic against
owner state, so the optimistic snapshot matches what the owner commits
on success. For strict consistency callers re-read via
`GetMeshNodeStream(path).Take(1)` — that does go to the owner.

The inner Subscribe IS captured in `composite` so disposal still tears
down the hub-level callback (no leaked Observe per Update).

Verified locally: OrleansChatTest.CreateThread_AndSubmitMessage_ProducesThreadMessages
passes in 35s (was failing in 48s with the wait-for-response).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
After reverting wait-for-PatchDataResponse to fix Orleans, the canary
test regressed at hop4_update_onnext — expected the caller's
AccessContext but saw 'system-security'. Root cause: the optimistic
OnNext fires inside `initialSub.Subscribe`'s callback, which runs on
the remote-stream emission thread — opened under `ImpersonateAsSystem`
(MeshNodeStreamExtensions.cs:109-114) for infrastructure routing. So
AsyncLocal Context = system-security at that point. CarryAccessContext
(wrapping the outer chain) doesn't compensate because it captures only
`Context`, not `CircuitContext` — pure CircuitContext callers (Blazor
circuits, tests using SetCircuitContext) see system-security.

Fix: wrap the OnNext + OnCompleted in a `SwitchAccessContext` scope
keyed to the eagerly-captured `capturedContextAtEntry` (which already
does the `Context ?? CircuitContext` fallback used elsewhere). Now the
caller's Subscribe(_ => …) callback runs under their identity, not the
infrastructure system identity.

Verified locally:
  - AccessContext_PreservedAcrossSubscribeAndUpdateHops canary: PASS
  - DelegationWriteCountTest.Delegation_ParentToolCalls_...: PASS

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
CI consistently tripped the 60s `[Fact(Timeout)]` even though the test
passes locally in ~13s. The cost is Roslyn cold-start on Linux runners
— two sequential C# compiles (LinkedInProfile + LinkedInTelemetryImport)
routinely take 40-60s on shared runners, leaving zero headroom for the
post-compile render. The per-test-class `.mesh-cache` directory is
unique-per-process (`MeshWeaverLinkedInTelemetryTests/.mesh-cache`
under temp), so every CI run pays the full first-compile cost.

Wall bumped to 120s. The inner `ct = new CancellationTokenSource(60s)`
keeps the application-level budget at 60s for the in-test waits —
only the outer xUnit wall is relaxed to absorb cold-start variance.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
NodeTypeCompileActivityHandler's source-fetch chain hung on slow per-node
hubs. CombineLatest waits for EVERY input to emit at least once; the
per-source `GetMeshNodeStream(p).Where(n!=null).Take(1).Timeout(5s)
.Catch(_ => Observable.Empty)` returned Empty on timeout — that input
completed WITHOUT emitting, CombineLatest never fired, `.Take(1)`
completed silently, the outer SelectMany never fired, and the compile
activity hung forever. Pattern observed:

  - LinkedInTelemetryImportTest local trace: 28s gap between
    "[NTCA] starting Roslyn" and "Compiling assembly" — sources
    eventually emitted (the 5s Timeout fired then ANOTHER read won
    on the second attempt) but burned the activity's wall clock,
    tipping the 60s test ceiling in CI.

  - Prod `rbuergi/CatBond` cascade: per-node hub slow → source
    streams time out → never-firing compile → 30s+ stale
    SubscribeRequest callbacks on the cache hub → `[UpdateRemote]
    ERROR ... TimeoutException` on every retry, looping forever.

Fix: each per-source stream emits exactly ONE value — a real
MeshNode OR a `null!` sentinel on Timeout/Catch — so CombineLatest
ALWAYS gets a value per input and fires. Filter nulls after the
Combine. Adds a defensive outer `Timeout(10s, …)` on the Combine
itself in case Subscribe never returns.

LinkedIn test timeout reverted 120s → 60s (the bump masked the bug
this commit cures).

Co-authored changes in the same fix family (already in working tree):
  - NodeTypeEnrichmentHelpers.SlowPathTimeout 30s → 60s (with a
    comment that correctness comes from activity FINISHING +
    DISPOSING, not from a longer wait).
  - New CompileFinishAndDisposeTest pinning the compile activity
    must reach terminal Failed/Succeeded + dispose — never wedge.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…e errors

A NodeType persisted as CompilationStatus=Compiling came up stranded on hub
init: the first-build kickoff needs null, the compile watcher needs Pending,
and the release-request watcher only fires on a settled status — so nothing
re-drove it and it sat in Compiling forever, leaving every instance hub on the
default config (no MeshNodeReference reducer) and rendering nothing
(rbuergi/CatBond/AtlanticBond 'I get nothing').

InstallCompileWatcher now adds a recovery kickoff: on the first emission at init,
if status is Compiling, it probes the recorded compile activity and — when that
activity is missing/terminal/stale (not actually running) — flips Compiling->
Pending so the watcher dispatches a fresh compile. A genuinely live compile is
left alone.

CompileProgressIndicator now surfaces the terminal Error state (CompilationError)
and stops swallowing stream faults, so a stuck/blank layout area tells the user
why instead of showing an indefinite spinner.

Tests: NodeTypeHub_StrandedInCompiling_RecompilesOnInit (new) +
NodeTypeHub_StaysResponsive_WhileFirstBuildCompileInFlight both pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…t crash the server

Rendering a self-similar/cyclic control tree recursed synchronously through
LayoutAreaHost.RenderArea -> control.Render -> RenderArea until the stack
overflowed, which in .NET is an uncatchable fail-fast (exit 0xC0000409) that
took down the whole portal. This was the rbuergi/CatBond crash: opening it
recursed while rendering area=Overview until the process died.

RenderingContext now carries a Depth (incremented per nested area in
GetContextForArea); RenderArea bails at MaxRenderDepth (100) and emits a visible
'Layout recursion detected' MarkdownControl instead of recursing into the crash.
100 is far above any legitimate layout and far below stack-overflow frame
counts, so it never trips a valid tree.

Test: DeeplyNestedLayout_DoesNotCrashServer_SurfacesRecursionError (new); full
LayoutTest class 19/19 pass (no render regression).

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…ous principal

For an anonymous session the hub is portal/anonymous, a hub-shaped principal;
RestoreUserContextOnEmission's leak-guard rejects hub-shaped principals, logging
an Error on every anonymous request. Provisioning a guest VUser node is an
infrastructure write, so it runs under ImpersonateAsSystem (system-security: a
real principal with Permission.All) instead of ImpersonateAsHub(hub).

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…inux inotify races

Two races caused the bulk CI flakes (both invisible on Windows, which has a
natively-recursive FileSystemWatcher):

1. Linux inotify subdir-registration race: FileSystemWatcher adds the per-
subdirectory watch REACTIVELY when it sees the dir-created event, so a file
written into a brand-new subdir immediately after Directory.CreateDirectory is
missed if the watch hasn't landed yet — the 5s/15s WaitForNotification timeouts
(ExternalFileCreation_ObserveQueryReceivesUpdate, Watcher_AfterStop_DoesNotNotify).
New EnsureSubdirWatchedAsync helper proves the subdir watch is live (observed
probe) before the asserted write.

2. Thread-unsafe accumulator: receivedNotifications was a List<T> written from
the watcher's debounce-timer + Read().Subscribe callback threads while the Rx
Interval poll thread enumerated it. maxParallelThreads:1 serializes TESTS, not
the watcher's own callback threads. Swapped to ConcurrentQueue.

Also hardened Watcher_AfterStop to assert node2 is not observed (robust against a
trailing debounced event) instead of an exact total count. Verified 10/10 in bulk.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
… overlay

After a framework rebuild/redeploy the FrameworkVersion hash changes, so a
dynamic NodeType's compiled assembly (Status=Ok, LatestAssembly* populated)
fails HasUsableBuild on the version mismatch alone. EnrichWithNodeType then
skipped straight to a bare "Compilation failed" overlay with an EMPTY code
block, since the compile never actually failed (no captured diagnostic). Every
dynamic NodeType showed this after every deploy until manually recompiled.

- Route the framework-stale case (Status=Ok + assembly present, version differs)
  through the existing TriggerRecompileAndRetry self-heal: Pending flip ->
  watcher rebuilds under system identity, bounded by MaxRecompileAttempts. Same
  mechanism as the "assembly bytes missing from store" path. Dynamic NodeTypes
  now auto-recover on first instance activation after a deploy.
- BuildCompilationErrorMarkdown no longer emits an empty text code fence for
  single-line messages; the framework-stale-after-cap overlay shows an accurate
  "Built against a previous framework version - Recompile" prompt with its own
  guidance instead of the misleading "fix the source code" text.
- Add Orleans regression test FrameworkStaleAssembly_SelfHealsOnInstanceActivation
  (green with fix in 33s, red without it in 1m18s - proven guard).

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…ck_inbox tests

Partial stabilization of InboxToolIntegrationTest. The deterministic drain-race
was server-side: RecoverStaleExecutingThread saw the test's artificial Executing
thread with no active round, reset Status->Idle, and the submission watcher then
drained PendingUserMessages before check_inbox read (tool returned '(no new
messages)'). No test-side stream.Where wait can prevent a post-wait server-side
drain.

Fix: SeedPendingMidExecutionAsync writes a GENUINE mid-execution state in ONE
atomic own-stream Update (Status=Executing + ActiveMessageId + PendingUserMessage
+ all queued PendingUserMessages), so recovery skips it and the watcher never sees
Idle+pending. Waits gate on the thread hub's OWN stream (the exact stream the tool
reads), not the mesh-remote view. Isolated CheckInbox_* now reliably green.

Known residual: the full class still flakes under cumulative process load from the
heavy Cancel_* real-execution tests; the durable fix is making the round-start
transition fully atomic in DispatchRound (follow-up).

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…on output split

Thread execution: drop transient Completing, add terminal Cancelled, replace
RequestedCancellationAt with a RequestedStatus enum control field. Replace the
late-GetMeshNode RecoverStaleExecutingThread (root cause of the check_inbox
phantom-drain flake) with InitializeThreadLifecycle: read the own-node stream's
first emission and drive any non-terminal state to valid once -- honor pending
cancel, resume the same response cell on interrupted Executing, leave
Idle/Cancelled+pending to the submission watcher. DispatchRound resume mode;
cancel watcher + no-CTS fallback.

check_inbox A7: clean mid-execution output-cell transition -- freeze the current
response cell, place interrupting user cells in the middle, switch streaming to a
fresh cell via a per-round ActiveResponseSegment (baseline-offset slice keeps
stale timer pushes harmless).

Activities: InitializeActivityLifecycle wake-up (kernel scripts -> Failed on
interrupt, honor pending cancel); simplify NodeType compile recovery to
re-request from the owner's own Compiling state, dropping the racy cross-hub
activity probe + 120s stale heuristic.

Docs: ActivityControlPlane.md, ThreadOperations.md, DebuggingMessageFlow.md.
Tests: migrate RequestedCancellationAt/Completing callsites; add the dedicated
A7 split test and an Orleans no-probe compile-recovery test.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…lake

WaitForThreadAsync used a Task.Delay(100) poll loop that re-read a potentially
stale cached snapshot each cycle and raced workspace write propagation, so under
cumulative test load a transition landing between two polls blew the budget
(2/25 flaked in combined runs; green in isolation). Replace it with the
stream-based wait InboxToolIntegrationTest already uses
(GetMeshNodeStream(path).Where(predicate).Take(1).Timeout) — emits on every
commit, never stale. Also convert the "exactly once" negative test's
Task.Delay(500) into a stream.Where + Timeout watch for the bad (second-round)
event. 3x36-test combined runs now green.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
The kernel/script hub IS the executor — it activates in order to RUN the script,
so its own ActivityLog is legitimately Running the instant it comes up. Wiring
InitializeActivityLifecycle there made its first-emission "Running => Failed
(interrupted)" recovery fire on every freshly-started script, killing it — broke
5 CI tests (ScriptExecutionInUserHomeTest.*, ActivityLogStreamTest
.Script_Failure_Flips_*, ExportDocumentScriptRelayTest).

Remove the kernel wiring and delete the InitializeActivityLifecycle helper (no
correct caller: that wake-up shape is only valid when the owner hub is DISTINCT
from the executor). NodeType compile recovery already does the right thing by
re-requesting from its OWN Compiling state (owner != activity hub). Docs updated
to spell out the invariant.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Follow-up to 27c0d9c. That commit added the framework-stale self-heal, but an
instance activating against an ABI-stale dynamic NodeType still rendered the
overlay ("Built against a previous framework version") instead of the healed
compiled config — the rbuergi/CatBond/AtlanticBond "not building type" symptom.

Root cause: TriggerRecompileAndRetry flips Ok->Pending via an async cross-hub
JSON-merge patch that has NOT round-tripped when the recursion subscribes. The
wait re-snapped the SAME stale Ok node (status Ok, old framework), recursed on it
before the recompile started, hit MaxRecompileAttempts ~5ms after the flip, and
froze the instance on the overlay (BuildSlowPath is Take(1) — one config per hub
lifetime). The NodeType itself healed correctly (trace: Ok->Pending->Compiling->Ok
with the live framework), which masked the bug at the NodeType level.

- TriggerRecompileAndRetry gains requireUsableBuild: the framework-stale heal now
  waits until the rebuild is GENUINELY usable (HasUsableBuild — framework version
  matches), which the stale Ok can never satisfy. The deciding fix.
- Defensive Pending/Compiling guards in BuildSlowPath and TriggerRecompileAndRetry
  so neither snaps an in-flight compile.

Add FrameworkStaleInstanceRenderTest (Monolith): renders a dynamic instance's
Overview and asserts the compiled HtmlControl marker, not the overlay. Red
without requireUsableBuild, green with it (35s). Regressions clean: Orleans
compile suite 7/7, CodeEditRecompileTest 5/5.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
… test assertions, query in-memory sync

Menu providers migrated from IAsyncEnumerable first-snapshot-wins to
IObservable<IReadOnlyCollection<>> — fixes the access race where a runtime
AccessAssignment (e.g. granting Editor) that propagated after first render
never reached the menu (the Menu_Editor_ShowsCreateItems flake). The predicate
renderer now subscribes and re-emits each MenuControl via host.UpdateArea when
permissions enrich.

- Menu: NodeMenuItemDefinition contract -> IObservable<IReadOnlyCollection<>>;
  reactive aggregator + renderer; all providers converted (Default Node/Mesh,
  Approval, AI thread side-panel/delegations/changes, MarkdownExport, LinkedIn).
  MenuAccessControlTest 8/8.
- New MeshWeaver.Reactive.Assertions: standalone, packable, self-contained
  (System.Reactive only) fluent await-free assertions on IObservable<T>. Tests
  become reactive role models: Query(...).Should().Match(...), no await.
- Query/autocomplete reactive migration: StaticNodeQueryProvider.Autocomplete
  now synchronous (in-memory, no fake async); MeshNodeAutocompleteProvider and
  BlazorAutocompleteService de-bridged onto the reactive surface. Autocomplete
  138/138.
- 3 flaky tests fixed via reactive waits: Menu_Editor_ShowsCreateItems,
  Delete_DeeplyNested (authoritative reads), ObserveQuery_DisposalStopsNotifications
  (own-subscription baseline). ObserveQuery_EmitsInitialResults migrated to a
  no-await void test as the role-model example.
- Docs: reactive menu pattern + async-boundary-at-the-IO-edge principle
  (AggregatingProviders, NodeMenu, CqrsAndContentAccess, AsynchronousCalls);
  new ReactiveTestAssertions.md referenced from Coder.md.

Follow-up: migrate the remaining src autocomplete edge-consumers and the ~401
test-site QueryAsync/AutocompleteAsync calls to the reactive assertions, then
delete the async query methods.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants