Fix workflows_pkey duplicate INSERT on collab reconnect, and remove its root cause (#4830)#4841
Open
stuartc wants to merge 4 commits into
Open
Fix workflows_pkey duplicate INSERT on collab reconnect, and remove its root cause (#4830)#4841stuartc wants to merge 4 commits into
stuartc wants to merge 4 commits into
Conversation
A stale "new" rejoin after an in-place socket reconnect could hand a collaborative session a freshly-built workflow struct for an id that the first save had already persisted, routing the second save to INSERT and colliding on workflows_pkey. Reconcile by id in fetch_workflow/1: a built-state struct whose id already has a row is reloaded to its :loaded form (routes to UPDATE), while a genuine first save (no row) still INSERTs. Confined to fetch_workflow/1; the channel-side load_workflow/5 is intentionally left untouched. The #4829 ConstraintError rescue is kept as a backstop.
Phase 2 (client) complement to the server-side reconcile-by-id fix. The channel-join action was read once from data-is-new-workflow and frozen in the provider's join params, so an in-place socket reconnect of an already-saved "new" workflow rejoined with stale action: "new". Make join params lazy and reactive to the SessionContext store's isNewWorkflow flag (flipped by clearIsNewWorkflow on save) via a ref bridged up from StoreProvider, since SessionProvider sits above the store. After the first successful save, a reconnect now rejoins as action: "edit". Narrows the trigger surface; the server fix remains the guarantee.
Covers the full StoreProvider -> ref -> getJoinParams bridge by rendering the real SessionProvider/StoreProvider tree (only the y-phoenix-channel transport is faked, recording the params it is constructed with). Asserts an initial connect joins action: "new" and an in-place reconnect after clearIsNewWorkflow() rejoins action: "edit" — fails against the pre-fix frozen-prop wiring. Exports the previously module-private SocketContext as a test seam so the provider tree can be supplied a connected socket without a real socket.
Workflow resolution was implemented twice — once in the channel join
(load_workflow/5) and once in the session save/reset path
(fetch_workflow/1). The two could disagree on whether a given id mapped
to a :built or :loaded struct, which is the structural root cause behind
the workflows_pkey duplicate INSERT on collab reconnect.
Introduce Lightning.Collaboration.WorkflowResolver as the single
authority: given a workflow id + action it returns the canonical
%Workflow{} in the correct Ecto state, plus an explicit kind
(:new | :existing | :version) so callers never re-derive newness from
struct shape. Resolving an existing id under a :new action reconciles to
the loaded row (kind :existing), so the save routes to UPDATE rather
than a duplicate INSERT. Both callers now delegate here; fetch_workflow/1
and the bespoke NotLoaded surgery are deleted.
The two channel sites that previously inferred newness from
__meta__.state and the snapshot_version assign now read the resolver's
kind via socket.assigns.workflow_kind, removing the inference entirely.
Per-action auth ordering is preserved (new = auth-first; edit/version =
resolve-first). A "new" rejoin to a foreign-project id now returns the
wrong-project error rather than silently building a fresh struct — a
narrowing of the auth boundary confirmed by security review. The #4829
Ecto.ConstraintError rescue is left untouched as the backstop.
Security Review ✅
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4841 +/- ##
=======================================
+ Coverage 90.3% 90.4% +0.1%
=======================================
Files 444 445 +1
Lines 22607 22639 +32
=======================================
+ Hits 20419 20468 +49
+ Misses 2188 2171 -17 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This fixes the
workflows_pkeyduplicate-key crash you'd sometimes hit when reconnecting to the collaborative editor after a save — and then goes after the thing that was actually causing it.The root cause was that we were resolving "which workflow am I editing, and is it new or existing?" in two completely separate places: once when the channel joins (
load_workflow/5) and again on the session save/reset path (fetch_workflow/1). Those two could quietly disagree about whether an id should map to a:builtstruct (→ INSERT) or a:loadedone (→ UPDATE). So a reconnect that still thought it was "new" would try to INSERT a row that already existed → boom.So there are two parts here:
Lightning.Collaboration.WorkflowResolver. Both callers now go through it, so they physically can't disagree any more. It also hands back an explicitkind(:new | :existing | :version) so the channel stops trying to guess "is this new?" from the shape of the struct.fetch_workflow/1and the oldNotLoadedpoking are gone.The nice property that closes the bug: a
"new"join for an id that already has a row resolves to the existing:loadedrow (kind: :existing), so the save routes to UPDATE. No more duplicate INSERT.Closes #4830
Validation steps
workflows_pkeyduplicate-key error — now it should just save cleanly.mix test test/lightning/collaboration/ test/lightning_web/channels/workflow_channel_test.exs— should be green (273 tests here).Additional notes for the reviewer
A few things worth your eyes:
"new"action on an id that already has a row comes back as:existing. There's a resolver test pinning this specifically, since getting it wrong would bring the duplicate-INSERT straight back."new"still authorises first,"edit"/version still resolve first. Authorisation stays in the channel; the resolver only does the project-ownership check, and only when it's handed a:project."new"rejoin for an id that belongs to another project now returns a wrong-project error instead of silently building a fresh struct. Felt like the safer behaviour.#4829Ecto.ConstraintErrorrescue inworkflows.exas a backstop — didn't want to remove the belt while adding the braces.workflow_deletedagain, same as before), with a test.AI Usage
You can read more details in our Responsible AI Policy
Pre-submission checklist
/reviewwith Claude Code):owner,:admin,:editor,:viewer)