Summary
Annotation runs (VEP, gnomAD, ClinVar) update live annotation links via two distinct transitions: value-replaces-value (a source returns a changed value, superseding the prior link) and value-replaced-by-nothing (a source returns no result for an allele that previously had a value). The first is handled today — the linker supersedes only on an actual value change. The second has no path at all: the prior live link is retained, an ABSENT/NO_RECORD event is appended, and a warning is logged. Because "current" state is read from the live link rather than derived from the event log, a genuine upstream retraction continues to surface the stale (but versioned) value with no signal it is gone.
This issue tracks treating both transitions as one annotation link lifecycle: adding the missing guarded retire-on-absence path, and bringing the existing supersession path under the same model and run-health gating so the two are consistent.
Problem
- Retire-on-absence (missing): When a run produces no result for an allele that currently has a live link, the linker keeps the prior link (e.g. VEP's "do not overwrite a held consequence with a null result" branch records
UNCHANGED and continues). The recorded ABSENT event is effectively audit-only, so retracted values continue to be surfaced as current, pinned to an older source_version.
- Supersession (works, but ad hoc): Value-replaces-value is handled by the linker writing a new link only when the value changes. This appears correct today, but it is not gated on run health and is not expressed as part of a shared lifecycle, so the two transitions are reasoned about and implemented separately.
- The only formalized retirement path is
supersede_with with __retire_cascade__ closing dependent allele links, which fires when a new mapping supersedes an old mapping — a different layer from the per-source annotation link transitions above.
Proposed behavior
Model annotation link transitions as a single lifecycle with a shared run-health gate:
- Supersession (value → new value): Keep current behavior of superseding only on an actual value change. Apply the same run-health gate so a degraded or partial run cannot supersede a good value with a wrong or incomplete one. Verify the prior link's
valid_to closes with no gap before the new link's valid_from.
- Retirement (value → nothing): When a run produces no result for an allele with a live link, retire (close
valid_to on) that live link so current state reflects the absence.
- Run-health gate (applies to both): Mutating transitions occur only when the run was healthy and complete. A degraded, partial, or errored run must neither retire nor supersede; it leaves prior live links untouched (current retain-on-absence behavior preserved for that case).
- Continue appending
ABSENT/NO_RECORD and change events so every transition (value → new value → absence → re-link) is auditable.
- Self-healing is preserved: a later healthy run that finds a value for a retired allele creates a fresh live link via the existing ValidTime flow.
Open product decision (resolve as part of this work)
Decide how a retired link surfaces to consumers, applied consistently across VEP, gnomAD, and ClinVar:
- Option A: Show nothing (no current annotation).
- Option B: Fall back to the most-recent non-absent versioned value (e.g. "last known: Ensembl 110").
Acceptance criteria
- A healthy, complete run that returns no result for an allele with a live link retires that link and records the
ABSENT/NO_RECORD event.
- A healthy, complete run that returns a changed value supersedes the prior link with no gap between
valid_to and the new valid_from, and records a change event.
- A degraded or partial run that returns no result, or a changed value, does not retire or supersede; prior live links are retained.
- After retirement, the read path surfaces the chosen product semantics (nothing, or last-known versioned value) consistently for VEP, gnomAD, and ClinVar.
- A later healthy run that finds a value for a previously retired allele produces a new live link, restoring current state.
- The full sequence (value → new value → absence → retirement → re-link) is reconstructable from the event log.
- Tests cover: healthy-run supersession; healthy-run retirement on absence; degraded/partial-run leaves links untouched for both transitions; re-link after retirement restores current state.
Implementation notes
- The
force flag must remain a skip-bypass / freshness knob only. Do not gate either transition on force; a forced run re-queries everything and is exactly when a source outage would cause maximal spurious retirement or supersession.
- Retire-on-absence is a value-replaced-by-nothing path with no current trigger. Add the hook where the per-allele "no result this run" branch presently records
UNCHANGED and continues (the VEP linking logic and its gnomAD/ClinVar equivalents).
- Supersession largely exists already; the work there is consolidation — route it through the shared lifecycle and run-health gate rather than rebuilding it.
- A run-health / completeness signal is the gate for both transitions. If one does not exist at the job level, define it (e.g. run completed without source errors and processed its full allele set) before wiring either mutating path.
- Annotation links use the ValidTime mixin; both transitions should close
valid_to consistent with how supersede_with closes superseded records.
- Longer term, deriving "current" from newest-event-per-subject would let
ABSENT-as-newest suppress stale values non-destructively and reduce reliance on retirement; out of scope here but noted as the eventual direction.
Summary
Annotation runs (VEP, gnomAD, ClinVar) update live annotation links via two distinct transitions: value-replaces-value (a source returns a changed value, superseding the prior link) and value-replaced-by-nothing (a source returns no result for an allele that previously had a value). The first is handled today — the linker supersedes only on an actual value change. The second has no path at all: the prior live link is retained, an
ABSENT/NO_RECORDevent is appended, and a warning is logged. Because "current" state is read from the live link rather than derived from the event log, a genuine upstream retraction continues to surface the stale (but versioned) value with no signal it is gone.This issue tracks treating both transitions as one annotation link lifecycle: adding the missing guarded retire-on-absence path, and bringing the existing supersession path under the same model and run-health gating so the two are consistent.
Problem
UNCHANGEDand continues). The recordedABSENTevent is effectively audit-only, so retracted values continue to be surfaced as current, pinned to an oldersource_version.supersede_withwith__retire_cascade__closing dependent allele links, which fires when a new mapping supersedes an old mapping — a different layer from the per-source annotation link transitions above.Proposed behavior
Model annotation link transitions as a single lifecycle with a shared run-health gate:
valid_tocloses with no gap before the new link'svalid_from.valid_toon) that live link so current state reflects the absence.ABSENT/NO_RECORDand change events so every transition (value → new value → absence → re-link) is auditable.Open product decision (resolve as part of this work)
Decide how a retired link surfaces to consumers, applied consistently across VEP, gnomAD, and ClinVar:
Acceptance criteria
ABSENT/NO_RECORDevent.valid_toand the newvalid_from, and records a change event.Implementation notes
forceflag must remain a skip-bypass / freshness knob only. Do not gate either transition onforce; a forced run re-queries everything and is exactly when a source outage would cause maximal spurious retirement or supersession.UNCHANGEDand continues (the VEP linking logic and its gnomAD/ClinVar equivalents).valid_toconsistent with howsupersede_withcloses superseded records.ABSENT-as-newest suppress stale values non-destructively and reduce reliance on retirement; out of scope here but noted as the eventual direction.