Skip to content

Unify annotation link lifecycle: gated supersession and retire-on-absence #780

Description

@bencap

Summary

Annotation runs (VEP, gnomAD, ClinVar) update live annotation links via two distinct transitions: value-replaces-value (a source returns a changed value, superseding the prior link) and value-replaced-by-nothing (a source returns no result for an allele that previously had a value). The first is handled today — the linker supersedes only on an actual value change. The second has no path at all: the prior live link is retained, an ABSENT/NO_RECORD event is appended, and a warning is logged. Because "current" state is read from the live link rather than derived from the event log, a genuine upstream retraction continues to surface the stale (but versioned) value with no signal it is gone.

This issue tracks treating both transitions as one annotation link lifecycle: adding the missing guarded retire-on-absence path, and bringing the existing supersession path under the same model and run-health gating so the two are consistent.

Problem

  • Retire-on-absence (missing): When a run produces no result for an allele that currently has a live link, the linker keeps the prior link (e.g. VEP's "do not overwrite a held consequence with a null result" branch records UNCHANGED and continues). The recorded ABSENT event is effectively audit-only, so retracted values continue to be surfaced as current, pinned to an older source_version.
  • Supersession (works, but ad hoc): Value-replaces-value is handled by the linker writing a new link only when the value changes. This appears correct today, but it is not gated on run health and is not expressed as part of a shared lifecycle, so the two transitions are reasoned about and implemented separately.
  • The only formalized retirement path is supersede_with with __retire_cascade__ closing dependent allele links, which fires when a new mapping supersedes an old mapping — a different layer from the per-source annotation link transitions above.

Proposed behavior

Model annotation link transitions as a single lifecycle with a shared run-health gate:

  • Supersession (value → new value): Keep current behavior of superseding only on an actual value change. Apply the same run-health gate so a degraded or partial run cannot supersede a good value with a wrong or incomplete one. Verify the prior link's valid_to closes with no gap before the new link's valid_from.
  • Retirement (value → nothing): When a run produces no result for an allele with a live link, retire (close valid_to on) that live link so current state reflects the absence.
  • Run-health gate (applies to both): Mutating transitions occur only when the run was healthy and complete. A degraded, partial, or errored run must neither retire nor supersede; it leaves prior live links untouched (current retain-on-absence behavior preserved for that case).
  • Continue appending ABSENT/NO_RECORD and change events so every transition (value → new value → absence → re-link) is auditable.
  • Self-healing is preserved: a later healthy run that finds a value for a retired allele creates a fresh live link via the existing ValidTime flow.

Open product decision (resolve as part of this work)

Decide how a retired link surfaces to consumers, applied consistently across VEP, gnomAD, and ClinVar:

  • Option A: Show nothing (no current annotation).
  • Option B: Fall back to the most-recent non-absent versioned value (e.g. "last known: Ensembl 110").

Acceptance criteria

  • A healthy, complete run that returns no result for an allele with a live link retires that link and records the ABSENT/NO_RECORD event.
  • A healthy, complete run that returns a changed value supersedes the prior link with no gap between valid_to and the new valid_from, and records a change event.
  • A degraded or partial run that returns no result, or a changed value, does not retire or supersede; prior live links are retained.
  • After retirement, the read path surfaces the chosen product semantics (nothing, or last-known versioned value) consistently for VEP, gnomAD, and ClinVar.
  • A later healthy run that finds a value for a previously retired allele produces a new live link, restoring current state.
  • The full sequence (value → new value → absence → retirement → re-link) is reconstructable from the event log.
  • Tests cover: healthy-run supersession; healthy-run retirement on absence; degraded/partial-run leaves links untouched for both transitions; re-link after retirement restores current state.

Implementation notes

  • The force flag must remain a skip-bypass / freshness knob only. Do not gate either transition on force; a forced run re-queries everything and is exactly when a source outage would cause maximal spurious retirement or supersession.
  • Retire-on-absence is a value-replaced-by-nothing path with no current trigger. Add the hook where the per-allele "no result this run" branch presently records UNCHANGED and continues (the VEP linking logic and its gnomAD/ClinVar equivalents).
  • Supersession largely exists already; the work there is consolidation — route it through the shared lifecycle and run-health gate rather than rebuilding it.
  • A run-health / completeness signal is the gate for both transitions. If one does not exist at the job level, define it (e.g. run completed without source errors and processed its full allele set) before wiring either mutating path.
  • Annotation links use the ValidTime mixin; both transitions should close valid_to consistent with how supersede_with closes superseded records.
  • Longer term, deriving "current" from newest-event-per-subject would let ABSENT-as-newest suppress stale values non-destructively and reduce reliance on retirement; out of scope here but noted as the eventual direction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    app: backendTask implementation touches the backendtype: enhancementEnhancement to an existing feature
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions