Skip to content

Investigate strict uniqueness for fct_vehicle_locations.key #5221

@chrisyamas

Description

@chrisyamas

While building fct_tides_vehicle_locations (PR #5216), I leaned on the documented "almost unique" guarantee on fct_vehicle_locations.key and added a defensive QUALIFY ROW_NUMBER per microbatch. Residual cross-batch duplicates land at 0.0089% (8,538 of 96M rows over a 24-day window).

That fits the upstream unique_proportion at_least 0.999 test, but it isn't strict TIDES unique: true for the corresponding location_ping_id field. So the TIDES output inherits the same near-uniqueness rather than guaranteed uniqueness.

Worth a separate conversation about whether to tighten the upstream key composition (e.g., adding dt or microbatch boundary, or regenerating via farm_fingerprint over the natural-key tuple) so downstream consumers can rely on strict uniqueness without each adding their own dedup layer.

Filing this so we can discuss separately from the TIDES PR scope.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions