While building fct_tides_vehicle_locations (PR #5216), I leaned on the documented "almost unique" guarantee on fct_vehicle_locations.key and added a defensive QUALIFY ROW_NUMBER per microbatch. Residual cross-batch duplicates land at 0.0089% (8,538 of 96M rows over a 24-day window).
That fits the upstream unique_proportion at_least 0.999 test, but it isn't strict TIDES unique: true for the corresponding location_ping_id field. So the TIDES output inherits the same near-uniqueness rather than guaranteed uniqueness.
Worth a separate conversation about whether to tighten the upstream key composition (e.g., adding dt or microbatch boundary, or regenerating via farm_fingerprint over the natural-key tuple) so downstream consumers can rely on strict uniqueness without each adding their own dedup layer.
Filing this so we can discuss separately from the TIDES PR scope.
While building
fct_tides_vehicle_locations(PR #5216), I leaned on the documented "almost unique" guarantee onfct_vehicle_locations.keyand added a defensiveQUALIFY ROW_NUMBERper microbatch. Residual cross-batch duplicates land at 0.0089% (8,538 of 96M rows over a 24-day window).That fits the upstream
unique_proportion at_least 0.999test, but it isn't strict TIDESunique: truefor the correspondinglocation_ping_idfield. So the TIDES output inherits the same near-uniqueness rather than guaranteed uniqueness.Worth a separate conversation about whether to tighten the upstream key composition (e.g., adding
dtor microbatch boundary, or regenerating viafarm_fingerprintover the natural-key tuple) so downstream consumers can rely on strict uniqueness without each adding their own dedup layer.Filing this so we can discuss separately from the TIDES PR scope.