Add fct_tides_vehicle_locations dbt model (closes #4837) by chrisyamas · Pull Request #5216 · cal-itp/data-infra

chrisyamas · 2026-05-01T21:14:41Z

Description

Describe your changes and why you're making them. Please include the context, motivation, and relevant dependencies.

Resolves #4837

Adds mart_gtfs.fct_tides_vehicle_locations, the first TIDES-conformant model in the warehouse. Reshapes fct_vehicle_locations into the TIDES vehicle_locations schema and filters to public, customer-facing or regional-subfeed fixed-route GTFS-RT feeds via dim_provider_gtfs_data. The model produces the BigQuery table only; per-agency parquet export (#4693), CDN-fronted public bucket (#4700), and file validator (#4839) are tracked separately.

A few design decisions worth flagging:

fct_vehicle_locations drops NULL trip_id rows upstream, so deadhead and layover pings are not in the export. TIDES doesn't require trip_id_performed, so a future change could source from fct_vehicle_positions_messages to keep them.
dim_provider_gtfs_data records multiple organization rows per VP feed when a feed is shared across agencies (govcbus.com is shared by 7 cities; the SD MTS feed is shared with the airport). The model collapses to one canonical org per feed (lex-smallest org name) to prevent fan-out duplication. Per-agency demuxing can happen at the export step.
fct_vehicle_locations.key is documented as "almost unique" upstream. The model adds a defensive QUALIFY ROW_NUMBER per microbatch; residual cross-batch dups are 0.0089% (8,538 of 96M), which fits the upstream unique_proportion at_least 0.999 threshold but isn't strict TIDES unique: true. Open to tightening upstream if you'd rather.
"City of Hermosa Beach" exists in dim_provider_gtfs_data but vehicle_positions_gtfs_dataset_key is NULL and customer_facing is FALSE, so Hermosa is not in this export. Worth confirming whether Hermosa is being onboarded or whether the seed-agency framing in Build a process to convert GTFS-RT Vehicle Positions data > TIDES Vehicle Locations using Hermosa Beach data #4837 was meant generically.

TIDES = Transit Integrated Data Exchange Specification, https://tides-transit.org/main/.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation

How has this been tested?

Include commands/logs/screenshots as relevant.

If making changes to dbt models, make sure they were created or update on Staging. Please run the command uv run dbt run -s CHANGED_MODEL --target staging and uv run dbt test -s CHANGED_MODEL --target staging, then include the output in this section of the PR.

uv run dbt run -s +fct_tides_vehicle_locations --target staging
uv run dbt test -s fct_tides_vehicle_locations --target staging

Materialized in cal-itp-data-infra-staging.christopher_mart_gtfs.fct_tides_vehicle_locations. 24-day window from 2026-03-20 to 2026-04-30:

96,049,006 rows, 100 distinct agencies
99.991% unique location_ping_id (8,538 dups; under team unique_proportion at_least 0.999)
Zero NULL on location_ping_id, event_timestamp, vehicle_id, trip_id_performed
Zero violations on TIDES bounds (lat/lon, heading, speed, odometer, trip_stop_sequence ≥ 1)
current_status enum mapping correct (no raw GTFS-RT values leak through)

Top agencies by ping count:

Agency	Pings	Vehicles
LA Metro	18,720,458	2,384
SFMTA	11,917,329	895
OCTA	6,977,247	458
AC Transit	6,565,813	502
VTA	5,837,699	526

Post-merge follow-ups

Document any actions that must be taken post-merge to deploy or otherwise implement the changes in this PR (for example, running a full refresh of some incremental model in dbt). If these actions will take more than a few hours after the merge or if they will be completed by someone other than the PR author, please create a dedicated follow-up issue and link it here to track resolution.

No action required
Actions required (specified below)

Two follow-up PRs ready locally and waiting on this one:

chore/tides-validation-harness — Frictionless validator under validation/tides/ (closes part of Build a validator for those TIDES files #4839)
feat/tides-trips-performed — second TIDES table sourced from fct_observed_trips

Will open follow-up issues for the upstream fct_vehicle_locations.key strict-uniqueness option and for the column-level RT test binding behavior (also affects existing fct_vehicle_locations).

github-actions · 2026-05-01T21:29:56Z

Warehouse report: Failed to add ci-report to a comment. Review the ci-report in the Summary.

github-actions · 2026-05-01T21:30:05Z

Impacted Exposures

No exposures are impacted by the changes in this PR.

Changed models

models/mart/gtfs/fct_tides_vehicle_locations.sql

If any impacted exposures are unexpected, verify that your changes do not unintentionally affect downstream consumers.

github-actions · 2026-05-01T21:32:47Z

Terraform plan in iac/cal-itp-data-infra-staging/airflow/us

Plan: 3 to add, 4 to change, 0 to destroy.

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+   create
!~  update in-place

Terraform will perform the following actions:

  # google_storage_bucket_object.calitp-staging-composer-catalog will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-staging-composer-catalog" {
!~      content             = (sensitive value)
!~      crc32c              = "7vbSEg==" -> (known after apply)
!~      detect_md5hash      = "gzQlzyAjYlTGiWPOSPmt/Q==" -> "different hash"
!~      generation          = 1777921775322636 -> (known after apply)
        id                  = "calitp-staging-composer-data/warehouse/target/catalog.json"
!~      md5hash             = "gzQlzyAjYlTGiWPOSPmt/Q==" -> (known after apply)
        name                = "data/warehouse/target/catalog.json"
#        (16 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-staging-composer-dags["dbt_project.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-staging-composer-dags" {
!~      crc32c              = "cIuoNQ==" -> (known after apply)
!~      detect_md5hash      = "bsZgcfmK985tISFYJCt+qg==" -> "different hash"
!~      generation          = 1777669801966208 -> (known after apply)
        id                  = "calitp-staging-composer-data/warehouse/dbt_project.yml"
!~      md5hash             = "bsZgcfmK985tISFYJCt+qg==" -> (known after apply)
        name                = "data/warehouse/dbt_project.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-staging-composer-dags["models/mart/tides/_mart_tides.yml"] will be created
+   resource "google_storage_bucket_object" "calitp-staging-composer-dags" {
+       bucket         = "calitp-staging-composer"
+       content        = (sensitive value)
+       content_type   = (known after apply)
+       crc32c         = (known after apply)
+       detect_md5hash = "different hash"
+       generation     = (known after apply)
+       id             = (known after apply)
+       kms_key_name   = (known after apply)
+       md5hash        = (known after apply)
+       md5hexhash     = (known after apply)
+       media_link     = (known after apply)
+       name           = "data/warehouse/models/mart/tides/_mart_tides.yml"
+       output_name    = (known after apply)
+       self_link      = (known after apply)
+       source         = "../../../../warehouse/models/mart/tides/_mart_tides.yml"
+       storage_class  = (known after apply)
    }

  # google_storage_bucket_object.calitp-staging-composer-dags["models/mart/tides/fct_tides_vehicle_locations.sql"] will be created
+   resource "google_storage_bucket_object" "calitp-staging-composer-dags" {
+       bucket         = "calitp-staging-composer"
+       content        = (sensitive value)
+       content_type   = (known after apply)
+       crc32c         = (known after apply)
+       detect_md5hash = "different hash"
+       generation     = (known after apply)
+       id             = (known after apply)
+       kms_key_name   = (known after apply)
+       md5hash        = (known after apply)
+       md5hexhash     = (known after apply)
+       media_link     = (known after apply)
+       name           = "data/warehouse/models/mart/tides/fct_tides_vehicle_locations.sql"
+       output_name    = (known after apply)
+       self_link      = (known after apply)
+       source         = "../../../../warehouse/models/mart/tides/fct_tides_vehicle_locations.sql"
+       storage_class  = (known after apply)
    }

  # google_storage_bucket_object.calitp-staging-composer-dags["seeds/_seeds.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-staging-composer-dags" {
!~      crc32c              = "7/62ZA==" -> (known after apply)
!~      detect_md5hash      = "auu3vnNdExPQiA88ThI9DA==" -> "different hash"
!~      generation          = 1776453636837026 -> (known after apply)
        id                  = "calitp-staging-composer-data/warehouse/seeds/_seeds.yml"
!~      md5hash             = "auu3vnNdExPQiA88ThI9DA==" -> (known after apply)
        name                = "data/warehouse/seeds/_seeds.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-staging-composer-dags["seeds/tides_publication_keys.csv"] will be created
+   resource "google_storage_bucket_object" "calitp-staging-composer-dags" {
+       bucket         = "calitp-staging-composer"
+       content        = (sensitive value)
+       content_type   = (known after apply)
+       crc32c         = (known after apply)
+       detect_md5hash = "different hash"
+       generation     = (known after apply)
+       id             = (known after apply)
+       kms_key_name   = (known after apply)
+       md5hash        = (known after apply)
+       md5hexhash     = (known after apply)
+       media_link     = (known after apply)
+       name           = "data/warehouse/seeds/tides_publication_keys.csv"
+       output_name    = (known after apply)
+       self_link      = (known after apply)
+       source         = "../../../../warehouse/seeds/tides_publication_keys.csv"
+       storage_class  = (known after apply)
    }

  # google_storage_bucket_object.calitp-staging-composer-manifest will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-staging-composer-manifest" {
!~      content             = (sensitive value)
!~      crc32c              = "ruSOBg==" -> (known after apply)
!~      detect_md5hash      = "Mw4Cul2QM1zWeUWwGhMlmw==" -> "different hash"
!~      generation          = 1777921776550660 -> (known after apply)
        id                  = "calitp-staging-composer-data/warehouse/target/manifest.json"
!~      md5hash             = "Mw4Cul2QM1zWeUWwGhMlmw==" -> (known after apply)
        name                = "data/warehouse/target/manifest.json"
#        (16 unchanged attributes hidden)
    }

Plan: 3 to add, 4 to change, 0 to destroy.

📝 Plan generated in Deploy dbt #1822

github-actions · 2026-05-01T21:32:50Z

Terraform plan in iac/cal-itp-data-infra/airflow/us

Plan: 3 to add, 2 to change, 0 to destroy.

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+   create
!~  update in-place

Terraform will perform the following actions:

  # google_storage_bucket_object.calitp-composer-dags["dbt_project.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "cIuoNQ==" -> (known after apply)
!~      detect_md5hash      = "bsZgcfmK985tISFYJCt+qg==" -> "different hash"
!~      generation          = 1777669782489514 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/dbt_project.yml"
!~      md5hash             = "bsZgcfmK985tISFYJCt+qg==" -> (known after apply)
        name                = "data/warehouse/dbt_project.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/mart/tides/_mart_tides.yml"] will be created
+   resource "google_storage_bucket_object" "calitp-composer-dags" {
+       bucket         = "calitp-composer"
+       content        = (sensitive value)
+       content_type   = (known after apply)
+       crc32c         = (known after apply)
+       detect_md5hash = "different hash"
+       generation     = (known after apply)
+       id             = (known after apply)
+       kms_key_name   = (known after apply)
+       md5hash        = (known after apply)
+       md5hexhash     = (known after apply)
+       media_link     = (known after apply)
+       name           = "data/warehouse/models/mart/tides/_mart_tides.yml"
+       output_name    = (known after apply)
+       self_link      = (known after apply)
+       source         = "../../../../warehouse/models/mart/tides/_mart_tides.yml"
+       storage_class  = (known after apply)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/mart/tides/fct_tides_vehicle_locations.sql"] will be created
+   resource "google_storage_bucket_object" "calitp-composer-dags" {
+       bucket         = "calitp-composer"
+       content        = (sensitive value)
+       content_type   = (known after apply)
+       crc32c         = (known after apply)
+       detect_md5hash = "different hash"
+       generation     = (known after apply)
+       id             = (known after apply)
+       kms_key_name   = (known after apply)
+       md5hash        = (known after apply)
+       md5hexhash     = (known after apply)
+       media_link     = (known after apply)
+       name           = "data/warehouse/models/mart/tides/fct_tides_vehicle_locations.sql"
+       output_name    = (known after apply)
+       self_link      = (known after apply)
+       source         = "../../../../warehouse/models/mart/tides/fct_tides_vehicle_locations.sql"
+       storage_class  = (known after apply)
    }

  # google_storage_bucket_object.calitp-composer-dags["seeds/_seeds.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "7/62ZA==" -> (known after apply)
!~      detect_md5hash      = "auu3vnNdExPQiA88ThI9DA==" -> "different hash"
!~      generation          = 1776457910260376 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/seeds/_seeds.yml"
!~      md5hash             = "auu3vnNdExPQiA88ThI9DA==" -> (known after apply)
        name                = "data/warehouse/seeds/_seeds.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["seeds/tides_publication_keys.csv"] will be created
+   resource "google_storage_bucket_object" "calitp-composer-dags" {
+       bucket         = "calitp-composer"
+       content        = (sensitive value)
+       content_type   = (known after apply)
+       crc32c         = (known after apply)
+       detect_md5hash = "different hash"
+       generation     = (known after apply)
+       id             = (known after apply)
+       kms_key_name   = (known after apply)
+       md5hash        = (known after apply)
+       md5hexhash     = (known after apply)
+       media_link     = (known after apply)
+       name           = "data/warehouse/seeds/tides_publication_keys.csv"
+       output_name    = (known after apply)
+       self_link      = (known after apply)
+       source         = "../../../../warehouse/seeds/tides_publication_keys.csv"
+       storage_class  = (known after apply)
    }

Plan: 3 to add, 2 to change, 0 to destroy.

📝 Plan generated in Deploy dbt #1822

vevetron · 2026-05-01T21:33:08Z

regional-subfeed fixed-route feeds. Closes #4837. - remove from model text

vevetron · 2026-05-01T23:01:16Z

should be it's own grouping of queries, not in the "mart" space. maybe - mart_tides?
I think it should be a view and not materialized. Access pattern is likely to be incredibly infrequent and storing the processed data for this is probably unncecessary.

chrisyamas · 2026-05-02T10:32:02Z

@vevetron thanks, addressing your three comments above and the two follow-ups from the call (HAVING clauses on Slack, the in-code TIDES #22 / #252 reference on screen-share). I just force-pushed commit which:

moved Closes #4837 out of the model description and lives in the PR body now. Moved the model out of mart_gtfs into its own mart/tides/ folder with a mart_tides schema, so it materializes in <user>_mart_tides rather than <user>_mart_gtfs. Same destination for the two stacked follow-up PRs (chore/tides-validation-harness and feat/tides-trips-performed), which carry the same shape and will go up for review once this lands
converted from incremental microbatch to a view, agreed on the access-pattern reasoning. Dropped partition_by, cluster_by, event_time, batch_size, begin, lookback, full_refresh, and on_schema_change config keys that don't apply to a view
added a model-level meta: { publish.product: tides } in the new yml, going with that key to fit the existing publish.* / ckan.* dotted-namespace convention I see on the CKAN-published models (e.g., dim_gtfs_datasets_latest). Happy to switch to a literal dbt tags: ['tides_product'] instead if you'd prefer — I noticed the warehouse doesn't currently use dbt tags: anywhere, which is why I went with the meta key
for the HAVING clauses (re: your Slack follow-up), I refactored the public_subfeed_agencies CTE from ANY_VALUE(... HAVING MIN organization_name) + GROUP BY 1 to QUALIFY ROW_NUMBER() OVER (PARTITION BY ... ORDER BY organization_name ASC) = 1. So this should match the pattern used elsewhere in the warehouse for picking one canonical row per group?
for the in-code TIDES issue reference you flagged on screen-share, dropped the parenthetical and replaced it with the full URL https://github.com/TIDES-transit/TIDES/issues/252 in both the SQL comment and the yml description

github-actions · 2026-05-02T10:33:39Z

Impacted Exposures

No exposures are impacted by the changes in this PR.

Changed models

models/mart/tides/fct_tides_vehicle_locations.sql

If any impacted exposures are unexpected, verify that your changes do not unintentionally affect downstream consumers.

lauriemerrell

A couple comments, think broad strokes look ok.

lauriemerrell · 2026-05-04T14:06:05Z

+        DATETIME(vp.location_timestamp, vp.schedule_feed_timezone) AS event_timestamp,
+
+        vp.trip_id AS trip_id_performed,
+        -- trip_id_scheduled left NULL for MVP; deriving requires a reliable


not sure if this is true -- per GTFS spec, trip_id in VP should reference schedule unless the schedule_relationship is one of a few specific values.... so maybe this should be based on that?

also, this is in the intent of the trip_instance_key identifier (to allow joins of a specific trip across feed types), so can use that for lookup if desired

good catch, agreed. per the GTFS-RT spec the VP trip_id does reference the schedule whenever trip.schedule_relationship is one of the in-schedule values, so deriving trip_id_scheduled from that conditional is the right shape. trip_instance_key is also viable as the join key.

leaving this PR's trip_id_scheduled as NULL for MVP and filing a follow-up to wire up the conditional + the join. happy to bump that into scope here if you'd prefer it land together.

lauriemerrell · 2026-05-04T14:11:33Z

+        USING (gtfs_dataset_key)
+),
+
+-- TIDES requires location_ping_id strictly unique; the upstream key is


noting that base64_url is part of key so if two results have same value they need to have same URL... so that won't actually be a substantive tie break

right, both ordering columns were degenerate at this grain since location_timestamp and base64_url are components of the upstream key. fixed in a fixup: pulled the dedup up to the source CTE and ordered by _extract_ts DESC (most-recently-extracted wins), which differs across the duplicates. trailing deduped CTE collapsed into the source. verified 0 dups on a sampled service_date in staging.

lauriemerrell · 2026-05-04T14:16:48Z

+      (https://tides-transit.org/main/). Sourced from `fct_vehicle_locations`
+      and filtered via `dim_provider_gtfs_data` to public, customer-facing or
+      regional-subfeed fixed-route feeds.
+    meta:


don't think we use publish.product anywhere else, what is intent for that as meta? can/should we be defining a dbt exposure? that is our standard for published items. see for example https://github.com/cal-itp/data-infra/blob/main/warehouse/models/mart/gtfs_schedule_latest/_gtfs_schedule_latest.yml#L1270-L1314 and https://github.com/cal-itp/data-infra/blob/main/airflow/dags/publish_gtfs.py and https://github.com/cal-itp/data-infra/blob/main/airflow/plugins/operators/dbt_manifest_to_metadata_operator.py#L88-L90 for the GTFS --> CKAN publish flow

good call, exposure is the right shape. dropping the publish.product meta in this PR. adding a single california_tides exposure as part of PR 5220 (so both fct_tides_vehicle_locations and fct_tides_trips_performed ref()s resolve in the same checkout), modeled on the GTFS california_open_data block. owner / methodology fields filled in; meta.destinations now filled in via PR 5229 (publishing pipeline).

lauriemerrell · 2026-05-04T14:21:02Z

+    FROM {{ ref('fct_vehicle_locations') }}
+),
+
+-- dim_provider_gtfs_data fans out: a single vehicle_positions feed can be


I am a little confused by the logic here -- my instinct would be to either group by VP URL where it meets these criteria (public facing etc.) and array_agg the organization info so that all orgs can be used later in the publish process OR just select distinct on the non-org columns and ignore the organization parts.

Basically, not sure how ending up with a VP feed tagged with one of its organizations meets future needs -- if we need orgs, we should keep all of them and handle unnesting or whatever in the publish process. If we don't need all the orgs then let's just drop and publish under the VP URL and publish org related metadata separately.

yeah agreed the fan-out collapse is doing more than it should. going with your second framing: dropping organization_name / organization_ntd_id from both fact tables since orgs aren't part of the TIDES spec anyway.

the public_subfeed_agencies CTE shrinks to a public_subfeed_keys CTE that's just SELECT DISTINCT on dataset keys matching the public-customer-facing-or-regional-subfeed criterion, no QUALIFY needed. govcbus / SD MTS multi-org reality moves to publish-side metadata in PR 5229, separate from the TIDES tables themselves.

erikamov · 2026-05-04T17:55:13Z

Hermosa Beach was selected to be the first candidate to share their data in previous meetings. For other agencies they would need to check if they agree to share the data. So we would need to filter the results to generate only for specific agencies.
@evansiroky and @vevetron, should we keep filtering only for Hermosa Beach and ignore this customer_facing is FALSE?

evansiroky · 2026-05-04T19:33:32Z

We were working with the City of Hermosa Beach which is interested in studying on-time-performance. I believe the thought was to narrow the initial output to just the agencies that traverse Hermosa Beach for development purposes only to check on costs and implementation before expanding statewide.

erikamov · 2026-05-04T19:39:16Z

We were working with the City of Hermosa Beach which is interested in studying on-time-performance. I believe the thought was to narrow the initial output to just the agencies that traverse Hermosa Beach for development purposes only to check on costs and implementation before expanding statewide.

Yeah, it is what I remember too. :)

chrisyamas · 2026-05-05T18:31:32Z

thanks both, the scope has been narrowed! went ahead and implemented it as MVP this morning rather than wait for our sync since the seed-based mechanism is small enough that landing it gives us something concrete to react to.

scope is now three feeds via a new tides_publication_keys seed:

Beach Cities Transit (Hermosa local operator, BCT JPA): 9edf45e373638700ca420b1e588efdaf
LA Metro Bus (south bay routes 102, 130, 232, 344): 1745f7d9b9fa48cdbc8ea282e60602bd
Torrance Transit (Swiftly feed): 46b00e5c738a0ebf93522371d9899627

filter is an INNER JOIN on the seed inside both fct_tides_vehicle_locations and fct_tides_trips_performed. the existing public_customer_facing_or_regional_subfeed_fixed_route filter stays in place; the seed is additive narrowing on top of it. row count drops to 18.3M (from ~96M) on the 9-day vehicle_locations window and 99,960 (from 590,253) on the 8-day trips_performed window.

PR description on this one is updated to match. PR 5220 (trips_performed) inherits the same seed; PRs 5229 and 5230 (the publishing pipeline + staging bucket, drafts up now) inherit the narrowing automatically since they consume the views.

one open question worth flagging for our sync today is whether it would be good to formalize a tides_publication_consent flag on dim_provider_gtfs_data long-term (defaulting to the existing public-customer-facing flag, overridable per agency), or keep the seed as the publication-list mechanism going forward. seed is the lightest mvp; flag would be more durable. happy to take either direction.

github-actions · 2026-05-05T19:37:13Z

Warehouse report 📦

Checks/potential follow-ups

Checks indicate the following action items may be necessary.

For new models, do they all have a surrogate primary key that is tested to be not-null and unique?

New models 🌱

calitp_warehouse.mart.tides.fct_tides_vehicle_locations

DAG

Legend (in order of precedence)

Resource type	Indicator	Resolution
Large table-materialized model	Orange	Make the model incremental
Large model without partitioning or clustering	Orange	Add partitioning and/or clustering
View with more than one child	Yellow	Materialize as a table or incremental
Incremental	Light green
Table	Green
View	White

github-actions · 2026-05-05T19:37:23Z

Impacted Exposures

No exposures are impacted by the changes in this PR.

Changed models

models/mart/tides/fct_tides_vehicle_locations.sql

If any impacted exposures are unexpected, verify that your changes do not unintentionally affect downstream consumers.

Adds mart_gtfs.fct_tides_vehicle_locations, the first TIDES-conformant model in the Cal-ITP warehouse. Reshapes fct_vehicle_locations into the TIDES vehicle_locations schema and filters to public, customer-facing or regional-subfeed fixed-route GTFS-RT feeds via dim_provider_gtfs_data. The model produces the BigQuery table only. Per-agency parquet export (#4693), the CDN-fronted public bucket (#4700), and the file validator (#4839) are tracked separately. Validated in christopher_mart_gtfs sandbox over a 24-day window (2026-03-20 to 2026-04-30): 96M rows across 100 agencies, 99.991% unique location_ping_id, zero NULL on TIDES required-not-null fields, zero violations on TIDES bounds and enum constraints.

- Drop the file-level comment header on fct_tides_vehicle_locations.sql (matches the existing fct_vehicle_locations style). - Trim CTE-level comments to keep WHY (agency fan-out, NULL-trip caveat, defensive dedup rationale) and drop WHAT comments that just restate the SQL below. - Specify ASC on the second ORDER BY column to satisfy sqlfluff AM03. - Use GROUP BY 1 instead of repeating the column name (matches existing Cal-ITP usage and avoids the AM06 alias-mismatch risk). - Yml: tighten column descriptions, reuse anchor refs (*rt_service_date, *rt_vehicle_id, *rt_vp_stop_id, *gtfs_rt_dt, *base64_url, *gtfs_dataset_key_desc) where they apply, and wrap test where clauses in config: blocks to match the existing pattern on fct_vehicle_locations.

…_LOOKBACK_DAYS The var was renamed in main by #5178 (Laurie's incremental-vs-microbatch docs cleanup) after this branch was started. Match the new name so dbt compile passes.

…ssue refs Per Vivek's call feedback (and Slack follow-up about HAVING clauses): - Move the model out of mart_gtfs into its own mart/tides folder. Add a mart_tides schema in dbt_project.yml so it materializes in <user>_mart_tides rather than <user>_mart_gtfs. The model is a TIDES product, not a GTFS mart, and grouping the eventual peers (trips_performed, stop_visits) under one folder keeps the boundary clear. - Convert from incremental microbatch to view. The downstream consumer is the per-agency export Airflow job querying once per cycle, not interactive analytics, so paying compute on every read is fine and we avoid carrying a materialized 96M-row copy. - Drop partition_by, cluster_by, full_refresh, on_schema_change, event_time, batch_size, begin, lookback config keys that don't apply to a view. - Refactor public_subfeed_agencies CTE from ANY_VALUE(... HAVING MIN organization_name) GROUP BY 1 to QUALIFY ROW_NUMBER() OVER (PARTITION BY ... ORDER BY organization_name) = 1. Matches the team pattern used elsewhere in the warehouse (every other "pick one canonical row per group" site uses QUALIFY ROW_NUMBER). - Add model-level meta: { publish.product: tides } in the new yml, parallel to the existing publish.* / ckan.* dotted-namespace meta keys used on CKAN-published models. - Drop "Closes #4837" from the model description (belongs in the PR body, not the warehouse). - Replace the inline TIDES issue #252 reference with the full GitHub URL.

…tive The previous QUALIFY ordered by event_timestamp DESC, base64_url ASC. Both columns are degenerate at the location_ping_id grain: location_timestamp and base64_url are components of the upstream `key`, so they're constant across rows that share a key. Move the dedup up to the source CTE and order by `_extract_ts DESC` so most-recently-extracted wins. The trailing `deduped` CTE collapses into the source CTE.

…lause Adds a `mart.tides: +enabled: true` line under data_tests in dbt_project.yml matching the existing `mart.payments` re-enable pattern. The model has six column-level tests (not_null on location_ping_id / event_timestamp / vehicle_id, accepted_values on current_status / trip_type, and unique_proportion on location_ping_id); all six pass against staging. The two accepted_values where clauses were `__rt_sampled__ AND <col> IS NOT NULL`. The rt_sampled_where_clause macro only substitutes on an exact `__rt_sampled__` match, so the compound form was emitting the literal token to BigQuery and failing. Trimmed to bare `__rt_sampled__`, matching the convention used everywhere else in the warehouse. accepted_values silently ignores NULLs already, so the IS NOT NULL filter was redundant.

…roduct Orgs aren't part of the TIDES spec (vehicle_locations.schema.json defines no organization fields). The agency-collapse CTE shrinks to a SELECT DISTINCT on `vehicle_positions_gtfs_dataset_key`, no QUALIFY needed. Both `organization_name` and `organization_ntd_id` come out of the model and out of _mart_tides.yml. The publish.product meta block is gone in favor of a real dbt exposure. The exposure itself lands on PR 5220 alongside fct_tides_trips_performed so both ref()s resolve in the same checkout. Per-agency / per-org metadata for the publish flow lives separately in PR 4 (next sprint).

github-actions · 2026-05-05T19:47:05Z

Impacted Exposures

No exposures are impacted by the changes in this PR.

Changed models

models/mart/tides/fct_tides_vehicle_locations.sql

If any impacted exposures are unexpected, verify that your changes do not unintentionally affect downstream consumers.

This was referenced May 4, 2026

Investigate strict uniqueness for fct_vehicle_locations.key #5221

Closed

Column-level dbt tests with where: '__rt_sampled__' don't appear to bind in the manifest #5222

Closed

lauriemerrell reviewed May 4, 2026

View reviewed changes

This was referenced May 5, 2026

Add TIDES publishing pipeline DAG and operators #5229

Draft

Add staging GCS bucket for TIDES public data publishing #5230

Draft

Add Setup Graphviz step to build-model-report.yml compile job #5234

Merged

Christopher Yamas and others added 9 commits May 5, 2026 15:38

fixup: rename DBT_ALL_MICROBATCH_LOOKBACK_DAYS to DBT_ALL_INCREMENTAL…

685b5c8

…_LOOKBACK_DAYS The var was renamed in main by #5178 (Laurie's incremental-vs-microbatch docs cleanup) after this branch was started. Match the new name so dbt compile passes.

fixup: narrow TIDES output to Hermosa Beach traversal MVP

49de037

ci: retrigger after #5234 setup-graphviz fix on main

529d4e2

chrisyamas force-pushed the feat/tides-vehicle-locations branch from d90e087 to 529d4e2 Compare May 5, 2026 19:38

Conversation

chrisyamas commented May 1, 2026

Description

Type of change

How has this been tested?

Post-merge follow-ups

Uh oh!

github-actions Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 1, 2026

Impacted Exposures

Changed models

Uh oh!

github-actions Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vevetron commented May 1, 2026

Uh oh!

vevetron commented May 1, 2026

Uh oh!

chrisyamas commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 2, 2026

Impacted Exposures

Changed models

Uh oh!

lauriemerrell left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erikamov commented May 4, 2026

Uh oh!

evansiroky commented May 4, 2026

Uh oh!

erikamov commented May 4, 2026

Uh oh!

chrisyamas commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checks/potential follow-ups

New models 🌱

DAG

Uh oh!

github-actions Bot commented May 5, 2026

Impacted Exposures

Changed models

Uh oh!

github-actions Bot commented May 5, 2026

Impacted Exposures

Changed models

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

github-actions Bot commented May 1, 2026 •

edited

Loading

github-actions Bot commented May 1, 2026 •

edited

Loading

github-actions Bot commented May 1, 2026 •

edited

Loading

chrisyamas commented May 2, 2026 •

edited

Loading

github-actions Bot commented May 5, 2026 •

edited

Loading