Add fct_tides_trips_performed dbt model#5220
Add fct_tides_trips_performed dbt model#5220chrisyamas wants to merge 4 commits intofeat/tides-vehicle-locationsfrom
Conversation
|
Warehouse report: Failed to add ci-report to a comment. Review the ci-report in the Summary. |
Impacted ExposuresNo exposures are impacted by the changes in this PR. Changed models
|
|
Terraform plan in iac/cal-itp-data-infra/airflow/us Plan: 4 to add, 2 to change, 0 to destroy.Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
!~ update in-place
Terraform will perform the following actions:
# google_storage_bucket_object.calitp-composer-dags["dbt_project.yml"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-composer-dags" {
!~ crc32c = "cIuoNQ==" -> (known after apply)
!~ detect_md5hash = "bsZgcfmK985tISFYJCt+qg==" -> "different hash"
!~ generation = 1777669782489514 -> (known after apply)
id = "calitp-composer-data/warehouse/dbt_project.yml"
!~ md5hash = "bsZgcfmK985tISFYJCt+qg==" -> (known after apply)
name = "data/warehouse/dbt_project.yml"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-composer-dags["models/mart/tides/_mart_tides.yml"] will be created
+ resource "google_storage_bucket_object" "calitp-composer-dags" {
+ bucket = "calitp-composer"
+ content = (sensitive value)
+ content_type = (known after apply)
+ crc32c = (known after apply)
+ detect_md5hash = "different hash"
+ generation = (known after apply)
+ id = (known after apply)
+ kms_key_name = (known after apply)
+ md5hash = (known after apply)
+ md5hexhash = (known after apply)
+ media_link = (known after apply)
+ name = "data/warehouse/models/mart/tides/_mart_tides.yml"
+ output_name = (known after apply)
+ self_link = (known after apply)
+ source = "../../../../warehouse/models/mart/tides/_mart_tides.yml"
+ storage_class = (known after apply)
}
# google_storage_bucket_object.calitp-composer-dags["models/mart/tides/fct_tides_trips_performed.sql"] will be created
+ resource "google_storage_bucket_object" "calitp-composer-dags" {
+ bucket = "calitp-composer"
+ content = (sensitive value)
+ content_type = (known after apply)
+ crc32c = (known after apply)
+ detect_md5hash = "different hash"
+ generation = (known after apply)
+ id = (known after apply)
+ kms_key_name = (known after apply)
+ md5hash = (known after apply)
+ md5hexhash = (known after apply)
+ media_link = (known after apply)
+ name = "data/warehouse/models/mart/tides/fct_tides_trips_performed.sql"
+ output_name = (known after apply)
+ self_link = (known after apply)
+ source = "../../../../warehouse/models/mart/tides/fct_tides_trips_performed.sql"
+ storage_class = (known after apply)
}
# google_storage_bucket_object.calitp-composer-dags["models/mart/tides/fct_tides_vehicle_locations.sql"] will be created
+ resource "google_storage_bucket_object" "calitp-composer-dags" {
+ bucket = "calitp-composer"
+ content = (sensitive value)
+ content_type = (known after apply)
+ crc32c = (known after apply)
+ detect_md5hash = "different hash"
+ generation = (known after apply)
+ id = (known after apply)
+ kms_key_name = (known after apply)
+ md5hash = (known after apply)
+ md5hexhash = (known after apply)
+ media_link = (known after apply)
+ name = "data/warehouse/models/mart/tides/fct_tides_vehicle_locations.sql"
+ output_name = (known after apply)
+ self_link = (known after apply)
+ source = "../../../../warehouse/models/mart/tides/fct_tides_vehicle_locations.sql"
+ storage_class = (known after apply)
}
# google_storage_bucket_object.calitp-composer-dags["seeds/_seeds.yml"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-composer-dags" {
!~ crc32c = "7/62ZA==" -> (known after apply)
!~ detect_md5hash = "auu3vnNdExPQiA88ThI9DA==" -> "different hash"
!~ generation = 1776457910260376 -> (known after apply)
id = "calitp-composer-data/warehouse/seeds/_seeds.yml"
!~ md5hash = "auu3vnNdExPQiA88ThI9DA==" -> (known after apply)
name = "data/warehouse/seeds/_seeds.yml"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-composer-dags["seeds/tides_publication_keys.csv"] will be created
+ resource "google_storage_bucket_object" "calitp-composer-dags" {
+ bucket = "calitp-composer"
+ content = (sensitive value)
+ content_type = (known after apply)
+ crc32c = (known after apply)
+ detect_md5hash = "different hash"
+ generation = (known after apply)
+ id = (known after apply)
+ kms_key_name = (known after apply)
+ md5hash = (known after apply)
+ md5hexhash = (known after apply)
+ media_link = (known after apply)
+ name = "data/warehouse/seeds/tides_publication_keys.csv"
+ output_name = (known after apply)
+ self_link = (known after apply)
+ source = "../../../../warehouse/seeds/tides_publication_keys.csv"
+ storage_class = (known after apply)
}
Plan: 4 to add, 2 to change, 0 to destroy.📝 Plan generated in Deploy dbt #1823 |
|
Terraform plan in iac/cal-itp-data-infra-staging/airflow/us Plan: 4 to add, 4 to change, 0 to destroy.Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
!~ update in-place
Terraform will perform the following actions:
# google_storage_bucket_object.calitp-staging-composer-catalog will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer-catalog" {
!~ content = (sensitive value)
!~ crc32c = "7vbSEg==" -> (known after apply)
!~ detect_md5hash = "gzQlzyAjYlTGiWPOSPmt/Q==" -> "different hash"
!~ generation = 1777921775322636 -> (known after apply)
id = "calitp-staging-composer-data/warehouse/target/catalog.json"
!~ md5hash = "gzQlzyAjYlTGiWPOSPmt/Q==" -> (known after apply)
name = "data/warehouse/target/catalog.json"
# (16 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer-dags["dbt_project.yml"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer-dags" {
!~ crc32c = "cIuoNQ==" -> (known after apply)
!~ detect_md5hash = "bsZgcfmK985tISFYJCt+qg==" -> "different hash"
!~ generation = 1777669801966208 -> (known after apply)
id = "calitp-staging-composer-data/warehouse/dbt_project.yml"
!~ md5hash = "bsZgcfmK985tISFYJCt+qg==" -> (known after apply)
name = "data/warehouse/dbt_project.yml"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer-dags["models/mart/tides/_mart_tides.yml"] will be created
+ resource "google_storage_bucket_object" "calitp-staging-composer-dags" {
+ bucket = "calitp-staging-composer"
+ content = (sensitive value)
+ content_type = (known after apply)
+ crc32c = (known after apply)
+ detect_md5hash = "different hash"
+ generation = (known after apply)
+ id = (known after apply)
+ kms_key_name = (known after apply)
+ md5hash = (known after apply)
+ md5hexhash = (known after apply)
+ media_link = (known after apply)
+ name = "data/warehouse/models/mart/tides/_mart_tides.yml"
+ output_name = (known after apply)
+ self_link = (known after apply)
+ source = "../../../../warehouse/models/mart/tides/_mart_tides.yml"
+ storage_class = (known after apply)
}
# google_storage_bucket_object.calitp-staging-composer-dags["models/mart/tides/fct_tides_trips_performed.sql"] will be created
+ resource "google_storage_bucket_object" "calitp-staging-composer-dags" {
+ bucket = "calitp-staging-composer"
+ content = (sensitive value)
+ content_type = (known after apply)
+ crc32c = (known after apply)
+ detect_md5hash = "different hash"
+ generation = (known after apply)
+ id = (known after apply)
+ kms_key_name = (known after apply)
+ md5hash = (known after apply)
+ md5hexhash = (known after apply)
+ media_link = (known after apply)
+ name = "data/warehouse/models/mart/tides/fct_tides_trips_performed.sql"
+ output_name = (known after apply)
+ self_link = (known after apply)
+ source = "../../../../warehouse/models/mart/tides/fct_tides_trips_performed.sql"
+ storage_class = (known after apply)
}
# google_storage_bucket_object.calitp-staging-composer-dags["models/mart/tides/fct_tides_vehicle_locations.sql"] will be created
+ resource "google_storage_bucket_object" "calitp-staging-composer-dags" {
+ bucket = "calitp-staging-composer"
+ content = (sensitive value)
+ content_type = (known after apply)
+ crc32c = (known after apply)
+ detect_md5hash = "different hash"
+ generation = (known after apply)
+ id = (known after apply)
+ kms_key_name = (known after apply)
+ md5hash = (known after apply)
+ md5hexhash = (known after apply)
+ media_link = (known after apply)
+ name = "data/warehouse/models/mart/tides/fct_tides_vehicle_locations.sql"
+ output_name = (known after apply)
+ self_link = (known after apply)
+ source = "../../../../warehouse/models/mart/tides/fct_tides_vehicle_locations.sql"
+ storage_class = (known after apply)
}
# google_storage_bucket_object.calitp-staging-composer-dags["seeds/_seeds.yml"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer-dags" {
!~ crc32c = "7/62ZA==" -> (known after apply)
!~ detect_md5hash = "auu3vnNdExPQiA88ThI9DA==" -> "different hash"
!~ generation = 1776453636837026 -> (known after apply)
id = "calitp-staging-composer-data/warehouse/seeds/_seeds.yml"
!~ md5hash = "auu3vnNdExPQiA88ThI9DA==" -> (known after apply)
name = "data/warehouse/seeds/_seeds.yml"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer-dags["seeds/tides_publication_keys.csv"] will be created
+ resource "google_storage_bucket_object" "calitp-staging-composer-dags" {
+ bucket = "calitp-staging-composer"
+ content = (sensitive value)
+ content_type = (known after apply)
+ crc32c = (known after apply)
+ detect_md5hash = "different hash"
+ generation = (known after apply)
+ id = (known after apply)
+ kms_key_name = (known after apply)
+ md5hash = (known after apply)
+ md5hexhash = (known after apply)
+ media_link = (known after apply)
+ name = "data/warehouse/seeds/tides_publication_keys.csv"
+ output_name = (known after apply)
+ self_link = (known after apply)
+ source = "../../../../warehouse/seeds/tides_publication_keys.csv"
+ storage_class = (known after apply)
}
# google_storage_bucket_object.calitp-staging-composer-manifest will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer-manifest" {
!~ content = (sensitive value)
!~ crc32c = "ruSOBg==" -> (known after apply)
!~ detect_md5hash = "Mw4Cul2QM1zWeUWwGhMlmw==" -> "different hash"
!~ generation = 1777921776550660 -> (known after apply)
id = "calitp-staging-composer-data/warehouse/target/manifest.json"
!~ md5hash = "Mw4Cul2QM1zWeUWwGhMlmw==" -> (known after apply)
name = "data/warehouse/target/manifest.json"
# (16 unchanged attributes hidden)
}
Plan: 4 to add, 4 to change, 0 to destroy.📝 Plan generated in Deploy dbt #1823 |
| GROUP BY 1, 2 | ||
| ), | ||
|
|
||
| -- Same shared-feed agency collapse as fct_tides_vehicle_locations. |
There was a problem hiding this comment.
same comment as there re: org logic
There was a problem hiding this comment.
addressed alongside L4 on PR 5216: dropping organization_name / organization_ntd_id from this table too. the public_subfeed_agencies CTE shrinks to a public_subfeed_keys SELECT DISTINCT, no QUALIFY. orgs aren't part of the TIDES spec; metadata moves to publish-side in PR 5229.
| ), | ||
|
|
||
| -- TIDES requires (service_date, trip_id_performed) unique. fct_observed_trips | ||
| -- can have multiple rows per PK when the same trip appears in multiple feeds; |
There was a problem hiding this comment.
This should only dedupe within a given feed (VP URL or VP GTFS dataset), don't dedupe across feeds. There is no reason to expect that trip ID performed should be unique across feeds and dropping like this will result in trips basically arbitrarily missing from some feeds just because they used the same ID as another feed. TIDES isn't really designed for this cross-agency use-case (all IDs within TIDES should only be assumed to be unique within a given feed, not across feeds/agencies) and honestly maybe this is something that needs to get surfaced at the TIDES Spec level -- should there be an agency or feed identifier in this table for cross-agency use cases?
There was a problem hiding this comment.
right, the partition was too coarse. fixed: added vehicle_positions_gtfs_dataset_key to the partition so dedup happens within feed, not across feeds. trips that share trip_id_performed across different feeds now both survive. base64_url tie-breaker dropped from the ORDER BY (redundant once partitioned by feed).
re-materialized with the fix on an 8-day window (2026-04-23 to 2026-04-30): 590,253 rows after, 562,378 distinct (service_date, trip_id_performed) pairs under the old grain. ~27,875 trips that were getting collapsed across feeds now survive as their own rows.
agreed on your TIDES-spec point: cross-feed uniqueness is not something the spec guarantees, so the assumption "trip_id_performed is unique within feed + service_date only" is the right framing. worth a thread in TIDES-community channels separately; happy to take that on.
| route metadata and to `fct_tides_vehicle_locations` for canonical | ||
| vehicle_id per trip. Filtered upstream to `appeared_in_vp = TRUE` so | ||
| every row has a derivable vehicle_id. | ||
| meta: |
There was a problem hiding this comment.
same comment I think this should be an exposure
There was a problem hiding this comment.
added a single california_tides exposure on this PR (so both fct_tides_vehicle_locations and fct_tides_trips_performed refs resolve in the same checkout), modeled on the GTFS california_open_data exposure. publish.product meta block dropped from both yml entries. PR 5229 fills in meta.destinations for the public-bucket flow.
b21643b to
946a03c
Compare
d90e087 to
529d4e2
Compare
Adds mart_tides.fct_tides_trips_performed, the second TIDES-conformant model in the Cal-ITP warehouse. Sources from fct_observed_trips joined to fct_scheduled_trips for route metadata and to fct_tides_vehicle_locations for canonical vehicle_id per trip. Filtered to public, customer-facing or regional-subfeed fixed-route GTFS feeds via dim_provider_gtfs_data. Stacked on feat/tides-vehicle-locations. Includes a relationships test asserting that every vehicle_id resolves to at least one row in fct_tides_vehicle_locations for the same service_date. Materialized as a view, in the same mart/tides folder as fct_tides_vehicle_locations and tagged with the same publish.product meta key. The downstream consumer is a per-agency Airflow export running once per cycle, so paying compute on each read is cheaper than carrying a materialized copy. Validated against christopher_mart_gtfs sandbox earlier as a table materialization, 8-day window: 553,084 rows, 0 PK duplicates, 0 NULL on TIDES required-not-null fields (service_date, trip_id_performed, vehicle_id), 99 agencies, schedule_relationship enum mapping correct (Scheduled/Canceled/Added/Duplicated). Re-running as a view in the new mart_tides dataset is a follow-up before merge.
eef0ca0 to
fcf8e48
Compare
|
Warehouse report 📦 Checks/potential follow-upsChecks indicate the following action items may be necessary.
New models 🌱calitp_warehouse.mart.tides.fct_tides_trips_performed calitp_warehouse.mart.tides.fct_tides_vehicle_locations DAGLegend (in order of precedence)
|
Impacted ExposuresThe following exposures are downstream of models changed in this PR: Changed models
|


Description
Describe your changes and why you're making them. Please include the context, motivation, and relevant dependencies.
Adds
mart_tides.fct_tides_trips_performed, sourced fromfct_observed_tripsand joined tofct_scheduled_tripsfor route metadata and tofct_tides_vehicle_locationsfor canonical vehicle_id per trip. Filtered to public, customer-facing or regional-subfeed fixed-route GTFS feeds viadim_provider_gtfs_data.Stacked on
feat/tides-vehicle-locations. Lives alongsidefct_tides_vehicle_locationsin the newmart/tides/folder and carries the samepublish.product: tidesmodel meta. Includes arelationshipstest asserting that everyvehicle_idhere resolves to at least one row infct_tides_vehicle_locations.A few design decisions worth flagging:
appeared_in_vp = TRUEupstream so every row has a derivable vehicle_id. Trips that only appeared in trip_updates (no VP) are excluded.APPROX_TOP_COUNT(vehicle_id, 1)). Vehicles that change mid-trip get the dominant one.schedule_trip_start/schedule_trip_endcast to DATETIME usingfeed_timezonefromfct_scheduled_trips. Falls back toAmerica/Los_Angeleswhen the feed timezone is NULL — defensible default for California, flag if you'd rather a different fallback.fct_tides_vehicle_locations. Earlier prototype materialized as a table to work around microbatch's auto-filter on event_time refs (thevehicle_per_tripCTE needs cross-dtaccess). Views don't have that filter, so the constraint goes away. The downstream consumer is the per-agency export Airflow job, so paying compute on each read is fine.QUALIFY ROW_NUMBER() OVER (PARTITION BY ... ORDER BY organization_name) = 1, matching the pattern used elsewhere in the warehouse.Type of change
How has this been tested?
Include commands/logs/screenshots as relevant.
If making changes to dbt models, make sure they were created or update on Staging. Please run the command
uv run dbt run -s CHANGED_MODEL --target staginganduv run dbt test -s CHANGED_MODEL --target staging, then include the output in this section of the PR.uv run dbt run -s +fct_tides_trips_performed --target staging uv run dbt test -s fct_tides_trips_performed --target stagingMaterialized in
cal-itp-data-infra-staging.christopher_mart_tides.fct_tides_trips_performed. 8-day window from 2026-04-23 to 2026-04-30 (numbers from the earlier table-materialized run; re-running as a view in the newmart_tidesdataset is on the post-merge follow-up list):(service_date, trip_id_performed)service_date,trip_id_performed,vehicle_idschedule_relationshipdistribution: 385,845 Scheduled / 165,672 NULL / 1,398 Canceled / 163 Added / 6 DuplicatedTop agencies by trip count: LA Metro (105,082), SFMTA (67,062), San Diego International Airport (51,005), AC Transit (33,753), VTA (25,781).
Post-merge follow-ups
Document any actions that must be taken post-merge to deploy or otherwise implement the changes in this PR (for example, running a full refresh of some incremental model in dbt). If these actions will take more than a few hours after the merge or if they will be completed by someone other than the PR author, please create a dedicated follow-up issue and link it here to track resolution.
No action required
Actions required (specified below)
Update the Frictionless validation harness to include trips_performed schema validation.
Open follow-up issue for
stop_visitsmodel and fortrip_start_stop_id/trip_end_stop_idderivation (deferred past MVP, requires stop_times join).