Skip to content

additonal analysis for 1943 Data for GTFS-RT Implementation#1961

Merged
csuyat-dot merged 6 commits intomainfrom
rt_compare
Mar 13, 2026
Merged

additonal analysis for 1943 Data for GTFS-RT Implementation#1961
csuyat-dot merged 6 commits intomainfrom
rt_compare

Conversation

@csuyat-dot
Copy link
Copy Markdown
Contributor

Issue:

Changes to initial notebook:

  • Received list of "missing agencies" to address. Added section to the bottom of the notebook explaining what happened to each of those agencies. ~7 agencies did not appear in the bridge_gtfs_analysis_name_x_ntd table so they never made it to the final merge.
  • Added a 2nd attempt at merging per Henry's instructions: merge the revenue vehicle list to the routes list, then merge to the "no gtfs-rt" list
  • Added ntd id to the "no gtfs-rt" list for easier mergning

Also made a brief write up:

My analysis consist of multiple tables each contiaining multiple components.

  1. The Total Revenue Vehicles (rev vehicles) table
  2. The Total Routes (routes table) table
  3. The “organizations_without_gtfs_rt_data…” (no_rt list) table

Regarding the rev vehicles table. The National Transit Database (NTD) provides a report of revenue vehicle inventory, see link: https://www.transit.dot.gov/ntd/data-product/2024-annual-database-revenue-vehicle-inventory. This dataset was filtered for California agencies, grouped by ntd_id and agency name and aggregated to sum the total fleet vehicles, active fleet vehicles and mode/TOS VOMS. The resulting rev vehicles table contains total revenue vehicles and VOMS per agency

The routes table is built using a SQL query that joins 2 tables from the DDS data warehouse;

  1. mart_gtfs_rollup.fct_monthly_operator_summary (rollup table)
  2. mart_transit_database.bridge_gtfs_analysis_name_x_ntd (bridge table)

The rollup table contains the total count of routes per GTFS schedule dataset. This table was filtered for schedules after February 1, 2026. The bridge table provides a crosswalk between GTFS schedule dataset name to NTD IDs, effectively identifying the agency name for the schedule. However, it was discovered that not every schedule in the rollup table is include in the bridge table. Meaning some schedules were dropped during the final merge. The resulting routes table contains the total routes per schedule per agency.

The “no_rt list” table contains a list of transit agencies that do not have GTFS-RT. However, some agencies were marked YES or have PassiGO. An NTD ID column was added to make joining to the other tables easier.

First, the rev vehicles and routes table were inner joined on NTD ID, resulting in in 155 agencies with: total revenue vehicles, VOMS and total routes values. The resulting table was then joined to the no_rt table. Some agencies were dropped during the merges so 2 scenarios emerged.

  1. Inner join: only NTD ID that matched both lists were kept. This list contained 51 agencies.
  2. Outer join: Kept all NTD ID from both list, regardless if they matched. This contains 162 agencies but can be used for further filtering.

The following agencies in the no_rt list were dropped from the final datatset because the agencies did not appear in the bridge table
ntd_id organization_name
90251 City of Baldwin Park
90260 City of Compton
90261 City of Covina
91008 Modoc Transportation Agency
99316 Chemehuevi Indian Tribe
99449 City of El Segundo
99451 City of San Fernando

@csuyat-dot csuyat-dot self-assigned this Mar 13, 2026
@github-actions
Copy link
Copy Markdown

nbviewer URLs for impacted notebooks:

@csuyat-dot csuyat-dot merged commit 7ca2c2b into main Mar 13, 2026
3 checks passed
@csuyat-dot csuyat-dot deleted the rt_compare branch March 13, 2026 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant