SunTransit

SunTransit continuously fetches live vehicle positions from multiple transit agencies, calculates delays for each stop, and stores this data for analysis. The system powers a dashboard that lets users track real-time vehicle locations and evaluate agency performance, similar to how FlightRadar24 visualizes flights. Designed for fault tolerance, scalability, and cloud agnosticism, SunTransit can be deployed seamlessly across any cloud platform.

Due to cost constraints, SunTransit is currently online only for fetching data from two transit agencies: Valley Metro and Massachusetts Bay Transportation Authority, as it is deployed locally to save on cloud costs.

System Walkthrough

Major Components

Kafka: Used as a message broker with one topic per transit agency. Each topic receives that agency's live vehicle positions.
Spark: Handles all data processing — one streaming job and two batch jobs — due to its scalability and distributed computing support.
Amazon S3: Stores delay records in Parquet format, partitioned by agency and date.
Redis: Holds live vehicle positions per agency with a 2-minute TTL. Used exclusively by the dashboard to render the live map.
PostgreSQL: Stores aggregated delay metrics in three tables: stops_mean_delay, route_mean_delay, and agency_mean_delay. Runs in Docker — no external database needed.

Jobs and Files

producer/producer.py Fetches GTFS-Realtime feeds from a transit agency and pushes vehicle positions to the respective Kafka topic.
spark-jobs/push_redis.py (Streaming job) Spark Structured Streaming job that reads vehicle positions from Kafka, deduplicates messages, and writes the latest position per vehicle to Redis with a 2-minute retention. Runs as a systemd service (suntransit-mbta-redis.service / suntransit-valley-metro-redis.service).
spark-jobs/batch/delay_calculator.py (Hourly batch job) Processes the last hour of Kafka messages, calculates stop-level arrival delay against the GTFS schedule, and writes results to S3 as Parquet. Triggered by batch_jobs.sh.
spark-jobs/analyze_daily_records.py (Daily batch job) Runs at 2 AM to process the previous day's S3 data. Computes mean delay per stop, route, and agency and writes aggregates to PostgreSQL. Triggered by analysis_job.sh.
spark-jobs/hourly_profile.py (Hourly profile job) Builds hour-of-day delay profiles from S3 data for use in delay prediction and trend analysis. Triggered by hourly_profile_job.sh.

Demo

Screenshots

Live Tracking
Stops + Hex Heatmap
Analytics Dashboard
Spark Master UI

Project Setup

Provisioning Cloud Resources

Only one external cloud resource is required:

S3 Bucket Used for storing Parquet files with delay data. Example: Amazon S3

PostgreSQL and Redis run as Docker containers — no external databases needed.

Creating `credentials.env`

Create a credentials.env file in the project root with your AWS credentials:

AWS_ACCESS_KEY_ID=<your-aws-access-key-id>
AWS_SECRET_ACCESS_KEY=<your-aws-secret-access-key>

Then update S3_BUCKET in spark-jobs/env/.env with your bucket name:

S3_BUCKET=s3a://<your-bucket-name>

Starting the Services

Start all services:
```
docker compose up -d --scale spark-worker=2
```
This starts Kafka, Zookeeper, Spark (master + 2 workers), Redis, PostgreSQL, and the web dashboard.
Initialize Kafka topics:
```
bash project-init.sh
```

Start the Kafka→Redis streaming jobs (systemd):

sudo systemctl enable --now suntransit-mbta-redis.service
sudo systemctl enable --now suntransit-valley-metro-redis.service

Schedule batch jobs via cron:

0 * * * * /path/to/batch_jobs.sh
0 * * * * /path/to/hourly_profile_job.sh
0 2 * * * /path/to/analysis_job.sh

Access the dashboard at http://localhost:8082.

Built with ❤️ by @rishitoshsingh

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
data		data
docs		docs
flask_app		flask_app
kafka		kafka
producer		producer
spark-image		spark-image
spark-jobs		spark-jobs
web-app		web-app
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile.python-dev		Dockerfile.python-dev
Dockerfile.spark-dev		Dockerfile.spark-dev
README.md		README.md
analysis_job.sh		analysis_job.sh
batch_jobs.sh		batch_jobs.sh
docker-compose.yml		docker-compose.yml
hourly_profile_job.sh		hourly_profile_job.sh
project-init.sh		project-init.sh
run_mbta_redis.sh		run_mbta_redis.sh
run_valley_metro_redis.sh		run_valley_metro_redis.sh
suntransit-mbta-redis.service		suntransit-mbta-redis.service
suntransit-valley-metro-redis.service		suntransit-valley-metro-redis.service

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SunTransit

System Walkthrough

Major Components

Jobs and Files

Demo

Screenshots

Project Setup

Provisioning Cloud Resources

Creating `credentials.env`

Starting the Services

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SunTransit

System Walkthrough

Major Components

Jobs and Files

Demo

Screenshots

Project Setup

Provisioning Cloud Resources

Creating credentials.env

Starting the Services

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Creating `credentials.env`

Packages