FlightWatch ✈

An open, free, self-hosting fare tracker for New Zealand ↔ Sri Lanka & India — the Christchurch ↔ Colombo and Auckland ↔ Colombo / Delhi / Mumbai corridors out of the box (and any route you add). It scrapes the full fare list several times a day, builds an intraday price history, forecasts where the fare is heading with a calibrated model, and shows a buy / wait decision with a confidence rate — plus deal scores, a predicted booking curve, a cheapest-day heatmap and free push alerts — all running for $0 on GitHub's free tier.

Data: Google Flights, scraped with real headless browsers (Playwright) — no API key, no token, no paid proxy. Shared Chromium and Firefox browsers run every itinerary concurrently (a lightweight context + rotating fingerprint/engine each — a genuine Chrome/Firefox mix, not just spoofed UA strings), harvesting every offer per scan to build a large open dataset
Coverage: a rolling ~3-month grid — every day a departure × every trip length 20–40 days (~1,900 itineraries, auto-generated relative to today) — sharded across the day's scan slots so the whole grid is covered daily without any single job blowing the CI limit, on top of a dense fixed set for deep intraday curves
Cadence: multiple scans/day; rows carry an hourly scan_slot so intraday moves accumulate instead of overwriting
Intelligence (100% offline): quantile gradient-boosting + split-conformal bands, seasonality features, a walk-forward backtest, deal scores, anomaly detection, best-time-to-book and templated natural-language summaries — no LLM, no API cost
Alerts: optional free Telegram / email pushes for new lows, price drops and BUY calls
Automation: GitHub Actions (free & unlimited for public repos) · Hosting: GitHub Pages (free)
Dataset: every observation is committed as plain CSV in data/ — open for anyone

Fares are informational. Always confirm the live price before booking.

Experimental — Singapore Airlines' own site. Besides Google Flights, an opt-in source (flightwatch/provider_sq.py) scrapes Singapore Airlines' public Best Fare Finder for SQ's own end-to-end fares — the genuine "SQ flying the whole trip" price. It is off by default (the scheduled scan only uses Google Flights, which stays rock-solid) because singaporeair.com sits behind heavy bot protection and needs selector tuning against a live page. Try it with python -m flightwatch sq-diag CHC CMB 2026-09-05 2026-09-26, which prints what it found and saves a screenshot + HTML to debug/.

Singapore Airlines on the dashboard. The command-center spotlights a configured carrier (featured_airline, default Singapore Airlines). Because SQ often flies these corridors as a connection (from Christchurch it's CHC→SIN→CMB with Air New Zealand), the dashboard distinguishes SQ operating one leg from SQ flying the whole trip on its own metal (e.g. AKL→SIN→CMB all on SQ) and shows the cheapest whole-trip SQ fare separately. The trip finder lets you filter for any single carrier — even one that only flies a leg of a connection.

Why scrape Google Flights? FlightWatch first used the Amadeus Self-Service API (dropped its free tier), then the Travelpayouts Data API (free, but a cache of Aviasales searches that simply has no data for a thin route like CHC↔CMB — it returned zero fares). Google Flights actually has the fares, and driving a real browser needs no key and stays free — so the tracker can be fully autonomous.

Heads up: scraping is inherently more fragile than an official API. Google can change its markup or rate-limit a runner. FlightWatch is built to degrade gracefully (it records no_results instead of crashing and saves debug screenshots), and the scraper is isolated in one file so it's easy to repair.

How it works

config.yaml ──► flightwatch/collect.py ──► data/flights_YYYY_MM.csv (append-only, intraday slots)
                     (async, shared browser)        │
                     flightwatch/predict.py ───►  model + backtest ┐
                     flightwatch/analytics.py ─►  offline AI layer ┤
                     flightwatch/dashboard.py ─►  docs/index.html  │ (GitHub Pages)
                     flightwatch/alerts.py ────►  Telegram / email ┘ (optional)

A GitHub Actions cron runs the collector several times a day — and again on every merge to main, so the published dashboard refreshes the moment a change lands. Shared Chromium + Firefox browsers scrape every itinerary concurrently (each context using a different fingerprint/engine to dodge soft-blocks), appends every fare found as its own CSV row stamped with an hourly scan_slot, retrains the forecast model, regenerates the dashboard, sends any fresh alerts, and commits everything back to the repo. Over weeks, the per-itinerary history becomes an intraday booking curve — which is what makes a real "should I book now?" prediction (and an honest backtest of it) possible.

Project layout

flightwatch/            # the Python package (importable, no path hacks)
  __init__.py           #   shared paths (repo root, data/, docs/, config.yaml)
  provider.py           #   fare source — Google Flights scraper, sync + async (swap providers here)
  collect.py            #   async shared-browser scan -> appends to data/
  storage.py            #   append-only monthly CSVs, intraday scan_slot de-duplication
  predict.py            #   forecast + buy/wait/watch with conformal bands, seasonality, backtest
  analytics.py          #   offline AI: deal scores, anomalies, heatmap, digest, narratives
  dashboard.py          #   renders docs/index.html
  alerts.py             #   optional free Telegram / email pushes
  __main__.py           #   CLI:  python -m flightwatch [collect|build|all|alert|backtest|diag]
config.yaml             # what to track + concurrency
data/                   # the open dataset (monthly CSVs)
docs/                   # the published dashboard (GitHub Pages)
.github/workflows/      # the scan cron
tests/                  # pytest suite (storage, model, analytics, alerts)

Everything runs through the package, so the provider lives in exactly one file (flightwatch/provider.py). To use a different fare source later, keep its search_flight_offers / search_flight_offers_async shape and the rest is unchanged.

Setup (about 10 minutes, all free — no API keys at all)

The scraper needs no token or secret, so setup is just GitHub settings.

1. Create the repo

Make a public GitHub repository (public = free Actions + Pages).
Upload these files (or git push them).

2. Allow the workflow to commit

Repo → Settings → Actions → General → Workflow permissions → select Read and write permissions → Save. (The workflow commits the daily data back.)

3. Turn on Pages

Repo → Settings → Pages → Source: Deploy from a branch → Branch: main, Folder: /docs → Save. Your dashboard will be at https://<your-username>.github.io/<repo-name>/.

4. Run it once

Repo → Actions tab → scan → Run workflow. The workflow installs a headless Chromium, scrapes each route, and commits the first data point to data/. From then on it runs automatically several times a day. If a run scrapes 0 fares, download the scrape-debug artifact from that run to see exactly what Google returned.

Run locally (optional)

pip install -r requirements.txt
python -m playwright install --with-deps chromium firefox   # one-time browser download (both engines)

python -m flightwatch collect    # one scrape -> appends to data/
python -m flightwatch build      # regenerate docs/index.html
python -m flightwatch all        # both at once
python -m flightwatch alert      # push fresh signals (dry-prints if no secrets set)
python -m flightwatch backtest   # how the engine's past calls actually fared
python -m flightwatch diag       # scrape one route verbosely + save debug screenshot
python -m flightwatch sq-diag    # EXPERIMENTAL: scrape Singapore Airlines' own site
open docs/index.html

Run the tests with pytest (they use synthetic data — no network or browser needed).

Set FLIGHTWATCH_HEADFUL=1 (the CI default, via xvfb-run) to drive non-headless browsers, which Google is less likely to soft-block than pure headless.

Configure what gets tracked

Edit config.yaml. The fixed itineraries: list is the dense core — daily depth on a few date-pairs that builds tight intraday curves:

itineraries:
  - {origin: CHC, destination: CMB, depart_date: 2026-09-01, return_date: 2026-09-22}
  - {origin: AKL, destination: CMB, depart_date: 2026-09-01, return_date: 2026-09-22}

For breadth, the auto_generate: block sweeps a rolling window of departures across the next ~3 months for each route — every day a departure, paired with every trip length in a range (e.g. 20–40 days). It's recomputed from today every run, so it always covers the next 3 months. Each route can override any grid setting (depart_step_days, horizon_days, min_days_out, trip_length_*), so a dense core corridor and a lighter new route can share one grid without the additions blowing the CI budget:

auto_generate:
  enabled: true
  horizon_days: 90          # ~3 months out (block default)
  min_days_out: 1
  depart_step_days: 1       # every day is a departure date (block default)
  trip_length_min: 20       # trip duration 20..40 days inclusive
  trip_length_max: 40
  trip_length_step: 1
  routes:
    - {origin: CHC, destination: CMB}                      # dense: every day
    - {origin: AKL, destination: CMB, depart_step_days: 4} # lighter sweep
    - {origin: AKL, destination: DEL, depart_step_days: 14}
    - {origin: AKL, destination: BOM, depart_step_days: 14}
  shard_across_slots: true  # see below
  shards: 4

That full grid is ~1,900 itineraries — far too many for one run. So the generated grid is sharded across the day's scan slots: each 6-hourly run scrapes one shard (~1/4, a few hundred itineraries, ~1–1.5 h), and across the day every date/length is covered exactly once. Sharding is stable per itinerary, so the same date/length lands in the same daily slot each day (a clean per-slot history). The fixed itineraries: are never sharded — they run every scan for tight intraday curves. Set shards to the number of daily cron slots. Prefer a denser-but-smaller grid? Just raise depart_step_days / trip_length_step or drop shard_across_slots.

Tune concurrency (how many bots scrape at once) / jitter_seconds here too. Change the scan frequency in .github/workflows/daily-scan.yml (the cron, in UTC — currently every 6 hours). Scheduled runs only fire on the default branch, so a new cadence takes effect once it's merged there.

The buy / wait decision (with confidence)

flightwatch/predict.py is the decision engine. While history is thin it uses a transparent heuristic (today's price vs the route's own min/median and days left) and honestly reports low confidence. Once there are enough cheapest-per-day observations (~120) it trains gradient-boosting models over the booking curve with:

Honest validation — error is time-series cross-validated (TimeSeriesSplit, forward-chaining), never a shuffled split that would leak the future and overstate skill.
Calibrated bands — a split-conformal interval whose width is learned from real held-out residuals, so an "80% band" actually covers ~80% of outcomes.
Seasonality features — cyclical day-of-week / month / week-of-year encodings, peak-season flags, lead-time buckets and a per-route price level.

Each recommendation exposes its full reasoning — signal (BUY / WAIT / WATCH), confidence (0–100%), the whole predicted forward curve with bands, the expected forecast low, expected savings from waiting and the drop probability.

Offline AI layer (flightwatch/analytics.py, no LLM or API) adds, on the dashboard:

Deal score (0–100) for each route's current cheapest fare
Predicted booking curve fan chart + best time to book
Cheapest day to fly heatmap (which departure/return weekday is cheapest)
What changed since the last scan, and anomaly drop/spike detection
Model accuracy — a walk-forward backtest of whether past BUY/WAIT calls paid off
A per-route natural-language summary generated purely from the numbers

…alongside the existing market insights and latest-offers tables.

Free push alerts (optional)

flightwatch/alerts.py turns the freshest signals into Telegram / email notifications, de-duplicated via data/alert_state.json. It is a no-op until you add secrets, so the project stays free and runnable by anyone:

Telegram — message @BotFather to create a bot, then set repo Settings → Secrets and variables → Actions secrets TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID.
Email (optional) — set SMTP_HOST, SMTP_USER, SMTP_PASS, ALERT_EMAIL_TO (e.g. a Gmail address + app password).

Alerts fire on a fresh all-time low, an unusual price drop, a new BUY call, or a "book now — window closing" nudge.

Troubleshooting: "it ran but scraped 0 fares"

Scraping Google Flights is the trade-off for being keyless and free. If a run records no_results, it's almost always one of:

Google soft-blocked the runner — datacenter IPs are fingerprinted. The workflow already runs a non-headless Chromium under xvfb (via FLIGHTWATCH_HEADFUL=1) to look more like a real browser, which is usually enough for a once-a-day, few-route scrape. A later daily run typically succeeds.
Google changed its markup — the extractor keys off the stable "… round trip total" ARIA label rather than CSS class names, but if Google reshapes the page the selectors in provider.py may need a tweak.

To see exactly what Google returned, scrape one route locally and inspect the saved screenshot + HTML:

python -m flightwatch diag
ls debug/        # diag-CHC-CMB.png / .html

Every failed CI scrape also uploads those files as the scrape-debug artifact on the run, so you can debug a remote run without reproducing it locally.

If Google blocking ever becomes persistent on GitHub's shared IPs, the most reliable free fix is to run the same python -m flightwatch all on any small always-on machine (a home server / Raspberry Pi) on a cron and push the commits — the code is identical; only the runner's IP changes.

Cost reality check

Piece	Service	Cost
Automation (several runs/day)	GitHub Actions (public repo, standard runner)	Free, unlimited
Dashboard hosting	GitHub Pages	Free
Data storage	CSV in the repo	Free
Fare data	Google Flights scrape (headless Chromium + Firefox)	Free, no key
Alerts	Telegram Bot API / SMTP	Free
Forecasts & AI features	scikit-learn, offline (no LLM)	Free

Each run takes a few Actions minutes (browser install + scraping the rolling ~3-month grid across Chromium + Firefox); several runs a day plus per-merge refreshes stay comfortably inside the free unlimited allowance for public repos.

License

Code: MIT. Data in data/: open — attribution appreciated (CC-BY).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlightWatch ✈

How it works

Project layout

Setup (about 10 minutes, all free — no API keys at all)

1. Create the repo

2. Allow the workflow to commit

3. Turn on Pages

4. Run it once

Run locally (optional)

Configure what gets tracked

The buy / wait decision (with confidence)

Free push alerts (optional)

Troubleshooting: "it ran but scraped 0 fares"

Cost reality check

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
.github/workflows		.github/workflows
data		data
docs		docs
flightwatch		flightwatch
redesign		redesign
tests		tests
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

FlightWatch ✈

How it works

Project layout

Setup (about 10 minutes, all free — no API keys at all)

1. Create the repo

2. Allow the workflow to commit

3. Turn on Pages

4. Run it once

Run locally (optional)

Configure what gets tracked

The buy / wait decision (with confidence)

Free push alerts (optional)

Troubleshooting: "it ran but scraped 0 fares"

Cost reality check

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages