Answers one question: when should your reps call which leads?
Loads your contact_attempts CSV, converts every timestamp to the lead's local timezone, computes connect and meeting rates per (day × hour) cell, masks cells with too few samples, and renders heatmaps segmented by industry and lead tier.
tierflow_heatmap/
│
├── data/
│ ├── contact_attempts_seed.csv ← 20-row seed (schema reference)
│ └── contact_attempts_synthetic.csv ← generated by generate_synthetic.py
│
├── outputs/ ← all PNGs and CSVs land here (git-ignored)
│ ├── heatmap_connect_rate.png
│ ├── heatmap_connect_rate_saas.png
│ ├── heatmap_meeting_rate.png
│ ├── top_windows_connect_rate.csv
│ └── ...
│
├── notebooks/ ← Jupyter notebooks go here (optional)
│
├── heatmap_pipeline.py ← main pipeline
├── generate_synthetic.py ← synthetic data generator
├── requirements.txt
└── README.md
# 1. Install dependencies
pip install -r requirements.txt
# 2a. Run on the 20-row seed (heatmap will be mostly grey — not enough data yet)
python heatmap_pipeline.py
# 2b. Or generate synthetic data and run on that
python generate_synthetic.py --rows 3000
python heatmap_pipeline.py --csv data/contact_attempts_synthetic.csv --min-samples 5Outputs land in outputs/. You get one overall heatmap plus one per industry.
python heatmap_pipeline.py [options]
--csv Path to your CSV file
default: data/contact_attempts_seed.csv
--metric What "best time" means
connect_rate → answered + meeting_booked (default)
meeting_rate → meeting_booked only
--industry Filter to one industry, e.g. --industry SaaS
Omit to get per-industry breakdown automatically
--tier Filter to one lead tier, e.g. --tier 1
Omit to include all tiers
--min-samples Minimum contact attempts before a cell is shown
Cells below this show as grey (default: 10)
--no-annotate Hide rate/count labels inside cells (cleaner for exporting)
# Tier 1 SaaS leads — what time gets meetings?
python heatmap_pipeline.py \
--csv data/contact_attempts_synthetic.csv \
--metric meeting_rate \
--industry SaaS \
--tier 1 \
--min-samples 3
# Overall connect rate, strict confidence threshold
python heatmap_pipeline.py \
--csv data/contact_attempts_synthetic.csv \
--metric connect_rate \
--min-samples 20Your CSV must contain these columns. See data/contact_attempts_seed.csv for a filled example.
| Column | Type | Notes |
|---|---|---|
attempt_id |
int | Unique per row. Auto-increment or UUID. |
timestamp_utc |
datetime | Always UTC. Format: YYYY-MM-DD HH:MM:SS |
lead_id |
str | Foreign key to your leads table. |
rep_id |
str | Rep who made the attempt. |
rep_role |
str | Warmer or Closer |
contact_channel |
str | call / email / sms / linkedin / whatsapp |
industry |
str | Lead's industry vertical. Keep consistent casing. |
lead_tier |
int | Tier at time of attempt (1 = hottest). Snapshot — don't join live. |
lead_score |
int | Score at time of attempt (0–100). Snapshot. |
outcome |
str | answered / no_reply / voicemail / meeting_booked / bounced / wrong_number |
duration_seconds |
int | Call duration. 0 for no-answer / email. |
lead_timezone |
str | IANA timezone string, e.g. America/New_York |
notes |
str | Optional free-text. Leave blank if nothing to add. |
CSV
└─ load & validate schema
└─ localise timestamps (UTC → lead's local hour + day-of-week)
└─ flag outcomes (connect_rate, meeting_rate)
└─ pivot table 7 rows (Mon–Sun) × 24 cols (00:00–23:00)
└─ mask cells below min_samples
└─ render heatmap PNG (+ per-industry breakdown)
└─ export top-5 windows CSV
The key step is timezone localisation. A call logged at 14:00 UTC means 19:00 in Karachi and 09:00 in New York. The heatmap is built from the lead's local hour — not the rep's — because that's what determines whether someone picks up.
- Warm cells (amber → red) — high connect or meeting rate at that hour/day combo
- Dark cells — low rate
- Grey cells — fewer than
--min-samplesattempts; don't draw conclusions from these - Annotations —
42%\n(17)means 42% rate from 17 attempts in that cell
The console also prints a top 5 windows table:
day hour connect_rate n
Tue 11:00 0.647 17
Thu 09:00 0.647 17
Wed 16:00 0.565 23
And saves it to outputs/top_windows_{metric}.csv — pipe this into your dashboard.
| Stage | What you can do |
|---|---|
| 0–500 rows | Build and test the pipeline. Heatmap will be mostly grey. |
| 500–2,000 rows | Overall heatmap starts showing patterns. Per-industry is still thin. |
| 2,000–5,000 rows | Per-industry heatmaps become reliable. Tier filtering works. |
| 5,000+ rows | Full segmentation (industry × tier). Use --min-samples 20. |
Lower --min-samples to see more cells earlier, but treat them as directional hints not gospel.
When you're ready to productise, swap the flat CSV read for a database query:
# Replace load_data() with something like:
import sqlalchemy as sa
engine = sa.create_engine(os.getenv("DATABASE_URL"))
df = pd.read_sql("""
SELECT attempt_id, timestamp_utc, lead_id, rep_id, rep_role,
contact_channel, industry, lead_tier, lead_score,
outcome, duration_seconds, lead_timezone
FROM contact_attempts
WHERE timestamp_utc >= NOW() - INTERVAL '90 days'
""", engine)The rest of the pipeline is unchanged.
Once you have enough real data, natural extensions from here:
- Confidence intervals — add Wilson score intervals per cell so dashboards can show error bars
- Regression model — use scikit-learn to learn which features (hour, day, tier, industry, score) drive connect rate, then score future call slots
- Rep-level heatmaps — same pipeline, filter by
rep_id - Decay weighting — weight recent attempts more heavily than old ones
- Scheduler integration — feed top-window windows back into routing engine to auto-suggest call times per lead


