It started with a bad number. During my Data Quality Analyst internship at CMHC (Sept–Dec 2025), I watched a single bad record work its way through a live reporting pipeline before anyone caught it. That's the moment data governance stopped being an abstract discipline for me and became the thing I wanted to build a career on - not just cleaning data, but designing the rules, catalogs, and stewardship structures that catch problems like that before they reach a decision-maker.
Working hands-on with Informatica IDMC, Collibra Data Intelligence Cloud, and Databricks SQL inside a federal regulatory environment taught me that governance is equal parts technical and human - the SQL rules matter, but so does knowing exactly who owns a data element and who gets called when it breaks.
Since then, I've built out that same rigor independently, on my own time and on public datasets - full DQ rule catalogs, lineage diagrams, and stewardship RACI matrices, using open-source tools to mirror what enterprise platforms do. Beyond governance, I also apply Python and SQL to time series forecasting and ML pipelines — understanding how data gets used downstream makes me better at protecting it upstream.
- 🌍 Based in Toronto, Ontario · open to roles across Canada
- 💼 Open to Data Governance · Data Quality · Data Stewardship · Data Analyst roles
- 🧠 Preparing for CDMP Foundation — DAMA International
- ✉️ [email protected]
Solo project, designed and built end to end — the flagship project in this portfolio
Enterprise governance programs follow well-established patterns — metadata catalogs, lineage, stewardship, DQ rule engines. I wanted to prove I could own that whole lifecycle myself, so I built one, end to end, on a real public Canadian housing dataset (10,800 records · 10 provinces · 2018–2023).
Governance deliverables:
| Component | Detail |
|---|---|
| Data Quality Rules | 15 SQL-based rules across 5 dimensions — completeness, validity, uniqueness, accuracy, consistency |
| DQ Score | 99.45% overall (Grade A) · 9 PASS · 6 WARN · 0 FAIL |
| Exception Management | 884 exceptions · 424 auto-remediated · 460 escalated with root cause analysis by province and dwelling type |
| Critical Data Elements | 6 CDEs with column-level data lineage across a 5-layer source-to-consumption pipeline |
| Metadata Catalog | Data dictionary · Business glossary · Sensitivity classifications · Stewardship RACI matrix |
| Data Contract | YAML-based producer-consumer agreement with tiered SLA thresholds and 15 mapped DQ rules |
| Regulatory Compliance | PIPEDA and OSFI B-20 applicability assessment |
| Dashboard | Streamlit Cloud — executive scorecard, exception explorer, live contract validator |
End-to-end data pipeline with time series forecasting on Toronto Police open data
Applied data cleaning, exploratory analysis, and forecasting models to 315,362 Major Crime Indicator records (2014–2023).
| Component | Detail |
|---|---|
| Data Pipeline | 347K → 315K records · duplicates removed · divisions reconciled · external features merged |
| EDA | Crime trends by year, month, hour, day · outlier analysis · correlation heatmap |
| ARIMA Model | ARIMA(1,1,1) on log-transformed weekly data · RMSE: 0.097 |
| LSTM Model | 50 units · sequence length 10 · training loss: 0.0069 |
Data pipeline, ML classification, and Power BI analytics for bird conservation
Built the data collection pipeline and ML model for an AI-powered bird species monitoring platform.
| Component | Detail |
|---|---|
| Data Pipeline | Selenium scraper · 288,562 observations · 53 species · 170 countries |
| ML Classifier | EfficientNetB0 · 97.35% accuracy on 10 species · Grad-CAM interpretability |
| Power BI Dashboard | Global species presence map · migration trends · population analytics |
Data Governance & Quality ← Core expertise
Programming & Data Tools
Machine Learning & AI
If you find any of my projects useful, please consider giving them a ⭐ — it means a lot!

