Skip to content
View rkdhakal's full-sized avatar

Highlights

  • Pro

Block or report rkdhakal

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
rkdhakal/README.md

Hi, I'm Ram Krishna Dhakal 👋

Data Quality & Governance Analyst | Canada

It started with a bad number. During my Data Quality Analyst internship at CMHC (Sept–Dec 2025), I watched a single bad record work its way through a live reporting pipeline before anyone caught it. That's the moment data governance stopped being an abstract discipline for me and became the thing I wanted to build a career on - not just cleaning data, but designing the rules, catalogs, and stewardship structures that catch problems like that before they reach a decision-maker.

Working hands-on with Informatica IDMC, Collibra Data Intelligence Cloud, and Databricks SQL inside a federal regulatory environment taught me that governance is equal parts technical and human - the SQL rules matter, but so does knowing exactly who owns a data element and who gets called when it breaks.

Since then, I've built out that same rigor independently, on my own time and on public datasets - full DQ rule catalogs, lineage diagrams, and stewardship RACI matrices, using open-source tools to mirror what enterprise platforms do. Beyond governance, I also apply Python and SQL to time series forecasting and ML pipelines — understanding how data gets used downstream makes me better at protecting it upstream.

  • 🌍 Based in Toronto, Ontario · open to roles across Canada
  • 💼 Open to Data Governance · Data Quality · Data Stewardship · Data Analyst roles
  • 🧠 Preparing for CDMP Foundation — DAMA International
  • ✉️ [email protected]

🗂️ Featured Projects


Solo project, designed and built end to end — the flagship project in this portfolio

Enterprise governance programs follow well-established patterns — metadata catalogs, lineage, stewardship, DQ rule engines. I wanted to prove I could own that whole lifecycle myself, so I built one, end to end, on a real public Canadian housing dataset (10,800 records · 10 provinces · 2018–2023).

Governance deliverables:

Component Detail
Data Quality Rules 15 SQL-based rules across 5 dimensions — completeness, validity, uniqueness, accuracy, consistency
DQ Score 99.45% overall (Grade A) · 9 PASS · 6 WARN · 0 FAIL
Exception Management 884 exceptions · 424 auto-remediated · 460 escalated with root cause analysis by province and dwelling type
Critical Data Elements 6 CDEs with column-level data lineage across a 5-layer source-to-consumption pipeline
Metadata Catalog Data dictionary · Business glossary · Sensitivity classifications · Stewardship RACI matrix
Data Contract YAML-based producer-consumer agreement with tiered SLA thresholds and 15 mapped DQ rules
Regulatory Compliance PIPEDA and OSFI B-20 applicability assessment
Dashboard Streamlit Cloud — executive scorecard, exception explorer, live contract validator

🔗 Explore this project →


End-to-end data pipeline with time series forecasting on Toronto Police open data

Applied data cleaning, exploratory analysis, and forecasting models to 315,362 Major Crime Indicator records (2014–2023).

Component Detail
Data Pipeline 347K → 315K records · duplicates removed · divisions reconciled · external features merged
EDA Crime trends by year, month, hour, day · outlier analysis · correlation heatmap
ARIMA Model ARIMA(1,1,1) on log-transformed weekly data · RMSE: 0.097
LSTM Model 50 units · sequence length 10 · training loss: 0.0069

🔗 Explore this project →


Data pipeline, ML classification, and Power BI analytics for bird conservation

Built the data collection pipeline and ML model for an AI-powered bird species monitoring platform.

Component Detail
Data Pipeline Selenium scraper · 288,562 observations · 53 species · 170 countries
ML Classifier EfficientNetB0 · 97.35% accuracy on 10 species · Grad-CAM interpretability
Power BI Dashboard Global species presence map · migration trends · population analytics

🔗 Explore this project →


🛠️ Skills

Data Governance & QualityCore expertise

Informatica Collibra DAMA Data Quality Metadata Management Data Lineage Data Contracts Data Stewardship

Programming & Data Tools

Python SQL PySpark Databricks Power BI Streamlit Git

Machine Learning & AI

TensorFlow scikit-learn FastAPI FAISS


🤝 Connect

LinkedIn Email GitHub


If you find any of my projects useful, please consider giving them a ⭐ — it means a lot!

Popular repositories Loading

  1. Retail-Industry-Project Retail-Industry-Project Public

    Price-Optimization-For-Retail

    Python 1

  2. Project_PDDuCNN_Python Project_PDDuCNN_Python Public

    Jupyter Notebook 3

  3. Plant-Disease-Frontend- Plant-Disease-Frontend- Public

    Forked from electronicshackers/Plant-Disease-Frontend-

    Frontend For Final Year Project

    JavaScript

  4. PlantVillageAPI PlantVillageAPI Public

    Forked from electronicshackers/PlantVillageAPI

    Simple Rest API For Plant Village

    JavaScript

  5. Ecoeye-bird-monitoring Ecoeye-bird-monitoring Public

    EcoEye bird monitoring - Selenium data pipeline, EfficientNetB0 species classifier (97.35% accuracy), and Power BI dashboard across 288,562 observations and 53 species.

    Jupyter Notebook

  6. canadian-housing-data-governance canadian-housing-data-governance Public

    End-to-end Data Governance & Quality Framework on Canadian housing data | Metadata Catalog · Data Lineage · Stewardship · DQ Rules · Scorecard

    Python