Skip to content

shahsmit2121/sql-data-engineering-with-duckdb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 Data Engineering & Analytics Projects (SQL + DuckDB)

This repository showcases two end-to-end projects focused on data engineering and data analytics using SQL.

The work demonstrates my ability to: - Design and build data pipelines - Model data using star schema architecture - Write efficient analytical SQL queries - Transform raw data into actionable insights


📌 Projects Overview

🔍 1. Exploratory Data Analysis (EDA) -- Job Market Analytics

📂 1_EDA

🔧 What I Did

  • Queried a data warehouse (star schema) to answer business questions
  • Built analytical queries to identify:
    • Most in-demand skills\
    • Highest-paying skills\
    • Optimal skills (balancing demand & salary)
  • Used multi-table joins across fact and dimension tables
  • Created derived metrics using aggregation and mathematical functions

🧰 Tools & Concepts Used

  • SQL (joins, aggregations, filtering, grouping)
  • DuckDB (analytical query engine)
  • Star schema (fact + dimension + bridge tables)
  • Functions like COUNT(), MEDIAN(), LN(), ROUND()

📈 What I Learned

  • How to translate business questions into SQL queries
  • Writing efficient analytical queries on structured data
  • Understanding trade-offs between demand and compensation
  • Working with real-world messy datasets

🏗️ 2. Data Warehouse & Data Mart Build (ETL Pipeline)

📂 2_Mart_Build-DW

🔧 What I Did

  • Built an end-to-end ETL pipeline
  • Designed fact, dimension, and bridge tables
  • Created multiple data marts
  • Implemented incremental updates using MERGE

🧰 Tools & Concepts Used

  • DuckDB
  • SQL (DDL + DML)
  • ETL pipeline design
  • Star schema & dimensional modeling
  • Incremental processing

📈 What I Learned

  • Designing production-style data pipelines
  • Importance of data modeling
  • Writing modular SQL
  • Handling incremental updates

🧱 Overall Tech Stack

  • 🐤 DuckDB\
  • 🧮 SQL\
  • ☁️ Google Cloud Storage\
  • 🛠️ VS Code\
  • 📦 Git & GitHub

📂 Repository Structure

.
├── 1_EDA/
├── 2_Mart_Build_DW/
├── Resources/Images
└── README.md

💡 Key Takeaways

  • Built a full data workflow
  • Demonstrated analytics + engineering
  • Applied real-world practices

🎯 Future Improvements

  • Add dashboards\
  • Automate pipeline\
  • Optimize queries

📬 Contact

Feel free to connect!

About

End-to-end Data Engineering & Analytics projects using SQL and DuckDB — featuring a Job Market EDA with star schema querying and a full ETL pipeline with dimensional modeling, data marts, and incremental updates.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors