World Layoffs Data Analysis: SQL Cleaning & EDA

This project demonstrates comprehensive data cleaning and exploratory data analysis techniques using SQL on a world layoffs dataset. The project includes systematic data cleaning operations to prepare raw data for analysis, followed by in-depth EDA to uncover patterns, trends, and insights about global layoff patterns from 2020-2023.

Dataset

The dataset is stored in worldlayoffs.db, a SQLite database containing global company layoffs data with fields like company name, location, industry, layoff counts, dates, and funding information.

Prerequisites

SQLite3 installed on your system
Basic knowledge of SQL and data cleaning concepts

Installation

Clone the repository:

git clone https://github.com/mrithip/world-layoffs-data-cleaning-sql.git
cd world-layoffs-data-cleaning-sql

Ensure SQLite3 is installed:
```
sqlite3 --version
```

Usage

Data Cleaning

Run the data cleaning script:

sqlite3 worldlayoffs.db < datacleaning.sql

Exploratory Data Analysis

Run the EDA script to explore cleaned data patterns:

sqlite3 worldlayoffs.db < EDA.sql

View the comprehensive insights and visualizations in insights.md.

Data Cleaning Steps

The datacleaning.sql script performs the following comprehensive operations:

Date Standardization: Converts layoff dates from MM/DD/YYYY format to ISO YYYY-MM-DD format using SQLite date functions
Staging Table Creation: Creates backup tables to preserve original data during cleaning process
Duplicate Removal: Identifies and removes duplicate records using ROW_NUMBER() window function and CTE
Text Standardization: Trims whitespace from company and industry names for consistency
Industry Normalization: Consolidates similar industry names (e.g., all Crypto variations to "Crypto")
Country Standardization: Removes trailing punctuation from country names (e.g., "United States." to "United States")
Missing Data Handling: Populates null industry values using self-joins to match companies with existing industry data
Data Quality Filtering: Removes rows with no useful information (null layoff counts and percentages)
Final Cleanup: Removes temporary columns and presents the clean dataset

Exploratory Data Analysis

The EDA.sql script performs comprehensive analysis on the cleaned dataset to uncover patterns and insights about global layoffs. The analysis includes:

Statistical Overview

Maximum layoffs in a single event (12,000 employees)
Companies with 100% layoffs (complete shutdowns)
Total layoffs by major companies (Amazon: 18,150, Google: 12,000, Meta: 11,000)

Temporal Analysis

Layoffs timeline: March 2020 - March 2023
Yearly trends: 2022 peak (160,661 layoffs), followed by 2023 (125,677)
Monthly patterns: January highest (92,037), November second (55,758)
Rolling totals using window functions

Company & Industry Insights

Top companies by total layoffs and funding raised
Industry impact: Consumer (45,182), Retail (43,613), Other (36,289)
Company funding vs layoffs paradox analysis

Geographic Distribution

Country breakdown: US (256,559), India (35,993), Netherlands (17,220)
Stage-wise analysis: Post-IPO (204,132), Series C/D impact

Advanced Analytics

Year-over-year top 5 companies rankings
Rolling total calculations using CTEs and window functions
Dense ranking for comparative analysis across years

Database Schema

After cleaning, the main table layoffs_staging2 contains standardized columns including:

company (trimmed and standardized)
location
industry (normalized and populated)
total_laid_off
percentage_laid_off
layoff_date (ISO format)
stage
country (standardized)
funds_raised_millions

Key SQL Techniques Demonstrated

Data Cleaning Techniques

Date parsing and conversion
Common Table Expressions (CTEs)
Window functions (ROW_NUMBER)
Self-joins for data population
String manipulation functions
NULL handling strategies
Duplicate detection and removal

Exploratory Data Analysis Techniques

Aggregate functions (SUM, MAX, COUNT)
GROUP BY and ORDER BY clauses
Substring extraction for date/time analysis
Rolling calculations with window functions
Dense ranking for comparative analysis
Multi-level grouping and filtering

Contributing

Feel free to submit issues and enhancement requests.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
EDA.sql		EDA.sql
LICENSE		LICENSE
README.md		README.md
datacleaning.sql		datacleaning.sql
insights.md		insights.md
worldlayoffs.db		worldlayoffs.db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

World Layoffs Data Analysis: SQL Cleaning & EDA

Dataset

Prerequisites

Installation

Usage

Data Cleaning

Exploratory Data Analysis

Data Cleaning Steps

Exploratory Data Analysis

Statistical Overview

Temporal Analysis

Company & Industry Insights

Geographic Distribution

Advanced Analytics

Database Schema

Key SQL Techniques Demonstrated

Data Cleaning Techniques

Exploratory Data Analysis Techniques

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

World Layoffs Data Analysis: SQL Cleaning & EDA

Dataset

Prerequisites

Installation

Usage

Data Cleaning

Exploratory Data Analysis

Data Cleaning Steps

Exploratory Data Analysis

Statistical Overview

Temporal Analysis

Company & Industry Insights

Geographic Distribution

Advanced Analytics

Database Schema

Key SQL Techniques Demonstrated

Data Cleaning Techniques

Exploratory Data Analysis Techniques

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages