Skip to content

code-with-ayyan/Data-Science-practice-projects

Repository files navigation

πŸ“Š Data Science & Preprocessing Practice Portfolio

Welcome to my Data Science and Analytics practice repository! This repository serves as a centralized hub for all my exploratory data analysis (EDA), data cleaning, text parsing, feature engineering, and preprocessing projects.

The main goal of this repository is to track my learning journey, build solid algorithmic thinking, and showcase clean, industry-standard data processing workflows.


πŸ› οΈ Tech Stack & Tools Used

  • Language: Python
  • Libraries: Pandas, NumPy, Matplotlib, Seaborn, Plotly
  • Environment: Jupyter Notebook / VS Code

πŸ“‚ Project Portfolio (Current Tracks)

πŸš— 1. CarDekho Dataset - Preprocessing & Text Parsing

  • File: car_project.ipynb / cardataset.csv
  • Description: A comprehensive data cleaning and feature engineering project on 8,000+ car records. It involves complex string formatting (extracting numerical values from alphanumeric strings like CC, bhp, kmpl), handling missing records using numerical medians, extracting brand categories, and engineering time-based metrics like car_age. It also includes deep dives into outlier handling techniques and structural categorical encodings (One-Hot and Ordinal Mapping).

πŸ€– 2. Google Play Store Dataset Analysis

  • File: google_playstore.ipynb / google_play_store_dataset.csv
  • Description: Exploratory Data Analysis (EDA) focused on app store dynamics. Cleaned and processed raw user installation numbers, ratings, app sizes, and pricing structures to build structural distribution metrics. Includes calculating estimated revenue parameters for paid apps and identifying high-volume market segments.

🏒 3. Airbnb NYC 2019 Data Analysis

  • File: AB_NYC_2019.ipynb / AB_NYC_2019.csv
  • Description: Spatial and financial analysis of the New York City Airbnb housing market. Focused on profiling neighborhood groups, investigating right-skewed pricing distributions, mapping price densities across coordinates, and filtering availability patterns to draw structural domain insights.

πŸ›’ 4. Supermarket Sales Analysis

  • File: SuperMarketAnalysis.ipynb / SuperMarketAnalysis.csv
  • Description: A detailed business analytics and retail dataset pipeline. Implemented data sorting routines by chronological dates, analyzed consumer product lines across gender distributions, calculated financial gross income parameters, and correlated transactional payment methods against aggregate customer ratings.

πŸ—‚οΈ How to Run the Notebooks

To set up and run these projects locally, follow these steps:

  1. Clone this repository to your local directory:
git clone [https://github.com/code-with-ayyan/Data-Science-practice-projects.git](https://github.com/code-with-ayyan/Data-Science-practice-projects.git)
  1. Navigate into the repository folder:
cd Data-Science-practice-projects/
  1. Launch Jupyter Notebook to review the source files:
jupyter notebook

Maintained with consistency and passion for backend development and data engineering.

About

πŸš€ A centralized hub tracking my Data Science and Analytics journey. Featuring structured data cleaning, complex text parsing, and feature engineering workflows on diverse datasets using Python, Pandas, seaborn and NumPy.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors