Welcome to my Data Science and Analytics practice repository! This repository serves as a centralized hub for all my exploratory data analysis (EDA), data cleaning, text parsing, feature engineering, and preprocessing projects.
The main goal of this repository is to track my learning journey, build solid algorithmic thinking, and showcase clean, industry-standard data processing workflows.
- Language: Python
- Libraries: Pandas, NumPy, Matplotlib, Seaborn, Plotly
- Environment: Jupyter Notebook / VS Code
- File:
car_project.ipynb/cardataset.csv - Description: A comprehensive data cleaning and feature engineering project on 8,000+ car records. It involves complex string formatting (extracting numerical values from alphanumeric strings like
CC,bhp,kmpl), handling missing records using numerical medians, extracting brand categories, and engineering time-based metrics likecar_age. It also includes deep dives into outlier handling techniques and structural categorical encodings (One-Hot and Ordinal Mapping).
- File:
google_playstore.ipynb/google_play_store_dataset.csv - Description: Exploratory Data Analysis (EDA) focused on app store dynamics. Cleaned and processed raw user installation numbers, ratings, app sizes, and pricing structures to build structural distribution metrics. Includes calculating estimated revenue parameters for paid apps and identifying high-volume market segments.
- File:
AB_NYC_2019.ipynb/AB_NYC_2019.csv - Description: Spatial and financial analysis of the New York City Airbnb housing market. Focused on profiling neighborhood groups, investigating right-skewed pricing distributions, mapping price densities across coordinates, and filtering availability patterns to draw structural domain insights.
- File:
SuperMarketAnalysis.ipynb/SuperMarketAnalysis.csv - Description: A detailed business analytics and retail dataset pipeline. Implemented data sorting routines by chronological dates, analyzed consumer product lines across gender distributions, calculated financial gross income parameters, and correlated transactional payment methods against aggregate customer ratings.
To set up and run these projects locally, follow these steps:
- Clone this repository to your local directory:
git clone [https://github.com/code-with-ayyan/Data-Science-practice-projects.git](https://github.com/code-with-ayyan/Data-Science-practice-projects.git)- Navigate into the repository folder:
cd Data-Science-practice-projects/- Launch Jupyter Notebook to review the source files:
jupyter notebookMaintained with consistency and passion for backend development and data engineering.