Data Engineering

A shared workspace for code, experiments, and data pipelines.

Projects

Project 1 — Reading CSV Files with Pandas — load a real aqueous-solubility dataset (AQSolDB) into a pandas DataFrame and explore it with shape, dtypes, summary stats, and filtering.
Project 2 — Summary Statistics & Outlier Detection — compute quartiles and the IQR and implement Tukey's outlier rule from scratch on the Palmer Penguins dataset, discovering why outliers only surface once you group by species.

Prerequisites/Setup

1. Homebrew

The macOS package manager — used to install everything below. If you don't have it:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

2. Git

macOS ships git with Apple's Command Line Tools:

xcode-select --install     # installs git + compilers (skip if already present)
git --version              # verify

Optional: brew install git for a newer version than Apple's.

3. uv

uv manages the Python interpreter, the virtual environment, and packages — all in one fast tool.

brew install uv

Getting started

git clone [email protected]:SuperCowPowers/data_engineering.git
cd data_engineering
uv sync          # creates .venv, installs the right Python + all dependencies

uv sync reads pyproject.toml and .python-version, downloads Python 3.13 if you don't have it, and builds the environment. That's the whole setup.

Running code

uv run python path/to/script.py                # run a script

Prefer the classic workflow? Activate the env and use python directly:

source .venv/bin/activate
python path/to/script.py

Editor setup

Point your editor at the project's .venv so it uses the right interpreter and finds the installed packages.

PyCharm

Settings → Project → Python Interpreter → Add Interpreter → Add Local.
Choose Existing and select .venv/bin/python in the project. (PyCharm 2024.2+ also has a native uv option that does this for you.)

VS Code

Install the Python extension.
Command Palette (⌘⇧P) → Python: Select Interpreter → pick the one under .venv. VS Code usually auto-detects it on open.

Tests

uv run pytest            # run tests

Contributing (pull request flow)

git checkout -b my-feature
# ... make changes, commit ...
git push -u origin my-feature

Then open a pull request on GitHub for review.

Layout

data_engineering/
├── pyproject.toml          # project, dependencies, tool config
├── .python-version         # pinned Python version
├── uv.lock                 # exact resolved versions (created by `uv sync`)
├── src/data_engineering/   # importable, shared code
├── tests/                  # pytest tests
├── project_1/              # reading CSVs with pandas
└── project_2/              # summary statistics & outlier detection

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering

Projects

Prerequisites/Setup

1. Homebrew

2. Git

3. uv

Getting started

Running code

Editor setup

Tests

Contributing (pull request flow)

Layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
project_1		project_1
project_2		project_2
src/data_engineering		src/data_engineering
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Data Engineering

Projects

Prerequisites/Setup

1. Homebrew

2. Git

3. uv

Getting started

Running code

Editor setup

Tests

Contributing (pull request flow)

Layout

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages