Skip to content
View jroznerski's full-sized avatar
  • Warsaw

Block or report jroznerski

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
jroznerski/README.md

Hi, I'm Jakub 👋

Data Scientist based in Poland 🇵🇱
Building production-grade ML systems — from statistical analysis to deployed APIs


🧠 About Me

I work across the full data science lifecycle — from exploratory analysis and statistical hypothesis testing to training ML models, building REST APIs, and shipping interactive dashboards.

My work focuses on:

  • Rigorous statistics — hypothesis testing with proper effect sizes, not just p-values
  • Production mindset — models that serve real predictions, not just notebooks
  • Clear communication — dashboards and docs that translate findings into decisions

🛠 Tech Stack

Languages & Core

Python SQL Bash

ML & Data Science

scikit-learn XGBoost pandas NumPy SHAP SciPy

Deep Learning & NLP

TensorFlow Keras HuggingFace PySpark

APIs & Deployment

FastAPI Streamlit Docker Pydantic

Statistical Analysis

Hypothesis Testing Effect Sizes


🚀 Featured Projects

End-to-end ML system predicting customer churn in real time

Component Details
Pipeline Data ingestion → feature engineering → XGBoost (CV AUC 0.75) → SHAP
Hypothesis Testing 5 statistical tests (χ², Welch t-test, Mann-Whitney U) with effect sizes
API FastAPI REST — single & batch prediction, model metrics, live test results
Dashboard Streamlit 4-page app — overview, hypothesis tests, live prediction, model performance
Infrastructure Docker Compose, GitHub Actions CI, pytest (28 tests)

Python XGBoost FastAPI Streamlit Docker

Production-style ML system predicting Warsaw apartment prices, refreshed weekly with live Otodom data

Component Details
Data ~19 months of Polish real estate listings (Aug 2023 – Jun 2024) + weekly live scraping
Pipeline Feature engineering → model training → automated weekly refresh
Scope End-to-end: data collection, preprocessing, modelling, deployment

Python scikit-learn


📁 More Projects

📊 Statistical Analysis & Experimentation

Project Description Stack
ab_testing Frequentist A/B test for an e-commerce page redesign — Z-test, Welch t-test, power analysis, effect sizes Python · SciPy · Jupyter
esg_risk_ml_project ESG risk classification using Random Forest Python · scikit-learn
credit_score_classification Credit score classification with precision/recall analysis Python · scikit-learn
default_rate_calculation Default rate vs. macroeconomic factors in a hypothetical Polish bank Python · pandas

🧬 Deep Learning & Computer Vision

Project Description Stack
brest_cancer_prediction Master's thesis — IDC detection from histopathology images with SVC, CNN & EfficientNetB0 TensorFlow · Keras · scikit-learn
smokers_detection Binary CNN classifier for smoker vs. non-smoker image detection TensorFlow · Keras
reuters Multi-class neural network classifier for Reuters news articles TensorFlow · Keras

🗣️ NLP & Text Classification

Project Description Stack
roberta_fake_job Fake job posting detection using RoBERTa HuggingFace · PyTorch
NLP_word_embeddings Word embedding experiments with Word2Vec and GloVe Python · Gensim
NLP_intro Sentiment analysis on the Sentimental_Data dataset Python · NLTK

🔧 Regression & Classical ML

Project Description Stack
vehicles Polynomial regression + Ridge regularization on a vehicles dataset (R²=0.93) Python · scikit-learn
stroke_predict Stroke likelihood prediction from patient health features Python · scikit-learn
pointed-gun-at-person_model ML classifier for Phoenix PD incidents involving drawn firearms Python · scikit-learn

⚙️ Data Engineering & APIs

Project Description Stack
vietnam_war_pyspark PySpark analysis of Vietnam War bombing operations (1955–1975) PySpark · Python
NIST_API Script for downloading vulnerability reports from the NVD API Python
Gmail_API Gmail API client for authenticating and downloading email attachments Python

📊 GitHub Stats


Always open to interesting data science problems and collaborations.

Pinned Loading

  1. brest_cancer_prediction brest_cancer_prediction Public

    Master's thesis project predicting Invasive Ductal Carcinoma (IDC) from histopathology images using SVC, CNN, and EfficientNetB0 with EDA. It covers data preprocessing, model training, evaluation, …

  2. credit_score_classification credit_score_classification Public

    This project builds a Credit Score Classification model using classification algorytims. It includes data preprocessing, exploratory data analysis, and model evaluation with metrics like accuracy, …

  3. default_rate_calculation default_rate_calculation Public

    Investigating the relationship between default rate and macroeconomic factors in a hypothetical Polish bank.

    Jupyter Notebook

  4. property-pricing-pl property-pricing-pl Public

    Production-style machine learning system for predicting apartment prices in Warsaw. Trained on ~19 months of Polish real estate listings (Aug 2023 – Jun 2024) and refreshed weekly with live Otodom …

    Jupyter Notebook

  5. stroke_predict stroke_predict Public

    This repository houses a machine learning model designed to predict the likelihood of an individual experiencing a stroke. The model is trained on a dataset containing various health-related featur…

    Jupyter Notebook

  6. ChurnGuard ChurnGuard Public

    Customer Churn Intelligence Platform — XGBoost ML pipeline, statistical hypothesis testing, FastAPI REST API, Streamlit dashboard

    Jupyter Notebook