Jakub Roznerski jroznerski

Hi, I'm Jakub 👋

Data Scientist based in Poland 🇵🇱
Building production-grade ML systems — from statistical analysis to deployed APIs

🧠 About Me

I work across the full data science lifecycle — from exploratory analysis and statistical hypothesis testing to training ML models, building REST APIs, and shipping interactive dashboards.

My work focuses on:

Rigorous statistics — hypothesis testing with proper effect sizes, not just p-values
Production mindset — models that serve real predictions, not just notebooks
Clear communication — dashboards and docs that translate findings into decisions

🛠 Tech Stack

Languages & Core

ML & Data Science

Deep Learning & NLP

APIs & Deployment

Statistical Analysis

🚀 Featured Projects

🛡️ ChurnGuard — Customer Churn Intelligence Platform

End-to-end ML system predicting customer churn in real time

Component	Details
Pipeline	Data ingestion → feature engineering → XGBoost (CV AUC 0.75) → SHAP
Hypothesis Testing	5 statistical tests (χ², Welch t-test, Mann-Whitney U) with effect sizes
API	FastAPI REST — single & batch prediction, model metrics, live test results
Dashboard	Streamlit 4-page app — overview, hypothesis tests, live prediction, model performance
Infrastructure	Docker Compose, GitHub Actions CI, pytest (28 tests)

🏠 Property Pricing PL — Warsaw Apartment Price Predictor

Production-style ML system predicting Warsaw apartment prices, refreshed weekly with live Otodom data

Component	Details
Data	~19 months of Polish real estate listings (Aug 2023 – Jun 2024) + weekly live scraping
Pipeline	Feature engineering → model training → automated weekly refresh
Scope	End-to-end: data collection, preprocessing, modelling, deployment

📁 More Projects

📊 Statistical Analysis & Experimentation

Project	Description	Stack
ab_testing	Frequentist A/B test for an e-commerce page redesign — Z-test, Welch t-test, power analysis, effect sizes	Python · SciPy · Jupyter
esg_risk_ml_project	ESG risk classification using Random Forest	Python · scikit-learn
credit_score_classification	Credit score classification with precision/recall analysis	Python · scikit-learn
default_rate_calculation	Default rate vs. macroeconomic factors in a hypothetical Polish bank	Python · pandas

🧬 Deep Learning & Computer Vision

Project	Description	Stack
brest_cancer_prediction	Master's thesis — IDC detection from histopathology images with SVC, CNN & EfficientNetB0	TensorFlow · Keras · scikit-learn
smokers_detection	Binary CNN classifier for smoker vs. non-smoker image detection	TensorFlow · Keras
reuters	Multi-class neural network classifier for Reuters news articles	TensorFlow · Keras

🗣️ NLP & Text Classification

Project	Description	Stack
roberta_fake_job	Fake job posting detection using RoBERTa	HuggingFace · PyTorch
NLP_word_embeddings	Word embedding experiments with Word2Vec and GloVe	Python · Gensim
NLP_intro	Sentiment analysis on the Sentimental_Data dataset	Python · NLTK

🔧 Regression & Classical ML

Project	Description	Stack
vehicles	Polynomial regression + Ridge regularization on a vehicles dataset (R²=0.93)	Python · scikit-learn
stroke_predict	Stroke likelihood prediction from patient health features	Python · scikit-learn
pointed-gun-at-person_model	ML classifier for Phoenix PD incidents involving drawn firearms	Python · scikit-learn

⚙️ Data Engineering & APIs

Project	Description	Stack
vietnam_war_pyspark	PySpark analysis of Vietnam War bombing operations (1955–1975)	PySpark · Python
NIST_API	Script for downloading vulnerability reports from the NVD API	Python
Gmail_API	Gmail API client for authenticating and downloading email attachments	Python

📊 GitHub Stats

Always open to interesting data science problems and collaborations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly