Skip to content
View psalarc's full-sized avatar

Block or report psalarc

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
psalarc/README.md

Pablo Salar Carrera — Data Scientist & ML Engineer

M.S. Data Science — New Jersey Institute of Technology (December 2024) B.A. Mathematics & Actuarial Science — Rider University (May 2023) 📍 Short Hills, NJ | LinkedIn | [email protected]


About

I build production-ready machine learning systems end-to-end: data ingestion pipelines, model training and rigorous evaluation, and deployment via REST APIs. My work spans supervised classification, deep RL agents (DQN, Value Iteration), and generative AI systems (RAG pipelines with LLM integration and vector search). I bring unusual statistical rigor — grounded in actuarial science and probability theory — to model evaluation, uncertainty quantification, and experimental design.

Published Research: Machine Learning Algorithms for Diabetes Diagnosis Prediction — ACM Digital Library, IVSP 2024


Technical Skills

Languages: Python, SQL, R

ML & Deep Learning: PyTorch, TensorFlow, Keras, scikit-learn — supervised/unsupervised learning, deep neural networks, LSTM, DQN, Value Iteration, Policy Iteration

Generative AI & NLP: Retrieval-Augmented Generation (RAG), LangChain, LLM API integration (OpenAI, Anthropic), Hugging Face Transformers, vector databases (FAISS, Chroma), semantic search, embeddings, prompt engineering, text classification

Data & Analytics: Pandas, NumPy, Matplotlib, Seaborn, Tableau, association rule mining (Apriori, FP-Growth), clustering (K-Means), dimensionality reduction

MLOps & Deployment: Docker, MLflow (experiment tracking), FastAPI, AWS (S3, EC2), Git/GitHub, Jupyter, VS Code

Evaluation & Rigor: Cross-validation, hyperparameter tuning, AUC-ROC, precision/recall, F1, SHAP interpretability, ablation studies, A/B testing


Featured Projects

Clinical RAG QA System (Private — available on request)

A full Retrieval-Augmented Generation pipeline that answers clinical medical questions grounded in a curated knowledge base. Built with Python, LangChain, LLM APIs, and FAISS vector search. Implements chunking strategy optimization, embedding selection, and retrieval quality evaluation.

Deep Q-Network agent trained to play ConnectX via PyTorch. Experiments across 8 architectural configurations (1–4 hidden layers, 64–512 units), quantifying the compute/performance trade-off. Key finding: shallow architectures match deeper networks at ~40% training cost. Stack: Python, PyTorch, Reinforcement Learning, DQN

Implements Value Iteration and Policy Iteration (from Sutton & Barto) to solve the Gambler's Problem. Analyzes the effect of discount factor γ and coin-flip probability p_h on the optimal policy. Stack: Python, NumPy, Dynamic Programming, MDP

Association rule mining on multi-retailer transaction data (Amazon, BestBuy, Nike, Supermarket). Surfaces high-confidence cross-sell rules; FP-Growth benchmarked against Apriori for runtime and rule quality at scale. Stack: Python, mlxtend, scikit-learn, Pandas

Benchmarks Random Forest, LSTM, KNN, and Naïve Bayes on a binary classification task under identical conditions. Compares models on accuracy, F1, and AUC-ROC with cross-validation and confusion matrix analysis. Stack: Python, TensorFlow, scikit-learn, Pandas

Benchmarks six ML classifiers (Ridge, Random Forest, Decision Tree, KNN, SVC, Naïve Bayes) across two diabetes datasets. Best results: Ridge Classifier 83.12% accuracy (Pima, n=768); Random Forest & Decision Tree 95% accuracy (Diabetes 2019, n=952). Published at ACM IVSP 2024. Stack: Python, scikit-learn, Pandas, NumPy, SHAP


Actively seeking ML engineering and applied AI roles. I bring production-systems thinking, rigorous statistical grounding, and hands-on experience with deep learning, reinforcement learning, and RAG architectures.

Pinned Loading

  1. DiabetesPredictionProject DiabetesPredictionProject Public

    Six ML classifiers benchmarked on two diabetes datasets. Best accuracy: Ridge Classifier 83.12% (Pima), Random Forest & Decision Tree 95% (Diabetes 2019). Published at ACM IVSP 2024.

    Python

  2. DQN-ConnectX-Agent DQN-ConnectX-Agent Public

    Deep Q-Network (DQN) agent trained to play ConnectX via PyTorch. Experiments across hidden layer depth and size to analyze the trade-off between network complexity and training efficiency.

    Jupyter Notebook

  3. Market-Basket-Analysis-Apriori-FPGrowth Market-Basket-Analysis-Apriori-FPGrowth Public

    Association rule mining on multi-retailer transaction data (Amazon, BestBuy, Nike, Supermarket) using Apriori and FP-Growth algorithms to surface cross-sell insights.

    Jupyter Notebook

  4. RL-Dynamic-Programming-GamblersProblem RL-Dynamic-Programming-GamblersProblem Public

    Value Iteration & Policy Iteration on the Gambler's Problem (Sutton & Barto). Analyzes how discount factor γ and coin-flip probability p_h affect optimal policy.

    Jupyter Notebook

  5. SalesPredictionProject_R SalesPredictionProject_R Public

    Iterative OLS regression in R across 11 models — from univariate to polynomial and interaction-term specifications. Includes VIF multicollinearity diagnostics, ANOVA model comparison, and residual …

    R

  6. Supervised-ML-Benchmark-Binary-Classification Supervised-ML-Benchmark-Binary-Classification Public

    Benchmarks Random Forest, LSTM, KNN, and Naive Bayes for binary classification. Compares models on accuracy, F1, and AUC-ROC using cross-validation.

    Jupyter Notebook