🎓 Data Analytics student at Northeastern University
🌎 Interested in environmental data, public health, clean energy, and sustainability
📊 I use data to identify performance gaps and turn them into actionable insights
- Environmental data analysis (BERDO, emissions, energy systems)
- Machine learning (Logistic Regression, Random Forest)
- Exploring GIS and spatial data
Analyzed 5,500+ Boston buildings to identify emissions patterns and non-compliance risks.
Key Outcomes:
- Identified 1,902 records missing valid Site EUI, highlighting data completeness challenges
- Flagged 1,003 high-priority buildings based on high energy intensity and property complexity
- Built an interactive Streamlit app that lets users look up any Boston address in the BERDO dataset and see the annual cost of non-compliance.
Impact: Insights directly informed workforce discussions around building performance, emissions reduction, and equitable decarbonization.
Tools: Python (pandas, matplotlib), Streamlit, Excel | Live app →
Built Logistic Regression and Random Forest models to predict cefepime resistance in E. coli
Key Outcomes:
- Logistic Regression achieved 87% recall and 0.871 balanced accuracy on the validation set, outperforming Random Forest across both metrics
- Tuned models using nested cross-validation (5-fold outer, 3-fold inner GridSearchCV) to prevent data leakage during hyperparameter selection
- Feature coefficient analysis identified key genomic resistance drivers, supporting model interpretability
Impact: Supports faster clinical decision-making for antibiotic selection.
Tools: Python (scikit-learn, pandas), statistical analysis
Designed a manually operated dispensing device for cost-effective, field-deployable applications.
Key Outcomes:
- cost reduction (~68%) vs. commercial alternatives
- Modular design prioritizes cleanability and durability
- CAD models and assembly documentation included
Impact: Enables resource-constrained teams to scale operations.
Tools: FreeCAD, Python, FMEA, Technical Documentation
-
Python (pandas, scikit-learn, numpy)
-
R (statistical analysis)
-
SQL
-
Tableau
-
Excel
-
Machine Learning
-
Data Visualization
-
Advanced geospatial analysis (QGIS, ArcGIS)
-
Time-series forecasting for energy demand
-
Climate impact modeling and scenario analysis
-
Collaborations on environmental data projects and sustainability analytics.
-
Internships in climate tech, renewable energy, or environmental consulting.
-
Conversations about data-driven climate action.