This repository contains a Python-based ensemble machine learning model that predicts:
- Total energy consumption
- Average temperature
- Average humidity
- CO₂ levels
It uses:
- Random Forest
- XGBoost
- Linear Regression
-
Install dependencies: pip install pandas numpy scikit-learn xgboost matplotlib
-
Run the model
The model reads data, performs feature engineering, trains the ensemble, and outputs performance metrics and visual plots.
The dataset is too large to host on GitHub.
You can download it from this Google Drive link.
This project uses a preprocessed version of the PLEIAData dataset, collected from smart buildings in the PHOENIX Horizon 2020 project. It contains high-resolution, time-aligned sensor readings from three residential buildings (Blocks A, B, and C) over a full year (2021).
Key Features
-
Power Consumption
-
Real energy usage data from three blocks
-
Column:
dif_cons_real -
Environmental Parameters
-
Indoor temperature (multiple sensors):
temp_sensor_1,temp_sensor_2,temp_sensor_3 -
Relative humidity:
hrmed -
CO₂ concentration:
V17→ renamed toCO2_ppm -
Weather data: solar radiation, wind speed, ambient temperature
-
Timestamp Alignment
- All sources are resampled to hourly intervals
- Merged on a common
Datecolumn
Preprocessing Steps
- Merged 8+ data sources into one unified DataFrame
- Generated:
total_consumption= sum of Block A + B + Caverage_temperatureandaverage_humidity- Lag features at 1, 2, 6, 12, and 24-hour intervals
- Rolling mean features over 6 and 12 hours
- Time encodings:
hour_sin,hour_cos,dayofweek,month
Forecasting Targets
The model forecasts the following key environmental and energy metrics:
total_consumptionaverage_temperatureaverage_humidityCO2_ppm
Dataset Citation
A. Martínez Ibarra, A. González-Vidal, and A. Skarmeta,
"PLEIAData: Consumption, HVAC, Temperature, Weather and Motion Sensor Data for Smart Buildings",
➡️ Dataset must be manually downloaded and placed in a folder in the above google drive link
# 1. Clone the repo
git clone https://github.com/Gayatri0116/Ensemble-TempHumidity_Forecastingmodel.git
cd Ensemble-TempHumidity_Forecastingmodel
# 2. (Optional) Create a virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 3. Install required libraries
pandas
numpy
matplotlib
scikit-learn
xgboost
# 4. Run the script
python ensembke.py