NEST - a Data-driven Building Model

Predictive building modelling to improve building operation.

⛶ Full screen

Based on this challenge:

See Readme above for details.

Watch video of our presentation:

This content is a preview from an external site.

04 - NEST, a Data-driven Building Model


Predictive building modelling to improve building operation.

The Goal

Energy consumption and CO2 emmissions of buildings could be reduced significantly by improving the controlling of the heating and cooling systems. This must be done predictively due to the thermal inertia. Hence, the control requires an accurate prediction model.

The Challenge

The Empa team proposed this challenge ( for the ENERGY & CLIMATE HACK 2021. Starting point was a research project on data predictive control. The proposed challenge was to improve the statistical modelling of the correlations between building controls and resulting room temperatures.

We have picked the data set for the SolAce unit (

The Idea

We train a machine-learning algorithm to maintain a target temperature. The system is set up as two stages: 1. Feature selection using the SULOV algorithm in Featurewiz. 2. Use genetic programming in TPOT to automatically explore thousands of models.


Select the model features. We use Featurewiz (an open-source python package) for automatically creating and selecting important features in the dataset that will create the best model with higher performance. Featurewiz uses the SULOV algorithm and Recursive XGBoost to reduce features to select the best features for the model. It also allows us to use advanced feature engineering strategies to create new features.

Featurewiz uses the SULOV (Searching for Uncorrelated List of Variables) algorithm. The algorithm works in the following steps. 1. First step: find all the pairs of highly correlated variables exceeding a correlation threshold (say absolute(0.8)). 2. Second step: find their Mutual Information Score to the target variable. Mutual Information Score is a non-parametric scoring method. So it's suitable for all kinds of variables and target. 3. Third step: take each pair of correlated variables, then knock off the one with the lower Mutual Information Score. 4. Final step: Collect the ones with the highest Information scores and least correlation with each other.


We use the TPOT (Tree-based Pipeline Optimization Tool) Python library for automated machine learning. TPOT uses a tree-based structure to represent a model pipeline for a predictive modeling problem, including data preparation and modeling algorithms, and model hyperparameters.

TPOT optimizes machine learning pipelines using genetic programming. It explores thousands of possible pipelines to find the best one for the data. Once TPOT is finished searching, it provides Pythons code for the best pipeline it found so we can tinker with the pipeline from there.

What we did

The model was trained using Jupyter notebooks on Google Collab on a subset of the data in order to get results in a reasonable time.

The prototype

Featurewiz selected 11 optimal features:


TPOT chose the XGBRegressor moddel as the best pipeline.

Resources/ Data

We have been using the SolAce Energy Demand and User Behaviour Data ( It contains 18 measurement points with a temporal resolution of 1 minute over the period of one year (July 2019 to July 2020).

The room temperature is a result of heat inflows and outflows of the unit. Heat inflows are - Space heating delivered to the unit. - Radiation that is coming through the windows. - Heat generated by the people present in the room. Heat outflows are - Heat that is transfered through the walls and windows to the outside world. - Space cooling delivered to the unit.

There are energy meters for the space heating and cooling delivered to the unit. We engineered the following features: - temproom = average of tempmeeting and tempoffice - radroom = irrad * (blindsheightF1 + blindsheightF2 + blindsheightF3 + blindsheightF4) during 11am and 5pm and 0 outside of these hours - praesroom = maximum of praesmeeting and praesoffice - tempdiff = tempamb - temproom

In addition, we have added the following time-based variables to help the algorithm deal with time-dependent phenomena: - dayofweek = 0 for Monday to 6 for Sunday - is_weekend = true for Saturday and Sunday - hour = hour of time variable

Next Steps

The is a number of next steps that should be taken from here - Due to time constraints, the current models were created with only a subset of the available data. With more time available, the training should be re-run with the entire data sets or with even larger data sets containing multiple years. - Specially constructured additional features can often improve a models performance. A few features have been created as part of this project. A possible next step should explore even more features. - In addition to the statistical analysis, the same data could be further analyzed through visualization. In particular, it would be interesting to investigate the temporal delays of the change in indoor temperature caused by energy input subject to the thermal inertia. The challenge submission suggested to look at H-scatterplots and cross-correlograms as tools to understand lagged correlations. - Finally, in order to establish trust in the results of the statistical model, model predictions should be validated through various analyses. The visualizations from the previous step could be adequate tools in order to understand how the predictions differ from actual measurements and, more importantly, how novel control algorithms reduce energy consumption while maintaining the same level of comfort.

Worked on the pitch

13.09.2021 05:26 ~ Maud

Readme updated

10.09.2021 15:05 ~ wolfram

lrwilt has joined!

01.09.2021 14:46

Event finished

01.09.2021 13:45

Worked on the pitch

01.09.2021 09:26 ~ wolfram

Readme updated

01.09.2021 09:20 ~ wolfram

Worked on the pitch

01.09.2021 09:20 ~ wolfram

Readme updated

01.09.2021 09:19 ~ wolfram

visualization code and structure of readme for project page (@WolframWilluhn)

Readme updated

01.09.2021 09:04 ~ wolfram

wolfram has joined!

01.09.2021 09:04

Challenge started.

01.09.2021 09:04 ~ wolfram

create new features in solacedataengineered.csv as documented (@WolframWilluhn)

system and device id that corresponds to NEST wiki added to solace metadata

source data

Initial commit

Event started

31.08.2021 09:00
Loading ...
All attendees, sponsors, partners, volunteers and staff at our hackathon are required to agree with the Hack Code of Conduct. Organisers will enforce this code throughout the event. We expect cooperation from all participants to ensure a safe environment for everybody. For more details on how the event is run, see the Guidelines on our wiki.

Creative Commons LicenceThe contents of this website, unless otherwise stated, are licensed under a Creative Commons Attribution 4.0 International License.

Energy & Climate Hack