× This Challenge was posted 2 years ago

Challenge Project

04 - NEST, a Data-driven Building Model

Predictive building modelling to improve building operation.

⛶  Fullscreen ↓  Download 📂 Demo

Projects based on this challenge:

NEST - a Data-driven Building Model
Energy & Climate Hack Project

Predictive building modelling to improve building operation.

Although buildings account for almost half of Switzerland's total final energy consumption, digitalisation in the building sector is not yet very advanced. At NEST, Empa's demonstrator, new technologies and algorithms are being developed and tested. For this purpose, NEST is monitored in detail by some 10,000 sensors. The aim of this challenge is to use the rich measurement data from the NEST to give buildings the opportunity to look a little into the future and thereby take an important step towards digitalisation and energy efficiency.

(DE Zusammenfassung)

NEST Fallstudie für "Decarbonising Cities" - Kostengünstiges, daten-getriebenes Gebäudemodell.

Obwohl Gebäude fast die Hälfte des gesamten Schweizerischen Endenergieverbrauchs verursachen, haben die Digitalisierung und ihre Versprechen hier bisher kaum Einzug gehalten. In dem Empa Demonstrator NEST, werden neue Technologien und Algorithmen entwickelt und getestet. Für diesen Zweck wird das NEST bis ins Detail sensorisch überwacht. Ziel dieser Challenge ist es, die Messdaten des NEST zu nutzen, um Gebäuden die Möglichkeit zu geben, ein wenig in die Zukunft zu blicken und dadurch einen wichtigen Schritt in Richtung Digitalisierung und Energieeffizienz zu machen.


Buildings are responsible for 60% of the energy consumption and 40% of the C02 emissions in Switzerland (overall energy statistics SFOE). Through optimal operation, i.e. without structural changes, consumption can be reduced by an estimated 20%.

However, the optimal operation of a building is a methodological challenge. Due to the thermal inertia, regulating measures (e.g. flow rate of the floor heating, charging of a storage tank) must be carried out predictively. Other influencing variables, such as solar radiation, however, have a direct influence on the indoor climate and must therefore be taken from predictions at the time of the regulating intervention.

Building models are used to solve this methodological challenge algorithmically. Building models can either be created deterministically on the basis of physical laws (model predictive control) or they can be learned from measurement data. The NEST research building of Empa and EAWAG grants access to measurement data from several buildings with different types of uses. Measurement data for all relevant systems and building functions have been stored for several years with a temporal resolution of one minute. The data situation, for example for creating data-driven building models, is unique.


The overall objective is to investigate the dependencies between the current energy input, indoor climate and weather, and the indoor climate at a later point in time. The corresponding question is: Which energy input leads to which future indoor climate for a given initial situation (current indoor climate, weather)?

Participants can choose their own approach or they can use the following, more detailed objectives and questions as a guide.

1) Visualisation of the interrelationships:

  • Which energy input leads to which indoor temperature?
  • How delayed is the reaction of the indoor temperature to a change in the energy input (i.e. how great is the thermal inertia)?
  • What influence do boundary conditions such as the outside temperature or the current indoor climate have on the subsequent indoor climate?
  • Do the above relationships vary between different rooms? Methodological note: H-scatterplots or cross-correlograms, among others, are suitable for the explorative investigation of "lagged correlations".

2) Statistical modelling of the correlations from 1):

  • Can these correlations be quantified with a statistical model?
  • Do the correlations become clearer if not only the current measurements are used as predictors, but also the last N historical measurements?

3) Sensors are expensive. In the real situation, there are usually fewer sensors available:

  • Which sensors (predictors) have which significance in the model from 2)?
  • Can the model be reduced by individual sensors to save costs without the prediction of future room temperature suffering greatly?

Data availability

The participants are provided with the data from the scientific study in an adjusted form, as CSV.

  • An overview of the measurement data and infrastructure at NEST is available at the following link: https://info.nestcollaboration.ch/wikipediapublic/.
  • The measurement data of the NEST research building can be accessed via REST API.
  • A detailed description of the sensors (incl. costs) is provided.
  • a Graphana dashboard provides visual access to all relevant measurement data.
  • Additional measurement data, such as outdoor temperature or solar radiation, can be obtained via publicly accessible interfaces.
  • Literature: Bünning, F., Huber, B., Heer, P., Aboudonia, A. and Lygeros, J., 2020. Experimental demonstration of data predictive control for energy optimisation and thermal comfort in buildings. Energy and Buildings, 211, p.109792.
  • More info: https://www.empa.ch/de/web/nest/

In principle, the above goals can be pursued for several units at NEST. However, it makes sense to start with the Urban Mining & Recycling (UMAR) unit. The measurement data from UMAR were recently used in a research project for a similar purpose (Bünning et al. 2020). Each unit is based on a specific thesis. In UMAR, this is that all resources needed to produce a building must be fully reusable, recyclable or compostable.


Left: UMAR from the outside. On the left and right in the picture are the two bedrooms. Right: Floor plan of the unit. Above left and right the two bedrooms, in the middle the living room. UMAR offers living space for two guest researchers or students. The unit has a living room with kitchen, two identical bedrooms and two bathrooms. Detailed measurements are available for the consumption of hot and cold water, electrical energy and heating and cooling (details here). With this information, a statistical model can be created and trained, which can estimate the expected room temperature based on the actual state (e.g. room temperature), a measure (e.g. increase heating output) and the weather forecast for tomorrow.


04 - NEST, a Data-driven Building Model


Predictive building modelling to improve building operation.

The Goal

Energy consumption and CO2 emmissions of buildings could be reduced significantly by improving the controlling of the heating and cooling systems. This must be done predictively due to the thermal inertia. Hence, the control requires an accurate prediction model.

The Challenge

The Empa team proposed this challenge (https://hack.opendata.ch/project/672) for the ENERGY & CLIMATE HACK 2021. Starting point was a research project on data predictive control. The proposed challenge was to improve the statistical modelling of the correlations between building controls and resulting room temperatures.

We have picked the data set for the SolAce unit (https://info.nestcollaboration.ch/wikipediapublic/building/solace/)

The Idea

We train a machine-learning algorithm to maintain a target temperature. The system is set up as two stages: 1. Feature selection using the SULOV algorithm in Featurewiz. 2. Use genetic programming in TPOT to automatically explore thousands of models.


Select the model features. We use Featurewiz (an open-source python package) for automatically creating and selecting important features in the dataset that will create the best model with higher performance. Featurewiz uses the SULOV algorithm and Recursive XGBoost to reduce features to select the best features for the model. It also allows us to use advanced feature engineering strategies to create new features.

Featurewiz uses the SULOV (Searching for Uncorrelated List of Variables) algorithm. The algorithm works in the following steps. 1. First step: find all the pairs of highly correlated variables exceeding a correlation threshold (say absolute(0.8)). 2. Second step: find their Mutual Information Score to the target variable. Mutual Information Score is a non-parametric scoring method. So it's suitable for all kinds of variables and target. 3. Third step: take each pair of correlated variables, then knock off the one with the lower Mutual Information Score. 4. Final step: Collect the ones with the highest Information scores and least correlation with each other.


We use the TPOT (Tree-based Pipeline Optimization Tool) Python library for automated machine learning. TPOT uses a tree-based structure to represent a model pipeline for a predictive modeling problem, including data preparation and modeling algorithms, and model hyperparameters.

TPOT optimizes machine learning pipelines using genetic programming. It explores thousands of possible pipelines to find the best one for the data. Once TPOT is finished searching, it provides Pythons code for the best pipeline it found so we can tinker with the pipeline from there.

What we did

The model was trained using Jupyter notebooks on Google Collab on a subset of the data in order to get results in a reasonable time.

The prototype

Featurewiz selected 11 optimal features:


TPOT chose the XGBRegressor moddel as the best pipeline.

Resources/ Data

We have been using the SolAce Energy Demand and User Behaviour Data (https://figshare.com/articles/dataset/NEST_-_SolAce_Energy_Demand_and_User_Behaviour_Data/14376950). It contains 18 measurement points with a temporal resolution of 1 minute over the period of one year (July 2019 to July 2020).

The room temperature is a result of heat inflows and outflows of the unit. Heat inflows are - Space heating delivered to the unit. - Radiation that is coming through the windows. - Heat generated by the people present in the room. Heat outflows are - Heat that is transfered through the walls and windows to the outside world. - Space cooling delivered to the unit.

There are energy meters for the space heating and cooling delivered to the unit. We engineered the following features: - temproom = average of tempmeeting and tempoffice - radroom = irrad * (blindsheightF1 + blindsheightF2 + blindsheightF3 + blindsheightF4) during 11am and 5pm and 0 outside of these hours - praesroom = maximum of praesmeeting and praesoffice - tempdiff = tempamb - temproom

In addition, we have added the following time-based variables to help the algorithm deal with time-dependent phenomena: - dayofweek = 0 for Monday to 6 for Sunday - is_weekend = true for Saturday and Sunday - hour = hour of time variable

Next Steps

The is a number of next steps that should be taken from here - Due to time constraints, the current models were created with only a subset of the available data. With more time available, the training should be re-run with the entire data sets or with even larger data sets containing multiple years. - Specially constructured additional features can often improve a models performance. A few features have been created as part of this project. A possible next step should explore even more features. - In addition to the statistical analysis, the same data could be further analyzed through visualization. In particular, it would be interesting to investigate the temporal delays of the change in indoor temperature caused by energy input subject to the thermal inertia. The challenge submission suggested to look at H-scatterplots and cross-correlograms as tools to understand lagged correlations. - Finally, in order to establish trust in the results of the statistical model, model predictions should be validated through various analyses. The visualizations from the previous step could be adequate tools in order to understand how the predictions differ from actual measurements and, more importantly, how novel control algorithms reduce energy consumption while maintaining the same level of comfort.

This content is a preview from an external site.
Contributed 3 years ago by nikki_bhler for Energy & Climate Hack
All attendees, sponsors, partners, volunteers and staff at our hackathon are required to agree with the Hack Code of Conduct. Organisers will enforce this code throughout the event. We expect cooperation from all participants to ensure a safe environment for everybody. For more details on how the event is run, see the Guidelines on our wiki.

Creative Commons LicenceThe contents of this website, unless otherwise stated, are licensed under a Creative Commons Attribution 4.0 International License.