Guest blog: The power of deep reinforcement learning to transform the built environment

Alexander Belyaev, Senior Data Scientist at Arloid Automation explores what AI can offer the built environment in the age of climate emergency

What can Artificial Intelligence (AI) offer the built environment in our age of climate emergency? At the heart of Deep Reinforcement Learning is an agent and an environment. Just as we are starting to learn that our actions within our environment have consequences on an immense, planetary scale, innovative AI is learning too – and faster than us.

By using Deep Reinforcement Learning to optimise the energy efficiency of HVAC (heating, ventilation and air conditioning) systems in the built environment, we can minimise the negative impact of our own actions without sacrificing user comfort. As businesses all over the world attempt to transition to Net Zero, this technology has a pivotal role to play.

But why is Deep Reinforcement Learning the best way to optimise HVAC performance?

What is Machine Learning?

When you have a task to complete, but don’t want to do it manually, Machine Learning (ML) enables you to teach a computer to do it instead. A type of AI, ML uses data and algorithms to imitate the way that humans learn, gradually honing its accuracy. Using a fixed set of data, the computer must go through a period of training before it can carry out the desired function.

ML can be further broken down into Supervised Learning and Unsupervised Learning. These may sound like opposites, but both approaches use fixed data to learn how to approach future tasks.

Supervised Learning works like this. Imagine you are trying to categorise pictures of cats and pictures of dogs. The model will be programmed to know that certain features = cat. It can then identify pictures of cats and assign them to the correct category, and assign pictures that correspond to the label ‘dog’ to another.

In Unsupervised Learning, labels are not used. Instead, the machine identifies patterns in the data e.g., similar features amongst cat pictures, similar features amongst dog pictures. It won’t be able to name these as ‘cat’ and ‘dog’, but it will still be able to correctly discern between them.

What is Reinforcement Learning?

Reinforcement Learning is distinct from the others because it doesn’t work with fixed datasets, but what are known as ‘environments’. It is solution focused; the goal is not to classify data, but rather to operate well within this environment, and solve a specific task. That’s why it proves invaluable when teaching a computer to optimise building energy efficiency, as we do at Arloid.

In Reinforcement Learning, the machine or agent gathers its data through the outcomes of its training, understanding how to behave correctly through trial and error. In this way, it maps environmental states to its own actions. When this is applied to HVAC systems, this enables it to determine the cause and effect between heating and cooling, rather than just remembering programmed settings.

This idea can be understood by imagining a mouse trying to exit a labyrinth. The mouse has several options – it can go forward, left, right, or back. After analysing these options and predicting what might happen, the mouse will choose one of them and move. As a result, the mouse will find itself a different position. The environment around the mouse has changed. In Mathematics, this is known as the Markov Process – a decision is made, and that decision changes the environment.

Depending on where the mouse has found itself, it will either be closer or further away from its goal of exiting the labyrinth; that is, it will either be rewarded or not. Essentially, this type of learning teaches the agent to take actions that result in the maximum reward.

Deep Reinforcement Learning Optimises HVAC Systems Faster

Deep Reinforcement Learning harnesses the power of deep neural networks to tackle problems too complicated for standard Reinforcement Learning. As a result, it has plenty to offer our built environment as we strive to reach Net Zero by 2050, optimising HVAC systems and building energy efficiency.

Fixed data methods like Supervised and Unsupervised Machine Learning can be used for HVAC optimisation, but they don’t deliver results as quickly or accurately. That’s because the relevant data about the building needs to be collected – requiring manual human intervention – and the agent will not take into account the fact that its actions are changing the environment and therefore the data.

By contrast, Deep Reinforcement Learning works within a dynamic environment, not independent of it. When it comes to HVAC systems, the agent will learn the optimal behaviour much more quickly – minimising energy consumption whilst maintaining the perfect level of user comfort. This makes it highly scalable, and applicable to a wide variety of building types and sizes.

This isn’t programming a model and hoping it will still work in the real world. Deep Reinforcement Learning means the agent understands how to react to all possible environmental states and will not be surprised by new real world data. Whatever the temperature and conditions faced by the HVAC system, the AI will cool or heat the space as required to return the building to equilibrium.

Arloid AI and Deep Reinforcement Learning

At Arloid, we use Deep Reinforcement Learning to train our algorithm to optimise HVAC performance on behalf of our clients. That’s because it delivers results more quickly and provides greater security for businesses and organisations.

Instead of requiring extensive operational datasets, we create virtual environments that are essentially digital copies of a building, known as the Digital Twin. Historic data of the building is not required here; instead, the simulations gather data as they run, saving time and ensuring the AI’s capabilities are acutely tailored to the space in question. Working with as many as 20 digital copies at once, we simulate a full year of environmental conditions and stressors many thousands of times.

During this period, the agent will perform actions which the virtual building will react to such as heating or cooling the space. Consequently, the agent will be able to see a relationship between its chosen action and the state of the Digital Twin’s environment and how that is connected to its goal – optimum thermal comfort and energy efficiency. As a result, the agent can understand how to accurately manage the temperature of a building in as little as 30-60 days.

This is the future of AI, and it’s already here.

techUK - Committed to Climate Action

Digital transformation is critical to the decarbonisation journey of organisations in every sector. Across supply chains and sectors, industries are converging with tech partners to find innovations that reduce carbon emissions and unlock efficiencies that drive down energy use. techUK focuses on the application of emerging technologies and data-driven decision making in traditional forms of infrastructure to deliver innovative environmental outcomes. For more information on our Climate, Environment and Sustainability Programme, please visit our Climate Action Hub and click ‘contact us’.