AUTHOR=Chen Bingqing , Cai Zicheng , Bergés Mario TITLE=Gnu-RL: A Practical and Scalable Reinforcement Learning Solution for Building HVAC Control Using a Differentiable MPC Policy JOURNAL=Frontiers in Built Environment VOLUME=6 YEAR=2020 URL=https://www.frontiersin.org/journals/built-environment/articles/10.3389/fbuil.2020.562239 DOI=10.3389/fbuil.2020.562239 ISSN=2297-3362 ABSTRACT=

We propose Gnu-RL: a novel approach that enables real-world deployment of reinforcement learning (RL) for building control and requires no prior information other than historical data from existing Heating Ventilation and Air Conditioning (HVAC) controllers. In contrast with existing RL agents, which are opaque to expert interrogation and need ample training to achieve reasonable control performance, Gnu-RL is much more attractive for real-world applications. Furthermore, Gnu-RL avoids the need to develop and calibrate simulation environments for each building or system under control, thus making it highly scalable. To achieve this, we bootstrap the Gnu-RL agent with domain knowledge and expert demonstration. Specifically, Gnu-RL adopts a recently-developed Differentiable Model Predictive Control (MPC) policy, which encodes domain knowledge on planning and system dynamics, making it both sample-efficient and interpretable. Prior to any interaction with the environment, a Gnu-RL agent is pre-trained on historical data using imitation learning, enabling it to match the behavior of the existing controller. Once it is put in charge of controlling the environment, the agent continues to improve its policy end-to-end, using a policy gradient algorithm. We evaluate Gnu-RL in both simulation studies and a real-world testbed. Firstly, we validated the Gnu-RL agent is indeed practical and scalable by applying it to three commercial reference buildings in simulation, and demonstrated the superiority of the Differentiable MPC policy compared to a generic neural network. In another simulation experiment, our approach saved 6.6% of the total energy compared to the best published RL result for the same environment, while maintaining a higher level of occupant comfort. Finally, Gnu-RL was deployed to control the airflow to a real-world conference room with a Variable-Air-Volume (VAV) system during a 3-weeks period. Our results show that Gnu-RL reduced cooling demand by 16.7% compared to the existing controller and tracked the temperature set-point better.