Interactive Reinforcement Learning for Adaptive Thermal Comfort
A. Korkusuz (TU Delft - Electrical Engineering, Mathematics and Computer Science)
L. Cavalcante Siebert – Mentor (TU Delft - Interactive Intelligence)
P. Rutgers – Mentor (Next Sense)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Designing and implementing effective systems for thermal comfort management in buildings is a complex task due to the need to account for subjective preference parameters influenced by human physiology, bias and tendencies. This research introduces a novel approach to simulating human interactions for managing thermal comfort. Stochastic simulated humans provide feedback in the form of thermostat interactions, from which their thermal comfort is inferred converting these interactions into rewards, called human rewards. Control policies are obtained from training with Human reward or PMV reward by utilizing the Proximal Policy Optimization (PPO) algorithm. It is shown that the learning process can be guided solely by human rewards. Experiment results assess the impact of this simulated human reward system on the adaptability of the reinforcement learning model for single human scenarios, also comparing back to the PMV reward case as ground truth. The policy trained with PMV reward achieves thermal control that keeps the PMV values inside the [-0.2,0.2] range, while the policy trained with the human reward achieves a range of [-0.6,0.6]. Simulating human feedback as an interaction with the thermostat, the proposed model is shown to capture a rough estimate of human thermal preference. This research paves the way for using simulated humans for interactive reinforcement learning (RL) based thermal comfort control.