Interactive Reinforcement Learning for Adaptive Thermal Comfort

None, None

Interactive Reinforcement Learning for Adaptive Thermal Comfort

Master Thesis (2024)

Author(s)

A. Korkusuz (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

L. Cavalcante Siebert – Mentor (TU Delft - Interactive Intelligence)

P. Rutgers – Mentor (Next Sense)

Faculty

Electrical Engineering, Mathematics and Computer Science

Reinforcement Learning Thermal Comfort Interactive RL Simulated Humans

To reference this document use:

https://resolver.tudelft.nl/uuid:41e4c023-d45b-4271-8f82-f20c02e51c22

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

25-09-2024

Awarding Institution

Delft University of Technology

Programme

['Computer Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Designing and implementing effective systems for thermal comfort management in buildings is a complex task due to the need to account for subjective preference parameters influenced by human physiology, bias and tendencies. This research introduces a novel approach to simulating human interactions for managing thermal comfort. Stochastic simulated humans provide feedback in the form of thermostat interactions, from which their thermal comfort is inferred converting these interactions into rewards, called human rewards. Control policies are obtained from training with Human reward or PMV reward by utilizing the Proximal Policy Optimization (PPO) algorithm. It is shown that the learning process can be guided solely by human rewards. Experiment results assess the impact of this simulated human reward system on the adaptability of the reinforcement learning model for single human scenarios, also comparing back to the PMV reward case as ground truth. The policy trained with PMV reward achieves thermal control that keeps the PMV values inside the [-0.2,0.2] range, while the policy trained with the human reward achieves a range of [-0.6,0.6]. Simulating human feedback as an interaction with the thermostat, the proposed model is shown to capture a rough estimate of human thermal preference. This research paves the way for using simulated humans for interactive reinforcement learning (RL) based thermal comfort control.

Files

Ata_Korkusuz_Thesis.pdf

(pdf | 3.46 Mb)

License info not available