Print Email Facebook Twitter Reinforcement Learning based Energy Management System for Smart Buildings Title Reinforcement Learning based Energy Management System for Smart Buildings Author van den Bovenkamp, Nick (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Vergara Barrios, P.P. (mentor) Degree granting institution Delft University of Technology Date 2022-03-14 Abstract Smart buildings, including photovoltaic (PV) generation, controllable electricity consumption, and a battery energy storage system, are expected to fulfill a crucial role in balancing out supply and demand in future power systems. Energy management systems (EMS) present in these smart buildings control the operation of these various components. Achieving an optimal dynamic control strategy is still challenging due to the stochastic nature of PV generation, electricity consumption patterns, and market prices. Hence, this research developed an EMS that minimizes day-ahead electricity costs based on reinforcement learning (RL) with linear function approximation. The proposed Q-learning with tile coding (QLTC) EMS is compared to the solutions found by the deterministic mixed-integer linear programming (MILP) model, which is needed to validate if the proposed approach reaches good-quality solutions. Furthermore, the QLTC's generalization capabilities are evaluated, a missing feature in literate. A case study on an industrial manufacturing company in the Netherlands with historical electricity consumption, PV generation, and wholesale electricity prices is carried out to examine the QLTC EMS's performance. The results show that the QLTC's returns convergence consistently to the MILP's negative electricity costs, indicating that the QLTC reaches a good-quality control policy. The EMS effectively adjusts its power consumption to favorable price moments during one week of operation, where the electricity costs made by the QLTC comes 99\% close to MILP's electricity costs. Furthermore, the results demonstrate that the QLTC approach can deploy a decent control policy without encountering the exact day of data by generalizing on previously learned control policies.On average, it can deploy a control policy of 80\% near the MILP's optimum on a test week after being trained for 85 days of data. To reference this document use: http://resolver.tudelft.nl/uuid:b44027f9-acf7-443b-8523-0b2283539952 Part of collection Student theses Document type master thesis Rights © 2022 Nick van den Bovenkamp Files PDF Thesis_Report_Nick_van_de ... enkamp.pdf 21.64 MB Close viewer /islandora/object/uuid:b44027f9-acf7-443b-8523-0b2283539952/datastream/OBJ/view