Print Email Facebook Twitter MARL-iDR: Multi-Agent Reinforcement Learning for Incentive-based Residential Demand Response Title MARL-iDR: Multi-Agent Reinforcement Learning for Incentive-based Residential Demand Response Author van Tilburg, Jasper (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Cavalcante Siebert, L. (mentor) Cremer, Jochen (mentor) Degree granting institution Delft University of Technology Date 2021-11-30 Abstract Distribution System Operators (DSOs) are responsible preventing grid congestion, while accounting for growing demand and the intermittent nature of renewable energy resources. Incentive-based demand response programs promise real-time flexibility to relieve grid congestion. To include residential consumers in these programs, aggregators can financially incentivize participants to reduce their energy demand and make aggregated energy reduction available to DSOs. A key challenge for aggregators is to coordinate heterogeneous preferences from multiple participants while preserving their privacy. This thesis proposes MARL-iDR: a decentralized Multi-Agent Reinforcement Learning approach to an incentive-based demand response program. The approach respects participants' privacy and preferences and makes decisions in real-time when deployed. The aggregator and each participant are controlled by Deep Reinforcement Learning agents that learn to maximize their reward. The aggregator agent learns a policy that dispatches suitable incentives to participants based on total energy demand and a target reduction, while minimizing financial costs. The participant agent learns to respond to these incentives by reducing consumption to a fraction of the original demand. The participant agents curtail or shift requested household appliances based on the selected consumption reduction using a novel Disjunctively Constrained Knapsack Problem optimization, while minimizing residents' dissatisfaction. A case study with real-world electricity data from 25 households demonstrates the capability to induce demand-side flexibility. The approach is compared to the case without demand response and to a centralized myopic baseline approach. A 9% reduction of the Peak-to-Average ratio (PAR) was achieved compared to the original PAR (no demand response). Subject Reinforcement Learning (RL)Demand ResponseMulti-Agent systems To reference this document use: http://resolver.tudelft.nl/uuid:4080723c-1e96-4a03-a40c-d7b74254e9d0 Part of collection Student theses Document type master thesis Rights © 2021 Jasper van Tilburg Files PDF Master_Thesis_JaspervanTilburg.pdf 5.57 MB Close viewer /islandora/object/uuid:4080723c-1e96-4a03-a40c-d7b74254e9d0/datastream/OBJ/view