Search results | TU Delft Repositories

document

Hierarchize Pareto Dominance in Multi-Objective Stochastic Linear Bandits

Cheng, Ji (author), Xue, Bo (author), Jiaxiang, Y. (author), Zhang, Qingfu (author)

Multi-objective Stochastic Linear bandit (MOSLB) plays a critical role in the sequential decision-making paradigm, however, most existing methods focus on the Pareto dominance among different objectives without considering any priority. In this paper, we study bandit algorithms under mixed Pareto-lexicographic orders, which can reflect...

journal article 2024

document

Conflict Resolution at High Traffic Densities with Reinforcement Learning

Ribeiro, M.J. (author)

Increasing delays and congestion reported in many aviation sectors indicate that the current centralised operational model is rapidly approaching saturation levels. Air Traffic Control (ATC) system is not expected to keep pace with the ever-increasing demand for air transportation. Its capacity is still limited by the available controllers, and...

doctoral thesis 2023

document

CEM: Constrained Entropy Maximization for Task-Agnostic Safe Exploration

Yang, Q. (author), Spaan, M.T.J. (author)

Without an assigned task, a suitable intrinsic objective for an agent is to explore the environment efficiently. However, the pursuit of exploration will inevitably bring more safety risks.<br/>An under-explored aspect of reinforcement learning is how to achieve safe efficient exploration when the task is unknown.<br/>In this paper, we propose a...

conference paper 2023

document

Persuading to Prepare for Quitting Smoking with a Virtual Coach: Using States and User Characteristics to Predict Behavior

Albers, N. (author), Neerincx, M.A. (author), Brinkman, W.P. (author)

Despite their prevalence in eHealth applications for behavior change, persuasive messages tend to have small effects on behavior. Conditions or states (e.g., confidence, knowledge, motivation) and characteristics (e.g., gender, age, personality) of persuadees are two promising components for more effective algorithms for choosing persuasive...

conference paper 2023

document

Policy Analysis of Safe Vertical Manoeuvring using Reinforcement Learning: Identifying when to Act and when to stay Idle

Groot, D.J. (author), Ribeiro, M.J. (author), Ellerbroek, Joost (author), Hoekstra, J.M. (author)

The number of unmanned aircraft operating in the airspace is expected to grow exponentially during the next decades. This will likely lead to traffic densities that are higher than those currently observed in civil and general aviation, and might require both a different airspace structure compared to conventional aviation, as well as different...

conference paper 2023

document

qgym: A Gym for Training and Benchmarking RL-Based Quantum Compilation

Van Der Linde, Stan (author), De Kok, Willem (author), Bontekoe, Tariq (author), Feld, S. (author)

Compiling a quantum circuit for specific quantum hardware is a challenging task. Moreover, current quantum computers have severe hardware limitations. To make the most use of the limited resources, the compilation process should be optimized. To improve currents methods, Reinforcement Learning (RL), a technique in which an agent interacts...

conference paper 2023

document

MARL-iDR: Multi-Agent Reinforcement Learning for Incentive-Based Residential Demand Response

van Tilburg, Jasper (author), Cavalcante Siebert, L. (author), Cremer, Jochen (author)

This paper presents a decentralized Multi-Agent Reinforcement Learning (MARL) approach to an incentive-based Demand Response (DR) program, which aims to maintain the capacity limits of the electricity grid and prevent grid congestion by financially incentivizing residential consumers to reduce their energy consumption. The proposed approach...

conference paper 2023

document

Improved DQN-Based Computation Offloading Algorithm in MEC Environment

Zhao, Zheyu (author), Cheng, H. (author), Xu, Xiaohua (author)

Massive terminal users have brought explosive need of data residing at edge of overall network. Multiple Mobile Edge Computing (MEC) servers are built in/near base station to meet this need. However, optimal distribution of these servers to multiple users in real time is still a problem. Reinforcement Learning (RL) as a framework to solve...

conference paper 2023

document

Interaction-Aware Motion Planning in Crowded Dynamic Environments

Ferreira de Brito, B.F. (author)

Autonomous robots will profoundly impact our society, making our roads safer, reducing labor costs and carbon dioxide (CO2) emissions, and improving our life quality. However, to make that happen, robots need to navigate among humans, which is extremely difficult. Firstly, humans do not explicitly communicate their intentions and use intuition...

doctoral thesis 2022

document

Models and heuristics for hard routing and knapsack problems

Pierotti, J. (author)

One of the world’s biggest challenges is that living beings have to share a limited amount of resources. As people of science, we strive to find innovative ways to better use these resources, to reach and positively affect more and more people. In the field of optimization, we aim at finding an optimal allocation of limited sets of resources to...

doctoral thesis 2022

document

Optimal dispatch of PV inverters in unbalanced distribution systems using Reinforcement Learning

Vergara Barrios, P.P. (author), Salazar, Mauricio (author), Giraldo, Juan S. (author), Palensky, P. (author)

In this paper, a Reinforcement Learning (RL)-based approach to optimally dispatch PV inverters in unbalanced distribution systems is presented. The proposed approach exploits a decentralized architecture in which PV inverters are operated by agents that perform all computational processes locally; while communicating with a central agent to...

journal article 2022

document

Back to the Future: Solving Hidden Parameter MDPs with Hindsight

Ponnambalam, C.T. (author), Kamran, Danial (author), Simão, T. D. (author), Oliehoek, F.A. (author), Spaan, M.T.J. (author)

conference paper 2022

document

Lateral and Vertical Air Traffic Control Under Uncertainty Using Reinforcement Learning

Badea, C. (author), Groot, D.J. (author), Morfin Veytia, A. (author), Ribeiro, M.J. (author), Dalmau, Ramon (author), Ellerbroek, Joost (author), Hoekstra, J.M. (author)

Air traffic demand has increased at an unprecedented rate in the last decade (albeit interrupted by the COVID pandemic), but capacity has not increased at the same rate. Higher levels of automation and the implementation of decision-support tools for air traffic controllers could help increase capacity and catch up with demand. The air traffic...

conference paper 2022

document

Robust Event-Driven Interactions in Cooperative Multi-agent Learning

Jarne Ornia, D. (author), Mazo, M. (author)

We present an approach to safely reduce the communication required between agents in a Multi-Agent Reinforcement Learning system by exploiting the inherent robustness of the underlying Markov Decision Process. We compute robustness certificate functions (off-line), that give agents a conservative indication of how far their state measurements...

conference paper 2022

document

Event-Based Communication in Distributed Q-Learning

Jarne Ornia, D. (author), Mazo, M. (author)

We present an approach to reduce the communication of information needed on a Distributed Q-Learning system inspired by Event Triggered Control (ETC) techniques. We consider a baseline scenario of a Distributed Q-Learning problem on a Markov Decision Process (MDP). Following an event-based approach, N agents sharing a value function explore the...

conference paper 2022

document

Learning Complex Policy Distribution with CEM Guided Adversarial Hypernetwork

Tang, Shi Yuan (author), Oliehoek, F.A. (author), Irissappane, Athirai A. (author), Zhang, Jie (author)

Cross-Entropy Method (CEM) is a gradient-free direct policy search method, which has greater stability and is insensitive to hyperparameter tuning. CEM bears similarity to population-based evolutionary methods, but, rather than using a population it uses a distribution over candidate solutions (policies in our case). Usually, a natural...

conference paper 2021

document

Facial Feedback for Reinforcement Learning: A Case Study and Offline Analysis Using the TAMER Framework

Li, Guangliang (author), Whiteson, Shimon (author), Dibeklioğlu, Hamdi (author), Hung, H.S. (author)

Interactive reinforcement learning provides a way for agents to learn to solve tasks from evaluative feedback provided by a human user. Previous research showed that humans give copious feedback early in training but very sparsely thereafter. In this paper, we investigate the potential of agent learning from trainers’ facial expressions via...

conference paper 2021

document

Reinforcement learning for hyperparameter tuning in deep learning-based side-channel analysis

Rijsdijk, J. (author), Wu, L. (author), Perin, G. (author), Picek, S. (author)

Deep learning represents a powerful set of techniques for profiling side-channel analysis. The results in the last few years show that neural network architectures like multilayer perceptron and convolutional neural networks give strong attack performance where it is possible to break targets protected with various coun-termeasures....

journal article 2021

document

General-Sum Multi-Agent Continuous Inverse Optimal Control

Muench, C. (author), Oliehoek, F.A. (author), Gavrila, D. (author)

Modeling possible future outcomes of robot-human interactions is of importance in the intelligent vehicle and mobile robotics domains. Knowing the reward function that explains the observed behavior of a human agent is advantageous for modeling the behavior with Markov Decision Processes (MDPs). However, learning the rewards that determine...

journal article 2021

document

Transient non-stationarity and generalisation in deep reinforcement learning

Igl, Maximilian (author), Farquhar, Gregory (author), Luketina, Jelena (author), Böhmer, J.W. (author), Whiteson, Shimon (author)

Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments. For example, most RL algorithms collect new data throughout training, using a non-stationary behaviour policy. Due to the transience of this non-stationarity, it is often not explicitly addressed in deep RL and a single neural network is continually...

conference paper 2021

Pages

Pages