K.P. Tuyls | TU Delft Repository

A comparative study of bug algorithms for robot navigation

Journal article (2019) - K. N. McGuire, G. C.H.E. de Croon, K. Tuyls

This paper presents a literature survey and a comparative study of Bug Algorithms, with the goal of investigating their potential for robotic navigation. At first sight, these methods seem to provide an efficient navigation paradigm, ideal for implementations on tiny robots with limited resources. Closer inspection, however, shows that many of these Bug Algorithms assume perfect global position estimate of the robot which in GPS-denied environments implies considerable expenses of computation and memory — relying on accurate Simultaneous Localization And Mapping (SLAM) or Visual Odometry (VO) methods. We compare a selection of Bug Algorithms in a simulated robot and environment where they endure different types noise and failure-cases of their on-board sensors. From the simulation results, we conclude that the implemented Bug Algorithms’ performances are sensitive to many types of sensor-noise, which was most noticeable for odometry-drift. This raises the question if Bug Algorithms are suitable for real-world, on-board, robotic navigation as is. Variations that use multiple sensors to keep track of their progress towards the goal, were more adept in completing their task in the presence of sensor-failures. This shows that Bug Algorithms must spread their risk, by relying on the readings of multiple sensors, to be suitable for real-world deployment. ...

Experience selection in deep reinforcement learning for control

Journal article (2018) - Tim De Bruin, Jens Kober, Karl Tuyls, Robert Babuška

Experience replay is a technique that allows off-policy reinforcement-learning methods to reuse past experiences. The stability and speed of convergence of reinforcement learning, as well as the eventual performance of the learned policy, are strongly dependent on the experiences being replayed. Which experiences are replayed depends on two important choices. The first is which and how many experiences to retain in the experience replay buffer. The second choice is how to sample the experiences that are to be replayed from that buffer. We propose new methods for the combined problem of experience retention and experience sampling. We refer to the combination as experience selection. We focus our investigation specifically on the control of physical systems, such as robots, where exploration is costly. To determine which experiences to keep and which to replay, we investigate different proxies for their immediate and long-term utility. These proxies include age, temporal difference error and the strength of the applied exploration noise. Since no currently available method works in all situations, we propose guidelines for using prior knowledge about the characteristics of the control problem at hand to choose the appropriate experience replay strategy. ...

Integrating state representation learning into deep reinforcement learning

Journal article (2018) - Tim de Bruin, Jens Kober, Karl Tuyls, Robert Babuska

Most deep reinforcement learning techniques are unsuitable for robotics, as they require too much interaction time to learn useful, general control policies. This problem can be largely attributed to the fact that a state representation needs to be learned as a part of learning control policies, which can only be done through fitting expected returns based on observed rewards. While the reward function provides information on the desirability of the state of the world, it does not necessarily provide information on how to distill a good, general representation of that state from the sensory observations. State representation learning objectives can be used to help learn such a representation. While many of these objectives have been proposed, they are typically not directly combined with reinforcement learning algorithms. We investigate several methods for integrating state representation learning into reinforcement learning. In these methods, the state representation learning objectives help regularize the state representation during the reinforcement learning, and the reinforcement learning itself is viewed as a crucial state representation learning objective and allowed to help shape the representation. Using autonomous racing tests in the TORCS simulator, we show how the integrated methods quickly learn policies that generalize to new environments much better than deep reinforcement learning without state representation learning. ...

Improved deep reinforcement learning for robotics through distribution-based experience retention

Conference paper (2016) - Tim de Bruin, Jens Kober, Karl Tuyls, Robert Babuska

Recent years have seen a growing interest in the use of deep neural networks as function approximators in reinforcement learning. In this paper, an experience replay method is proposed that ensures that the distribution of the experiences used for training is between that of the policy and a uniform distribution. Through experiments on a magnetic manipulation task it is shown that the method reduces the need for sustained exhaustive exploration during learning. This makes it attractive in scenarios where sustained exploration is in-feasible or undesirable, such as for physical systems like robots and for life long learning. The method is also shown to improve the generalization performance of the trained policy, which can make it attractive for transfer learning. Finally, for small experience databases the method performs favorably when compared to the recently proposed alternative of using the temporal difference error to determine the experience sample distribution, which makes it an attractive option for robots with limited memory capacity. ...

Off-policy experience retention for deep actor-critic learning

Conference paper (2016) - Tim de Bruin, Jens Kober, K.P. Tuyls, Robert Babuska

When a limited number of experiences is kept in memory to train a reinforcement learning agent, the criterion that determines which experiences are retained can have a strong impact on the learning performance. In this paper, we argue that for actor critic learning in domains with significant momentum, it is important to retain experiences with off-policy actions when the amount of exploration is reduced over time. This claim is supported by simulation experiments with a pendulum swing-up problem and a magnetic manipulation task. Additionally, we compare our strategy to database overwriting policies based on obtaining experiences spread out over the state-action space, and also to using the temporal difference error as a proxy for the value of experiences. ...

Bayesian inference in dynamic domains using Logical OR gates

Conference paper (2016) - R. Claessens, A. de Waal, P. de Villiers, Ate Penders, G. Pavlin, Karl Tuyls

The range of applications that require processing of temporally and spatially distributed sensory data is expanding. Common challenges in domains with these characteristics are sound reasoning about uncertain phenomena and coping with the dynamic nature of processes that influence these phenomena. To address these challenges we propose the use of causal Bayesian Networks for probabilistic reasoning and introduce the Logical OR gate in order to combine them with dynamic processes estimated by arbitrary Markov processes. To illustrate the genericness of the proposed approach, we apply it in a wildlife protection use case. Furthermore we show that the resulting model supports modularization of computations, which allows for efficient decentralized processing. ...

The importance of experience replay database composition in deep reinforcement learning

Conference paper (2015) - Tim de Bruin, Jens Kober, K.P. Tuyls, Robert Babuska

Recent years have seen a growing interest in the use of deep neural networks as function approximators in reinforcement learning. This paper investigates the potential of the Deep Deterministic Policy Gradient method for a robot control problem both in simulation and in a real setup. The importance of the size and composition of the experience replay database is investigated and some requirements on the distribution over the state-action space of the experiences in the database are identified. Of particular interest is the importance of negative experiences that are not close to an optimal policy. It is shown how training with samples that are insufficiently spread over the state-action space can cause the method to fail, and how maintaining the distribution over the state-action space of the samples in the experience database can greatly benefit learning. ...