JS
J. Smit
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
2 records found
1
Know what it does not know
Improving Offline Deep Reinforcement Learning with Uncertainty Estimation
Offline reinforcement learning, or learning from a fixed data set, is an attractive alternative to online reinforcement learning. Offline reinforcement learning promises to address the cost and safety implications of taking numerous random or bad actions online, which is a crucial aspect of traditional reinforcement learning that makes it difficult to apply in real-world problems. However, when offline reinforcement learning is naïvely applied to a fixed data set, the resulting policy may exhibit poor performance in the real environment. This happens due to over-estimations of the expected return for state-action pairs not sufficiently covered in the data set. Therefore, offline reinforcement learning agents must know what they do not know, allowing them to avoid these over-estimated state-action pairs and their potentially erroneous outcomes. A promising way to instill offline reinforcement learning agents with this ability is the pessimism principle, which states that agents should select actions that maximize an uncertainty-based lower bound of the expected return. This pessimism principle has drastically improved the performance of offline reinforcement learning methods in the tabular and linear function approximation domain. However, in deep reinforcement learning, uncertainty estimation is highly non-trivial, and the development of effective uncertainty-based pessimistic algorithms remains an open question. That is why in this thesis, we explore various existing deep learning-based uncertainty estimation techniques with the aim to combine them with existing deep reinforcement learning methods to create an uncertainty-aware offline deep reinforcement learning algorithm. This research has resulted in two novel offline deep reinforcement learning methods built on Double Deep Q-Learning and Soft Actor-Critic. We applied these methods to various benchmarks and experiments to demonstrate their interesting and unique properties. In some situations, they even beat the current state-of-the-art results of these benchmarks.
...
Offline reinforcement learning, or learning from a fixed data set, is an attractive alternative to online reinforcement learning. Offline reinforcement learning promises to address the cost and safety implications of taking numerous random or bad actions online, which is a crucial aspect of traditional reinforcement learning that makes it difficult to apply in real-world problems. However, when offline reinforcement learning is naïvely applied to a fixed data set, the resulting policy may exhibit poor performance in the real environment. This happens due to over-estimations of the expected return for state-action pairs not sufficiently covered in the data set. Therefore, offline reinforcement learning agents must know what they do not know, allowing them to avoid these over-estimated state-action pairs and their potentially erroneous outcomes. A promising way to instill offline reinforcement learning agents with this ability is the pessimism principle, which states that agents should select actions that maximize an uncertainty-based lower bound of the expected return. This pessimism principle has drastically improved the performance of offline reinforcement learning methods in the tabular and linear function approximation domain. However, in deep reinforcement learning, uncertainty estimation is highly non-trivial, and the development of effective uncertainty-based pessimistic algorithms remains an open question. That is why in this thesis, we explore various existing deep learning-based uncertainty estimation techniques with the aim to combine them with existing deep reinforcement learning methods to create an uncertainty-aware offline deep reinforcement learning algorithm. This research has resulted in two novel offline deep reinforcement learning methods built on Double Deep Q-Learning and Soft Actor-Critic. We applied these methods to various benchmarks and experiments to demonstrate their interesting and unique properties. In some situations, they even beat the current state-of-the-art results of these benchmarks.
Bachelor thesis
(2019)
-
Jordi Smit, Matthijs van Niekerk, Robin Oosterbaan, Daniël van Gelder, Stephan Tromer, K. F. Chan, Asterios Katsifodimos, Otto Visser
Scenwise is a business working on innovative and sophisticated solutions in the domain of traffic management. Leveraging data science and IT systems, Scenwise delivers products to institutions to facilitate efficient traffic management. In order to manage the highly complex network of infrastructure on the road network, traffic managers need to use and analyze data that is collected all across the network in order to support decision makers in management of this network. However, there is often a mismatch in expertise between traffic management experts and decision makers. Traffic management experts use highly technical visualization techniques that require significant background knowledge in the traffic management domain. In addition, the visualization techniques are spread out over a multitude of systems that do not work together. In order to bridge the knowledge gap, a product needs to be created that allows experts to extract and visualize relevant data using their traffic domain knowledge while providing intuitive and clear visualizations which are clear to both experts and non-experts. The ultimate goal of this product would be to facilitate efficient traffic management in order to improve the lives of commuters by contributing to a better organized infrastructure. Our project group has designed and implemented a product for Scenwise that offers this solution. A web-based application has been created that retrieves and stores traffic data. The product is able to traverse the road network and provide helpful insights into the traffic network’s state at either the present moment, or moments in history. The application is able to provide dynamic traffic contour plots, draw fundamental diagrams, show live traffic intensity over the entire Dutch road network and provide information related to traffic events like accidents and matrix sign states. The product is able to do all of this while providing a seamless and intuitive user interface. The system has been designed and implemented over a span of ten weeks by a group of five students. A SCRUM methodology was adopted and through careful discussion with the client and a continuous feedback loop a product was delivered that fits both the clients needs and the wider product vision that has been defined.
...
Scenwise is a business working on innovative and sophisticated solutions in the domain of traffic management. Leveraging data science and IT systems, Scenwise delivers products to institutions to facilitate efficient traffic management. In order to manage the highly complex network of infrastructure on the road network, traffic managers need to use and analyze data that is collected all across the network in order to support decision makers in management of this network. However, there is often a mismatch in expertise between traffic management experts and decision makers. Traffic management experts use highly technical visualization techniques that require significant background knowledge in the traffic management domain. In addition, the visualization techniques are spread out over a multitude of systems that do not work together. In order to bridge the knowledge gap, a product needs to be created that allows experts to extract and visualize relevant data using their traffic domain knowledge while providing intuitive and clear visualizations which are clear to both experts and non-experts. The ultimate goal of this product would be to facilitate efficient traffic management in order to improve the lives of commuters by contributing to a better organized infrastructure. Our project group has designed and implemented a product for Scenwise that offers this solution. A web-based application has been created that retrieves and stores traffic data. The product is able to traverse the road network and provide helpful insights into the traffic network’s state at either the present moment, or moments in history. The application is able to provide dynamic traffic contour plots, draw fundamental diagrams, show live traffic intensity over the entire Dutch road network and provide information related to traffic events like accidents and matrix sign states. The product is able to do all of this while providing a seamless and intuitive user interface. The system has been designed and implemented over a span of ten weeks by a group of five students. A SCRUM methodology was adopted and through careful discussion with the client and a continuous feedback loop a product was delivered that fits both the clients needs and the wider product vision that has been defined.