Smart Start

A Directed and Persistent Exploration Framework for Reinforcement Learning

More Info
expand_more

Abstract

An important problem in reinforcement learning is the exploration-exploitation dilemma. Especially for environments with sparse or misleading rewards it has proven difficult to construct a good exploration strategy. For discrete domains good exploration strategies have been devised, but are often nontrivial to implement on more complex domains with continuous states and/or actions.
In this work, a novel persistent and directed exploration framework is developed, called Smart Start. Usually, a reinforcement learning agent executes its learned policy with some exploration strategy from the start until the end of an episode, which we call ``normal'' learning. The idea of Smart Start is to split a reinforcement learning episode in two parts, the Smart Start phase and the ``normal'' learning phase. The initial Smart Start phase guides the agent to a region in which the agent expects to learn the most. The region is constructed using previous experiences and the guiding is done using a model-based planning or trajectory optimization method. When the agent arrives at the region, it continues its ``normal'' reinforcement learning. This approach leaves the performance of the used reinforcement learning algorithm unchanged, but augments it with persistent and directed exploration.
The Smart Start framework was evaluated using three reinforcement learning algorithms, a simple model-based reinforcement learning algorithm (MBRL), R-max and Q-Learning with epsilon-greedy, Boltzmann and UCB1 exploration. The evaluation was done on four discrete gridworld environments. Three environments with sparse rewards and one with misleading rewards. We showed that the performance of Q-Learning with Smart Start is comparable to R-max, which performs near optimal in the used scenarios. The MBRL algorithm with Smart Start is even able to outperform R-max in some of the problems. We show that Smart Start is a good framework for exploration that can be incorporated with any reinforcement learning algorithm.

Files