Print Email Facebook Twitter Reinforcement Learning of Visual Features Title Reinforcement Learning of Visual Features Author Blom, W.B. Contributor Jonker, P.P. (mentor) Faculty Mechanical, Maritime and Materials Engineering Department Biomechanical Engineering Date 2016-05-27 Abstract The digital environment has an ever increasing amount smart programs. Programs that also get smarter every day. They help us filtering spam e-mail and they adjust to show us personalized advertisements. These smart programs observe people and serve (other) people. A robot can be seen as a program with a body. Make the program smart enough and it can help us in the real world too. The smartest programs learn from observations to become better at what they do. Reinforcement Learning (RL) is a type of learning that has been successfully applied to solve a variety of learning tasks. RL is learning from experience in sensory changes and rewards. The robot that uses RL tries to optimize the actions it takes to achieve the maximum reward. Most RL algorithms do not scale well to large sensory inputs. Images are very large inputs because each pixel is an input. Therefore algorithms have been created to compress the visual information to abstract representations (Visual Features). Neural Q-learning is such a method. It combines the RL algorithm of Q-learning with Artificial Neural Networks (ANNs). ANNs are networks of neurons that each do a small adjustable calculation. The network can transform the input to more abstract or useful information. The ANN can learn by adjusting and optimizing the calculations until the network creates the desired transformation. Using an ANN is a good method to find complex ways to make visual data more abstract and more compressed. In this theses, Deep Q-learning is tested with a more difficult task combined with a higher world complexity. In the original paper it was tested on ATARI 2600 games and achieved good results. In this thesis Deep Q-learning is tested in a transportation task in a 3D simulation where the learner only has a relatively large first-person perspective image of the robot it controls. The results show that the complexity of the visual information and the relatively long-delayed reinforcements cause an initialization-noise to reinforcement-signal ratio such that the learner was unable to converge the neural network to describe beneficial behavior. What was learned was forgotten faster than the learner could replay the useful experiences. It can be concluded that only scaling the environment complexity with the Neural Q-learning algorithm is not possible. The learning algorithm needs an extension that makes it better able to handle long-delayed rewards with large visual inputs. Subject reinforcement learningvisual featuresneural networksunsupervised learningconvolutional neural networks To reference this document use: http://resolver.tudelft.nl/uuid:2402256c-8713-4737-9c47-b97a43558705 Part of collection Student theses Document type master thesis Rights (c) 2016 Blom, W.B. Files PDF ThesisWBBlom_V1.9.pdf 3.69 MB Close viewer /islandora/object/uuid:2402256c-8713-4737-9c47-b97a43558705/datastream/OBJ/view