Segmented Active Reward Learning

More Info
expand_more

Abstract

The use of robotic systems outside the branch of tasks currently common in industry requires the development of novel intelligent control methods. In this thesis we will aim to improve on a recent machine learning method known as active reward learning. This method is able to teach a robotic system a task using human expert ratings on demonstrated robotic trajectories. Current implementations of this method use information collected from complete trajectories without regard for time specific features. This work will incorporate time segmentation as a new feature in two extensions of the active reward learning framework. In one extension, demonstrations are still rated over entire trajectories, leaving extraction of the important time segments to the learning algorithm. In the second extension we allow the expert to rate trajectory segments directly. The two constructed algorithms are tested using a robot simulator. It is shown that these new methods are able to learn simple end effector tasks using reasonable numbers of queries and rollouts.