Handling the unknown

Towards on-the-job object recognition

More Info
expand_more

Abstract

As robots are becoming a more integral part in our daily lives, it is important to ensure they work in a safe and efficient manner. A large part of perceiving the environment is done through robot vision. Research in computer vision and machine learning lead to great improvements in the past decades and robots are able to outperform humans on certain tasks. However, these tasks are often in closed set condition. This makes translation to a real world application challenging, as this is in open set condition. The open set condition implies that incomplete knowledge of the world is available at training time. It is important for a robot, or agent, to be aware of this limitation. In vision tasks this is defined as open world recognition and allows an agent to detect and incrementally learn unknown objects. The key contributions of this report are an autonomous data collection protocol for synthetic data creation, the open world algorithm Learning to Accept Image Classes (L2AIC) and an on-the-job recognition approach that combines open world recognition with autonomous data collection. L2AIC is a deep meta-learning model that classifies objects by comparing it to its memory in an n-way k-shot manner. New classes can be incrementally added to the memory without needing to retrain the model. The autonomous data collection protocol consists of two steps. First, a 3D model is reconstructed from an unknown object with an RGB-D camera. Secondly, from this 3D model a synthetic dataset is created that is added to the memory of L2AIC. Results show that the on-the-job recognition approach is successful in learning to recognize a single unknown object using the L2AIC model with small-fc architecture an ResNet152 encoder. The encoder is loaded with pretrained weights on the ImageNet dataset. No additional fine-tuning is required, this has an adverse effect on the performance. Using the autonomous data collection protocol two datasets were created, varying the distance from the camera to the object. It was found that the synthetic dataset containing close-up images achieves the best performance. The performance of L2AIC with this closeup synthetic dataset is similar to using a dataset of actual images of the object. This means that for a single object it currently is not efficient to create a synthetic dataset. A more extensive study is required to compare performance of both datasets when increasing the number of encountered objects. Finally, the on-the-job approach shows it is capable of recognizing other instances of the same object class, using only the dataset of a single instance. This shows the on-the-job recognition approach is able to generalize well. Future work could be focused on improving this generalization with, for example, the help of Generative Adversarial Networks (GANs).