Cooperative Visual Object Learning

More Info
expand_more

Abstract

A lot of attention has recently been focused on possible benefits of the cooperation between machines and humans. Taking the best from machines and humans and joining them together can produce results which exceed each collaborating partner performing separately. A common belief is that the
key for good cooperation is an excellent communication. Some important aspects of communication are self-evaluation processes. When applied to humans these processes improve the communication quality between humans. Therefore, we believe that employing self-evaluation processes at machines
advances the human-machine communication, and cooperation quality. Accordingly, this thesis is exploring communication strategies between machines and humans. More precisely, it examines possibilities for the communication improvement through an exploration of self-evaluation processes of classifiers.

Firstly, we introduce a baseline framework, an interactive visual category learning architecture, called Tubby at Honda Research Institute in Germany. For simplicity we consider in a first step the classification of objects rather than the more difficult task of learning multiple categories per object. We then introduce theoretical foundations used in the thesis. The background on classification, neural networks, outlier detection and assessment of classifiers is explained in depth. We outline the critical importance of self-evaluation in classification. Therefore, we propose two self-evaluation measures which are incorporated within a testing and a training strategy. The first measure captures a confidence
in predictions during classification and it is used within the proposed testing strategy. The second measure denotes the quality of each training sample with respect to the generalization performance of classification and it is used within the proposed training strategy. The quality of the training sample essentially represents how different the current training sample is from all of the previously acquired training samples of the object. The confidence and the quality measure are communicated through a graph to the user of the system. Depending on whether the system is in the testing or training phase the value of the corresponding measure is provided to the user. Two ways of deriving each
of the two measures are presented. Essentially, we offer two testing and two training strategies. Furthermore, we consider each proposed strategy separately and a combination of those. We then evaluate all of the considered cases against a baseline strategy. The evaluations are performed in different dimensionalities of the feature space for different numbers of training and testing objects. We present an offline simulation of the interaction between Tubby and the user. Results of the simulation provide additional insights into the working mechanisms of the proposed strategies and measures.

The proposed strategies improve the baseline performance. The absolute improvement of the average classification accuracy varies between 1% and 25%, depending on the dimensionality of the feature space and number of training and testing objects. The best results are achieved when a combination of the proposed training and testing strategy is used. The biggest improvement is observed when a lot of objects are in the learning process (≈ 100), and the dimensionality of the feature space is high (≥ 10D). In the actual application setting this is the most realistic case - a large number of objects and a high dimensionality of the feature space.