Dissimilarity-based ensembles for multiple instance learning

Journal article (2016)

Authors

V.V. Cheplygina Pattern Recognition and Bioinformatics -

D.M.J. Tax Pattern Recognition and Bioinformatics -

M. Loog Pattern Recognition and Bioinformatics -

Research Group

Pattern Recognition and Bioinformatics () (TU Delft)

Combining classifiers Dissimilarity representation Multiple instance learning (MIL) Random subspacemethod (RSM)

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:1629ec10-47ad-4ba7-8242-0527c26c46fe

Published Date

2016

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Department

Intelligent Systems

Research Group

Pattern Recognition and Bioinformatics

Abstract

In multiple instance learning, objects are sets (bags) of feature vectors (instances) rather than individual feature vectors. In this paper, we address the problem of how these bags can best be represented. Two standard approaches are to use (dis)similarities between bags and prototype bags, or between bags and prototype instances. The first approach results in a relatively low-dimensional representation, determined by the number of training bags, whereas the second approach results in a relatively high-dimensional representation, determined by the total number of instances in the training set. However, an advantage of the latter representation is that the informativeness of the prototype instances can be inferred. In this paper, a third, intermediate approach is proposed, which links the two approaches and combines their strengths. Our classifier is inspired by a random subspace ensemble, and considers subspaces of the dissimilarity space, defined by subsets of instances, as prototypes. We provide insight into the structure of some popular multiple instance problems and show state-of-the-art performances on these data sets.