ELM Decision Branching

Improving streaming clustering with decision branching

Master thesis (2021)

Authors

J. Moree Electrical Engineering, Mathematics and Computer Science

Contributors

S.E. Verwer Cyber Security - (mentor)

R.L. Lagendijk Cyber Security - (graduation committee member)

Joana Gonçalves Pattern Recognition and Bioinformatics - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

To reference this document use:

http://resolver.tudelft.nl/uuid:69a32e76-2a55-466a-8d42-f867a02fe158

More Info

expand_more

Published Date

30-06-2021

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Clustering is a commonly used method in data analysis. It is a complex problem that can be very time consuming, especially when clustering large datasets with many features. Most clustering algorithms scale exponentially in time when increasing the dataset size, making it infeasible to use them for large datasets. Streaming algorithms do not have this problem as they will scale linearly. Evolving Local Means (ELM) is a streaming clustering algorithm based on Mean Shift. The ELM algorithm iterates over all data samples in a single iteration, updating an internal structure of clusters with each sample being added. Similar to Mean Shift, ELM requires only one parameter to be provided by the user: the initial radius of a cluster. In ELM, each sample can either be added to the nearest cluster or turned into a new cluster, depending on the distance to the nearest cluster. Clusters that come too close to each other will be merged to form a single cluster. There is one issue with this approach: the decisions are based on an intermediate state that is constantly evolving. This might cause incorrect clustering results when ELM merges clusters together prematurely. This research introduces ELM Decision Branching (ELM-DB) which extends ELM with decision branching. ELM-DB aims to reduce the chance of a premature merge by postponing important decisions. Instead of making a decision directly, it will branch and continue all possible options until the decision can be made. ELM-DB is able to improve clustering performance with only a small increase in runtime and produce consistent results for a larger range of radius settings than ELM.

Files

ELM_Decision_Branching_J._More... (.pdf)

(.pdf | 3.31 Mb)