Cold start is coming: How to approximate the optimal set of initial prototypes for clustering sequence data online

Bachelor Thesis (2021)
Author(s)

S. Fucarev (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Nadeem – Mentor (TU Delft - Cyber Security)

Sicco Verwer – Mentor (TU Delft - Cyber Security)

Gosia Migut – Graduation committee member (TU Delft - Computer Science & Engineering-Teaching Team)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2021 Silviu Fucarev
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Silviu Fucarev
Graduation Date
01-07-2021
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Clustering data is a classic topic in the academic community and in the industry. It is by and large one of the most popular unsupervised classification techniques. It is fast and flexible as it can accommodate all kinds of data when a suitable similarity metric is found. SeqClu is an online k-medoids prototype based clustering algorithm designed to handle large quantities of sequence data. Our main focus is the role initialization plays in the performance of SeqClu. In this paper we show that Greedy Heuristics perform significantly better than K-medoids heuristics. In the context of Greedy Heuristics we show that these can be combined together to achieve potentially better accuracy if a proper metric to choose the initialization results is elected.

Files

Thesis.pdf
(pdf | 2 Mb)
License info not available