Cold start is coming: How to approximate the optimal set of initial prototypes for clustering sequence data online

None, None

Cold start is coming: How to approximate the optimal set of initial prototypes for clustering sequence data online

Bachelor Thesis (2021)

Author(s)

S. Fucarev (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Nadeem – Mentor (TU Delft - Cyber Security)

Sicco Verwer – Mentor (TU Delft - Cyber Security)

Gosia Migut – Graduation committee member (TU Delft - Computer Science & Engineering-Teaching Team)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Clustering algorithms Greedy heuristic K-medoids Online clustering algorithms

To reference this document use:

https://resolver.tudelft.nl/uuid:59e50492-e027-4f04-9d86-f8c659851cc6

More Info

expand_more

Publication Year

2021

Language

English

Copyright

Graduation Date

01-07-2021

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Clustering data is a classic topic in the academic community and in the industry. It is by and large one of the most popular unsupervised classification techniques. It is fast and flexible as it can accommodate all kinds of data when a suitable similarity metric is found. SeqClu is an online k-medoids prototype based clustering algorithm designed to handle large quantities of sequence data. Our main focus is the role initialization plays in the performance of SeqClu. In this paper we show that Greedy Heuristics perform significantly better than K-medoids heuristics. In the context of Greedy Heuristics we show that these can be combined together to achieve potentially better accuracy if a proper metric to choose the initialization results is elected.

Files

Thesis.pdf

(pdf | 2 Mb)

License info not available