SeqClu-PV: An extension of online K-medoids to efficiently cluster sequences real-time

Bachelor Thesis (2021)
Author(s)

R.E.C. te Wierik (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Azqa Nadeem – Mentor (TU Delft - Cyber Security)

Sicco Verwer – Graduation committee member (TU Delft - Cyber Security)

M.A. Migut – Coach (TU Delft - Computer Science & Engineering-Teaching Team)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2021 Ruben te Wierik
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Ruben te Wierik
Graduation Date
01-07-2021
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Related content

The link to the GitHub repository containing the code and research results generated during the research project.

https://github.com/rtewierik/seqclupv

The link to the publicly available Python package released via the Python Package Index (PyPI).

https://pypi.org/project/seqclupv
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Real-time sequence clustering is the problem of clustering an infinite stream of sequences in real time with limited memory. A variant of the k-medoids algorithm called SeqClu is the suggested approach, representing a cluster with p most representative sequences of the cluster, called prototypes, to solve the problem of maintaining a high-quality representation of a cluster that requires little memory throughout time. However, the computational cost of this algorithm is considerable due to many distance computations that use Dynamic Time Warping (DTW), which is a computationally expensive distance measure that can be applied to sequences and is proven to be robust to noise and
delays. Therefore, this paper proposes an extension of SeqClu called SeqClu-PV, characterised by a decision-making mechanism for updating prototypes that improves the balance between the number of distance computations and the cost incurred due to incorrect clustering and reviews its performance.

Files

Research_paper_final_3.pdf
(pdf | 0.819 Mb)
License info not available