SeqClu-PV: An extension of online K-medoids to efficiently cluster sequences real-time
R.E.C. te Wierik (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Azqa Nadeem – Mentor (TU Delft - Cyber Security)
Sicco Verwer – Graduation committee member (TU Delft - Cyber Security)
M.A. Migut – Coach (TU Delft - Computer Science & Engineering-Teaching Team)
More Info
expand_more
The link to the GitHub repository containing the code and research results generated during the research project.
https://github.com/rtewierik/seqclupvThe link to the publicly available Python package released via the Python Package Index (PyPI).
https://pypi.org/project/seqclupvOther than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Real-time sequence clustering is the problem of clustering an infinite stream of sequences in real time with limited memory. A variant of the k-medoids algorithm called SeqClu is the suggested approach, representing a cluster with p most representative sequences of the cluster, called prototypes, to solve the problem of maintaining a high-quality representation of a cluster that requires little memory throughout time. However, the computational cost of this algorithm is considerable due to many distance computations that use Dynamic Time Warping (DTW), which is a computationally expensive distance measure that can be applied to sequences and is proven to be robust to noise and
delays. Therefore, this paper proposes an extension of SeqClu called SeqClu-PV, characterised by a decision-making mechanism for updating prototypes that improves the balance between the number of distance computations and the cost incurred due to incorrect clustering and reviews its performance.