Similarity metrics for binary cell clustering

How close can we get to state-of-the-art ?

Bachelor Thesis (2023)
Author(s)

B.P. Golik (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Marcel J. T. Reinders – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

G.A. Bouland – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

BHM Gerritsen – Graduation committee member (TU Delft - Computer Science & Engineering-Teaching Team)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Bartek Golik
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Bartek Golik
Graduation Date
28-06-2023
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Analysing single-cell RNA sequencing data is becoming an increasingly tedious task as the size of data sets grows. As a proposed solution, recent discoveries suggest that these data sets can be binarized without losing much information. This in turn should allow for memory and time efficient methods of storage and computation. Numerous analyses techniques require cell clustering as a preliminary procedure, which suggests the need to evaluate binary representation performance under that context. In this work we present a comparison between binary clustering results and the state-of-the-art, with a focus on similarity metric choice and the impact on intermediate steps of the procedure (i.e. similarity matrices and kNN graphs). The method was evaluated on single-cell transcriptomic data sets, utilizing a combination of R and C++ as an evaluation framework. Through these means we found that some of the similarity metrics operating on continuous input can possibly be reproduced with similarity metrics operating on binary input.

Files

CSE3000_Final_Paper.pdf
(pdf | 2.15 Mb)
License info not available