Memory usage analysis of binary clustering algorithm
What is the gain in peak memory usage of the binary clustering algorithm compared to current state-of-the-art clustering methods?
P. Verigo (TU Delft - Electrical Engineering, Mathematics and Computer Science)
G.A. Bouland – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
Marcel J. T. Reinders – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
BHM Gerritsen – Graduation committee member (TU Delft - Computer Science & Engineering-Teaching Team)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
The rapid increase in the size of single-cell RNAseq datasets presents significant performance challenges when conducting evaluations and extracting information. We research an alternative input data format that utilizes binarization. Our main focus is an analysis of peak memory usage. An in-depth exploration of the solution’s design and implementation is provided, specifically emphasizing the strategies used to minimize memory usage. We analyzed and validated memory usage patterns and asymptotes using memory profiling tools. However, our findings suggest that gains in reducing memory usage on big modern datasets are attributable only to binarized data format rather than workflow interaction with the new format, which we found to be independent of the input format.