Feasibility of Spectral Clustering in Imaging Mass Spectrometry
B.A. Khan (TU Delft - Mechanical Engineering)
Raf Van de Plas – Mentor (TU Delft - Team Raf Van de Plas)
P.L. Delacour – Mentor (TU Delft - Team Raf Van de Plas)
G. de Albuquerque Gleizer – Graduation committee member (TU Delft - Team Gabriel Gleizer)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Imaging Mass Spectrometry (IMS) collects spatial and chemical information of a sample, generating high-dimensional datasets that present challenges in exploratory data analysis due to their substantial size. Spectral clustering is a promising unsupervised learning approach for IMS applications, employing graph-based strategies to identify patterns without assumptions about cluster geometry.
Unlike many clustering algorithms that have assumptions about the geometry of the clusters, spectral clustering constructs a similarity graph and performs eigendecomposition on the Laplacian matrix to reveal non-convex clusters. This allows one to find clusters of arbitrary shapes, which can result in new or improved segmentation being discovered in IMS data. Furthermore, two recent studies allow for the potential argument that spectral clustering might be optimal for IMS data.
Despite these advantages, spectral clustering faces implementation barriers primarily due to its computational complexity and memory constraints. The limited applications of spectral clustering on IMS data, can predominantly be attributed to these limitations.
This thesis investigates the feasibility of spectral clustering for analyzing high-dimensional imaging mass spectrometry data, with a focus on performance under noise, computational scalability, and maintaining biological segmentation. To assess the performance, internal and external validation metrics are used as well as a comparison with variations of k-means clustering. Additionally, a memory constrained algorithm was developed to address the scalability issue induced by the memory complexity.
The results highlight that spectral clustering outperforms k-means, when both methods are utilizing the cosine metric, in scenarios of increased noise on a synthetic dataset. Upon application on a real world subset of an IMS dataset of a mouse pup containing the brain, the results between k-means and spectral clustering were highly comparable. When applied on the complete dataset with a memory constrained version of spectral clustering, the results were less promising due to its dependence on initial seeding, where k-means obtained better or similar clustering results with lower time and memory complexity.