Imaging mass spectrometry (IMS) yields high-dimensional and large data sets commonly exceeding 100,000 pixels, each reporting a mass spectrum of 200,000 intensity values or more. Reducing the dimensionality and size of IMS data is often necessary to enable downstream analysis, an
...
Imaging mass spectrometry (IMS) yields high-dimensional and large data sets commonly exceeding 100,000 pixels, each reporting a mass spectrum of 200,000 intensity values or more. Reducing the dimensionality and size of IMS data is often necessary to enable downstream analysis, and matrix-factorization-based approaches are often used for this purpose. However, the model underlying most of these techniques, decomposing measurements into the sum of a low-rank term (presumed signal) and a small entry-wise residual term (presumed noise), is often not optimal for IMS. For example, while spatially or spectrally sparse signals are common in IMS data, they can heavily distort the low-rank approximation. Therefore, we propose capturing the IMS data structure using low-rank models that, in addition to a dense residual, allow for sparse variation to be captured separately. We implement two such methods, principal component pursuit (PCP) and stable principal component pursuit (SPCP), apply them to IMS data, and compare them to a classical factorization method, principal component analysis (PCA). We investigate their dimensionality and noise reduction performance on MALDI Q-TOF IMS measurements of human cornea and retina tissue since the human eye is a complex organ with lots of small, tightly packed tissue substructures that are spatially sparse. Our results suggest that if parameters are set adequately, PCP and SPCP enable stronger dimensionality reduction and higher compression of IMS data compared to PCA, while concurrently reducing signal overestimation.