Extension of Maximum Autocorrelation Factorization
With application to imaging mass spectrometry data
More Info
expand_more
Abstract
Multivariate images are built up by measuring multiple features or variables simultaneously while recording a measurement’s location. An example of such images is Imaging Mass Spectrometry (IMS) data. IMS is a technique for recording the mass-over-charge ratio of molecules in (biological) samples while also recording the molecules’ spatial location. High dimensionality in multivariate images (e.g. many recorded features per pixel) often makes direct human interpretation infeasible and computational analysis impractical. For this reason, unsupervised and data-driven factorization is often applied prior to any exploration of the data, with the goal of reducing its dimensionality. However, one of the more promising factorization methods for multivariate images, Maximum Autocorrelation Factorization (MAF), still depends on some input from the user. Unlike most factorization methods, that focus solely on the spectral content of the observations, MAF also utilizes the spatial structure of the input data to generate matrix factors that try to capture both spatial and spectral patterns in the data. The factors of MAF represent the content of a matrix, ranked according to spatial autocorrelation, i.e. how rapidly they vary spatially. The idea for application of MAF in a bioimaging context is that naturally occurring, signals tend to form larger, more uniform areas and therefore change more slowly spatially, compared to noisy, non-biological measurement patterns. Since MAF factors are ordered according to autocorrelation, noisy measurement (with low autocorrelation) tend to get demoted in the order of factors, effectively separating them from biological data components (with positive autocorrelation). The goal of this thesis is to build upon the MAF algorithm and remove the current need for user input, making an extension of MAF. This novel factorization method is named Extended Maximum Autocorrelation Factorization (EMAF). Similarly to MAF, EMAF is invariant to linear transformations, utilizes spatial and spectral information of the dataset to determine its factors, and produces uncorrelated factors at all distances under certain conditions. The EMAF algorithm is fully unsupervised and produces factors ranked according to spatial autocorrelation. Unlike MAF, EMAF does not unnecessarily promote spatial artifacts oriented in one particular direction over other directions. The exact formulation of EMAF, derivations of the mentioned traits, and practical examples of EMAF are found in the following thesis. Preliminary experiments show that EMAF returns factors of improved quality compared to MAF.