Interactive manipulation of t-SNE embeddings
G.E. Bos (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Mark van de Ruit – Mentor (TU Delft - Computer Graphics and Visualisation)
E Eisemann – Mentor (TU Delft - Computer Graphics and Visualisation)
Tom Viering – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Low-dimensional datasets, for which each datapoint contains no more than three attributes, are straightforward to visualize with common visualization idioms, such as scatterplots. In order to visualize high-dimensional datasets with potentially thousands of attributes, their dimensionality will need to be reduced. t-SNE is a widely used, state-of-the-art algorithm for non-linear dimensionality reduction. It produces embeddings of the high-dimensional data onto two or three dimensions by using gradient descent to minimize the discrepancy between the probability distributions of the high-dimensional and low-dimensional datapoint similarities, iteratively adjusting the low-dimensional embedding. However, since it is impossible to capture all neighbourhoods over all high-dimensional axes in two or three dimensions, the resulting clustering is merely virtual, only meant to offer an intuition of the dataset's high-dimensional distribution. Our work proposes a number of tools that allow a user to interactively impose constraints based on prior knowledge or assumptions about the dataset and manipulate the underlying probability model with the aim of enhancing an embedding's interpretability, for a better intuition about the data's high-dimensional structure. We demonstrate the effectiveness and limitations of these tools with examples and case studies.