Interactive manipulation of t-SNE embeddings

More Info
expand_more

Abstract

Low-dimensional datasets, for which each datapoint contains no more than three attributes, are straightforward to visualize with common visualization idioms, such as scatterplots. In order to visualize high-dimensional datasets with potentially thousands of attributes, their dimensionality will need to be reduced. t-SNE is a widely used, state-of-the-art algorithm for non-linear dimensionality reduction. It produces embeddings of the high-dimensional data onto two or three dimensions by using gradient descent to minimize the discrepancy between the probability distributions of the high-dimensional and low-dimensional datapoint similarities, iteratively adjusting the low-dimensional embedding. However, since it is impossible to capture all neighbourhoods over all high-dimensional axes in two or three dimensions, the resulting clustering is merely virtual, only meant to offer an intuition of the dataset's high-dimensional distribution. Our work proposes a number of tools that allow a user to interactively impose constraints based on prior knowledge or assumptions about the dataset and manipulate the underlying probability model with the aim of enhancing an embedding's interpretability, for a better intuition about the data's high-dimensional structure. We demonstrate the effectiveness and limitations of these tools with examples and case studies.

Files

License info not available