Interactive manipulation of t-SNE embeddings

Master Thesis (2024)
Author(s)

G.E. Bos (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Mark van de Ruit – Mentor (TU Delft - Computer Graphics and Visualisation)

E Eisemann – Mentor (TU Delft - Computer Graphics and Visualisation)

Tom Viering – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
25-11-2024
Awarding Institution
Delft University of Technology
Programme
['Computer Science | Software Technology']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Low-dimensional datasets, for which each datapoint contains no more than three attributes, are straightforward to visualize with common visualization idioms, such as scatterplots. In order to visualize high-dimensional datasets with potentially thousands of attributes, their dimensionality will need to be reduced. t-SNE is a widely used, state-of-the-art algorithm for non-linear dimensionality reduction. It produces embeddings of the high-dimensional data onto two or three dimensions by using gradient descent to minimize the discrepancy between the probability distributions of the high-dimensional and low-dimensional datapoint similarities, iteratively adjusting the low-dimensional embedding. However, since it is impossible to capture all neighbourhoods over all high-dimensional axes in two or three dimensions, the resulting clustering is merely virtual, only meant to offer an intuition of the dataset's high-dimensional distribution. Our work proposes a number of tools that allow a user to interactively impose constraints based on prior knowledge or assumptions about the dataset and manipulate the underlying probability model with the aim of enhancing an embedding's interpretability, for a better intuition about the data's high-dimensional structure. We demonstrate the effectiveness and limitations of these tools with examples and case studies.

Files

License info not available