Interactive manipulation of t-SNE embeddings

None, None

Interactive manipulation of t-SNE embeddings

Master Thesis (2024)

Author(s)

G.E. Bos (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Mark van de Ruit – Mentor (TU Delft - Computer Graphics and Visualisation)

E Eisemann – Mentor (TU Delft - Computer Graphics and Visualisation)

Tom Viering – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

High-Dimensional Data Analysis Data visualization Dimensionality reduction T-SNE

To reference this document use:

https://resolver.tudelft.nl/uuid:722fde94-89bf-400f-a486-980ee8001df3

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

25-11-2024

Awarding Institution

Delft University of Technology

Programme

['Computer Science | Software Technology']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Low-dimensional datasets, for which each datapoint contains no more than three attributes, are straightforward to visualize with common visualization idioms, such as scatterplots. In order to visualize high-dimensional datasets with potentially thousands of attributes, their dimensionality will need to be reduced. t-SNE is a widely used, state-of-the-art algorithm for non-linear dimensionality reduction. It produces embeddings of the high-dimensional data onto two or three dimensions by using gradient descent to minimize the discrepancy between the probability distributions of the high-dimensional and low-dimensional datapoint similarities, iteratively adjusting the low-dimensional embedding. However, since it is impossible to capture all neighbourhoods over all high-dimensional axes in two or three dimensions, the resulting clustering is merely virtual, only meant to offer an intuition of the dataset's high-dimensional distribution. Our work proposes a number of tools that allow a user to interactively impose constraints based on prior knowledge or assumptions about the dataset and manipulate the underlying probability model with the aim of enhancing an embedding's interpretability, for a better intuition about the data's high-dimensional structure. We demonstrate the effectiveness and limitations of these tools with examples and case studies.

Files

Interactive_manipulation_of_t-... (pdf)

(pdf | 59.3 Mb)

License info not available