Sample-Based t-SNE Embeddings

None, None

Sample-Based t-SNE Embeddings

How different Sampling Strategies influence the Quality of Low-Dimensional Embeddings

Bachelor Thesis (2025)

Author(s)

E.L. Ketterer (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Martin Skrodzki – Mentor (TU Delft - Computer Graphics and Visualisation)

K Hildebrandt – Mentor (TU Delft - Computer Graphics and Visualisation)

C. Lofi – Graduation committee member (TU Delft - Web Information Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Data visualisation Sampling T-SNE High-dimensional data

To reference this document use:

https://resolver.tudelft.nl/uuid:43d9439f-e5e4-4bac-b55d-10f59eeeaa59

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

27-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Data visualisation is an important area of research: as the amount of data keeps increasing, we have to find ways of showcasing this data to provide an intuition for trends and patterns within it. This can be a particular challenge for high-dimensional data, since we cannot perceive it as is. A common approach is to use dimensionality-reduction techniques to bring the high-dimensional data into lower dimensions, which can then be visualised. One such technique is t-distributed Stochastic Neighbour Embedding (t-SNE), which produces good visualisations but struggles with long runtimes. This paper explores the effect of using sampled data instead of the full dataset to produce t-SNE embeddings, reducing the runtime of the algorithm and hence providing visualisations faster. We show that both visually and numerically, uniform random sampling and Poisson disk sampling can result in much faster runtimes while producing similar, or even more meaningful embeddings than the embedding of the entire dataset.

Files

Final_Research_Paper_Em_Ketter... (pdf)

(pdf | 3.89 Mb)

License info not available