High-Dimensional Data Visualization via Sampling-Based Approaches

Measurement of structural similarity between different embeddings as a way of predicting a suitable perplexity

Bachelor Thesis (2025)
Author(s)

R. Chiriac (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Martin Skrodzki – Mentor (TU Delft - Computer Graphics and Visualisation)

K Hildebrandt – Mentor (TU Delft - Computer Graphics and Visualisation)

C. Lofi – Graduation committee member (TU Delft - Web Information Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
26-06-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Dimensionality reduction techniques, such as t-SNE, are widely used to visualize high-dimensional data and have a crucial role in practical tasks such as biological data exploration, anomaly detection, or clustering large datasets. However, they are highly dependent on hyperparameters or sampling strategies. This paper investigates whether the structural similarity between sampled and full embeddings can be measured using Procrustes analysis by comparing the structural similarity of the embeddings. This work provides a reproducible framework that quantifies the difference between visualizations produced by sampling t-SNE. These insights provide users a medium to create visualizations with t-SNE without exhaustive experimentation (for example, creating all visualizations), making t-SNE more accessible and reliable.

Files

License info not available