High-Dimensional Data Visualization via Sampling-Based Approaches

None, None

High-Dimensional Data Visualization via Sampling-Based Approaches

Effect of Perplexity at different levels of Sampling-Based Approach

Bachelor Thesis (2025)

Author(s)

M.A. Bhatti (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Martin Skrodzki – Mentor (TU Delft - Computer Graphics and Visualisation)

K Hildebrandt – Mentor (TU Delft - Computer Graphics and Visualisation)

C. Lofi – Graduation committee member (TU Delft - Web Information Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

To reference this document use:

https://resolver.tudelft.nl/uuid:e66bcd35-f202-4d2e-a685-147463a1f397

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

26-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Visualizing high-dimensional data is a key challenge in modern data analysis. T-distributed Stochastic Neighbor Embedding (t-SNE) is a popular nonlinear dimensionality reduction technique that maps such data into a low-dimensional embedding while preserving local relationships. A critical hyperparameter in t-SNE is perplexity. Choosing an appropriate value of perplexity for a particular use-case is non-trivial, especially for large datasets, where repeated t-SNE computations become computationally prohibitive. To mitigate this, the sample-based approach runs t-SNE twice: first on a downsampled subset of the data and then on the full dataset. This introduces two perplexity parameters: sample perplexity for the first run and full perplexity for the second run.

In this work, we systematically investigate the impact of varying combinations of sample perplexity and full perplexity on the quality of the final t-SNE embedding. Our findings show that sample perplexity predominantly determines the global layout of the embedding, while full perplexity influences local refinement. We also compare our approach with different strategies for choosing perplexity values, and find that while some offer better preservation of structural details, they provide less flexibility.

Files

Final_paper_3.pdf

(pdf | 5.61 Mb)

License info not available