Sample-Based t-SNE Embeddings
How different Sampling Strategies influence the Quality of Low-Dimensional Embeddings
E.L. Ketterer (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Martin Skrodzki – Mentor (TU Delft - Computer Graphics and Visualisation)
K Hildebrandt – Mentor (TU Delft - Computer Graphics and Visualisation)
C. Lofi – Graduation committee member (TU Delft - Web Information Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Data visualisation is an important area of research: as the amount of data keeps increasing, we have to find ways of showcasing this data to provide an intuition for trends and patterns within it. This can be a particular challenge for high-dimensional data, since we cannot perceive it as is. A common approach is to use dimensionality-reduction techniques to bring the high-dimensional data into lower dimensions, which can then be visualised. One such technique is t-distributed Stochastic Neighbour Embedding (t-SNE), which produces good visualisations but struggles with long runtimes. This paper explores the effect of using sampled data instead of the full dataset to produce t-SNE embeddings, reducing the runtime of the algorithm and hence providing visualisations faster. We show that both visually and numerically, uniform random sampling and Poisson disk sampling can result in much faster runtimes while producing similar, or even more meaningful embeddings than the embedding of the entire dataset.