Exploring the computational feasibility limits of perplexity in t-SNE for scenarios of limited working memory
D. Netzov (TU Delft - Electrical Engineering, Mathematics and Computer Science)
K Hildebrandt – Mentor (TU Delft - Computer Graphics and Visualisation)
C. Lofi – Graduation committee member (TU Delft - Web Information Systems)
Martin Skrodzki – Mentor (TU Delft - Computer Graphics and Visualisation)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Modern data analysis often involves working with large multidimensional datasets. Visualizing this kind of data helps leverage human intuition and pattern recognition to reveal hidden relationships. t-SNE is a widely used tool for creating such visualizations. Despite its popularity, it suffers drawbacks in the form of hard-to-tune parameters with no heuristic for guaranteed best results. Due to the size of the data researchers have to work with, the algorithm can often exceed the available memory and lead to slowdowns and crashes. This paper investigates the behaviour of memory usage with respect to the tunable parameter perplexity and the size of the data. It provides a reliable way for researchers to predict the memory consumption before running the algorithm for the popular openTSNE implementation of t-SNE. In addition, a modification to reduce the peak memory usage of the implementation is presented. Together, these contributions improve the reliability and efficiency of t-SNE pipelines in memory-constrained environments.