Modern data analysis often involves working with large multidimensional datasets. Visualizing this kind of data helps leverage human intuition and pattern recognition to reveal hidden relationships. t-SNE is a widely used tool for creating such visualizations. Despite its popular
...
Modern data analysis often involves working with large multidimensional datasets. Visualizing this kind of data helps leverage human intuition and pattern recognition to reveal hidden relationships. t-SNE is a widely used tool for creating such visualizations. Despite its popularity, it suffers drawbacks in the form of hard-to-tune parameters with no heuristic for guaranteed best results. Due to the size of the data researchers have to work with, the algorithm can often exceed the available memory and lead to slowdowns and crashes. This paper investigates the behaviour of memory usage with respect to the tunable parameter perplexity and the size of the data. It provides a reliable way for researchers to predict the memory consumption before running the algorithm for the popular openTSNE implementation of t-SNE. In addition, a modification to reduce the peak memory usage of the implementation is presented. Together, these contributions improve the reliability and efficiency of t-SNE pipelines in memory-constrained environments.