Data visualisation is an important area of research: as the amount of data keeps increasing, we have to find ways of showcasing this data to provide an intuition for trends and patterns within it. This can be a particular challenge for high-dimensional data, since we cannot perce
...
Data visualisation is an important area of research: as the amount of data keeps increasing, we have to find ways of showcasing this data to provide an intuition for trends and patterns within it. This can be a particular challenge for high-dimensional data, since we cannot perceive it as is. A common approach is to use dimensionality-reduction techniques to bring the high-dimensional data into lower dimensions, which can then be visualised. One such technique is t-distributed Stochastic Neighbour Embedding (t-SNE), which produces good visualisations but struggles with long runtimes. This paper explores the effect of using sampled data instead of the full dataset to produce t-SNE embeddings, reducing the runtime of the algorithm and hence providing visualisations faster. We show that both visually and numerically, uniform random sampling and Poisson disk sampling can result in much faster runtimes while producing similar, or even more meaningful embeddings than the embedding of the entire dataset.