Accelerating t-SNE for hyperbolic embeddings

More Info
expand_more

Abstract

t-distributed stochastic neighbor embedding (t-SNE) is a dimensionality reduction technique able to embed high-dimensional data into 2 or 3 dimensions for the purpose of visualization, and has proven to be useful in various applications, such as single-cell analysis. Normally t-SNE embeds into Euclidean space, but recent work shows that embedding into hyperbolic space can lead to higher quality embeddings. Most notably on data containing hierarchical structures, as seen in single-cell measurements. As such it would be interesting to generalize computation of t-SNE embeddings to embeddings in hyperbolic space. A missing tool for this is the approximation of the objective and the gradient. This is needed as the cost for evaluating the objective and gradient is quadratic in the number of data points to be embedded. In practice such approximations are already used for Euclidean embeddings. In this thesis we introduce the first approximation scheme for hyperbolic t-SNE embeddings. To achieve this we generalize a Barnes-Hut approximation scheme to hyperbolic space. A major difficulty for this is to transfer the use of a quadtree in the Euclidean plane to hyperbolic space. We propose the usage of a polar quadtree in the Poincaré disk and show that our solution yields substantial gain in run time, in particular when embedding larger data sets, with the resulting embeddings being similar in quantitative and qualitative aspects.