A mapping from high dimensional space to low dimensional space.
pj|i=p(xj|xi): the probability of a path from xi to xj
qj|i=q(yj|yi): the probability of a path from yi to yj
How likely xi would pick xj as its neighbor.
van der Maaten, L. & Hinton, G. (2008). Visualizing Data using t-SNE . Journal of Machine Learning Research, 9, 2579--2605.
In information theory, to measure how well a probability distribution predicts a sample. A low perplexity indicates the prediction is precise.
perp(p) = 2^{H(p_i)}where H(p_i) = -\sum_j p_{j|i} \log_2 p_{j|i}
Perplexity: how smooth a Gaussian distribution would be.
Perplexity \uparrow \Rightarrow H(p_i) \uparrow \Rightarrow \sigma_i \uparrow
p_{j|i}, p_{i|j} \rightarrow p_{i,j}:
\large p_{i,j} = \frac{1}{2n} (p_{j|i} + p_{i|j})To make p and q as likely as possible.
Mcinnes, L., Healy, J., Saul, N., & Großberger, L. (2018). UMAP: Uniform Manifold Approximation and Projection. Journal of Open Source Software, 3(29), 861. doi: 10.21105/joss.00861
where \large p_{j|i} = exp(-\frac{||x_i - x_j||^2 - \rho_i}{\sigma_i})
For fast convergence, normalized Laplacian is constructed from p_{i,j}. Eigenvectors of normalized Laplacian is used as initialization of q_{i,j} and further gradient descent method is used.
Kobak, D., & Linderman, G. C. (2019). UMAP does not preserve global structure any better than t-SNE when using the same initialization. doi: 10.1101/2019.12.19.877522
Normalized Laplacian: L = I - D^{-\frac{1}{2}}AD^{-\frac{1}{2}}
construct graph from features and transform features on the geometry of graph
graph = f(X) \\ Y = g(X, graph)