A mapping from high dimensional space to low dimensional space.
$\large p_{j|i} = p(x_j | x_i)$: the probability of a path from $x_i$ to $x_j$
$\large q_{j|i} = q(y_j | y_i)$: the probability of a path from $y_i$ to $y_j$
How likely $x_i$ would pick $x_j$ as its neighbor.
van der Maaten, L. & Hinton, G. (2008). Visualizing Data using t-SNE . Journal of Machine Learning Research, 9, 2579--2605.
In information theory, to measure how well a probability distribution predicts a sample. A low perplexity indicates the prediction is precise.
$$ perp(p) = 2^{H(p_i)} $$where $H(p_i) = -\sum_j p_{j|i} \log_2 p_{j|i}$
Perplexity: how smooth a Gaussian distribution would be.
$Perplexity \uparrow \Rightarrow H(p_i) \uparrow \Rightarrow \sigma_i \uparrow$
$p_{j|i}, p_{i|j} \rightarrow p_{i,j}$:
$$ \large p_{i,j} = \frac{1}{2n} (p_{j|i} + p_{i|j}) $$To make $p$ and $q$ as likely as possible.
Mcinnes, L., Healy, J., Saul, N., & Großberger, L. (2018). UMAP: Uniform Manifold Approximation and Projection. Journal of Open Source Software, 3(29), 861. doi: 10.21105/joss.00861
where $\large p_{j|i} = exp(-\frac{||x_i - x_j||^2 - \rho_i}{\sigma_i})$
For fast convergence, normalized Laplacian is constructed from $p_{i,j}$. Eigenvectors of normalized Laplacian is used as initialization of $q_{i,j}$ and further gradient descent method is used.
Kobak, D., & Linderman, G. C. (2019). UMAP does not preserve global structure any better than t-SNE when using the same initialization. doi: 10.1101/2019.12.19.877522
Normalized Laplacian: $L = I - D^{-\frac{1}{2}}AD^{-\frac{1}{2}}$
transform features on the geometry of graph
$$ Y = f(X, graph) $$construct graph from features and transform features on the geometry of graph
$$ graph = f(X) \\ Y = g(X, graph) $$