A mapping from high dimensional space to low dimensional space.
$\large p_{j|i} = p(x_j | x_i)$: the probability of a path from $x_i$ to $x_j$
$\large q_{j|i} = q(y_j | y_i)$: the probability of a path from $y_i$ to $y_j$
How likely $x_i$ would pick $x_j$ as its neighbor.
van der Maaten, L. & Hinton, G. (2008). Visualizing Data using t-SNE . Journal of Machine Learning Research, 9, 2579--2605.
In information theory, to measure how well a probability distribution predicts a sample. A low perplexity indicates the prediction is precise.
$$ perp(p) = 2^{H(p_i)} $$where $H(p_i) = -\sum_j p_{j|i} \log_2 p_{j|i}$
$Perplexity \uparrow \Rightarrow H(p_i) \uparrow \Rightarrow \sigma_i \uparrow$
$p_{j|i}, p_{i|j} \rightarrow p_{i,j}$:
$$ \large p_{i,j} = \frac{1}{2n} (p_{j|i} + p_{i|j}) $$To make $p$ and $q$ as likely as possible.
Mcinnes, L., Healy, J., Saul, N., & Großberger, L. (2018). UMAP: Uniform Manifold Approximation and Projection. Journal of Open Source Software, 3(29), 861. doi: 10.21105/joss.00861
where $\large p_{j|i} = exp(-\frac{||x_i - x_j||^2 - \rho_i}{\sigma_i})$
For fast convergence, normalized Laplacian is constructed from $p_{i,j}$. Eigenvectors of normalized Laplacian is used as initialization of $q_{i,j}$ and further gradient descent method is used.
A new algorithm, called uniform manifold approximation and projection (UMAP) has been recently published and is claimed to preserve as much of the local and more of the global data structure than t-SNE, with a shorter run time.
Becht, E., McInnes, L., Healy, J. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol 37, 38–44 (2019) doi:10.1038/nbt.4314
Kobak, D., & Linderman, G. C. (2019). UMAP does not preserve global structure any better than t-SNE when using the same initialization. doi: 10.1101/2019.12.19.877522
Both visualization and trajectory inference aim to approximate the data manifold. They construct kNN graph to approximate the data manifold.
Data visualization: data points separate with each other which is better.
Trajectory: data points form a pattern which is better.