A mapping from high dimensional space to low dimensional space.
pj|i=p(xj|xi): the probability of a path from xi to xj
qj|i=q(yj|yi): the probability of a path from yi to yj
How likely xi would pick xj as its neighbor.
van der Maaten, L. & Hinton, G. (2008). Visualizing Data using t-SNE . Journal of Machine Learning Research, 9, 2579--2605.
In information theory, to measure how well a probability distribution predicts a sample. A low perplexity indicates the prediction is precise.
perp(p)=2H(pi)where H(pi)=−∑jpj|ilog2pj|i
Perplexity↑⇒H(pi)↑⇒σi↑
pj|i,pi|j→pi,j:
pi,j=12n(pj|i+pi|j)To make p and q as likely as possible.
Mcinnes, L., Healy, J., Saul, N., & Großberger, L. (2018). UMAP: Uniform Manifold Approximation and Projection. Journal of Open Source Software, 3(29), 861. doi: 10.21105/joss.00861
For fast convergence, normalized Laplacian is constructed from pi,j. Eigenvectors of normalized Laplacian is used as initialization of qi,j and further gradient descent method is used.
A new algorithm, called uniform manifold approximation and projection (UMAP) has been recently published and is claimed to preserve as much of the local and more of the global data structure than t-SNE, with a shorter run time.
Becht, E., McInnes, L., Healy, J. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol 37, 38–44 (2019) doi:10.1038/nbt.4314
Kobak, D., & Linderman, G. C. (2019). UMAP does not preserve global structure any better than t-SNE when using the same initialization. doi: 10.1101/2019.12.19.877522
Both visualization and trajectory inference aim to approximate the data manifold. They construct kNN graph to approximate the data manifold.
Data visualization: data points separate with each other which is better.
Trajectory: data points form a pattern which is better.