生物資訊的降維演算法及視覺化

Comparison of tSNE and UMAP

Yueh-Hua Tu 杜岳華

2020.4.10

Outline

  • Brief introduction of Dimensional reduction
  • Types of dimensional reduction
  • T-distributed stochastic neighbor embedding (t-SNE)
  • Uniform manifold approximation and projection (UMAP)
  • Comparison of two models
  • Dimensional reduction via graph

Biological Data

High dimensional biological data

High dimensional data form manifold

Example of dimensional reduction

Dimensional reduction brings information loss

Dimensional Reduction

A mapping from high dimensional space to low dimensional space.

Types of Dimensional Reduction

Types of Dimensional Reduction

  • Linear method
  • Non-linear method
    • Preserve global data structure
    • Preserve local data structure

Global structure v.s. local structure

Before we start...

Notation

$\large p_{j|i} = p(x_j | x_i)$: the probability of a path from $x_i$ to $x_j$

$\large q_{j|i} = q(y_j | y_i)$: the probability of a path from $y_i$ to $y_j$

Nonlinear Dimensional Reduction

To preserve distance or neighbors?

T-distributed Stochastic Neighbor Embedding (t-SNE)

T-distributed Stochastic Neighbor Embedding (t-SNE)

$$ p_{j|i} = \frac{exp(-||x_i - x_j||^2 / 2 \sigma_i^2)}{\sum_{k \ne i} exp(-||x_i - x_k||^2 / 2 \sigma_i^2)} $$

How likely $x_i$ would pick $x_j$ as its neighbor.

van der Maaten, L. & Hinton, G. (2008). Visualizing Data using t-SNE . Journal of Machine Learning Research, 9, 2579--2605.

Stochastic neighbor embedding (SNE)

Similarity in (high-dimension) original space

$$ \large p_{j|i} = \frac{exp(-||x_i - x_j||^2 / 2 \sigma_i^2)}{\sum_{k \ne i} exp(-||x_i - x_k||^2 / 2 \sigma_i^2)} $$

Similarity in (low-dimension) reconstructed space

$$ \large q_{j|i} = \frac{exp(-||y_i - y_j||^2)}{\sum_{k \ne i} exp(-||y_i - y_k||^2)} $$

Distance to probability

SNE

Loss function

$$ \large \arg\min D_{KL}(p || q) = \sum_i D_{KL}(p_i || q_i) = \sum_i \sum_j p_{j|i} \log \frac{p_{j|i}}{q_{j|i}} $$

Minimize the dissimilarity of distribution between $p$ and $q$

Optimization: gradient descent

The crowding problem

Use t-distribution for repulsive force

T-distributed Stochastic Neighbor Embedding (t-SNE)

Similarity in high dimension

$$ \large p_{j|i} = \frac{exp(-||x_i - x_j||^2 / 2 \sigma_i^2)}{\sum_{k \ne i} exp(-||x_i - x_k||^2 / 2 \sigma_i^2)} $$

Similarity in low dimension

$$ \large q_{j|i} = \frac{(1 + ||y_i - y_j||^2)^{-1}}{\sum_{k \ne i} (1 + ||y_i - y_k||^2)^{-1}} $$

How to decide the variance of Gaussian distribution?

Perplexity 困惑度

In information theory, to measure how well a probability distribution predicts a sample. A low perplexity indicates the prediction is precise.

$$ perp(p) = 2^{H(p_i)} $$

where $H(p_i) = -\sum_j p_{j|i} \log_2 p_{j|i}$

Perplexity: how smooth a Gaussian distribution would be

$Perplexity \uparrow \Rightarrow H(p_i) \uparrow \Rightarrow \sigma_i \uparrow$

  • $\sigma_i$:
    • large: points get crowded
    • small: components break into segments

Making it symmetric

$p_{j|i}, p_{i|j} \rightarrow p_{i,j}$:

$$ \large p_{i,j} = \frac{1}{2n} (p_{j|i} + p_{i|j}) $$

Assumption: make each data point contributes significantly.

$$ \sum_j p_{j|i} > \frac{1}{2n} $$

t-SNE

Loss function

$$ \large D_{KL}(p || q) = \sum_i D_{KL}(p_i || q_i) = \sum_i \sum_j -p_{j|i} \log \frac{q_{j|i}}{p_{j|i}} $$

To make $p$ and $q$ as likely as possible.

Uniform Manifold Approximation and Projection (UMAP)

UMAP

Mcinnes, L., Healy, J., Saul, N., & Großberger, L. (2018). UMAP: Uniform Manifold Approximation and Projection. Journal of Open Source Software, 3(29), 861. doi: 10.21105/joss.00861

UMAP

$\epsilon$-neighborhood

Graph is broken into pieces

If the data is uniformly distributed on the manifold, the cover will be good.

Assumption: Data is uniformly distributed on the manifold

Use fuzzy metric $p_{j|i}$

Assumption: manifold is locally connected

UMAP

Similarity in original space

$$ \large p_{j|i} = exp(-\frac{||x_i - x_j||^2 - \rho_i}{\sigma_i}) $$

Directed connected graph

Make it symmetric

Probabilistic fuzzy union

$$ \large p_{i,j} = p_{j|i} + p_{i|j} - p_{j|i} p_{i|j} $$

k-nearest neighbor graph construction

Given k, search $\sigma_i$ s.t.

$$ \large \log_2 k = \sum_i p_{i,j} $$

where $\large p_{j|i} = exp(-\frac{||x_i - x_j||^2 - \rho_i}{\sigma_i})$

Undirected locally connected graph

Reconstruction in low-dimensional space

t distribution with $\nu = 1$

$$ \large q_{i,j} = (1 + a||y_i - y_j||^2b)^{-1} $$

UMAP

Binary cross entropy loss

$$ \large H(p, q) = -\sum_i (p_i \log q_i) + ((1 - p_i) \log (1 - q_i)) $$
$$ = \underline{H(p)} + D_{KL}(p || q) $$

Optimization: gradient descent

Initialization

Eigenvectors of normalized Laplacian for initialization

For fast convergence, normalized Laplacian is constructed from $p_{i,j}$. Eigenvectors of normalized Laplacian is used as initialization of $q_{i,j}$ and further gradient descent method is used.

$$ \large Lu = \lambda u $$

kNN graph provides a underling backbone of data points, which somehow provides the global structure (connection) to dimensional reduction

What's new in UMAP?

UMAP introduces the baseline defined by the nearest neighbor, which unifies data, and make graph connected.

UMAP use eigenvectors of normalized Laplacian as initialization

Comparison of two models

Dimensional reduction via graph

How to decide the topology of graph?

Discrete approach (which generates a sparse graph)

  1. $\epsilon$-neighborhood
    • consider vertecies within $\epsilon$ as neighbors
  2. k-nearest neighbor
    • consider the k nearest vertecies as neighbors

Continuous approach (which generates a dense graph)

  1. distance
    • $\Large d(x_i, x_j) = ||x_i - x_j||^2$ as pairwise distance
  2. probability derived from distance (could be asymmetric)
    • $\Large p_{ij} = e^{-d(x_i, x_j)/\sigma}$ as pairwise distance

But...we all have to define distance metric in prior

Application of UMAP in single-cell analysis

A new algorithm, called uniform manifold approximation and projection (UMAP) has been recently published and is claimed to preserve as much of the local and more of the global data structure than t-SNE, with a shorter run time.

Becht, E., McInnes, L., Healy, J. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol 37, 38–44 (2019) doi:10.1038/nbt.4314

Application of UMAP in single-cell analysis

Application of UMAP in single-cell analysis

UMAP does not preserve global structure any better than t-SNE

Kobak, D., & Linderman, G. C. (2019). UMAP does not preserve global structure any better than t-SNE when using the same initialization. doi: 10.1101/2019.12.19.877522

UMAP does not preserve global structure any better than t-SNE

UMAP does not preserve global structure any better than t-SNE

Pseudotemporal ordering/Trajectory inference

Diffusion map

Diffusion map

Diffusion map

Diffusion map

Comparison

About kNN graph

Both visualization and trajectory inference aim to approximate the data manifold. They construct kNN graph to approximate the data manifold.

Different purposes

Data visualization: data points separate with each other which is better.

Trajectory: data points form a pattern which is better.

Thank you for attention