Attention mechanism¶

杜岳華¶

2019.3.30¶

About me¶

Julia Taiwan 社群發起人
AI Tech 社群常規成員與講師
工研院機器學習理論與實作講師
著作：《Julia程式設計》

專長：系統生物學、計算生物學、機器學習
碩論：Identification of cell state using super-enhancer RNA

陽明生物醫學資訊所碩士
成大雙主修醫學檢驗生物技術學系學士，資訊工程學系學士

Outline¶

RNN 的問題
Seq2Seq encoder-decoder 架構
Attention model 解決的問題
Attention types
Applications of attention
- Translation
- Summarization
- Image caption
Transformer

RNN 的問題¶

picture source

Seq2Seq encoder-decoder 架構¶

picture source

Seq2Seq encoder-decoder 架構¶

picture source

Attention model 解決的問題¶

picture source

How to solve the problem?¶

picture source

Attention mechanism¶

Attention types¶

Global/local attention
Hard/soft attention
Self-attention

Global/local attention¶

Hard/soft attention¶

Soft attention¶

Alignment weights are learned to attend over all data
$0 \le w \le 1$
Pro: model is smooth and differentiable
Con: large computation if input is large

Hard attention¶

Select part of data to attend or not at a time
0 or 1
Pro: less inference time
Con: model is non-differentiable

Self-attention¶

Alignment (compatibility function)¶

query: $q_{j}$, key: $k_i$

Location-based¶

$$ \alpha^i_j = softmax(W_{\alpha} q_j) $$

Content-based¶

$$ score(q_j, k_i) = cos([q_j; k_i]) $$

Additive¶

$$ score(q_j, k_i) = v^T_{\alpha} tanh(W_{\alpha} [q_j; k_i]) $$

Alignment (compatibility function)¶

General¶

$$ score(q_j, k_i) = q_j^T W_{\alpha} k_i $$

Dot-product¶

$$ score(q_j, k_i) = q_j^T k_i $$

Scaled dot-product¶

$$ score(q_j, k_i) = \frac{q_j^T k_i}{\sqrt{n}} $$

Scaled dot-product attention¶

Applications of attention¶

Summarization: Rush 2015
Translation: Bahdanau 2014, Luong 2015
Image caption: Xu 2015
...

Translation¶

Summarization¶

Image caption¶

Transformer¶

The era of Transformer¶

Why self-attention?¶

More or equal efficient than RNN/CNN

Attention mechanism¶

杜岳華¶

2019.3.30¶

About me¶

Outline¶

RNN 的問題¶

Seq2Seq encoder-decoder 架構¶

Seq2Seq encoder-decoder 架構¶

Attention model 解決的問題¶

How to solve the problem?¶

Attention mechanism¶

Attention mechanism¶

Attention mechanism¶

Attention mechanism¶

Attention mechanism¶

Attention mechanism¶

Attention mechanism¶

Attention mechanism¶

Attention types¶

Global/local attention¶

Hard/soft attention¶

Soft attention¶

Hard attention¶

Self-attention¶

Alignment (compatibility function)¶

Location-based¶

Content-based¶

Additive¶

Alignment (compatibility function)¶

General¶

Dot-product¶

Scaled dot-product¶

Scaled dot-product attention¶

Applications of attention¶

Translation¶

Translation¶

Summarization¶

Image caption¶

Transformer¶

The era of Transformer¶

Why self-attention?¶

Integration of RNN/CNN into Transformer¶

Thank you for attention.¶

References¶

Papers¶