Attention mechanism
¶
杜岳華
¶
2019.3.30
¶
About me
¶
Julia Taiwan 社群發起人
AI Tech 社群常規成員與講師
工研院 機器學習理論與實作 講師
著作:《Julia程式設計》
專長:系統生物學、計算生物學、機器學習
碩論:Identification of cell state using super-enhancer RNA
陽明 生物醫學資訊所 碩士
成大 雙主修 醫學檢驗生物技術學系 學士,資訊工程學系 學士
Outline
¶
RNN 的問題
Seq2Seq encoder-decoder 架構
Attention model 解決的問題
Attention types
Applications of attention
Translation
Summarization
Image caption
Transformer
RNN 的問題
¶
picture source
Seq2Seq encoder-decoder 架構
¶
picture source
Seq2Seq encoder-decoder 架構
¶
picture source
Attention model 解決的問題
¶
picture source
How to solve the problem?
¶
picture source
Attention mechanism
¶
Attention mechanism
¶
Attention mechanism
¶
Attention mechanism
¶
Attention mechanism
¶
Attention mechanism
¶
Attention mechanism
¶
Attention mechanism
¶
Attention types
¶
Global/local attention
Hard/soft attention
Self-attention
Global/local attention
¶
Hard/soft attention
¶
Soft attention
¶
Alignment weights are learned to attend over all data
$0 \le w \le 1$
Pro: model is smooth and differentiable
Con: large computation if input is large
Hard attention
¶
Select part of data to attend or not at a time
0 or 1
Pro: less inference time
Con: model is non-differentiable
Self-attention
¶
Alignment (compatibility function)
¶
query: $q_{j}$, key: $k_i$
Location-based
¶
$$ \alpha^i_j = softmax(W_{\alpha} q_j) $$
Content-based
¶
$$ score(q_j, k_i) = cos([q_j; k_i]) $$
Additive
¶
$$ score(q_j, k_i) = v^T_{\alpha} tanh(W_{\alpha} [q_j; k_i]) $$
Alignment (compatibility function)
¶
General
¶
$$ score(q_j, k_i) = q_j^T W_{\alpha} k_i $$
Dot-product
¶
$$ score(q_j, k_i) = q_j^T k_i $$
Scaled dot-product
¶
$$ score(q_j, k_i) = \frac{q_j^T k_i}{\sqrt{n}} $$
Scaled dot-product attention
¶
Applications of attention
¶
Summarization:
Rush 2015
Translation:
Bahdanau 2014
,
Luong 2015
Image caption:
Xu 2015
...
Translation
¶
Translation
¶
Summarization
¶
Image caption
¶
Transformer
¶
The era of Transformer
¶
Why self-attention?
¶
More or equal efficient than RNN/CNN
Integration of RNN/CNN into Transformer
¶
Thank you for
attention
.
¶
References
¶
Attention? Attention!
放棄幻想,全面擁抱 Transformer:自然語言處理三大特徵抽取器(CNN/RNN/TF)比較
Papers
¶
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Sequence to Sequence Learning with Neural Networks
Neural Machine Translation by Jointly Learning to Align and Translate
A Neural Attention Model for Abstractive Sentence Summarization
Effective Approaches to Attention-based Neural Machine Translation
Reasoning about Entailment with Neural Attention
Attention Is All You Need
Pay Less Attention with Lightweight and Dynamic Convolutions
How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention