Transfer learning¶

杜岳華¶

2019.4.20¶

Transfer learning¶

Data not directly related to the task.

Domain transfer: Different data, similar task
Task transfer: Similar data, different task

Large unrelated data applied to specific data on specific task.

Domain transfer¶

$$ \text{(large) Source data }(x^s, y^s) \rightarrow \text{(small) Target data }(x^t, y^t) $$

Neural network¶

Fine-tuning¶

Conservative training
Layer transfer
Task transfer

Conservative training¶

It fails easily...¶

Ways to avoid training fail...¶

regularization on the outcome.
restrict model parameters have to be similar to the original model.

Layer transfer¶

Which layers can be transfered?¶

It depends...

Speech¶

$$voice \Rightarrow frequency \Rightarrow timbre (音色) \Rightarrow articulation (發音) \Rightarrow word$$

Transfer the latter layers.

Image¶

$$pixel \Rightarrow edge \Rightarrow shape \Rightarrow pattern \Rightarrow object \Rightarrow instance$$

Transfer the former layers.

Task transfer¶

Fully Convolutional Networks for Semantic Segmentation

Multitask learning¶

Help to learn a common and good representation for multiple tasks.

Multitask learning¶

Feature-based multitask learning¶

Parameter-based multitask learning¶

Instance-based multitask learning¶

Few works

Feature-based multitask learning¶

Feature transformation approach
Feature selection approach
Deep learning approach

Parameter-based multitask learning¶

Put task relatedness into model learning via regularization on model parameters¶

Low-rank approach
Task-clustering approach
Task-relation learning approach
Dirty approach
Multi-level approach

Multimodal learning (not transfer learning)¶

Integrate multiple data modalities, which provides rich information for model prediction.

Multimodal learning (not transfer learning)¶

Domain-adversarial learning¶

Github repo

Domain-adversarial learning¶

Feature extractor try to reduce the domain-specific properties
Domain classifier: classify the domain where data come from

$$ \mathcal{L} = (\text{loss of label classifier}) - (\text{loss of domain classifier}) $$

Domain-adversarial learning¶

Zero-shot learning¶

Training over seen dataset, while have the ability to classify unseen instances.¶

Images are mapped into a semantic space.
Classifier assign test images into classes for which they have seen.
Novelty detection

Zero-shot learning¶

Novelty detection¶

Novelty variable $V$
Seen image classifier: softmax classifier
Unseen classifier: Gaussian classifier

Seen image classifier¶

give the probability of known classes.

Unseen classifier¶

Estimate the probability of a known semantic word vector $w_y$.
The probability is a Gaussian distribution $(w_y, \Sigma_y)$.

Self-taught learning¶

With the help of unlabeled data, train supervised task.

Obtain bases $b$ from sparse encoding with unlabeled data¶

$$ \begin{align} \mathop{\arg\min}_{a,b} & \ \ \sum_i \left\lVert x_u^{(i)} - \sum_j a_j^{(i)}b_j \right\rVert ^2 + \beta \left\lVert a_j^{(i)} \right\rVert_1 \\ \text{subject to} & \ \ \left\lVert b_j \right\rVert_2 \le 1 \end{align} $$

Compute features with labeled data¶

$$ \hat{a}(x_l^{(i)}) = \mathop{\arg\min}_{a^{(i)}} \left\lVert x_l^{(i)} - \sum_j a_j^{(i)}b_j \right\rVert ^2 + \beta \left\lVert a^{(i)} \right\rVert_1 $$

Train supervised learning with $(\hat{a}(x_l^{(i)}), y^{(i)})$¶

Self-taught learning¶

Self-taught clustering¶

Large unlabeled auxiliary data helps clustering small unlabeled target data¶

Auxiliary data help uncover a better data representation for target data
target data $X$, auxiliary data $Y$
common feature $Z$

Self-taught clustering¶

Information theoretic co-clustering¶

Loss function¶

$$ \min I(X, Z) - I(\tilde{X}, \tilde{Z}) $$

Mutual information¶

$$ I(X, Y) = \sum_y \sum_x p(x, y) log(\frac{p(x, y)}{p(x)p(y)}) $$

Self-taught clustering¶

Loss function¶

$$ \mathcal{J} = I(X, Z) - I(\tilde{X}, \tilde{Z}) + \lambda [I(Y, Z) - I(\tilde{Y}, \tilde{Z})] $$

$\tilde{X}$: clusters of $X$
$\tilde{Y}$: clusters of $Y$
$\tilde{Z}$: clusters of $Z$

Thank you for attention¶

References¶

ML Lecture 19: Transfer Learning - 李宏毅

Transfer learning¶

杜岳華¶

2019.4.20¶