The Deterministic Information Bottleneck
在 Information Bottleneck 之後出現了不少驚豔的呼喊,也出現了指出這個方法的缺點及詮釋錯誤。
在這之後有人專注在確定性這件事上。
1 | graph LR |
$T$:
- soft sufficient statistics (for statistics)
- lossy compression (for signal)
- maximally informative clustering (for machine learning)
IB
$$
min\ \mathcal{L} [q(t|x)] = I(T; X) - \beta I(T; Y), \beta > 0
$$
$I(T; X)$: compression
$I(T; Y)$: relevance
Markov constraint: $T \leftarrow X \leftrightarrow Y$
$$
q(t|x) = \frac{q(t)}{Z(x, \beta)} exp(- \beta D_{KL} [p(y|x) || q(y|t)])) \\
q(t) = \sum_x p(x)q(t|x) \\
q(y|t) = \frac{1}{q(t)} \sum_x p(y|x)q(t|x)p(x)
$$
$I(T; X)$ from channel coding, rate distortion theory
DIB
$$
min\ \mathcal{L} [q(t|x)] = H(T) - \beta I(T; Y)
$$
$H(T)$: penalize coding itself
$I(T; Y)$: lead to deterministic $\mathcal{L}_{IB}$
$$
\mathcal{L} _{IB} - \mathcal{L} _{DIB} = I(T; X) - H(T) = -H(T|X)
$$
$\mathcal{L}_{IB}$: implicit encourage of stochastic
Generalized IB
$$
\mathcal{L}_{\alpha} = H(T) - \alpha H(T|X) - \beta I(Y; T)
$$
$\alpha = 1 \Rightarrow \mathcal{L} _{IB}$: stochastic $\rightarrow$ soft clustering
$\alpha = 0 \Rightarrow \mathcal{L} _{DIB}$: deterministic $\rightarrow$ hard clustering