Introduction to graph neural networks

Yueh-Hua Tu 杜岳華

2019.04.28

Outline

  • Brief history and some graph applications
  • Some definitions about graph
  • Taxonomy of GNN
  • Tasks
  • Graph convolution networks
  • Graph attention networks
  • Graph autoencoder
  • Graph generative networks
  • Graph spatial-temporal networks
  • Graph network (GN) block

Brief history

1998 LeNet - LeCun

2003 Language model - Bengio

2005 GNN - Franco Scarselli, Marco Gori, Gabriele Monfardini

2006 RBM - Hinton

2009 GNN - Marco Gori, Gabriele Monfardini, Franco Scarselli

2012 AlexNet - Hinton

2013 Word2Vec - Google

2015 Deep learning - Hinton

Graph applications

  • Social network
  • Biological network (gene regulatory network, protein-protein interaction network)
  • Computer graphics
  • Traffic network
  • Knowledge graph

Internet

Traffic network

Global flight network

Social network

Biological pathway

KEGG

Protein-protein interaction network

Protein-protein interaction network

Schziophrenia PPI network

picture source

Human gene coexpression network

Computer graphics

Knowledge graph

NLP and knowledge graph

PageRank

Network science

Definition

$$G = (V, E)$$$$V = \{1, 2, 3, 4, 5, 6, 7, 8, 9\}$$$$E = \{(1, 2), (2, 3), (2, 7), (3, 4), (4, 7), (5, 6), (6, 7), (7, 8), (7, 9)\}$$

Adjacency matrix

$$ A = \begin{bmatrix} 0& 1& 0& 0& 0& 0& 0& 0& 0 \\ 1& 0& 1& 0& 0& 0& 1& 0& 0 \\ 0& 1& 0& 1& 0& 0& 0& 0& 0 \\ 0& 0& 1& 0& 0& 0& 1& 0& 0 \\ 0& 0& 0& 0& 0& 1& 0& 0& 0 \\ 0& 0& 0& 0& 1& 0& 1& 0& 0 \\ 0& 1& 0& 1& 0& 1& 0& 1& 1 \\ 0& 0& 0& 0& 0& 0& 1& 0& 0 \\ 0& 0& 0& 0& 0& 0& 1& 0& 0 \end{bmatrix} $$

Degree

$$ deg(v_i) = \text{the number of connections with }v_i\\ \text{ex. }deg(7) = 5 $$

Degree matrix

$$ D = \begin{bmatrix} 1& 0& 0& 0& 0& 0& 0& 0& 0 \\ 0& 3& 0& 0& 0& 0& 0& 0& 0 \\ 0& 0& 2& 0& 0& 0& 0& 0& 0 \\ 0& 0& 0& 2& 0& 0& 0& 0& 0 \\ 0& 0& 0& 0& 1& 0& 0& 0& 0 \\ 0& 0& 0& 0& 0& 2& 0& 0& 0 \\ 0& 0& 0& 0& 0& 0& 5& 0& 0 \\ 0& 0& 0& 0& 0& 0& 0& 1& 0 \\ 0& 0& 0& 0& 0& 0& 0& 0& 1 \end{bmatrix} $$

Laplacian matrix

Laplacian matrix

$$ L = D - A $$$$ L = \begin{bmatrix} 2& -1& 0& 0& -1 \\ -1& 3& -1& 0& -1 \\ 0& -1& 2& -1& 0 \\ 0& 0& -1& 2& -1 \\ -1& -1& 0& -1& 3 \end{bmatrix} $$

Symmetric normalized Laplacian matrix

$$ L = D^{-\frac{1}{2}}LD^{-\frac{1}{2}} = I - D^{-\frac{1}{2}}AD^{-\frac{1}{2}} $$

Graph signals and labels

$A$: adjacency matrix, $X$: input node features, $l$: node/edge labels

GNN and network embedding

Taxonomy of GNN

  • Graph convolution networks
  • Graph attention networks
  • Graph autoencoders
  • Graph generative networks
  • Graph spatial-temporal networks

Tasks

Node-level tasks

  • node regression, node classification
  • graph convolution layer gives node's latent representations

Edge-level tasks

  • link prediction, edge classification
  • additional function would take two nodes' latent representations as input of graph convolution layer

Graph-level (global-level) tasks

  • graph classification
  • graph pooling gives the reduced graph information

Semi-supervised learning for node classification

Supervised learning for graph classification

Unsupervised learning for graph embedding

Convolution on graph

Graph convolution networks

Spectral-based GCN

Spatial-based GCN

Spectral-based GCN

Single-node layer

$$x_j^{t+1} = \sigma (\tilde{A}x_i^t) = \sigma (U \Theta_{i,j}^k U^T x_i^t)$$

Spectral CNN

$$ L = I - D^{-\frac{1}{2}}AD^{-\frac{1}{2}} = U \Lambda U^T $$

Graph Fourier transform

$$ \mathcal{F}(x) = U^Tx $$

Graph convolution of a input $x$ and a filter $g \in \mathbb{R}^N$ is defined as

$$ x *_G g = \mathcal{F}^{-1}(\mathcal{F}(x) \odot \mathcal{F}(g)) = U(U^Tx \odot U^Tg) $$
$$ = Ug_{\theta}U^Tx, g_{\theta} = diag(U^Tg) $$

Chebyshev spectral CNN (ChebNet)

For efficiency, define filters as Chebyshev polynomials.

$$ g_{\theta} = \sum \theta_iT_k(\tilde{\Lambda}) $$

First-order ChebNet

First-order approximation of ChebNet

$$ x *_G g_{\theta} = \theta (I + D^{-\frac{1}{2}}AD^{-\frac{1}{2}})x \\ = \theta \tilde{A} x $$
$$ X^{t+1} = \tilde{A} X^t \Theta $$

Spatial-based GCN

Recurrent-based GCN (spatial)

Apply the same graph convolution layer to update hidden representations.

Composition-based GCN (spatial)

Apply different graph convolution layers to update hidden representations.

Recurrent-based GCN

Graph neural network (GNN)

$$ x_i(t+1) = f(l_i, l_e, x_{ne[i]}(t), l_{ne[i]}) \\ o_i(t) = g(x_i(t), l_i) $$

Composition-based GCN

Message passing neural networks (MPNN)

MPNN

MPNN

Message passing phase

$$ m_v^{t+1} = \sum_{u \in N(v)} M_t(h_v^t, h_u^t, e_{uv}) \\ h_v^{t+1} = U_t(h_v^t, m_v^{t+1}) $$

$h_v^t$: vertex features, $e_{uv}$: edge features, $N(v)$: neighbors of vertex $v$

Readout phase

$$ \hat{y} = R(h_v) $$

Graph pooling module

mean/max/sum pooling

Reducing number of vertecies of a graph.

Graph attention networks

Graph attention network (GAT)

$$ h_i^t = \sigma (\sum_j \alpha (h_i^{t-1}, h_j^{t-1})W^{t-1}h_j^{t-1}) $$

Multi-head attention

$$ h_i^t = ||_{k=1}^K \sigma (\sum_j \alpha_k (h_i^{t-1}, h_j^{t-1})W_k^{t-1}h_j^{t-1}) $$

GAT

Graph autoencoder

Graph autoencoder (GAE)

$$ Z = GCN(X, A) = \text{ReLU}(\tilde{A}XW_0) \\ \hat{A} = \sigma (ZZ^T) $$

Variational graph autoencoder (VGAE)

$$ \mu = GCN_{\mu}(X, A) = \tilde{A} \ \text{ReLU}(\tilde{A}XW_0) \ W_{\mu} \\ \log \sigma = GCN_{\sigma}(X, A) = \tilde{A} \ \text{ReLU}(\tilde{A}XW_0) \ W_{\sigma} \\ q(Z \mid X, A) = \prod_{i=1}^N \mathcal{N}(z_i \mid \mu_i, diag(\sigma_i^2)) \\ p(A \mid Z) = \sigma (ZZ^T) $$

Graph generative networks

MolGAN

MolGAN

Graph spatial-temporal networks

Diffusion convolutional recurrent neural network (DCRNN)

DCRNN

diffusion convolution + sequence to sequence architecture + scheduled sampling technique

DCRNN

diffusion convolution + sequence to sequence architecture + scheduled sampling technique

DCRNN

DCRNN

Relational inductive bias

Graph as model of relations

Graph network (GN) block

Graph network

Update of graph network

Update of graph network

Thank you for attention

Reference papers

  • A Comprehensive Survey on Graph Neural Networks
  • Relational inductive biases, deep learning, and graph networks
  • Geometric deep learning: going beyond Euclidean data
  • The Graph Neural Network Model
  • Variational Graph Auto-Encoders
  • Neural Message Passing for Quantum Chemistry
  • DIFFUSION CONVOLUTIONAL RECURRENT NEURAL NETWORK: DATA-DRIVEN TRAFFIC FORECASTING
  • GRAPH ATTENTION NETWORKS
  • MolGAN: An implicit generative model for small molecular graphs