文章目錄
- non-local Neural Networks for Video Classification
- Hierarchical Graph Representation Learning with Differentiable Pooling
- MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing
- Position-aware Graph Neural Networks
- GeniePath: Graph Neural Networks with Adaptive Receptive Paths
non-local Neural Networks for Video Classification
CVPR 2018
motivated from `non-local mean operation, instead of adding layers, the authors choose to use none local mechanism
Following the non-local mean operation, they define a generic non-local operation in deep nn as:
-
is the index of an output position (in space, time or spacetime), whose response if to be computed and is the index that enumerates all possible positions.
-
is a kernel function, which is the the key of this work
- gaussian
- embedded gaussian , here ,
- dot product
- concatenation
- : linear embedding
The pairwise computation of a non-local block is lightweight when it is used in high-level, sub-sampled feature maps.[?]
Hierarchical Graph Representation Learning with Differentiable Pooling
NIPS 2018
Abstract
DIFFPOOL, a differentiable graph pooling module that can generate hierarchical representations of graphs
GNN-Kipf’s GCN:
here we focus on Proposed Method
3.1 Preliminaries
Stacking GNNs and pooling layers.
Formally, given , the output of a GNN module, and a graph adjacency matrix , we seek to define a strategy to output a new coarsened graph containing nodes, with weighted adjacency matrix and node embeddings .
Thus, their goal is to learn how to cluster or pool together nodes using the output of a GNN, so that they can use this coarsened graph as input to another GNN layer.
What makes designing such a pooling layer for GNNs especially challenging—compared to the usual graph coarsening task—is that our goal is not to simply cluster the nodes in one graph, but to provide a general recipe to hierarchically pool nodes across a broad set of input graphs. That is, we need our model to learn a pooling strategy that will generalize across graphs with different nodes, edges, and that can adapt to the various graph structures during inference.
3.2 Differentiable Pooling via Learned Assignments
they addressed the above challenges by learning a cluster assignment matrix over the nodes using the output of a GNN model.
The key intuition is that we stack L GNN modules and learn to assign nodes to clusters at layer l in an end-to-end fashion, using embeddings generated from a GNN at layer l − 1.
Thus, we are using GNNs to both extract node embeddings that are useful for graph classification, as well to extract node embeddings that are useful for hierarchical pooling.
Pooling with an assignment matrix
Suppose that S(l) has already been computed:
is the assignment matrix
is the embedding matrix
is the assignment matrix
is the node features
Learning the assignment matrix.
how DIFFPOOL generates the assignment matrix and embedding matrices :
They generate these two matrices using two separate GNNs that are both applied to the input cluster node features and coarsened adjacency matrix
- the softmax is row-wise
- The output dimension of corresponds to a pre-defined maximum number of clusters in layer , and is a hyperparameter of the model.
3.3 Auxiliary Link Prediction Objective and Entropy Regularization
- , denotes the entropy function, and is the -th row of .
Experiment
baseline methods:
GNN-based methods:
- GRAPHSAGE
- STRUCTURE2VEC: combines a latent cariable model with GNNs. It uses global mean pooling.
- Edge-conditioned filters in CNN for graphs (ECC) incorporates edge information into the GCN model and performs pooling using a graph coarsening algorithm
- PATCHYSAN defines a receptive field for each node, using a canonical node ordering, applies convolutions on linear sequences of node embeddings.
- SET2SET replaces the global mean-pooling in the traditional GNN architectures by the aggrega- tion used in SET2SET. Set2Set aggregation has been shown to perform better than mean pooling in previous work [15]. We use GRAPHSAGE as the base GNN model.
- SORTPOOL applies a GNN architecture and then performs a single layer of soft pooling followed by 1D convolution on sorted node embeddings.
kernel-based
- GRAPHLET
- SHORTEST-PATH
- WEISFEILER- LEHMAN kernel (WL)
- WEISFEILER-LEHMAN OPTIMAL ASSIGNMENT KERNEL (WL- OA)
MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing
ICML 2019
Abstract
MixHop requires no additional memory or computational complexity
In addition, they proposes sparsity regularization that allows us to visualize how the network prioritizes neighborhood information across different graph datasets. Our analysis of the learned architectures reveals that neighborhood mixing varies per datasets.
Intro
Defferrard et al. (2016) and Kipf & Welling (2017) propose GC approximations that are computationally-efficient (linear complexity, in the number of edges), and can be applied in inductive settings, where the test graphs are not observed during training.
Proposed Architecture
interested in higher-order message passing, where nodes receive latent representations from their immediate(first-degree) neighbors
the analysis starts with Delta Operator, a subtraction operation between node features collected from different distances – which is actually concate different GCN outputs.
Complexity: no need to calculate , they calculate with right-to-left multiplication.
Representational Capability: they prove vanilla GCN cannot represent two-hop Delta Operator while their model can.
General Neighborhood Mixing:
definition 2: General layer-wise Neighborhood Mixing
Learning GC Architectures
output layer: MixHop uniquely mixes features from different sets of informations. the output layer: select columns into sets of size , and compute
learning adjacency power architectures:
- one per adjacency power used in the model
- different sizes of may be more appropriate for different tasks and datasets, interested in learning how to automatically size
- for vanilla GCNs, such search is inexpensive (Here I do not understand how vanilla GCNs search the size of automaticlly)
Experiments
it seems like there is a trend that what we should answer needs to be written in Section experiment
- they use synthetic dataset (in which the homophily is decreased) to evaluate the model (which better captures delta operator)
- real world experiment
- visualization
Position-aware Graph Neural Networks
ICML 2019
Abstract
existing Graph Neural Network (GNN) architectures have limited power in capturing the position/location of a given node with respect to all other nodes of the graph.
They propose Position-aware Graph Neural Networks (P-GNNs):
- samples sets of anchor nodes
- computes the distance of a given target node to each anchor-set
- learns a non-linear distance-weighted aggregation scheme over the anchor-sets
Introduction
However, the key limitation of existing GNN architectures is that they fail to capture the position/location of the node within the broader context of the graph structure.
It provide a example: two nodes that are far away, through GCN, the embedding would be the same (ignore the feature).
existing researchers haves spotted this weakness:
- introduce one-hot feature
- deepen the GCN
their one key observation: node position can be captured by a low-distortion embedding by quantifying the distance between a given node and a set of anchor nodes.
method:
- P-GNN first samples multiple anchor-sets in each forward pass
- then learns a non-linear aggregation scheme that combines node feature information from each anchor-set and weighs it by the distance between the node and the anchor-set.
besides, Bourgain theorem (Bourgain, 1985) guarantees that only anchor-sets are needed to preserve the distances in the original graph with low distortion.
implementary:
- In settings where node attributes are not available, P-GNN’s computation of the k dimensional distance vector is inductive across different node orderings and different graphs.
- When node attributes are available, a node’s embedding is further enriched by aggregating information from all anchor-sets, weighted by the k dimensional distance vector.
Further, for large graphs, they proposed P-GNN-Fast.
Preliminaries
“We call node embeddings to be position- aware, if the embedding of two nodes can be used to (approximately) recover their shortest path distance in the network.”
they give two definition:
- position-aware
- structure-aware
most GCNs are structure-aware method.
proposition 1: there exist a mapping from structure-aware embeddings to position-aware embeddings no pair of nodes have isomorphic local q-hop neighbourhood graphs. (proved in appendix)
Proposed Approach
We design P-GNNs such that each node embedding dimension corresponds to messages computed with respect to one anchor-set, which makes the computed node embeddings position-aware (see Figure 2)
experiment
- link prediction
- pair-wise node classification
GeniePath: Graph Neural Networks with Adaptive Receptive Paths
AAAI 19 Le Song