2019.11.6 note

2019.11.6 note

DeepGCNs: Making GCNs Go as Deep as CNNs

Graph Convolutional Networks (GCNs) offer an alternative that allows for non-Eucledian data as input to a neural network similar to CNNs. While GCNs already achieve encouraging results, they are currently limited to shallow architectures with 2 to 4 layers due to vanishing gradients during training. They transfer concepts such as residual/dense connections and dilated convolutions from CNNs to GCNs in order to successfully train very deep GCNs. They show the benefit of deep GCNs with as many as 112 layers experimentally across various datasets and tasks.

For example, EdgeConv GCN:

在這裏插入圖片描述

The residual version is:

在這裏插入圖片描述

RETHINKING DATA AUGMENTATION: SELF-SUPERVISION AND SELF-DISTILLATION

In supervised settings, a common practice for data augmentation is to assign the same label to all augmented samples of the same source. However, if the augmentation results in large distributional discrepancy among them (e.g., rotations), forcing their label invariance may be too difficult to solve and often hurts the performance. To tackle this challenge, they suggest a simple yet effective idea of learning the joint distribution of the original and self-supervised labels of augmented samples. The joint learning framework is easier to train, and enables an aggregated inference combining the predictions from different augmented samples for improving the performance. Further, to speed up the aggregation process, they also propose a knowledge transfer technique of self-distillation type which transfers the knowledge of augmentation into the model itself.

在這裏插入圖片描述

SOFTMAX IS NOT AN ARTIFICIAL TRICK: AN INFORMATION-THEORETIC VIEW OF SOFTMAX IN NEURAL NETWORKS

Despite great popularity of applying softmax to map the non-normalised outputs of a neural network to a probability distribution over predicting classes, this normalised exponential transformation still seems to be artificial. A theoretic framework that incorporates softmax as an intrinsic component is still lacking. In this paper, we view neural networks embedding softmax from an information theoretic perspective. Under this view, we can naturally and mathematically derive log-softmax as an inherent component in a neural network for evaluating the conditional mutual information between network output vectors and labels given an input datum. We show that training deterministic neural networks through maximising log-softmax is equivalent to enlarging the conditional mutual information, i.e., feeding label information into network outputs. We also generalise our informative theoretic perspective to neural networks with stochasticity and derive information upper and lower bounds of log-softmax. In theory, such an information-theoretic view offers rationality support for embedding softmax in neural networks; in practice, we eventually demonstrate a computer vision application example of how to employ our information-theoretic view to filter out targeted objects on images.

STABILIZING TRANSFORMERS FOR REINFORCEMENT LEARNING

They stablize transformers for reinforcement learning and propose GTrXL for RL tasks. They show that the GTrXL, trained using the same losses, has stability and performance that consistently matches or exceeds a competitive LSTM baseline, including on more reactive tasks where memory is less critical. GTrXL offers an easy-to-train, simple-to-implement but substantially more expressive architectural alternative to the standard multi-layer LSTM ubiquitously used for RL agents in partially observable environments.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章