GCN 論文英語表達總結

!貓在家裏看論文,寫論文的日子真爽

!我常常自嘲自己的英文寫的很像老太太的裹腳布,又臭又長

!主要是將一些GCN的英文表達方式記錄下來,收藏起來慢慢學習

!會給出論文題目,還有一些小小的note

-------------------------------------------------------一條開始認真臉的分界線---------------------------------------------------------

 

Aspect-based Sentiment Classification with Aspect-specific Graph Convolutional Networks 

1. To tackle this problem, we propose to build a Graph Convolutional Network (GCN) over the dependency tree of a sentence to exploit syntactical information and word dependencies. 

注意over 和 exploit 的使用

2. GCN has a multi-layer architecture, with each layer encoding and updating the representation of nodes in the graph using features of immediate neighbors. 

注意multi-layer的使用,

以及用with 的使用

這句話常常需要用來表示多層的GCN

3. Furthermore, following the idea of self-looping in Kipf and Welling (2017), each word is manually set adjacent to itself, i.e. the diagonal values of A are all ones. 

Following the idea of …

the diagonal values of A are all ones. 對角線爲1的矩陣A

set adjacent to itself 設置自鏈接

4. Experimental results have indicated that GCN brings benefit to the overall performance by leveraging both syntactical infor- mation and long-range word dependencies. 

Bing benefit to

Leverage 可以翻譯爲利用的意思

5. While attention-based models are promising, they are insufficient to capture syntactical dependencies between context words and the aspect within a sentence. 

這裏描述了attention-based的缺陷,不能充分地捕捉句子的句法依賴,其實還是由於word與word之間距離遠,而 導致的,其實也不能完全這麼說吧,self attention 會考慮句內所有word的attention,可能能解決一些遠距離的信息丟失問題吧。

While 是儘管

SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS 

1. Our contributions are two-fold. Firstly, we introduce a simple and well-behaved layer-wise prop- agation rule for neural network models which operate directly on graphs and show how it can be motivated from a first-order approximation of spectral graph convolutions (Hammond et al., 2011). Secondly, we demonstrate how this form of a graph-based neural network model can be used for fast and scalable semi-supervised classification of nodes in a graph. Experiments on a number of datasets demonstrate that our model compares favorably both in classification accuracy and effi- ciency (measured in wall-clock time) against state-of-the-art methods for semi-supervised learning. 

經典GCN是這樣來描述

從本質上講,GCN 是譜圖卷積(spectral graph convolution) 的局部一階近似(localized first-order approximation)。GCN的另一個特點在於其模型規模會隨圖中邊的數量的增長而線性增長。總的來說,GCN 可以用於對局部圖結構與節點特徵進行編碼。

2. Semantic role labeling (SRL)can be informally described as the task of discovering who did what to whom

之前在任務定義,形式化時常常會用 is formalized as ……或者是 is define as ……problem

其實也可以使用 is described as the task of    …..被描述爲這樣….的任務

GRAPH ATTENTION NETWORKS 

1. In its most general formulation, the model allows every node to attend on every other node, dropping all structural information. We inject the graph structure into the mechanism by performing masked attention—we only compute eij for nodes j ∈ Ni, where Ni is some neighborhood of nodei in the graph. 

這裏介紹了GAT的兩種機制,一種是每個節點考慮圖中所有節點的影響,這是極端情況,忽略了結構信息。

另外一種則是隻考慮節點i領域範圍內的節點。

注意表達方式

every node to attend on every other node 來表達節點相互attend的感覺

Drop all structural information. 尤其是drop的使用,這裏有比較多normal的詞,比如ignore,lose

injectsth into sth by sth 將某種機制,某種結構通過某種方式注入到….

mask attention 這種說法

Neighborhood of node i in the graph

2. By stacking layers in which nodes are able to attend over their neighborhoods’ features, we enable (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of costly matrix operation (such as inversion).

這裏的which 是指在stack layers.

nodes are able to attend over their neihborhoods’ features.

specifying different weights to different nodes 

Without 的使用

3. However, many interesting tasks involve data that can not be represented in a grid-like structure and that instead lies in an irregular domain. 

4. This is the case of 3D meshes, social networks, telecommunication networks, biological networks or brain connectomes. Such data can usually be represented in the form of graphs. 

注意表達方式

這段話的常用來的表達的是兩種結構

一種是grid-like structure這樣的網格結構是可以通過CNN,

還有一種是irregular domain 非規則的,比如社交網絡,電信網絡等

5. The idea is to compute the hidden representations of each node in the graph, by attending over its neighbors, following a self-attention strategy. 

注意表達方式

By attending over its neighbors

Following a self-attention strategy

Attention Guided Graph Convolutional Networks for Relation Extraction 

1. However, how to effectively make use of relevant information while ignoring irrelevant information from the dependency trees remains a challenge research question. 

注意表達方式

以how to do sth 作爲主語

While 的使用,這裏的while 表示同時

然而,如何在有效利用相關信息的同時忽略依賴樹中的無關信息,仍然是一個具有挑戰性的研究問題

remains a challengng research question , 這裏的remain用的好,比 is 表達出了這不僅僅是個問題,還是個遺留問題

2. Intuitively, we develop a “soft pruning” strategy that transforms the original dependency tree into a fully connected edge- weighted graph. 

注意表達方式

Intuitively

develop a  strategy that 

3. With the help of dense connections, we are able to train the AGGCN model with a large depth, allowing rich local and non-local de- pendency information to be captured. 

這一段描述的是dense connections 對網絡的作用,雖然都是表達DC能夠訓練更深的網絡,降低過擬合的風險,但是這個with the help of 用的好啊

With the help of

train model with a large depth  這個就比deeper network要高大上的多

local and no-local dependency information

allow的主語是model, 也更客觀 

model allow sth to be done. 

allowing rich local and non-local dependency information to be captured. 其實這裏可以藉此衍生出很多改寫

GCNs are neural networks that operate directly on graph structures 

 operate的使用

大體上描述GCNs

4. Opposite direction of a dependency arc is also included, which means Aij = 1 and Aji = 1 if there is an edge going from node i to node j, 

otherwise Aij = 0 and Aji = 0. 

5. Our model can be understood as a soft-pruning approach that automatically learns how to selectively attend to the relevant sub-structures useful for the relation extraction task 

Sth can be understood as a ….approach that

how to selectively attend to the relevant sub-structures useful 這裏的attend

6. Instead of using rule-based prun- ing, we develop a “soft pruning” strategy in the attention guided layer, which assigns weights to all edges. These weights can be learned by the model in an end-to-end fashion. 

純碎覺的寫的又簡單又清晰,我有的時候覺得我寫的文像老奶奶的裹腳布,又臭又長。

Syntax-Aware Aspect Level Sentiment Classification with Graph Attention Networks 

1. When an aspect term is separated away from its sentiment phrase, it is hard to find the associated sentiment words in a sequence. 

描述了爲什麼我們要將syntax引入到很多nlp任務中,這裏指的是apsect-level sentiment classification.

因爲通常我們的模型是建立在序列輸入上,在序列關係上,有些文本距離一些關鍵信息距離很遠,但是如果將其轉換爲句法樹其實上兩者存在直接的關係,這就是爲什麼要引入syntactic dependencies,因爲能從一定程度上降低由於長距離依賴而導致的信息丟失問題。

2. Unlike these previous methods, our approach represents a sentence as a dependency graph instead of a word sequence. 

注意表達

其實就是將文本從詞序列的結構轉換爲依賴圖的形式

our approach represents A as a B instead of C

我們將A用B來表示而不是用C

3. We employ a multi-layer graph attention network to propagate sentiment features from important syntax neighbourhood words to the aspect target. 

注意表達方式

我很喜歡這個propagate的使用方式

employsb to do sth

Propagate the ….from the important syntax neighbourhood words to the aspect target

這句話就很形象地表達了圖結構的信息傳播方式,沿着圖的邊將鄰居結點的信息聚合起來

Graph Convolution over Pruned Dependency Trees Improves Relation Extraction 

1. To resolve these issues, we further apply a Contextualized GCN (C-GCN) model, where the input word vectors are first fed into a bi-directional long short-term memory (LSTM) network to generate contextualized representations, which are then used as h(0) in the original model. 

這裏解釋了C-GCN, 其實C-GCN很好理解,其實就是在word embedding 和 GCN layer之間插一個Bi-LSTM層(有時也被稱爲contextualized layer), 現將word embedding 過一遍bi-lstm再輸入到gin 中對contextualized features 做propagate.

We note that this relation extraction model is conceptually similar to graph kernel-based models (Zelenko et al., 2003), in that it aims to utilize local dependency tree patterns to inform relation classification. 

In that 的使用

 is conceptually similar to 在概念上與…..相似

Intuitively, during each graph convolution, each node gathers and summarizes information from its neighboring nodes in the graph. 

描述GCN更新結點表示的方式 gathers and summarizes information from its neighboring nodes in the graph

However, naively applying the graph convolution operation in Equation (1) could lead to node representations with drastically different magnitudes, since the degree of a token varies a lot. This could bias our sentence representation towards favoring high-degree nodes regardless of the information carried in the node (see details in Section 2.2). 

這段話用來解釋在GCN中對鄰接矩陣進行歸一化的原因。我們用鄰接矩陣來表示圖結構,但是由於圖中結點的差異比較大,直接使用原始的鄰接矩陣A會導致結點更偏愛度大的結點。

Degree of a token varies a lot

be towards to doing sth 有利於做…..

  • towards doing 這個用法我不太懂

regardless of the information carried in the node 不管

Densely Connected Convolutional Networks 

DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. 

這裏描述了DC的優點:緩解梯度消息的問題,增強特徵傳播

                                      大大地減少了參數的數量

Compelling 引入注目的

Vanishing-gradient problem 梯度下降的問題

feature propagation 特徵傳播

Encourage feature reuse 特徵重用,這裏是不是可以理解爲每層的輸入都可能被選擇,從而保留下來

substantially 大大地,基本上;大體上;總的來說

 

As CNNs become increasingly deep, a new research problem emerges: as information about the input or gradient passes through many layers, it can vanish and “wash out” by the time it reaches the end (or beginning) of the network. 

信息會隨着網絡層數變多,在傳播過程中出現消失或者是wash out的現象

 

  • they create short paths from early layers to later layers 這句話直戳本質,解決信息vanish and “wash out” 的問題。

Although these different approaches vary in network topology and training procedure, they all share a key characteristic: they create short paths from early layers to later layers. 

這裏指的是爲了解決由於網絡變深而導致得一系列問題,包括DC,ResNets和Highway Networks都是爲了緩解這些問題的

 network topology 網絡拓撲

share a key characteristic

這句話描述了Dense Connected layer的工作原理

To ensure maximum information flow between layers in the network, we connect all layers (with matching feature-map sizes) directly with each other. To preserve the feed-forward nature, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. 

注意表達方式

each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers.

我覺得這句話寫的特別好,點明瞭DC

每一層從所有之前的層獲得額外的輸入,並結合子層的要素映射傳遞到所有後續圖層

  • 這句話描述了DenseNets 和 ResNets的不同

Concatenating feature-maps learned by different layers increases variation in the input of subsequent layers and improves efficiency. This constitutes a major difference between DenseNets and ResNets 

constitutes a major difference between  A and B 構成了A和B的主要不同(差異)

 increases variation 增加變化

There are other notable network architecture innovations which have yielded competitive results. 

 yielded competitive results 取得了具有競爭力的結果

  • 穿插一個ReNet

We empirically demonstrate DenseNet’s effectiveness on several benchmark datasets and compare with state-of-the- art architectures, especially with ResNet and its variants. 

注意表達方式

empirically 經驗地

Demonstrate  A’s effectiveness on ****(數據集)and compare with *****(models or methods), especially with ****(models or methods)

DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation 

Emotion recognition in conversation (ERC) has received much attention, lately, from researchers due to its potential widespread applications in diverse areas, such as health-care, education, and human resources. 

其實我覺得這個表達感覺很奇怪啊

The current state-of-the-art model in emotion recognition in conversation is (Majumder et al., 2019), where authors introduced a party state and global state based recurrent model for modelling the emotional dynamics. 

留意這個句式,如何將目前領域最優的模型進行介紹

Thus, there is always a major interplay between the inter-speaker dependency and self- dependency with respect to the emotional dynamics in the conversation

With respect to 關於,至於

there is always a major interplay between A and B with respect to C

就C而言,在A和B之間存在相互作用

We also represent ui ∈ RDm as the feature representation of the utterance, obtained using the feature extraction process described below. 

obtained的使用

using的使用

In theory, RNNs like long short-term memory (LSTM) and GRU should propagate long-term contextual information. However, in practice it is not always the case (Bradbury et al., 2017). 

指出了LSTM在理論上可以傳播長期上下文信息,但是事實上並未如此。

In theory 理論上

theoretically

in practice it is not always the case 並非總是這樣

Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling 

For example, one can observe that many arcs in the syntactic dependency graph (shown in black below the sentence in Figure 1) are mirrored in the semantic dependency graph. 

注意表達方式,這裏其實想說語義表示與句法表示密切相關,句法依存樹上的很多邊,也就是依賴關係可以在語義依賴圖上體現出來(語義依賴圖是什麼圖???),這裏其實通過這種說法,說明可以通過句法信息來挖掘語義上的聯繫

A is mirrored in B, A在B中有所體現

We believe that one of the reasons for this radical choice is the lack of simple and effective methods for incorporating syntactic information into sequential neural networks (namely, at the level of words). 

Incorporate syntactic information into sequential neural networks.

指出了現在的方法缺少一種將句法信息融合到序列化的神經網絡中

at the level of words

Lack of 

sequential neural networks 序列化的神經網絡中

One layer GCN encodes only information about immediate neighbors and K layers are needed to encode K-order neighborhoods (i.e., information about nodes at most K hops aways). 

這裏表達的單個GCN只能編碼一階緊鄰,如果要編碼K階鄰居結點,需要K個GCN

注意表達方式

immediate neighbors

 encode K-order neighborhoods

  • This contrasts with recurrent and recursive neural networks which, at least in theory, can capture statistical dependencies across unbounded paths in a trees or in a sequence.
  • 這個句式沒看懂

Interestingly, again unlike recursive neural networks, GCNs do not constrain the graph to be a tree. 

因爲之前很多方法爲了獲得句法、詞法上的信息,通常會使用遞歸神經網絡,但是GCNs 操作的圖結果並不強制爲樹結構。

We believe that there are many applications in NLP, where GCN-based encoders of sentences or even documents can be used to incorporate knowledge about linguistic structures (e.g., representations of syntax, semantics or discourse). 

因爲這篇論文首次將GCN應用與NLP任務中,在此之前沒有任何NLP任務使用GCN,因而作者在這裏暢想以GCN爲基礎作爲編碼句子或者是文檔的編碼器,用來融合語言結構()

As in standard convolutional networks (LeCun et al., 2001), by stacking GCN layers one can incorporate higher degree neighborhoods

注意表達方式

incorporate higher degree neighborhoods

 Our simplification captures the intuition that information should be propagated differently along edges depending whether this is a head-to-dependent or dependent-to-head edge (i.e., along or opposite the corresponding syntactic arc) and whether it is a self-loop.

我就是我見過表達信息沿圖結構中的邊傳播最好的表達,其實是說圖結構中的邊是不同的,這裏將其歸爲3類,一類是head->dep, 也就是從句法依存中parse的依賴關係,一類是dep->head,  是head->dep的反方向邊,最後一個是self-loop, 其實就是在傳播過程自我信息要加以保護的一種做法。

這篇文章認爲信息沿邊的傳播要視邊的類型而定,在聚合鄰居結點是對不同類型的邊學習不同的權重,因而是label-specific parameters

注意表達方式

captures the intuition that 從句 出於****的出發點

be propagated differently along edges 沿邊緣以不同方式傳播

depends Whether this is A or B and whether it is C,取決於這是否是A或者B,還是C

The inability of GCNs to capture dependencies between nodes far away from each other in the graph may seem like a serious problem, especially 

in the context of SRL: paths between predicates and arguments often include many dependency arcs。

這裏指出了GCN對捕獲距離遠的兩個結點的inability,並指出了這個在SRL中更爲嚴重,因爲謂語和arg之間通常會包含很多依賴邊。

However, when graph convolution is performed on top of LSTM states (i.e., LSTM states serve as input x = h(1) vv 

to GCN) rather than static word embeddings, GCN may not need to capture more than a couple of hops. 

 這裏針對上面GCN存在的問題,表示如果將GCN建在LSTM之上,也就是說用LSTM的狀態作爲GCN的輸入,而不是靜態的word embedding 那麼GCN或許

不需要跳很多就能捕獲

注意表達方式

The inability of ***(model)to do  sth, 模型在do sth的不足

nodes far away from each other in the graph

Paths between A and B often include many dependency arcs.

 a couple of hops.

The classifier predicts semantic roles of words given the predicate while relying on word representations provided by GCN

這裏的while 如何理解呢

該分類器根據GCN提供的詞表示方法預測給定謂詞的詞的語義角色

This suggests that extra GCN layers are effective but largely redundant with respect to what LSTMs already capture. 

這篇文章通過實驗說明了額外的GCN層是有效的,但與LSTMs已經捕獲的內容相比,它是大量冗餘的,這句話要細細品

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章