知識圖譜_關係抽取_文獻筆記（二）

本文介紹一篇18年EMNLP的文章Neural Relation Extraction via Inner-Sentence Noise Reduction and Transfer Learning。對知識圖譜關係抽取前世瞭解一下，再來看今天的文章哦。還需瞭解一下用神經網絡做依存句法分析。

一、問題描述

這篇文章是做知識圖譜中的關係抽取的，創新點有三個：

1. 通過Sub-Tree Parse (STP)來移除句子內的噪音的，還可以降低句子長度。

2. 通過entity-wise attention來幫助句子捕捉句子內的重點的。

3. 通過遷移學習，在entity type分類上預訓練後，再遷移到關係分類的任務上幫助模型提高魯棒性。

看不懂沒關係，下面會一一介紹。

二、什麼是句子內的噪音

先來看一個句子：

[It is no accident that the main event will feature the junior welterweight champion miguel cotto, a puerto rican, against Paul Malignaggi, an Italian American from Brooklyn.]

其實光看橘色的部分就知道Paul Malignaggi出生在Brooklyn，也就是/people/person/place of birth關係，那麼除了橘色部分的其他單詞都是句子內噪音啦，多餘的哎！

三、爲什麼從實體種類識別遷移學習有用呢

先來看一個句子：

[Alfead Kahn, the Cornell-University economist who led the fight to deregulate airplanes.]

如果不知道Alfead Kahn是個人，不知道Cornell-University是公司，還不好預測關係呢。

四、模型架構

可以看到先將句子用 STP處理以後，將其轉化爲詞向量後，輸入到雙向GRU內轉化爲hidden state，然後利用entity-wise attention+Hierarchical-level Attention（Word-level Attention和Sentence-level Attention的綜合）後將包含一個實體對的所有句子轉化爲一個向量，然後將這個向量經過全連接和softmax就可以做entity type分類或者關係分類了。

1.Sub-Tree Parse (STP)

先畫出句子的依存句法關係樹，找到兩個實體最近的共同祖先（非自身），以該祖先爲根將子句法樹提取出來即可，則該子樹的單詞啦，單詞位置啦都可以作爲輸入了，我覺得這招很高！

舉個例子，看上圖，有個句子：

[In 1990, he lives in Shanghai, China.]

實體爲Shanghai和China，看圖中他們的共同祖先爲in，則橘色部分in Shanghai, China就被提取出來，這三個單詞的word和position就要被換成詞向量輸入到雙向GRU中了。

這個方法比Shortest Dependency Path (SDP)好，在SDP中，上述句子因爲Shanghai和China在句法樹中直接相連，則最短嘛，就是提取出Shanghai, China。沒有“in”了，但in這個單詞纔是預測這個關係最重要的單詞，但是被SDP忽略了，但是在STP中就被保留了。

2.word an position embedding

包含一個實體對的所有句子叫包，一個包內的第條句子的第個單詞的詞向量爲維，記爲 $x _ { i j } ^ { w } \in R ^ { k }$ ，分別和兩個實體的距離對應的向量爲維，記爲 $x _ { i j } ^ { p 1 } \in R ^ { l }$ 和 $x _ { i j } ^ { p 2 } \in R ^ { l }$ ，將三者連起來就是該單詞對應的下一步的輸入啦 $x _ { i j } = \left[ x _ { i j } ^ { w } ; x _ { i j } ^ { p 1 } ; x _ { i j } ^ { p 2 } \right]$

3.entity-wise attention

經GRU處理過的第條句子的第個單詞對應位置的hidden state爲 $h _ { i t }$ 。entity-wise attention給每個單詞賦予一個權重，如果該單詞是兩個實體之一，則該權重爲1，否則爲0.

$\alpha _ { i t } ^ { e } = \left\{ \begin{array} { l l } { 1 } & { t = h e a d , \text { tail } } \\ { 0 } & { \text { others } } \end{array} \right.$

4.Word-level Attention

Word-level Attention也是給每個單詞賦予一個權重。

$\alpha _ { i t } ^ { w } = \frac { \exp \left( h _ { i t } A ^ { w } r ^ { w } \right) } { \sum _ { t = 1 } ^ { T } \exp \left( h _ { i t } A ^ { w } r ^ { w } \right) }$

其中 $A ^ { w }$ 和 $r ^ { w }$ 是要學習的參數。

將每個單詞的entity-wise attention權重和Word-level Attention權重相加，就是這個單詞的權重。將各個單詞對應的hidden state按權重相加就是第條句子的context啦。如果只用Word-level Attention則把實體的重要性削弱了，其實entity-wise attention就是把實體的權重增加了1而已啦，如果只用entity-wise attention也不好，因爲其他單詞也包含了信息。

$S _ { i } = \sum _ { t = 1 } ^ { T } \left( \alpha _ { i t } ^ { w } + \alpha _ { i t } ^ { e } \right) h _ { i t }$

5.Sentence-level Attention

Sentence-level Attention是給每個句子賦予一個權重。

$\alpha _ { i } ^ { s } = \frac { \exp \left( S _ { i } A ^ { s } r ^ { s } \right) } { \sum _ { i } \exp \left( S _ { i } A ^ { s } r ^ { s } \right) }$

其中 $A ^ { s }$ 和 $r ^ { s }$ 是要學習的參數。

將各個句子對應的context按權重相加就是這個包的context啦。

$S = \sum _ { i } \alpha _ { i } ^ { s } S _ { i }$

6. 全連接+softmax

$\hat { p } ^ { i } = \operatorname { softmax } \left( W _ { i } S _ { i } + b _ { i } \right) ; i \in \{ \text {head} , \text { tail } , \text { r } \}$

7.遷移學習

我們先從實體1類型分類和實體2類型分類學習通用參數，然後這些通用參數來初始化關係分類任務的通用參數。那麼哪些是通用參數，哪些是和任務相關的參數呢。

$\theta = \theta _ { 0 } \cup \theta _ { h e a d } \cup \theta _ { \text { tail } } \cup \theta _ { r }$

其中所有任務的所有參數爲 $\theta$ ，通用參數爲 $\theta_0$ ，實體1類型分類的相關參數爲 $\theta_{head}$ ，實體2類型分類的相關參數爲 $\theta_{tail}$ ，關係分類任務的相關參數爲 $\theta_{r}$ 。

$\begin{array} { r } { \theta _ { i } = \left\{ A _ { i } ^ { w } , r _ { i } ^ { w } , A _ { i } ^ { s } , r _ { i } ^ { s } , W _ { i } , b _ { i } \right\} } \\ { i \in \{ h e a d , \text { tail, } r \} } \end{array}$

可以看出任務相關參數爲attention、全連接層的參數。也就是說基本上只有GRU及以前的參數纔是通用參數。

預訓練（實體1類型分類和實體2類型分類）的目標函數爲

$\begin{array} { l } { J _ { e } \left( \theta _ { 0 } , \theta _ { h e a d } , \theta _ { t a i l } \right) = \beta \left\| \theta _ { 0 } \right\| ^ { 2 } } \\ { + \sum _ { t } \left( - \frac { 1 } { z _ { t } } \lambda _ { t } \sum _ { i = 1 } ^ { z _ { t } } y _ { i } ^ { t } \log \left( \hat { p } _ { i } ^ { t } \right) + \beta \left\| \theta _ { t } \right\| ^ { 2 } \right) } \\ { t \in \{ \text {head} , \text { tail } \} } \end{array}$

關係分類任務的目標函數爲

$\begin{aligned} J _ { r } \left( \theta _ { 0 } , \theta _ { r } \right) & = - \frac { 1 } { z _ { r } } \sum _ { i = 1 } ^ { z _ { r } } y _ { i } \log \left( \hat { p } _ { i } \right) \\ & + \beta \left( \left\| \theta _ { 0 } \right\| ^ { 2 } + \left\| \theta _ { r } \right\| ^ { 2 } \right) \end{aligned}$

五、實驗結果

因爲freebase的實體對會提供實體type（種類），所以我們光用NYT數據集就能完成遷移學習。

1.STP有效

2. entity-wise attention有效，但不能只有entity-wise attention，還要有Word-level Attention，因爲實體之外的其他單詞也包含了信息。

3. 遷移學習有效

4.和關係抽取的其他方法對比

不管訓練的時候用包內的一個句子、兩個句子還是所有句子，都是本文所提方法效果最好。

知識圖譜_關係抽取_文獻筆記（二）

一、問題描述

二、什麼是句子內的噪音

三、爲什麼從實體種類識別遷移學習有用呢

四、模型架構

1.Sub-Tree Parse (STP)

2.word an position embedding

3.entity-wise attention

4.Word-level Attention

5.Sentence-level Attention

6. 全連接+softmax

7.遷移學習

五、實驗結果

ziw2pdf

sql高級語法

apisix~helm方式的部署到k8s

firmeye - IoT固件漏洞挖掘工具

依存句法分析—A Fast and Accurate Dependency Parser using Neural Networks

圖像識別——AlexNet原理解析及實現

fasttext源碼解析

Graph Embedding（一）—— DeepWalk的原理及實現

推薦系統——MF及其python實現

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結