供個人查詢

文章目錄

methods of KGE & logical rules

1. Injecting Logical Background Knowledge into Embeddings for Relation Extraction

2015, NAACL, Tim Rockta ̈schel

pdf:https://rockt.github.io/pdf/rocktaschel2015injecting.pdf
github:https://github.com/uclnlp/low-rank-logic

開放領域關係抽取問題。

模型

整個模型框架如下：

- 矩陣分解部分
具體來說，用 binary matrix $|P|\times |\mathcal{R}|$ 的每一行表示constant-pairs，每一列是predicates，通過矩陣分解的方式可以得到兩個embeddings， $|P|\times k$ 和 $k \times |\mathcal{R}|$ ，分別表示constant-pairs embeddings $v_{e_i,e_j}$ 和 predicate embeddings $v_{r_m}$ 。矩陣中存的內容參考Riedel 2013（可以大致看這篇）：

已知embeddings $V$ 的情況下，對於已知事實 $w$ ，可以定義如下的概率分佈。其中， $\pi_m^{e_i,e_j}= \sigma (v_{r_m},v_{e_i,e_j})$ ， $v_{(\cdot)}$ 表示對應的embedding，可以通過最大化本公式得到。

$p(\mathbf{w} | \mathbf{V})=\prod_{r_{m}\left(e_{i}, e_{j}\right) \in w} \pi_{m}^{e_{i}, e_{j}} \prod_{r_{m}\left(e_{i}, e_{j}\right) \notin w}\left(1-\pi_{m}^{e_{i}, e_{j}}\right)$

- 融入規則部分，包括兩種方式

Pre-Factorization Inference：通過已經抽取到的規則生成新的三元組，將新三元組加入，產生新的規則，繼續訓練得到規則，再產生三元組，反覆直到沒有新的三元組產生。
joint 模型：訓練的整體目標爲：
$\min _{\mathbf{v}} \sum_{\mathcal{F} \in \tilde{\mathbf{z}}} \mathcal{L}([\mathcal{F}])$

對於facts， $[\mathcal{F}]$ 是 $p(\mathbf{w} | \mathbf{V})$ 的marginal probability ， $\mathcal{L}([\mathcal{F}]):=-\log ([\mathcal{F}])$
對於logic formulae，遵循了product t-norm，即 $[\mathcal{A} \wedge \mathcal{B}]$ 的邊緣概率可以用 $[\mathcal{A}][\mathcal{B}]$ 計算, 其他如下：

針對的是KGC: "predict hidden knowledge-base relations from observed natural- language relations. "

2. KALE: Jointly Embedding Knowledge Graphs and Logical Rules

Shu Guo、Quan Wang

EMNLP 2016:https://www.aclweb.org/anthology/D16-1019.pdf

Wang（2015）和 Wei（2015）利用KGE和rules去做KGC的任務，採用的pipline的方式；Rockta ̈schel（2015）採用joint模型將一階邏輯規則注入到KGE過程中，因爲它關注的是關係抽取項目，是實體對而非單個實體創建embedding，因此沒有辦法處理單個實體。

本文介紹了KALE（entity and relation Embeddings by jointly modeling Knowledge And Logic.）

之前的模型通常在馬爾可夫邏輯網絡的基礎上，對知識獲取和推理中的邏輯規則進行了廣泛的研究（Richardson和Domingos，2006；Bröcheler等，2010； Pujara等，2013； Beltagy和Mooney， 2014）。最近，人們對組合邏輯規則和嵌入模型越來越感興趣。

模型

KG embedding

$I\left(e_{i}, r_{k}, e_{j}\right)=1-\frac{1}{3 \sqrt{d}}\left\|\mathbf{e}_{i}+\mathbf{r}_{k}-\mathbf{e}_{j}\right\|_{1}$
得分位於[0,1] 區間指示the truth value of that triple.

rule modeling

採用t-norm fuzzy logics，它將複雜公式的真值定義爲其成分的真值的組合。本文遵循Bröcheler的定義採用product t-norm，對邏輯規則的組合定義如下：
$\begin{aligned} I\left(f_{1} \wedge f_{2}\right) &=I\left(f_{1}\right) \cdot I\left(f_{2}\right) \\ I\left(f_{1} \vee f_{2}\right) &=I\left(f_{1}\right)+I\left(f_{2}\right)-I\left(f_{1}\right) \cdot I\left(f_{2}\right) \\ I\left(\neg f_{1}\right) &=1-I\left(f_{1}\right) \end{aligned}$

可以推演出：
$\begin{array}{l}{I\left(\neg f_{1} \wedge f_{2}\right)=I\left(f_{2}\right)-I\left(f_{1}\right) \cdot I\left(f_{2}\right)} \\ {I\left(f_{1} \Rightarrow f_{2}\right)=I\left(f_{1}\right) \cdot I\left(f_{2}\right)-I\left(f_{1}\right)+1}\end{array}$

對$f \triangleq\left(e_{m}, r_{s}, e_{n}\right) \Rightarrow\left(e_{m}, r_{t}, e_{n}\right)$有:

$\begin{aligned} I(f) &=I\left(e_{m}, r_{s}, e_{n}\right) \cdot I\left(e_{m}, r_{t}, e_{n}\right) \\ &-I\left(e_{m}, r_{s}, e_{n}\right)+1 \end{aligned}$
對 $f \triangleq\left(e_{\ell}, r_{s_{1}}, e_{m}\right) \wedge\left(e_{m}, r_{s_{2}}, e_{n}\right) \Rightarrow\left(e_{\ell}, r_{t}, e_{n}\right)$ 有:
$\begin{aligned} I(f) &=I\left(e_{\ell}, r_{s_{1}}, e_{m}\right) \cdot I\left(e_{m}, r_{s_{2}}, e_{n}\right) \cdot I\left(e_{\ell}, r_{t}, e_{n}\right) \\ &-I\left(e_{\ell}, r_{s_{1}}, e_{m}\right) \cdot I\left(e_{m}, r_{s_{2}}, e_{n}\right)+1 \end{aligned}$

The larger the truth values are, the better the ground rules are satisfied.

聯合訓練

訓練集合包括了兩個部分，1）KG中的三元組，2）ground rules，損失函數定義如下
$\begin{array}{c}{\min _{\{\mathbf{e}\},\{\mathbf{r}\}} \sum_{f^{+} \in \mathcal{F}} \sum_{f^{-} \in \mathcal{N}_{f^{+}}}\left[\gamma-I\left(f^{+}\right)+I\left(f^{-}\right)\right]_{+}} \\ {\text {s.t. }\|\mathbf{e}\|_{2} \leq 1, \forall e \in \mathcal{E} ;\|\mathbf{r}\|_{2} \leq 1, \forall r \in \mathcal{R}}\end{array}$

這裏的 $f$ 就是上面兩個部分，負樣本的的產生分爲兩步。對KG中的三元組，隨機替換頭尾實體作爲負樣本，對於ground rules，隨機替換其中的關係作爲負樣本，比如對（ Paris, Capital-Of, France） ⇒（Paris, Located-In, France ），可能生成的負樣本爲（Paris,Capital-Of,France）⇒ （Paris,Has-Spouse,France）。

實驗

在wordnet上和FB上級行了鏈接預測和三元組分類任務。

（這裏的邏輯規則或者是手動獲得或者是自動抽取獲得）構建規則的方式：首先運行TransE模塊，之後用公式（2）（3）計算得分，對其排序並手動選擇top，最終在WN上確定了14條規則，在FB上確定了47條規則。

3. Logic Rules Powered Knowledge Graph Embedding

arxiv 2019：https://arxiv.org/pdf/1903.03772.pdf

總結了之前工作的缺點是：
1）大多數知識圖嵌入模型未充分利用邏輯規則怎麼證明？
2）手動選擇使用的規則
3）規則以真值的形式編碼，導致了一對多映射的映射。文中沒有看到更多說明？
4）邏輯符號的代數運算在三元組和規則中不一致。感覺本質跟KALE差不多？

文章的貢獻：
1）提出一種規則增強的方法，可以和任何Traslation based Embedding method 集成。
2）介紹了一種自動挖掘規則(三種類型的規則)並給出置信度(大概是通過規則產生的triples中facts佔據所有產生的triples的比例)
3）將triples和logical rules transform到the same first-order logical space（將facts表示爲 $h(r)\Rightarrow t$ ）
4）在 filtered Hit@1上效果提升顯著

方法

RULE EXTRACTION

主要針對三種規則：

inference rule： $\forall h, t:\left(h, r_{1}, t\right) \Rightarrow\left(h, r_{2}, t\right)$ ，如 $(Washington, isCapitalof, USA)⇒(Washington, isLocatedin, USA)$
transitivity rule
antisymmetryrule： $\forall h, t:\left(h, r_{1}, t\right) \Leftrightarrow\left(t, r_{2}, h\right)$

整體的抽取流程如圖：

Rule Sample Extraction。結合上圖理解。
Rule Candidate Extraction
這裏需要注意的是：對於inference rule文中有詳細說明怎麼區分concept和instance，主要是利用probase得到the concept of an entity，如下圖：
$\begin{array}{ll}\hline \text { Relation } & {\text { Concept-Instance Pairs }} \\ \hline \text { location.country } & {(\text { location-country })} \\ {\text { people. profession }} & {(\text { people-profession })} \\ {\text { music. songwriter }} & {(\text { music-songwriter })} \\ {\text { sports.boxer }} & {(\text { sports-boxer })} \\ {\text { book. magazine }} & {(\text { book-magazine })} \\ \hline\end{array}$
從上圖中可以看到country是location的instance，（這裏的Relation對應上面說的entity的concept）。
FB中的relation是有層次結構的，如 /location/country/capital 和 /location/location/contains ，爲了在inference rule中這兩個誰是concept/instance，可以利用上面提到的the concept of entity。
Score Calculation
在KALE中，排名靠前的規則是通過手動過濾的，很難在large-scale KG上應用，因此，本文提出一個從候選池中自動選擇規則的方法。score計算過程如下。 $GetNewtriples()$ 用來生成新的三元組，對 $(h,r_1,t)$ 用Inference candidate rule $r_1 \Rightarrow r_2$ ，能夠生成 $(h,r_2,t)$ 這樣的new triple

RULE-ENHANCED KNOWLEDGE GRAPH EMBEDDING

Rule-Enhanced TransE Model
$s_{1}(h, r, t)=\|\mathbf{h}+\mathbf{r}-\mathbf{t}\|_{l / 2}$
下圖是邏輯規則對應的數學表示：
$\begin{array}{ll}\hline \text { First-order logic } & {\text { Mathematical expression }} \\ \hline r(h) & {\mathbf{r}+\mathbf{h}} \\ {a \Rightarrow b} & {\mathbf{a}-\mathbf{b}} \\ {h \in C} & {\mathbf{h} \cdot \mathbf{C}(\mathbf{C} \text { is a matrix })} \\ {a \wedge b} & {\mathbf{a} \otimes \mathbf{b}} \\ {a \Leftrightarrow b} & {(\mathbf{a}-\mathbf{b}) \otimes(\mathbf{a}-\mathbf{b})} \\ \hline\end{array}$
用上面的對應表示，三種類型的規則有以下轉換：
Inference rule
$s_{2}(f)=\left\|\left(\mathbf{h}_{m} \cdot \mathbf{C}\right) \otimes\left(\mathbf{h}_{m}+\mathbf{r}_{1}-\mathbf{t}_{n}\right)-\left(\mathbf{h}_{m}+\mathbf{r}_{2}-\mathbf{t}_{n}\right)\right\|_{l_{1 / 2}}$
Transitivity rule
$\begin{aligned} s_{3}(f)=& \|\left[\left(\mathbf{e}_{l}+\mathbf{r}_{1}-\mathbf{e}_{m}\right) \otimes\left(\mathbf{e}_{m}+\mathbf{r}_{2}-\mathbf{e}_{n}\right)\right] \\ &-\left(\mathbf{e}_{l}+\mathbf{r}_{3}-\mathbf{e}_{n}\right) \|_{l_{1 / 2}} \end{aligned}$
Antisymmetry rule
$\begin{aligned} s_{4}(f) &=\left\|\left(\mathrm{TR}_{f}-\mathrm{TR}_{b}\right) \otimes\left(\mathrm{TR}_{b}-\mathrm{TR}_{f}\right)\right\|_{l_{1 / 2}} \\ \mathrm{TR}_{f} &=\mathrm{h}_{m}+\mathrm{r}_{1}-\mathrm{t}_{n}, \quad \mathrm{TR}_{b}=\mathrm{t}_{n}+\mathrm{r}_{2}-\mathrm{h}_{m} \end{aligned}$
Rule-Enhanced TransH Model，同上過程，具體看原文
Rule-Enhanced TransR Model，同上過程，具體看原文

GLOBAL OBJECTIVE FUNCTION

$I_n(f)$ 表示了門，如下圖所示

實驗

Datasets

FB166,FB15k,WN18
$\begin{array}{cccccc}\hline \text { Dataset } & {\# \mathrm{E}} & {\# \mathrm{R}} & \# \text { Trip. (Train } &/ \text { Valid } &/ \text { Test) } \\ \hline \text { FB15K } & {14,951} & {1,345} & {483,142} & {50,000} & {59,071} \\ {\mathrm{FB} 166} & {9,658} & {166} & {100,289} & {10,457} & {12,327} \\ {\mathrm{WN} 18} & {40,943} & {18} & {141,442} & {5,000} & {5,000} \\ \hline\end{array}$

指標

MR，mean rank，越小越好
$M R=\frac{1}{2 \# \mathcal{K}_{t}} \sum_{i=1}^{\# \mathcal{K}_{t}}\left(r a n k_{i h}+r a n k_{i t}\right)$
MRR，mean reciprocal rank，平均導數排名，越大越好。
$M R R=\frac{1}{2 \# \mathcal{K}_{t}} \sum_{i=1}^{\#}\left(\frac{1}{r a n k_{i h}}+\frac{1}{r a n k_{i t}}\right)$
Hits@n
$\text {Hits} @ n=\frac{1}{2 * \mathcal{K}_{t}} \sum_{j=1}^{\# \mathcal{K}_{t}}\left(I_{n}\left(r a n k_{i h}\right)+I_{n}\left(\text {rank}_{i t}\right)\right)\\ I_{n}\left(\operatorname{rank}_{i}\right)=\left\{\begin{array}{ll}{1} & {\text { if } r a n k_{i} \leq n} \\ {0} & {\text { otherwise }}\end{array}\right.$

Link Prediction

如下是在FB15上的結果，其餘數據集上的參考論文。TransE是原始的訓練集訓練，TransE（Pre）中是將規則得到的new triples加入到訓練集中訓練，TransE（RUle）是文中提到的將三元組和規則都transform到logical rule space進行joint訓練訓練。可以看到無論是基於哪種Translation-based model，本文方法的效果都最好。

Triple Classification

Triple Classification任務是判斷給定的三元組 $(h,r,t)$ 是否正確。
KALE基於TransE。基於TransE，三個數據集上acc提升約1%～2%。

粗略比較1,2,3

1 injecting xxx做的是關係抽取任務不像是現在常見的KGC的任務，更側重KGC，矩陣分解的辦法去進行的，融合規則的方式有兩種，一種是將規則產生的三原則加入到訓練中，一種是joint的方式，將規則以t-norm的形式加入到矩陣分解過程中。學習到的是實體對和關係的表示。規則也是提前生成。
2 KALE受到上面的啓發，在TransE模型中加入了規則，加入的方式也是t-norm，joint訓練時候的損失函數跟TransE一致，對規則也生成了負樣本。學習到實體和關係的表示。規則提前生成。
3 powerd xxx這篇規則並非通過手動選擇，而是通過度量規則生成的facts中true facts in raw KG的比例來選擇，提出可以集成到任何Translate-based KGE方法中。學習到實體和關係的表示。

4. Dismult：EMBEDDING ENTITIES AND RELATIONS FOR LEARN- ING AND INFERENCE IN KNOWLEDGE BASES

ICLR 2015，BiShan Yang, EMBEDDING ENTITIES AND RELATIONS FOR LEARN- ING AND INFERENCE IN KNOWLEDGE BASES

5. IterE: Iteratively Learning Embeddings and Rules for Knowledge Graph Reasoning

2019 IW3C2：paper

methods of rule mining

1. Neural LP: Differentiable learning of logical rules for knowledge base reasoning

NIPS 2017: https://papers.nips.cc/paper/6826-differentiable-learning-of-logical-rules-for-knowledge-base-reasoning.pdf
可直接參考:

張文
or NYSDY

文章的研究 probabilistic first-order logical rules for knowledge base reasoning，任務的難點是需要學習連續空間的參數和離散空間的結構。爲此，文中提出了一個基於端到端的可微分模型（end-to-end differentiable model），Neural Logic Programming，簡稱Neural LP。方法基於Tensorlog提出，Tensorlog可以將推理任務編譯爲可微分操作的序列。

2. DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs

NeurIPS 2019:https://papers.nips.cc/paper/9669-drum-end-to-end-differentiable-rule-mining-on-knowledge-graphs.pdf

文章的開頭就提到說雖然inductive link prediction很重要，但是很多的工作關注的點是deductive link prediction，這種方式不能管理unseen entities，且很多都是block-box模型不可解釋，因此本文提出DRUM，a scalable and differentiable approach 挖掘邏輯規則。

一些問題：

什麼是Differentiable？可拆分？A：將離散的規則表示成可以微分/計算的方式

什麼是inductive？deductive？ A：inductive是通過很多例子來歸納出來，具體到抽象，deductive是抽象到具體。

simultaneously learn rule structures as well as appropriate scores is crucial？這裏的rule struct是說什麼？A：就是規則，（規則是帶有結構的，）scores指的是一些評價指標像是PCA confidence這些。

方法

將實體集合 $\varepsilon$ 中的實體用one-hot向量表示爲 ${v_1,v_2,...,v_n}$ ， $n$ 是實體的數量， $A_{B_r}\in\mathbb{R}^{n*n}$ 表示關係 $B_r$ 的鄰接矩陣。

將離散的問題轉爲線性可微分的問題。原始的離散問題是存在一條從 $x$ 到 $y$ 的路徑 $B_1(x,z_1) \wedge B_2(z_1,z_2) \wedge ... \wedge B_T(z_{T-1},y)$ 等同於 $v_x^T \cdot A_{B_1} \cdot A_{B_2} \cdots A_{B_T} \cdot v_y$ 是positive scalar，這個positive scalar等於從 $x$ 到 $y$ 經過 $B_{r_i}$ 路徑長度爲 $T$ 的路徑數量。那麼找關於關係 $H$ 的logical rules相當於學習參數 $\alpha$ 使 $O_H(\alpha)$ 最大：

$O_H(\alpha)=\sum_{(x,H,y)\in KG}v_x^T\omega_H(\alpha)v_y \tag {3.1}$

$\omega_H(\alpha) = \sum_s\alpha_s \prod_{k\in p_s}A_{B_k}\tag {3.2}$

其中， $s$ 是所有從 $x$ 到 $y$ 的規則， $p_s$ 就是規則 $s$ 中涉及的關係， $\alpha_s$ 表示規則的confidence。對於長度爲 $T$ 的規則，規則的第 $i$ 位可選的關係有 $|\mathcal{R}|$ 種，因此上述 $\alpha$ 數量爲 $\mathcal{O}(|\mathcal{R}|^T)$ 。爲了減小參數量，將上述 $\omega_H(\alpha)$ 重寫成以下公式，相當於說關係 $i$ 出現在規則中的第 $k$ 位的置信度爲 $\alpha_{i,k}$ ，此時參數量降低爲 $\mathcal{O}(T\mathcal{R})$ 。

$\Omega_H(\alpha) =\prod_{i=1}^T \sum_{k=1}^{|\mathcal{R}|}\alpha_{i,k} A_{B_k}\tag {3.3}$

但是，重寫後的公式只能學習長度爲T的規則，爲了解決這個問題，定義了一個新關係 $B_0$ ，它的embedding爲單位陣 $A_{B_0}=I_n$ 。 $B_0$ 它可以出現在長度爲T的規則的任意位置，出現任意次，它的加入並不會影響最後的值，但是這樣就可以表示任意長度的規則。

$\Omega_H^I(\alpha) =\prod_{i=1}^T (\sum_{k=0}^{|\mathcal{R}|}\alpha_{i,k} A_{B_k}) \tag {3.4}$

這樣 $\Omega_H^I(\alpha)$ 可以表達長度不大於T的規則，且參數量只有 $T(|\mathcal(R)+1|)$ 。雖然 $\Omega_H^I(\alpha)$ 考慮到了所有的規則，但是，仍然受到了learning correct rule confidence的約束，文中給出證明confidence約束不可避免地會帶來mines incorrect rules with high confidences的問題。（文中有具體證明過程）

Recall 長度爲T的規則的數量最多爲 $|\mathcal{R}+1|^T$ ，可以看成\mathcal{R}+1個T維張量。矩陣中的每個值反應了規則body爲 $B_{r_1},B_{r_2},\cdots,B_{r_T}$ 的置信度，稱之爲confidence value tensor。文中證明 $(3.4)$ 中的 $\Omega_H^I(\alpha)$ 置信度是confidence value tensor的rank estimation。

由於低秩逼近（不僅僅是秩1）是張量逼近的一種流行方法，因此我們使用它來推廣 $\Omega_H^I(\alpha)$ 。 $(3.4)$ 可以轉換爲： $\alpha_{j,i,k}$ 含義？這裏是轉換爲low-rank進行計算嗎

$\Omega_H^L(\alpha,L) = \sum_{j=1}^L\{\prod_{i=1}^T (\sum_{k=0}^{|\mathcal{R}|}\alpha_{j,i,k} A_{B_k})\} \tag {3.5}$

注意 $\Omega_H^L$ 中的參數量現在是 $LT|\mathcal{R}+1|$ ，這是對於一個關係作爲規則頭的參數量，對所有關係，學習相關的規則需要的參數量爲 $LT|\mathcal{R}+1| \cdot |\mathcal{R}|$ ，爲 $\mathcal{O}( |\mathcal{R}|^2)$ ，仍然非常大。

另外一個更重要的問題是通過優化 $\Omega_H^L$ 學習到的規則之間的相互獨立的，學習一條規則並不能幫助另外一條的學習。對此引入RNN解決。是通過RNN共享參數使得他們之間不是相互獨立的？
$\begin{array}{l}{\mathbf{h}_{i}^{(j)}, \mathbf{h}_{T-i+1}^{\prime(j)}=\mathbf{B} \mathbf{i} \mathbf{R} \mathbf{N} \mathbf{N}_{j}\left(\mathbf{e}_{H}, \mathbf{h}_{i-1}^{(j)}, \mathbf{h}_{T-i}^{\prime}\right)} \\ {\left[a_{j, i, 1}, \cdots, a_{j, i,|\mathcal{R}|+1}\right]=f_{\theta}\left(\left[\mathbf{h}_{i}^{(j)}, \mathbf{h}_{T-i+1}^{(j)}\right]\right)}\end{array}\tag{3.6}$
$f_{\theta}$ 是全連接層，隱層 $h$ 和 $h'$ are zero initialized。

實驗

1) Statistical Relation Learning

數據集：
$\begin{array}{cccc}\hline & {\#\text { Triplets }} & {\# \text { Relations }} & {\# \text { Entities }} \\ \hline \text { Family } & {28356} & {12} & {3007} \\ {\text { UMLS }} & {5960} & {46} & {135} \\ {\text { Kinship }} & {9587} & {25} & {104} \\ \hline\end{array}$

用了三個數據集：1）Family數據集包含多個家庭的個體之間的血統關係，2）統一醫學語言系統（UMLS）由生物醫學概念（例如藥物和疾病名稱）以及它們之間的關係（例如診斷和治療）組成；3）Kinship數據集是澳大利亞中部土著部落成員之間的親屬關係。

結果：

2) Knowledge Graph Completion–link prediction任務

數據集：

結果1：

結果2:
表5是inductive下的測試結果。Inductive link prediction任務中測試和訓練集中涉及的實體交集爲空，這種情況下new entity沒有對應的embedding，因此基於embedding下的方法效果顯著下降。

3) Quality and Interpretability of the Rules

人爲評估。family數據集。紅色的是錯誤的。

joshuwang0810

發佈了14 篇原創文章 · 獲贊 11 · 訪問量 2萬+

私信關注

KGE & logical rules