【論文筆記】On How to Perform a Gold Standard Based Evaluation of Ontology Learning

原創

2020-06-17 08:22

導讀

文章是 2006年 ISWC的，CCF的B類會議。我是第一次看這個會議的paper
文章主要說目前的ontology learning中，關於concept hierarchy的evaluation還是非常的欠缺。因此文章主要的貢獻是爲了提供一種新的taxonomic的度量（taxonomic就是hierarchy結構）

Introduction

目前主要有3種evaluation方式:

1 在應用程序上驗證（應該是通過下游應用來進行驗證

2 專家驗證

3 用預定義好的gold standard進行驗證（這當然就是我們的選擇！

這篇文章會專注於evaluate concept hierarchy

Related Work

筆者: 相關工作也讀是覺得既然hierarchy clustering找不到自己想要的evaluation方式，那麼從taxonomy construction evaluation肯定能找到的，這一塊的evaluation我還沒有研究過，因此要好好注意一下

Lexical ontology的evaluation

ontology就是本體，就是concept，就是item。
通常用於推論。比如本體 $X, Y, Z$
$X$ 是 $Y$ , $Z$ 的老師
構成兩個三元組之後，可以用於推導關係
$Y$ 是 $Z$ 的同學
ontology就是一個item信息，wiki上的東西全都是ontology。

比較通常是binary的，也就是把標準的和學習到的ontology或者是taxonomy進行比較。
其中， ontology 的比較：

Term Precision, Recall

Lexical Precision, Recall

simply Precision, Recall

String matching,基於編輯距離來evaluate 兩個taxonomy

筆者：ontology的比較非常的簡單，這裏就不去研究了，畢竟兩個單詞集合再比較也不會特別複雜，且並非這篇文章的重點。
重點在於taxonomy，也就是這個structure的evaluation

concept hierarchy的比較比ontology的更加複雜

local measure 比較的是兩個hierarchy之間的concept位置

glocal measure 平均了所有concept pair的local measure結果

$Taxonomic$ $Overlap$
Paper: [6], [7], [8] to read
6:Ontology Learning for the Semantic Web 2002
7:Measuring similarity between ontologies EKAW 2002
8:Learning concept hierarchies from text corpora using formal concept analysis. 2005 JAIR

$Augmented$ $Precision$ $and$ $Recall$

$Learning Accuracy$ LA[10], compare the distance in the tree. the length to the tree and the length to their specific common abstraction.

$Balanced Distance Metric$ $BDM$ [9]

$OntoRand$ index [11] 是一個對稱度量，在clustering結果上進行對比（並且強制了concept hierarchy必須包含相同的instance集合）

基於公共祖先

基於concept在樹上的距離。
Paper: [9, 10, 11] to read.
9 : Metrics for evaluation of ontology-based information extraction 2009 EON Workshop
10: Towards text knowledge engineering. 1998 AAAI
11:Methods for ontology evaluation. 2004 Knowledge Web

Criteria for Good Evaluation Measures

這一章節主要描述了作者是如何定義ontology learning的evaluation metric之中，最重要的幾個評價標準。
也就是說，評價評價指標的指標是哪些。

1 ontology 的evaluation和concept hierarchy的evaluation應該可以獨立

2 錯誤的影響應該是成比例的。比如在root節點上的問題，會比在葉子節點上的問題更大
筆者：這很make sense，因爲這個評估指標體現出了evaluation metric具有重要性的一個概念。在root的節點更重要一些。

3 度量值需要的一個逐漸的錯誤應該導致測量結果逐漸的變差，因爲如果輕微的錯誤導致了結果直接大幅降低，那麼很難區分小的誤差和嚴重的誤差

這邊提到的幾個標準，上述相關工作中指標很多都不滿足…

然後作者自己提出了一個指標，說滿足所有的標準
行吧…

Comparing Learned Ontologies with Gold Standards

Definition 1.
$O:= (C, root, \le_c)$ 是一個ontology，其中 $C$ 是一個concept identifier, root是根節點。 partial order $\le_c$ 是一個taxonomy。這三個元素構成了core ontology learning problem

$Ref$ 是reference，也就是gold standard hierarchy
$Comp$ 是計算得出的hierachy

Precision & Recall

$P$ $R$ $F1$ 這邊這麼簡單的就不再提及了

Lexical Precision & Recall

$O_C$ 爲 core ontology of computed results.
$O_R$ 爲 core ontology of gold standard.
$LP(O_C, O_R) = \frac{|C_C \cap C_R|}{|C_C|} LR(O_C, O_R) = \frac{|C_C \cap C_R|}{|C_R|}$

那麼lexical precision 和recall沒有考慮結構的，只是考慮構建的ontology元素是否準確以及元素是否全部考慮進去。這裏不再贅述了。

Taxonomic Precision & Recall

這裏只提供taxonomy precision的計算，由於recall和F1計算推導都很簡單。就不提及了

兩個concept的相似程度，是由他們的characteristic決定的，i.e. 在hierarchy之中的距離。
那麼距離是由於他與common object之間的相似度決定的。也就是存在一個characteristic extract $ce$ .

那麼在兩個hierarchy之間的兩個concept的對比如下
local taxonomy precision $tp_{ce}$ of concept $c_1 \in O_C$ and $c2 \in O_R$

$tp_{ce}(c_1, c_2, O_C, O_R):= \frac{|ce(c_1, O_C) \cap ce(c_2,O_R)|}{|ce(c_1, O_C)|}$
這個characteristic extract 是一個非常重要的building block

在reference 7 中通過semantic cotopy 去characterize 一個concept。也就是他所有的上層級以及下層級。
給定concept $c \in C$
$sc(c, O):= \{c_i| c_i \in C\ and\ (c_i \le c\ or\ c \le c_i)\}$

這個一看就很不合理吧，如果一個concept消失了，會影響他的所有上級，並且重複影響。

改進版本： common semantic cotopy $csc$

$csc(c, O_1, O_2) := \{c_i| c_i \in C_1 \in C_2\ and \ (c_i <_1 c\ or \ c <_1 c_i\}$

這個裏面最大的改進就是，僅取二者均有的concept進行對比。

最終的分數是取平均

我建議使用sc版本的，最終的分數計算如下：如果聚類的類別在標準的hierarchy中不存在，就直接0分，並且會拉低最終分數的權重。

$T P_{s c}\left(\mathcal{O}_{C}, \mathcal{O}_{R}\right):=\frac{1}{\left|\mathcal{C}_{C}\right|} \sum_{c \in \mathcal{C}_{C}}\left\{\begin{array}{ll} {t p_{s c}\left(c, c, \mathcal{O}_{C}, \mathcal{O}_{R}\right)} & {\text { if } c \in \mathcal{C}_{R}} \\ {0} & {\text { if } c \notin \mathcal{C}_{R}} \end{array}\right.$

這一部分是Taxonomy Precision的計算。Taxonomy Recall的公式就不列出來了，也很簡單。
F1就是他們倆乘2倍除上相加。

還有一個metric可以度量兩個taxonomy之間的similarity，叫做Taxonomy Overlap

$t o_{s c}\left(c_{1}, c_{2}, \mathcal{O}_{1}, \mathcal{O}_{2}\right):=\frac{\left|s c\left(c_{1}, \mathcal{O}_{1}\right) \cap s c\left(c_{2}, \mathcal{O}_{2}\right)\right|}{\left|s c\left(c_{1}, \mathcal{O}_{1}\right) \cup s c\left(c_{2}, \mathcal{O}_{2}\right)\right|}$

這個是兩個concept之間的關係。通過這樣的公式可以推導出整體的TO相似度。

可以證明
$TO=\frac{TF}{2-TF}$

所以只要優化TF就可以了。

！！！我有idea了，nice！

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

【論文筆記】On How to Perform a Gold Standard Based Evaluation of Ontology Learning

導讀

Introduction

Related Work

Lexical ontology的evaluation

Criteria for Good Evaluation Measures

Comparing Learned Ontologies with Gold Standards

Precision & Recall

Lexical Precision & Recall

Taxonomic Precision & Recall

AI模型 Llama 3體驗筆記

【面試準備】又一次失敗的面試經歷，題目離譜～資深軟件測試工程師

dotnet 8 版本與銀河麒麟V10和UOS系統的 glibc 兼容性

【論文筆記】Auto-Encoding Variational Bayes

【論文筆記】Deep Metric Learning via Facility Location

【論文筆記】Joint Unsupervised Learning of Deep Representations and Image Clusters

【論文筆記】On How to Perform a Gold Standard Based Evaluation of Ontology Learning

【Python3】深層結構中的值刪除問題/ python列表刪除值出錯

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結