【论文笔记】On How to Perform a Gold Standard Based Evaluation of Ontology Learning

原創

2020-06-17 08:22

导读

文章是 2006年 ISWC的，CCF的B类会议。我是第一次看这个会议的paper
文章主要说目前的ontology learning中，关于concept hierarchy的evaluation还是非常的欠缺。因此文章主要的贡献是为了提供一种新的taxonomic的度量（taxonomic就是hierarchy结构）

Introduction

目前主要有3种evaluation方式:

1 在应用程序上验证（应该是通过下游应用来进行验证

2 专家验证

3 用预定义好的gold standard进行验证（这当然就是我们的选择！

这篇文章会专注于evaluate concept hierarchy

Related Work

笔者: 相关工作也读是觉得既然hierarchy clustering找不到自己想要的evaluation方式，那么从taxonomy construction evaluation肯定能找到的，这一块的evaluation我还没有研究过，因此要好好注意一下

Lexical ontology的evaluation

ontology就是本体，就是concept，就是item。
通常用于推论。比如本体 $X, Y, Z$
$X$ 是 $Y$ , $Z$ 的老师
构成两个三元组之后，可以用于推导关系
$Y$ 是 $Z$ 的同学
ontology就是一个item信息，wiki上的东西全都是ontology。

比较通常是binary的，也就是把标准的和学习到的ontology或者是taxonomy进行比较。
其中， ontology 的比较：

Term Precision, Recall

Lexical Precision, Recall

simply Precision, Recall

String matching,基于编辑距离来evaluate 两个taxonomy

笔者：ontology的比较非常的简单，这里就不去研究了，毕竟两个单词集合再比较也不会特别复杂，且并非这篇文章的重点。
重点在于taxonomy，也就是这个structure的evaluation

concept hierarchy的比较比ontology的更加复杂

local measure 比较的是两个hierarchy之间的concept位置

glocal measure 平均了所有concept pair的local measure结果

$Taxonomic$ $Overlap$
Paper: [6], [7], [8] to read
6:Ontology Learning for the Semantic Web 2002
7:Measuring similarity between ontologies EKAW 2002
8:Learning concept hierarchies from text corpora using formal concept analysis. 2005 JAIR

$Augmented$ $Precision$ $and$ $Recall$

$Learning Accuracy$ LA[10], compare the distance in the tree. the length to the tree and the length to their specific common abstraction.

$Balanced Distance Metric$ $BDM$ [9]

$OntoRand$ index [11] 是一个对称度量，在clustering结果上进行对比（并且强制了concept hierarchy必须包含相同的instance集合）

基于公共祖先

基于concept在树上的距离。
Paper: [9, 10, 11] to read.
9 : Metrics for evaluation of ontology-based information extraction 2009 EON Workshop
10: Towards text knowledge engineering. 1998 AAAI
11:Methods for ontology evaluation. 2004 Knowledge Web

Criteria for Good Evaluation Measures

这一章节主要描述了作者是如何定义ontology learning的evaluation metric之中，最重要的几个评价标准。
也就是说，评价评价指标的指标是哪些。

1 ontology 的evaluation和concept hierarchy的evaluation应该可以独立

2 错误的影响应该是成比例的。比如在root节点上的问题，会比在叶子节点上的问题更大
笔者：这很make sense，因为这个评估指标体现出了evaluation metric具有重要性的一个概念。在root的节点更重要一些。

3 度量值需要的一个逐渐的错误应该导致测量结果逐渐的变差，因为如果轻微的错误导致了结果直接大幅降低，那么很难区分小的误差和严重的误差

这边提到的几个标准，上述相关工作中指标很多都不满足…

然后作者自己提出了一个指标，说满足所有的标准
行吧…

Comparing Learned Ontologies with Gold Standards

Definition 1.
$O:= (C, root, \le_c)$ 是一个ontology，其中 $C$ 是一个concept identifier, root是根节点。 partial order $\le_c$ 是一个taxonomy。这三个元素构成了core ontology learning problem

$Ref$ 是reference，也就是gold standard hierarchy
$Comp$ 是计算得出的hierachy

Precision & Recall

$P$ $R$ $F1$ 这边这么简单的就不再提及了

Lexical Precision & Recall

$O_C$ 为 core ontology of computed results.
$O_R$ 为 core ontology of gold standard.
$LP(O_C, O_R) = \frac{|C_C \cap C_R|}{|C_C|} LR(O_C, O_R) = \frac{|C_C \cap C_R|}{|C_R|}$

那么lexical precision 和recall没有考虑结构的，只是考虑构建的ontology元素是否准确以及元素是否全部考虑进去。这里不再赘述了。

Taxonomic Precision & Recall

这里只提供taxonomy precision的计算，由于recall和F1计算推导都很简单。就不提及了

两个concept的相似程度，是由他们的characteristic决定的，i.e. 在hierarchy之中的距离。
那么距离是由于他与common object之间的相似度决定的。也就是存在一个characteristic extract $ce$ .

那么在两个hierarchy之间的两个concept的对比如下
local taxonomy precision $tp_{ce}$ of concept $c_1 \in O_C$ and $c2 \in O_R$

$tp_{ce}(c_1, c_2, O_C, O_R):= \frac{|ce(c_1, O_C) \cap ce(c_2,O_R)|}{|ce(c_1, O_C)|}$
这个characteristic extract 是一个非常重要的building block

在reference 7 中通过semantic cotopy 去characterize 一个concept。也就是他所有的上层级以及下层级。
给定concept $c \in C$
$sc(c, O):= \{c_i| c_i \in C\ and\ (c_i \le c\ or\ c \le c_i)\}$

这个一看就很不合理吧，如果一个concept消失了，会影响他的所有上级，并且重复影响。

改进版本： common semantic cotopy $csc$

$csc(c, O_1, O_2) := \{c_i| c_i \in C_1 \in C_2\ and \ (c_i <_1 c\ or \ c <_1 c_i\}$

这个里面最大的改进就是，仅取二者均有的concept进行对比。

最终的分数是取平均

我建议使用sc版本的，最终的分数计算如下：如果聚类的类别在标准的hierarchy中不存在，就直接0分，并且会拉低最终分数的权重。

$T P_{s c}\left(\mathcal{O}_{C}, \mathcal{O}_{R}\right):=\frac{1}{\left|\mathcal{C}_{C}\right|} \sum_{c \in \mathcal{C}_{C}}\left\{\begin{array}{ll} {t p_{s c}\left(c, c, \mathcal{O}_{C}, \mathcal{O}_{R}\right)} & {\text { if } c \in \mathcal{C}_{R}} \\ {0} & {\text { if } c \notin \mathcal{C}_{R}} \end{array}\right.$

这一部分是Taxonomy Precision的计算。Taxonomy Recall的公式就不列出来了，也很简单。
F1就是他们俩乘2倍除上相加。

还有一个metric可以度量两个taxonomy之间的similarity，叫做Taxonomy Overlap

$t o_{s c}\left(c_{1}, c_{2}, \mathcal{O}_{1}, \mathcal{O}_{2}\right):=\frac{\left|s c\left(c_{1}, \mathcal{O}_{1}\right) \cap s c\left(c_{2}, \mathcal{O}_{2}\right)\right|}{\left|s c\left(c_{1}, \mathcal{O}_{1}\right) \cup s c\left(c_{2}, \mathcal{O}_{2}\right)\right|}$

这个是两个concept之间的关系。通过这样的公式可以推导出整体的TO相似度。

可以证明
$TO=\frac{TF}{2-TF}$

所以只要优化TF就可以了。

！！！我有idea了，nice！

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

【论文笔记】On How to Perform a Gold Standard Based Evaluation of Ontology Learning

导读

Introduction

Related Work

Lexical ontology的evaluation

Criteria for Good Evaluation Measures

Comparing Learned Ontologies with Gold Standards

Precision & Recall

Lexical Precision & Recall

Taxonomic Precision & Recall

诈骗（杀猪盘）网站进行渗透测试

Python 潮流周刊#50：我最喜欢的 Python 3.13 新特性！

外行也能读懂的网络硬件设备功能原理速成

【論文筆記】Auto-Encoding Variational Bayes

【論文筆記】Deep Metric Learning via Facility Location

【論文筆記】Joint Unsupervised Learning of Deep Representations and Image Clusters

【論文筆記】On How to Perform a Gold Standard Based Evaluation of Ontology Learning

【Python3】深層結構中的值刪除問題/ python列表刪除值出錯

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結