WordNet介紹和使用

原創

2020-02-21 12:31

Wordnet是一個詞典。每個詞語(word)可能有多個不同的語義，對應不同的sense。而每個不同的語義（sense）又可能對應多個詞，如topic和subject在某些情況下是同義的，一個sense中的多個消除了多義性的詞語叫做lemma。例如，“publish”是一個word，它可能有多個sense：

1. (39) print, publish -- (put into print; "The newspaper published the news of the royal couple's divorce"; "These news should not be printed")

2. (14) publish, bring out, put out, issue, release -- (prepare and issue for public distribution or sale; "publish a magazine or newspaper")

3. (4) publish, write -- (have (one's written work) issued for publication; "How many books did Georges Simenon write?"; "She published 25 books during her long career")

在第一個sense中，print和publish都是lemma。Sense 1括號內的數字39表示publish以sense 1在某外部語料中出現的次數。顯然，publish大多數時候以sense 1出現，很少以sense 3出現。

WordNet的具體用法

NLTK是python的一個自然語言處理工具，其中提供了訪問wordnet各種功能的函數。下面簡單列舉一些常用功能：

得到wordnet本身：

from nltk.corpus import wordnet

獲得一個詞的所有sense，包括詞語的各種變形的sense：

wordnet.synsets('published')

[Synset('print.v.01'),

Synset('publish.v.02'),

Synset('publish.v.03'),

Synset('published.a.01'),

Synset('promulgated.s.01')]

得到synset的詞性：

>>> related.pos

's'

得到一個sense的所有lemma：

>>> wordnet.synsets('publish')[0].lemmas

[Lemma('print.v.01.print'), Lemma('print.v.01.publish')]

得到Lemma出現的次數：

>>> wordnet.synsets('publish')[0].lemmas[1].count()

在wordnet中，名詞和動詞被組織成了完整的層次式分類體系，因此可以通過計算兩個sense在分類樹中的距離，這個距離反應了它們的語義相似度：

>>> x = wordnet.synsets('recommended')[-1]

>>> y = wordnet.synsets('suggested')[-1]

>>> x.shortest_path_distance(y)

形容詞和副詞的相似度計算方法：

形容詞和副詞沒有被組織成分類體系，所以不能用path_distance。

>>> a = wordnet.synsets('beautiful')[0]

>>> b = wordnet.synsets('good')[0]

>>> a.shortest_path_distance(b)

-1

形容詞和副詞最有用的關係是similar to。

>>> a = wordnet.synsets('glorious')[0]

>>> a.similar_tos()

[Synset('incandescent.s.02'),

Synset('divine.s.06'),

……]

ICTExtr9

發佈了29 篇原創文章 · 獲贊 5 · 訪問量 30萬+

私信關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

WordNet介紹和使用

MySQL 核心模塊揭祕 | 18 期 | 鎖在內存里長什麼樣*

使用perf工具生成火焰圖

HttpSecurity 是如何組裝過濾器鏈的

數說海南——近6年海南各市縣人口簡單看

長序列中Transformers的高級注意力機制總結

大齡程序員思考

響應式界面控件DevExtreme * 更強的數據分析和可視化功能

wrapper的來歷

2009年相關會議的簡要信息

有關信息抽取的文章列表(1)

網上信息抽取技術縱覽

與網頁內容抽取相關的文獻

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結