文章目錄

BosonNLP API 中文語義分析

BosonNLP API 中文語義分析

參見 python版 BosonNLP HTTP API 封裝庫（SDK）：http://bosonnlp-py.readthedocs.io/#bosonnlp

BosonNLP 官網：http://bosonnlp.com/
BosonNLP HTTP API 文檔： http://docs.bosonnlp.com/index.html

from __future__ import print_function, unicode_literals
from bosonnlp import BosonNLP
import requests, json

token = 'your Token' # 個人token！！！

nlp = BosonNLP(token)  # nlp = BosonNLP('YOUR_API_TOKEN')

查詢 API 頻率限制

免費用戶的 API 每天有次數限制的，具體如下圖：

當然，通過購買，可以增加次數，費用情況如下：

我覺得，面對這麼完善的中文文本分析功能，免費用戶卻擁有全部的功能，即便每天有次數限制，已然值得稱讚~~~

#  本接口用來查詢用戶使用 BosonNLP API 頻率限制的詳細信息。
HEADERS = {'X-Token': token} # 注意：在測試時請更換爲您的 API token
RATE_LIMIT_URL = 'http://api.bosonnlp.com/application/rate_limit_status.json'
result = requests.get(RATE_LIMIT_URL, headers=HEADERS).json()

result['limits'].keys() 
# (['review', 'keywords', 'tag', 'classify', 'depparser', 'time', 'summary', 'ner', 'cluster', 'comments', 'suggest', 'sentiment'])

dict_keys([‘review’, ‘keywords’, ‘tag’, ‘classify’, ‘depparser’, ‘time’, ‘summary’, ‘ner’, ‘cluster’, ‘comments’, ‘suggest’, ‘sentiment’])

例：查詢情感分析剩餘次數

result['limits']['sentiment'].keys() 
# (['rate-limit-limit', 'rate-limit-remaining', 'rate-limit-reset', 'quota-limit', 'count-limit-reset', 'count-limit-limit', 'quota-remaining', 'count-limit-remaining'])
result['limits']['sentiment']['count-limit-remaining'] # 查詢情感分析次數

寫成函數的形式：

def sentiment_limit_remaining(): 
    result = requests.get(RATE_LIMIT_URL, headers=HEADERS).json()
    return result['limits']['sentiment']['count-limit-remaining']

情感分析

核心函數：nlp.sentiment(data, model = 'general')

參見：http://docs.bosonnlp.com/sentiment.html

model 參數用來傳遞模型名選擇用特定行業語料進行訓練的模型；可選值，默認爲 general 。

模型名	行業	URL
general	通用	http://api.bosonnlp.com/sentiment/analysis
auto	汽車	http://api.bosonnlp.com/sentiment/analysis?auto
kitchen	廚具	http://api.bosonnlp.com/sentiment/analysis?kitchen
food	餐飲	http://api.bosonnlp.com/sentiment/analysis?food
news	新聞	http://api.bosonnlp.com/sentiment/analysis?news
weibo	微博	http://api.bosonnlp.com/sentiment/analysis?weibo

返回結果說明：

第一個值爲非負面概率，第二個值爲負面概率，兩個值相加和爲 1。

nlp.sentiment(['這家味道還不錯', '菜品太少了而且還不新鮮'], model='weibo')

[[0.9694666780709835, 0.03053332192901642],
[0.07346999807197441, 0.9265300019280256]]

nlp.sentiment(['這家味道還不錯', '菜品太少了而且還不新鮮'], model='food')

[[0.9991737012037423, 0.0008262987962577828],
[9.940036427291687e-08, 0.9999999005996357]]

或者使用 HTTP Header 返回

SENTIMENT_URL = 'http://api.bosonnlp.com/sentiment/analysis?weibo' # 微博分析api
headers = {'X-Token': token} # 注意：在測試時請更換爲您的 API token 。
s = [' 他是個傻逼 ', ' 美好的世界 ']
data = json.dumps(s) # 包裝成 json

HTTP 返回 Body JSON 格式的 [double, double] 類型組成的列表。

resp = requests.post(SENTIMENT_URL, headers=headers, data=data.encode('utf-8')) # 上傳 data 進行分析

resp.text # 顯示情感分數

‘[[0.4434637245024887, 0.5565362754975113], [0.9340287284701145, 0.06597127152988551]]’

分詞與詞性標註

核心函數：nlp.tag(contents, space_mode=0, oov_level=3, t2s=0, special_char_conv=0)

函數參數參見：http://docs.bosonnlp.com/tag.html

詞性標註說明參見：http://docs.bosonnlp.com/tag_rule.html

BosonNLP 的詞性標註非常詳細，共有 22個大類，70個標籤！！

而且 BosonNLP 分詞和詞性標註系統還提供了多種分詞選項，以滿足不同開發者的需求：

空格保留選項 (space_mode)
新詞枚舉強度選項 (oov_level)
繁簡轉換選項 (t2s)
特殊字符轉換選項 (special_char_conv)

result = nlp.tag(['成都商報記者 姚永忠', '調用參數及返回值詳細說明見'])
print(result)

[{‘tag’: [‘ns’, ‘n’, ‘n’, ‘nr’], ‘word’: [‘成都’, ‘商報’, ‘記者’, ‘姚永忠’]}, {‘tag’: [‘v’, ‘n’, ‘c’, ‘v’, ‘n’, ‘ad’, ‘v’, ‘v’], ‘word’: [‘調用’, ‘參數’, ‘及’, ‘返回’, ‘值’, ‘詳細’, ‘說明’, ‘見’]}]

關鍵詞提取

核心函數：nlp.extract_keywords(text, top_k=None, segmented=False)

參見：http://docs.bosonnlp.com/keywords.html

keywords = nlp.extract_keywords('病毒式媒體網站：讓新聞迅速蔓延', top_k=2)
print(keywords) # 返回權重和關鍵詞，所有關鍵詞的權重的平方和爲 1

[[0.5686631749811326, ‘蔓延’], [0.5671956747680966, ‘病毒’]]

語義聯想

核心函數：nlp.suggest(data)

參見：http://docs.bosonnlp.com/suggest.html

term = '粉絲'
result = nlp.suggest(term, top_k=10)
for score, word in result:
    print(score, word)

0.9999999999999996 粉絲/n
0.48602467961311013 腦殘粉/n
0.47638025976400944 聽衆/n
0.4574711603743689 球迷/n
0.4427939662212161 觀衆/n
0.43996388413040877 噴子/n
0.43706751168681585 樂迷/n
0.43651710096540336 鰻魚/n
0.4357353461210975 水軍/n
0.4332090811336725 好友/n

新聞分類

核心函數：nlp.classify(data)

參見：http://docs.bosonnlp.com/classify.html

編號	分類	編號	分類
0	體育	7	科技
1	教育	8	互聯網
2	財經	9	房產
3	社會	10	國際
4	娛樂	11	女人
5	軍事	12	汽車
6	國內	13	遊戲

s = ['俄否決安理會譴責敘軍戰機空襲阿勒頗平民',
     '鄧紫棋談男友林宥嘉：我覺得我比他唱得好',
     'Facebook收購印度初創公司']
result = nlp.classify(s)
result

[5, 4, 8]

新聞摘要

核心函數：summary(title, content, word_limit=0.3, not_exceed=False)

參見：http://docs.bosonnlp.com/summary.html

content = (
    '騰訊科技訊（劉亞瀾）10月22日消息，前優酷土豆技術副總裁'
    '黃冬已於日前正式加盟芒果TV，出任CTO一職。'
    '資料顯示，黃冬歷任土豆網技術副總裁、優酷土豆集團產品'
    '技術副總裁等職務，曾主持設計、運營過優酷土豆多個'
    '大型高容量產品和系統。'
    '此番加入芒果TV或與芒果TV計劃自主研發智能硬件OS有關。')
title = '前優酷土豆技術副總裁黃冬加盟芒果TV任CTO'
nlp.summary(title, content, 0.1)

‘騰訊科技訊（劉亞瀾）10月22日消息，前優酷土豆技術副總裁黃冬已於日前正式加盟芒果TV，出任CTO一職。’

時間轉換

核心函數：nlp.convert_time(data, basetime=None)

參見：http://docs.bosonnlp.com/time.html

感覺這是一個獨(ling)特(lei)的文本分析功能，用在時間文本上面，應該是個不錯的選擇。

import datetime # 使用 basetime 時導入該模塊
nlp.convert_time(
    "2013年二月二十八日下午四點三十分二十九秒",
    datetime.datetime.today()) # datetime.datetime(2017, 10, 19, 22, 21, 18, 434128)

{‘timestamp’: ‘2013-02-28 16:30:29’, ‘type’: ‘timestamp’}

nlp.convert_time("今天晚上8點到明天下午3點", datetime.datetime(2015, 9, 1))

{‘timespan’: [‘2015-09-01 20:00:00’, ‘2015-09-02 15:00:00’],
‘type’: ‘timespan_0’}

nlp.convert_time("今天晚上8點到明天下午3點",  datetime.datetime.today()) #

{‘timespan’: [‘2017-10-21 20:00:00’, ‘2017-10-22 15:00:00’],
‘type’: ‘timespan_0’}

其他單文本分析

依存文法分析：http://docs.bosonnlp.com/depparser.html

命名實體識別：http://docs.bosonnlp.com/ner.html

多文本分析功能

文本聚類：http://docs.bosonnlp.com/cluster.html

典型意見：http://docs.bosonnlp.com/comments.html

BosonNLP API 中文語義分析（筆記）【boson已經停止提供服務了，可使用百度AI 的 NLP 功能】

文章目錄

BosonNLP API 中文語義分析

查詢 API 頻率限制

情感分析

分詞與詞性標註

關鍵詞提取

語義聯想

新聞分類

新聞摘要

時間轉換

其他單文本分析

多文本分析功能

2024年DataOps趨勢預測：AI不會取代數據工程師

雲原生週刊：K8s 中的服務和網絡｜ 2024.4.29

[轉帖]cpupower

今天，昨天，近七天，近30天，近90天，js封裝

華爲云云原生FinOps解決方案，釋放雲原生最大價值

BosonNLP API 中文語義分析（筆記）【boson已經停止提供服務了，可使用百度AI 的 NLP 功能】

CentOS7 server 安裝 R 搭建 rstudio server

pandas：Timestamp 函數的 unit 和 tz 參數

rmarkdown 編譯成 pdf 文件時報錯 LaTeX Error: File `xxx.sty' not found.

CentOS server 搭建雲端 jupyter notebook

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

BosonNLP API 中文語義分析（筆記）【boson已經停止提供服務了，可使用 百度AI 的 NLP 功能】

文章目錄

BosonNLP API 中文語義分析

查詢 API 頻率限制

情感分析

分詞與詞性標註

關鍵詞提取

語義聯想

新聞分類

新聞摘要

時間轉換

其他單文本分析

多文本分析功能

BosonNLP API 中文語義分析（筆記）【boson已經停止提供服務了，可使用百度AI 的 NLP 功能】