nlp分詞之TextBlob

TextBlob

TextBlob是用於處理文本數據的Python(2和3)庫。它提供了一個一致的API,可用於深入研究普通自然語言處理(NLP)任務,例如詞性標記,名詞短語提取,情感分析等。
主要用於英文的分詞,不適用於中文

安裝TextBlob
可以在PyCharm開發工具中Python Console窗口用pip install textblob

詞性標註

from textblob import TextBlob
wiki = TextBlob("Python is a high-level, general-purpose programming language.")
print(wiki.tags)

輸出:
[('Python', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('high-level', 'JJ'), 
('general-purpose', 'JJ'), ('programming', 'NN'), ('language', 'NN')]

名詞短語提取

from textblob import TextBlob
wiki = TextBlob("Python is a high-level, general-purpose programming language.")
print(wiki.tags)

輸出:
['python']

情感分析
polarity:情感積極消極在[-1,1]之間,越接近-1越消極,越接近1越積極
subjectivity:主觀客觀在[0,1]之間,越接近1越主觀

from textblob import TextBlob
testimonial = TextBlob("Textblob is amazingly simple to use. What great fun!")
print(testimonial.sentiment)

輸出:
Sentiment(polarity=0.39166666666666666, subjectivity=0.4357142857142857)

符號化
分詞,分句

text = "Beautiful is better than ugly. "\
    " Explicit is better than implicit. "\
    "Simple is better than complex. "\
# 利用textblob實現分句
blob = TextBlob(text)
sentences = blob.sentences
print("1分句:",sentences)

words_list = [] #聲明一個list存儲所有的分詞結果
for sentence in sentences:
    words_list.append(sentence.words)
    print(sentence.words)
print("2 分詞:",words_list)

1分句: [Sentence("Beautiful is better than ugly."), Sentence("Explicit is better than implicit."), Sentence("Simple is better than complex.")]
['Beautiful', 'is', 'better', 'than', 'ugly']
['Explicit', 'is', 'better', 'than', 'implicit']
['Simple', 'is', 'better', 'than', 'complex']
2 分詞: [WordList(['Beautiful', 'is', 'better', 'than', 'ugly']), WordList(['Explicit', 'is', 'better', 'than', 'implicit']), WordList(['Simple', 'is', 'better', 'than', 'complex'])]

拼寫校正

b=TextBlob("I havv goood speling!")
print(b.correct())

輸出:
I have good spelling!

獲取單詞和名詞短語頻率
兩種方法獲取單詞中名詞或名詞短語的出現頻率
1、通過word_counts字典
2、通過count()

monty = TextBlob("We are no longer the Knights who say Ni. We are now the Knights who say Ekki ekki ekki PTANG.")
print("ekki:",monty.word_counts['ekki'])
print("ekki:",monty.words.count('ekki'))

#是否區分大小寫
print("ekki_sensitive:",monty.words.count('ekki',case_sensitive=True))

輸出:
ekki: 3
ekki: 3
ekki_sensitive: 2

機器翻譯
這個需要翻牆,演示不了了

en_text = "I think KFC is not good. "
en_blob = textblob.TextBlob(en_text)
zh_text = en_blob.translate(from_lang='en',to='zh-CN')

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章