Computing with Language:Simple Statistics

Frequency Distributions

//定義變量
fdist1 = FreqDist(text1)
//輸出
fdist1
//重複最多的50個
fdist1.most_common(50)
//whale重複次數
fdist1['whale']
//累積頻率圖
fdist1.plot(50,cumulative=True)
//單頻詞
fdist1.hapaxes()

//定義V，V是一個鏈表，而不是一個集合
V = set(text1)
//在V中長度大於15的詞
long_words = [w for w in V if len(w) > 15]
//排序
sorted(long_words)

Python這裏很類似於數學的表達方式，和正在用的java相比，更偏數學語言。

//詞長>7，且詞頻>7的詞（與文本內容相關的高頻詞）
fdist5 = FreqDist(text5)
sorted(w for w in set(text5) if len(w) > 7 and fdist5[w] > 7)

Collocations and Bigrams

雙聯詞

bigrams(['more','is','said','than','done'])

直接執行上述代碼會報錯

Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
bigrams(['more','is','said','than','done'])
NameError: name 'nltk' is not defined

需要import nltk

from nltk import *

之後執行，並未顯示出來，而是以下語句，需要加上list函數執行。

list(bigrams(['more','is','said','than','done']))

collocation函數爲我們找到一個text中的雙聯詞

text4.collocations()

Counting other things

//詞長的頻率
fdist = FreqDist([len(w) for w in text1])
fdist.keys()
//freqdist後的結果
fdist.items()
fdist.max()
fdist[3]
fdist.freq(3)

NLTK頻率分佈類中定義的函數

例子	描述
fdist = FreqDist(samples)	創建包含給定樣本的頻率分佈
fdist.inc(sample)	增加樣本
fdist['monstrous']	計數給定樣本出現的次數
fdist.freq('monstrous')	給定樣本的頻率
fdist.N()	樣本總數
fdist.keys()	以頻率遞減順序排序的樣本鏈表
for sample in fdist :	以頻率遞減的順序遍歷樣本
fidst.max()	數值最大的樣本
fdist.tabulate()	繪製頻率分佈表
fdist.plot()	繪製頻率分佈圖
fdist.plot(cumulative=True)	繪製累積頻率分佈圖
fdist1 < fdist2	測試樣本在fdist1中出現的頻率是否小於fdist2

Computing with Language:Simple Statistics

使用anaconda搭建TensorFlow環境

近觀Python: Texts as list of words

簡單試水nltk

Computing with Language:Simple Statistics

anaconda navigator閃退問題解決

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結