【python】13_英文詞頻統計&前 K 個高頻元素

1.英文詞頻統計

作爲字典(key-value)的經典應用題目,單詞統計幾乎出現在每一種語言鍵值對學習後的必練題目。
主要需求:
寫一個函數wordcount統計一篇文章的每個單詞出現的次數(詞頻統計)。統計完成後,對該統計按單詞頻次進行排序。

from collections import  Counter  #計數排序
text = """
    Enterprise architects will appreciate new capabilities such as lightweight application isolation.
    Application developers will welcome an updated development environment and application-profiling tools. Read more at the Red Hat Developer Blog.
    System administrators will appreciate new management tools and expanded file-system options with improved performance and scalability.

    Deployed on physical hardware, virtual machines, or in the cloud, Red Hat Enterprise Linux 7 delivers the advanced features required for next-generation architectures.
    Where to go from here:

    Red Hat Enterprise Linux 7 Product Page

    The landing page for Red Hat Enterprise Linux 7 information. Learn how to plan, deploy, maintain, and troubleshoot your Red Hat Enterprise Linux 7 system.
    Red Hat Customer Portal

    Your central access point to finding articles, videos, and other Red Hat content, as well as manage your Red Hat support cases.

    Documentation

    Provides documentation related to Red Hat Enterprise Linux and other Red Hat offerings.
    Red Hat Subscription Management

    Web-based administration interface to efficiently manage systems.
    Red Hat Enterprise Linux Product Page

    Provides an entry point to Red Hat Enterprise Linux product offerings.
"""

# # 1. 先拿出字符串裏面的所有單詞;
words = text.split()    

# 2. 統計每個單詞出現的次數
#       1). 如何存儲統計好的信息: 字典存儲
#       2). 如何處理?
word_count_dict = {}
for word in words:
    if word not in word_count_dict:
        word_count_dict[word] = 1
    else:
        word_count_dict[word]  += 1
print(word_count_dict)
# 3. 排序,獲取出現次數最多的單詞
counter = Counter(word_count_dict)
print(counter.most_common(7))

在這裏插入圖片描述
但是在上一篇博文的介紹中,我們知道setdefault( key , value)
方法可以保證使用一個鍵之前總會將它初始化位一個初始值,同時如果這個鍵已經存在,調用setdefault沒有任何影響。因此,將上述代碼可以用這個方法優化。此外,由於print是將信息打印在一行,所以我們可以導入pprint模塊。

import pprint
from collections import Counter
words = text.split()

word_dict = {}  #key:word value:count

for item in set(words):
    word_dict.setdefault(item,words.count(item))
    
pprint.pprint(word_dict)

count = Counter(word_dict)
pprint.pprint(count.most_common(5))

在這裏插入圖片描述

2.前 K 個高頻元素: topKFrequent.py

給定一個非空的整數數組,返回其中出現頻率前 k 高的元素。

例如,

給定數組 [1,1,1,2,2,3] , 和 k = 2,返回 [1,2]。

nums = [1,1,1,2,2,3,3,2,1]
k = int(input('>>'))
results = []

count = Counter(nums)
for item in count.most_common(k):
    results.append(item[0])

print(results)

在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章