top k frequent words（前K個高頻單詞）

問題

給一非空的單詞列表，返回前 k 個出現次數最多的單詞。

返回的答案應該按單詞出現頻率由高到低排序。如果不同的單詞有相同出現頻率，按字母順序排序。

示例 1：

輸入: ["i", "love", "leetcode", "i", "love", "coding"], k = 2
輸出: ["i", "love"]
解析: "i" 和 "love" 爲出現次數最多的兩個單詞，均爲2次。
    注意，按字母順序 "i" 在 "love" 之前。

示例 2：

輸入: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4
輸出: ["the", "is", "sunny", "day"]
解析: "the", "is", "sunny" 和 "day" 是出現次數最多的四個單詞，
    出現次數依次爲 4, 3, 2 和 1 次。

注意：

假定 k 總爲有效值， 1 ≤ k ≤ 集合元素數。
輸入的單詞均由小寫字母組成。

擴展練習：

嘗試以 O(n log k) 時間複雜度和 O(n) 空間複雜度解決。

思路

這道題很經典，思路也有很多。我最先想到的解法是先用字典來儲存單詞出現的個數，再對字典排序，最後拿出前K個，如：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

class Solution(object):
    def topKFrequent(self, nums, k):
        """
        :type nums: List[int]
        :type k: int
        :rtype: List[int]
        """
        data, res = {}, []
        for i in nums:
            data[i] = data[i] + 1 if i in data else 1
        import operator
        sorted_data = sorted(data.items(), key=operator.itemgetter(1), reverse=True)
        for i in range(k):
            res.append(sorted_data[i][0])
        return res


if __name__ == '__main__':
    l = ["i", "love", "leetcode", "i", "love", "coding"]
    print(Solution().topKFrequent(l, 2))

提交之後發現一種更優雅的解法：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

class Solution(object):
    def topKFrequent(self, nums, k):
        """
        :type nums: List[int]
        :type k: int
        :rtype: List[int]
        """
        import collections
        count = collections.Counter(nums)
        heap = [(-freq, word) for word, freq in count.items()]
        import heapq
        heapq.heapify(heap)
        return [heapq.heappop(heap)[1] for _ in range(k)]


if __name__ == '__main__':
    l = ["i", "love", "leetcode", "i", "love", "coding"]
    print(Solution().topKFrequent(l, 2))

這裏引入了兩個庫，下面去好好看看，分析下

collections
heapq

collections.Counter

看看源碼：

 1def __init__(*args, **kwds):
 2    '''Create a new, empty Counter object.  And if given, count elements
 3    from an input iterable.  Or, initialize the count from another mapping
 4    of elements to their counts.
 5
 6    >>> c = Counter()                           # a new, empty counter
 7    >>> c = Counter('gallahad')                 # a new counter from an iterable
 8    >>> c = Counter({'a': 4, 'b': 2})           # a new counter from a mapping
 9    >>> c = Counter(a=4, b=2)                   # a new counter from keyword args
10
11    '''
12    if not args:
13        raise TypeError("descriptor '__init__' of 'Counter' object "
14                        "needs an argument")
15    self, *args = args
16    if len(args) > 1:
17        raise TypeError('expected at most 1 arguments, got %d' % len(args))
18    super(Counter, self).__init__()
19    self.update(*args, **kwds)

主要是調用了self.update函數，再來看看self.update

 1def update(*args, **kwds):
 2    '''Like dict.update() but add counts instead of replacing them.
 3
 4    Source can be an iterable, a dictionary, or another Counter instance.
 5
 6    >>> c = Counter('which')
 7    >>> c.update('witch')           # add elements from another iterable
 8    >>> d = Counter('watch')
 9    >>> c.update(d)                 # add elements from another counter
10    >>> c['h']                      # four 'h' in which, witch, and watch
11    4
12
13    '''
14
15    if not args:
16        raise TypeError("descriptor 'update' of 'Counter' object "
17                        "needs an argument")
18    self, *args = args
19    if len(args) > 1:
20        raise TypeError('expected at most 1 arguments, got %d' % len(args))
21    iterable = args[0] if args else None
22    if iterable is not None:
23        if isinstance(iterable, Mapping):
24            if self:
25                self_get = self.get
26                for elem, count in iterable.items():
27                    self[elem] = count + self_get(elem, 0)
28            else:
29                super(Counter, self).update(iterable)  # fast path when counter is empty
30        else:
31            _count_elements(self, iterable)
32    if kwds:
33        self.update(kwds)

1def _count_elements(mapping, iterable):
2    'Tally elements from the iterable.'
3    mapping_get = mapping.get
4    for elem in iterable:
5        mapping[elem] = mapping_get(elem, 0) + 1

可以看到這裏也是使用便利方法，然後利用字典保存次數。如果counter爲空，就直接調用dict中的update。

heapq(小頂堆)

heapq模塊實現了Python中的堆排序，並提供了有關方法。讓用Python實現排序算法有了簡單快捷的方式。

源碼

 1def heapify(x):
 2    """Transform list into a heap, in-place, in O(len(x)) time."""
 3    n = len(x)
 4    # Transform bottom-up.  The largest index there's any point to looking at
 5    # is the largest with a child index in-range, so must have 2*i + 1 < n,
 6    # or i < (n-1)/2.  If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so
 7    # j-1 is the largest, which is n//2 - 1.  If n is odd = 2*j+1, this is
 8    # (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
 9    for i in reversed(range(n//2)):
10        _siftup(x, i)

最後看看大神關於本題的寫法

 1class Solution:
 2    def topKFrequent(self, words, k):
 3        """
 4        :type words: List[str]
 5        :type k: int
 6        :rtype: List[str]
 7        """
 8        cn = [(-j,i) for i,j in collections.Counter(words).items()]
 9
10        return [j[1] for j in heapq.nsmallest(k,cn)]

top k frequent words（前K個高頻單詞）

問題

思路

collections.Counter

heapq(小頂堆)

Spring Cloud 部署時如何使用 Kubernetes 作爲註冊中心和配置中心

Node 單機集羣入門實戰

MongoDB中的參數限制與閥值詳析

App爬蟲思路

神經網絡基礎及Keras入門

WebSocket爬蟲之爬取龍珠彈幕

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結