問題
給一非空的單詞列表,返回前 k 個出現次數最多的單詞。
返回的答案應該按單詞出現頻率由高到低排序。如果不同的單詞有相同出現頻率,按字母順序排序。
示例 1:
輸入: ["i", "love", "leetcode", "i", "love", "coding"], k = 2 輸出: ["i", "love"] 解析: "i" 和 "love" 爲出現次數最多的兩個單詞,均爲2次。 注意,按字母順序 "i" 在 "love" 之前。
示例 2:
輸入: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4 輸出: ["the", "is", "sunny", "day"] 解析: "the", "is", "sunny" 和 "day" 是出現次數最多的四個單詞, 出現次數依次爲 4, 3, 2 和 1 次。
注意:
- 假定 k 總爲有效值, 1 ≤ k ≤ 集合元素數。
- 輸入的單詞均由小寫字母組成。
擴展練習:
- 嘗試以 O(n log k) 時間複雜度和 O(n) 空間複雜度解決。
思路
這道題很經典,思路也有很多。我最先想到的解法是先用字典來儲存單詞出現的個數,再對字典排序,最後拿出前K個,如:
#!/usr/bin/env python # -*- coding: utf-8 -*- class Solution(object): def topKFrequent(self, nums, k): """ :type nums: List[int] :type k: int :rtype: List[int] """ data, res = {}, [] for i in nums: data[i] = data[i] + 1 if i in data else 1 import operator sorted_data = sorted(data.items(), key=operator.itemgetter(1), reverse=True) for i in range(k): res.append(sorted_data[i][0]) return res if __name__ == '__main__': l = ["i", "love", "leetcode", "i", "love", "coding"] print(Solution().topKFrequent(l, 2))
提交之後發現一種更優雅的解法:
#!/usr/bin/env python # -*- coding: utf-8 -*- class Solution(object): def topKFrequent(self, nums, k): """ :type nums: List[int] :type k: int :rtype: List[int] """ import collections count = collections.Counter(nums) heap = [(-freq, word) for word, freq in count.items()] import heapq heapq.heapify(heap) return [heapq.heappop(heap)[1] for _ in range(k)] if __name__ == '__main__': l = ["i", "love", "leetcode", "i", "love", "coding"] print(Solution().topKFrequent(l, 2))
這裏引入了兩個庫,下面去好好看看,分析下
- collections
- heapq
collections.Counter
看看源碼:
1def __init__(*args, **kwds): 2 '''Create a new, empty Counter object. And if given, count elements 3 from an input iterable. Or, initialize the count from another mapping 4 of elements to their counts. 5 6 >>> c = Counter() # a new, empty counter 7 >>> c = Counter('gallahad') # a new counter from an iterable 8 >>> c = Counter({'a': 4, 'b': 2}) # a new counter from a mapping 9 >>> c = Counter(a=4, b=2) # a new counter from keyword args 10 11 ''' 12 if not args: 13 raise TypeError("descriptor '__init__' of 'Counter' object " 14 "needs an argument") 15 self, *args = args 16 if len(args) > 1: 17 raise TypeError('expected at most 1 arguments, got %d' % len(args)) 18 super(Counter, self).__init__() 19 self.update(*args, **kwds)
主要是調用了self.update
函數,再來看看self.update
1def update(*args, **kwds): 2 '''Like dict.update() but add counts instead of replacing them. 3 4 Source can be an iterable, a dictionary, or another Counter instance. 5 6 >>> c = Counter('which') 7 >>> c.update('witch') # add elements from another iterable 8 >>> d = Counter('watch') 9 >>> c.update(d) # add elements from another counter 10 >>> c['h'] # four 'h' in which, witch, and watch 11 4 12 13 ''' 14 15 if not args: 16 raise TypeError("descriptor 'update' of 'Counter' object " 17 "needs an argument") 18 self, *args = args 19 if len(args) > 1: 20 raise TypeError('expected at most 1 arguments, got %d' % len(args)) 21 iterable = args[0] if args else None 22 if iterable is not None: 23 if isinstance(iterable, Mapping): 24 if self: 25 self_get = self.get 26 for elem, count in iterable.items(): 27 self[elem] = count + self_get(elem, 0) 28 else: 29 super(Counter, self).update(iterable) # fast path when counter is empty 30 else: 31 _count_elements(self, iterable) 32 if kwds: 33 self.update(kwds)
1def _count_elements(mapping, iterable): 2 'Tally elements from the iterable.' 3 mapping_get = mapping.get 4 for elem in iterable: 5 mapping[elem] = mapping_get(elem, 0) + 1
可以看到這裏也是使用便利方法,然後利用字典保存次數。如果counter爲空,就直接調用dict中的update。
heapq(小頂堆)
heapq模塊實現了Python中的堆排序,並提供了有關方法。讓用Python實現排序算法有了簡單快捷的方式。
源碼
1def heapify(x): 2 """Transform list into a heap, in-place, in O(len(x)) time.""" 3 n = len(x) 4 # Transform bottom-up. The largest index there's any point to looking at 5 # is the largest with a child index in-range, so must have 2*i + 1 < n, 6 # or i < (n-1)/2. If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so 7 # j-1 is the largest, which is n//2 - 1. If n is odd = 2*j+1, this is 8 # (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1. 9 for i in reversed(range(n//2)): 10 _siftup(x, i)
最後看看大神關於本題的寫法
1class Solution: 2 def topKFrequent(self, words, k): 3 """ 4 :type words: List[str] 5 :type k: int 6 :rtype: List[str] 7 """ 8 cn = [(-j,i) for i,j in collections.Counter(words).items()] 9 10 return [j[1] for j in heapq.nsmallest(k,cn)]