python中的堆
- python中沒有獨立的堆這個數據結構,但是有一個包含堆操作函數的模塊(heapq)
函數 | 功能 |
---|---|
heappush(heap, x) | 將x壓入堆中 |
heappop(heap) | 從堆中彈出最小元素 |
heapify(heap) | 讓列表具備堆特徵 |
heapreplace(heap, x) | 彈出最小元素,並將x壓入堆中 |
nlargest(n, iter) | 返回iter中n個最大的元素 |
nsmallest(n, iter) | 返回iter中n個最小的元素 |
- 堆(heap)是一種優先隊列。優先隊列讓你能夠以任意順序添加對象,並隨時找出刪除最小的元素
from heapq import *
from random import shuffle
# a = list(range(10))
a = [0,1,2,3,4,5,6,7,8,9]
shuffle(a)
heap = []
for i in a:
heappush(heap, i)
print(a) # [8, 6, 7, 3, 5, 4, 2, 0, 1, 9]
print(heap) # [0, 1, 3, 2, 6, 7, 4, 8, 5, 9]
元素的排列順序並不像看起來那麼隨意,雖然不是嚴格排序的,但必須保證:位置 i 處的元素大於位置 i // 2 處的元素(也就是位置 i 的值小於位置 2i 和 2i+1 的值),稱爲堆特徵
- 函數heapify執行儘可能少的移位操作將列表變爲合法的堆
import heapq
from heapq import *
from random import shuffle
# print(heapq.__all__)
list1 = [0,1,2,3,4,5,6,7,8,9]
shuffle(list1)
print(list1) # [2, 6, 5, 7, 4, 1, 9, 3, 0, 8]
heapify(list1)
print(list1) # [0, 2, 1, 3, 4, 5, 9, 6, 7, 8]
topK問題解法:
- 排序切片法:
a = [-5,4,-6,9,8,10]
end_3 = sorted(a)[:3]
top_3 = sorted(a)[3:]
# 或者是sorted(a,reversted)[:3]
print(end_3)
print(top_3)
- 使用heapq模塊的nlargest和nsmallest
from heapq import *
list1 = [0,1,2,3,4,5,6,7,8,9]
print(nlargest(4, list1))
print(nsmallest(4, list1))
- 將列表轉化爲heapq的堆,然後獲取topK
import heapq
from heapq import *
from random import shuffle
# print(heapq.__all__)
list1 = [0,1,2,3,4,5,6,7,8,9]
shuffle(list1)
print(list1) # [2, 6, 5, 7, 4, 1, 9, 3, 0, 8]
heapify(list1)
print(nlargest(4, list1)) # [9, 8, 7, 6]
print(nsmallest(4, list1)) # [0, 1, 2, 3]
print(list1) # [0, 2, 1, 3, 4, 5, 9, 6, 7, 8]
- 1000w數找到topK
使用heapq的先找到 list1 前 k 個數;
然後使用 heapq.heapify(list1) 將k個數的列表轉爲堆結構(heapq小頂堆);
最後拿每個數和這個堆的堆頂元素比較,如果比這個堆頂元素大則使用 heap.heapreplace(heap,x) 替換掉堆頂的元素(反之跳過),然後在重新排列這個堆(確保堆頂爲最小值);
最後堆的這 k 個元素就是topK。
import heapq
import random
class TopkHeap(object):
def __init__(self, k):
self.k = k
self.data = []
def Push(self, elem):
if len(self.data) < self.k:
heapq.heappush(self.data, elem)
else:
topk_small = self.data[0]
if elem > topk_small:
heapq.heapreplace(self.data, elem)
def TopK(self):
return [x for x in reversed([heapq.heappop(self.data) for x in range(len(self.data))])]
if __name__ == "__main__":
list_rand = random.sample(range(1000000), 100)
th = TopkHeap(3)
for i in list_rand:
th.Push(i)
print(th.TopK()) # [986241, 985101, 975889]
print(sorted(list_rand, reverse=True)[0:3]) # [986241, 985101, 975889]