Python中的堆和优先队列

分别是heapq和queue.PriorityQueue这两个模块

import heapq        

from queue import PriorityQueue as PQ

PriorityQueue模块定义如下所示:

class PriorityQueue(Queue):
    '''Variant of Queue that retrieves open entries in priority order (lowest first).

    Entries are typically tuples of the form:  (priority number, data).
    '''

    def _init(self, maxsize):
        self.queue = []

    def _qsize(self):
        return len(self.queue)

    def _put(self, item):
        heappush(self.queue, item)

    def _get(self):
        return heappop(self.queue)

他含有一个属性queue,输出队列中每个元素，三个方法，分别是qsize()，代表优先级队列中元素个数，put()，用heappush()方法将元素放入队列，get()，用heappop()方法将元素从队列取出。`注意`这个库在解答leetcode相关问题时(比如top-K问题，leetcode-215,347等)不能用。

例子如下

from queue import PriorityQueue as PQ
pq = PQ()
pq.put((-2, 'c'))
pq.put((-3, 'b'))
pq.put((-4,'a'))
pq.queue # output: [(-4, 'a'), (-2, 'c'), (-3, 'b')]
#只能是最小堆，所以按照元素从小到大（如果元素是list或者tuple，那么以第一个元素的顺序）的顺序出队
pq.get() # output: (-4, 'a')
#每当堆首元素出队后，剩下的元素又会重新排成堆
pq.get() #output: [(-3, 'b'), (-2, 'c')]
pq.get() # output: (-3,'b')
pq.get() # output: (-2,'c')
#最后结果为空
pq.queue # output:[]

heappush模块代码如下，功能：将item添加到当前最小堆最后，并再恢复成最小堆（只是把元素插到堆尾再调整，之前已经是堆了，现在只有最后一个元素可能不满足堆得要求，调整也只会沿着一条路径自底向上，时间复杂度O(lgn)）

def heappush(heap, item):
    """Push item onto heap, maintaining the heap invariant."""
    heap.append(item)
    _siftdown(heap, 0, len(heap)-1)


def _siftdown(heap, startpos, pos):
newitem = heap[pos]
# Follow the path to the root, moving parents down until finding a place
# newitem fits.
while pos > startpos:
    parentpos = (pos - 1) >> 1
    parent = heap[parentpos]
    if newitem < parent:
        heap[pos] = parent
        pos = parentpos
        continue
    break
heap[pos] = newitem

heappop模块代码如下，功能：当前最小堆中首元素，再将当前堆尾元素放在堆首，之后将这个列表再回复成堆。(只是把元素插到堆首再调整，之前已经是堆了，现在只有第一个元素可能不满足堆得要求，调整也只会沿着一条路径从根节点向下，时间复杂度O(lgn))

def heappop(heap):
    """Pop the smallest item off the heap, maintaining the heap invariant."""
    lastelt = heap.pop()    # raises appropriate IndexError if heap is empty
    if heap:
        returnitem = heap[0]
        heap[0] = lastelt
        _siftup(heap, 0)
        return returnitem
    return lastelt

def _siftup(heap, pos):
    endpos = len(heap)
    startpos = pos
    newitem = heap[pos]
    # Bubble up the smaller child until hitting a leaf.
    childpos = 2*pos + 1    # leftmost child position
    while childpos < endpos:
        # Set childpos to index of smaller child.
        rightpos = childpos + 1
        if rightpos < endpos and not heap[childpos] < heap[rightpos]:
            childpos = rightpos
        # Move the smaller child up.
        heap[pos] = heap[childpos]
        pos = childpos
        childpos = 2*pos + 1
    # The leaf at pos is empty now.  Put newitem there, and bubble it up
    # to its final resting place (by sifting its parents down).
    heap[pos] = newitem
    _siftdown(heap, startpos, pos)

def _siftdown(heap, startpos, pos):
    newitem = heap[pos]
    # Follow the path to the root, moving parents down until finding a place
    # newitem fits.
    while pos > startpos:
        parentpos = (pos - 1) >> 1
        parent = heap[parentpos]
        if newitem < parent:
            heap[pos] = parent
            pos = parentpos
            continue
        break
    heap[pos] = newitem

heapfiy模块代码如下，功能：在O(len(x))时间复杂度将列表x原地转换成最小堆

def heapify(x):
    """Transform list into a heap, in-place, in O(len(x)) time."""
    n = len(x)
    # Transform bottom-up.  The largest index there's any point to looking at
    # is the largest with a child index in-range, so must have 2*i + 1 < n,
    # or i < (n-1)/2.  If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so
    # j-1 is the largest, which is n//2 - 1.  If n is odd = 2*j+1, this is
    # (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
    for i in reversed(range(n//2)):
        _siftup(x, i)

这个某块可以直接使用使用，通过import heapq，所以当在leetcode中遇到top-K或类似问题需要用到优先队列时(Python)，不需要用优先队列了，直接用堆即可，不过注意python中的heapq只是最小堆，如果需要最大堆，那么在把元素入堆的时候需要加负号。并且最后输出结果的时候也要加负号，然后在逆序输出。

下面以leetcode-215和leetcode-347题目的用优先队列和堆的解法来进一步了解两者的关系。

leetcode-215

class Solution:
    def findKthLargest(self, nums, k):
        """
        :type nums: List[int]
        :type k: int
        :rtype: int
        """
        import heapq
        #heapq中直接有这个函数
        return heapq.nlargest(k, nums)[-1]


# 用堆，时间复杂度O(N + klog(N))
class Solution:
    def findKthLargest(self, nums, k):
        """
        :type nums: List[int]
        :type k: int
        :rtype: int
        """
        import heapq
        # heapify只能创建一个最小堆,所以加负号，但根据题目意思应该用最大堆
        nums = [-num for num in nums]
        heapq.heapify(nums)
        res = float('inf')
        for _ in range(k):
            res = heapq.heappop(nums)
        return -res

leetcode-347

# top-K问题常用解题思路，用优先队列，时间复杂度O(nlgk)
# 维护一个含有k个元素的优先队列。如果遍历到的元素比队列中的最小频率元素的频率高，则取出队列中最小频率
# 的元素，将新元素入队。最终，队列中剩下的，就是前k个出现频率最高的元素。
class Solution(object):
    def topKFrequent(self, nums, k):
        """
        :type nums: List[int]
        :type k: int
        :rtype: List[int]
        """
        freq = {}
        for num in nums:
            if num not in freq:
                freq[num] = 1
            else:
                freq[num] += 1
        # 扫描数组，维护当前出现频率最高的k个元素
        # 在优先队列中，按照频率排序，所以数据对是(频率，元素)的形式
        # python的堆只有最小堆，优先队列就是用堆实现的，所以优先队列也只有最小优先队列
        from queue import PriorityQueue as PQ
        pq = PQ()
        for v, f in freq.items():
            if pq.qsize() == k:
                (fr, va) = pq.get()
                # pq只是最小优先队列，为了实现最大优先队列，所以需要取负数
                if -f > -fr:
                    pq.put((-f, v))
                else:
                    pq.put((fr, va))
            else:
                pq.put((-f, v))
        res = []
        while pq.queue:
            res.append((pq.get()[1]))
        return res
    
    
 #上面用优先队列只是为了了解这个模块中优先队列的用法，实际上用起来也不方便，可以看到上面写的很繁琐，
 不易懂，而且在leetcode上面提交还会显示ImportError: No module named queue


class Solution(object):
    def topKFrequent(self, nums, k):
        """
        :type nums: List[int]
        :type k: int
        :rtype: List[int]
        """
        # 上面的优先队列就是用heapq实现的，所以这里直接使用heapq堆
        import heapq
        freq = {}
        res = []
        for num in nums:
            if num not in freq:
                freq[num] = 1
            else:
                freq[num] += 1
        # 1.我们需要按照数字出现频率进行排序，所以val在前，key在后
        # 2.我们需要最大堆，但是heapq只实现了最小堆，所以加个负号，模拟最大堆
        max_heap = [(-val, key) for key, val in freq.items()]
        heapq.heapify(max_heap)
        for i in range(k):
            res.append(heapq.heappop(max_heap)[1])
        return res
    
    
    nums = [1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 6, 7]
    k = 4
    print(Solution().topKFrequent(nums, k))

参考

最小堆构建、插入、删除的过程图解
 排序算法总结-堆排序

Python中的堆和优先队列

分别是heapq和queue.PriorityQueue这两个模块

PriorityQueue模块定义如下所示:

例子如下

heappush模块代码如下，功能：将item添加到当前最小堆最后，并再恢复成最小堆（只是把元素插到堆尾再调整，之前已经是堆了，现在只有最后一个元素可能不满足堆得要求，调整也只会沿着一条路径自底向上，时间复杂度O(lgn)）

heapfiy模块代码如下，功能：在O(len(x))时间复杂度将列表x原地转换成最小堆

下面以leetcode-215和leetcode-347题目的用优先队列和堆的解法来进一步了解两者的关系。

leetcode-215

leetcode-347

参考

.Net 8.0 下的新RPC，IceRPC之试试的新玩法"打洞"

完美替代postman的软件

Vue mockjs mock.js

关于游戏付费的一点想法

我通过CKA和CKS啦！

安装chromadb注意事项

《最新出炉》系列入门篇-Python+Playwright自动化测试-42-强大的可视化追踪利器Trace Viewer

大数据怎么学？对大数据开发领域及岗位的详细解读，完整理解大数据开发领域技术体系

Python字典的常見操作

word中常見的問題

MATLAB原樣保存imshow或者畫出來的圖像（不留白邊）

Linux服務器沒有內存的原因

安裝mathtype常見問題

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結