堆的概念
堆實際上是一棵完全二叉樹,其任何一非葉節點滿足性質:
Key[i]<=key[2i+1]&&Key[i]<=key[2i+2]或者Key[i]>=Key[2i+1]&&key>=key[2i+2]
即任何一非葉節點的關鍵字不大於或者不小於其左右孩子節點的關鍵字。
堆分爲大頂堆和小頂堆,滿足Key[i]>=Key[2i+1]&&key>=key[2i+2]稱爲大頂堆,滿足 Key[i]<=key[2i+1]&&Key[i]<=key[2i+2]稱爲小頂堆。由上述性質可知大頂堆的堆頂的關鍵字肯定是所有關鍵字中最大的,小頂堆的堆頂的關鍵字是所有關鍵字中最小的。
其中,大根堆和小根堆在海量數據的top N問題中,有着很好的時間複雜度。
首先,先給出一個交換兩個變量數值的函數。
void Swap(uint32_t* array, uint32_t i, uint32_t j)
{
assert(array);
uint32_t tmp = 0;
tmp = array[j];
array[j] = array[i];
array[i] = tmp;
}
頭文件包含
#include <stdlib.h>
#include <stdint.h>
#include <assert.h>
#include <string.h>
#include <stdio.h>
大根堆實現
/*大根堆調整*/
void MaxHeapify(uint32_t* array, uint32_t heapSize, uint32_t currentNode)
{
uint32_t leftChild = 0, rightChild = 0, largest = 0;
leftChild = 2*currentNode + 1;
rightChild = 2*currentNode + 2;
if(leftChild < heapSize && array[leftChild] > array[currentNode])
largest = leftChild;
else
largest = currentNode;
if(rightChild < heapSize && array[rightChild] > array[largest])
largest = rightChild;
if(largest != currentNode)
{
Swap(array, largest, currentNode);
MaxHeapify(array, heapSize, largest);
}
}
/*構建大根堆*/
void MaxHeapCreat(uint32_t* array, uint32_t heapSize)
{
int i = 0;
for(i = heapSize/2; i >= 0; i--)
{
MaxHeapify(array, heapSize, i);
}
}
小根堆實現
/*小根堆調整*/
void MinHeapify(uint32_t* array, uint32_t heapSize, uint32_t currentNode)
{
uint32_t leftChild = 0, rightChild = 0, minimum = 0;
leftChild = 2*currentNode + 1;
rightChild = 2*currentNode + 2;
if(leftChild < heapSize && array[leftChild] < array[currentNode])
minimum = leftChild;
else
minimum = currentNode;
if(rightChild < heapSize && array[rightChild] < array[minimum])
minimum = rightChild;
if(minimum != currentNode)
{
Swap(array, minimum, currentNode);
MinHeapify(array, heapSize, minimum);
}
}
/*構建小根堆*/
void MinHeapCreat(uint32_t* array, uint32_t heapSize)
{
int i = 0;
for(i = heapSize/2; i >= 0; i--)
{
MinHeapify(array, heapSize, i);
}
}
top N問題
利用小根堆解決獲取大量數據中最大的N個值,先構建一個擁有N個元素的小根堆。然後,將其餘的元素插入到小根堆即可。插入方法如下:
/*maintain the top N numbers*/
void MinInsert(uint32_t* array, uint32_t heapSize, uint32_t elem)
{
if(elem > array[0])
{
array[0] = elem;
MinHeapify(array, heapSize, 0);
}
}
利用大根堆解決獲取大量數據中最小的N個值,先構建一個擁有N個元素的大根堆。然後,將其餘的元素插入到大根堆即可。插入方法如下:
/*maintain the low N numbers*/
void MaxInsert(uint32_t* array, uint32_t heapSize, uint32_t elem)
{
if(elem < array[0])
{
array[0] = elem;
MaxHeapify(array, heapSize, 0);
}
}
時間複雜度分析
堆調整一次的時間複雜度是O(logN)。所以,通過堆來解決top N 問題的時間複雜度是O(nlogN).
其中,n爲數據的個數,N爲堆維護的數據的個數。
測試程序
int main()
{
int i = 0, heapSize = 10;
uint32_t array[] = {2,20,13,18,15,8,3,5,4,25};
uint32_t minelem = 10, maxelem = 1;
/*build min heap and test insert*/
MinHeapCreat(array, heapSize);
printf("Output the MinHeap:\n");
for(i = 0; i < heapSize; i++)
{
printf("%d\t", array[i]);
}
MinInsert(array, heapSize, minelem);
printf("\nOutput insert elem %d:\n",minelem);
for(i = 0; i < heapSize; i++)
{
printf("%d\t", array[i]);
}
printf("\n");
/*build max heap and test insert*/
MaxHeapCreat(array, heapSize);
printf("Output the MaxHeap:\n");
for(i = 0; i < heapSize; i++)
{
printf("%d\t", array[i]);
}
MaxInsert(array, heapSize,maxelem);
printf("\nOutput insert elem %d:\n",maxelem);
for(i = 0; i < heapSize; i++)
{
printf("%d\t", array[i]);
}
printf("\n");
}