《編程之美》學習筆記——2.5尋找最大的K個數

一、問題

有很多無序的數，假定它們各不相等，從中找出最大的K個數。

問題分析：

輸入：N個數；K。

輸出：N個數中最大的K個數，這K個數並不需要是有序的，只需爲數組中最大的K個數即可。

約束：N個數各不相等。

二、解法

解法一採用排序算法類

A.數組全排序：

可以對數組進行快速排序或堆排序(O(n*lgn))，然後獲得最大的K個數(O(k))。

時間複雜度：O(n*lgn) + O(k)=O(n*lgn)(k < n)。

缺點：算法不僅對最大K個數排序，也對其他的N-K個數排序，後面的方法可以避免。

B.數組K個數排序：

可以進一步優化：由於我們只需要最大的K個數，對於其他的n-k個數的排序我們實際上是不需要進行的。因此可以修改或改用其他的排序算法，可使用選擇排序和交換排序，每次排序時只針對最大的K個數，減少對其餘n-k個數的排序計算量。

時間複雜度：O(n*k)。當 k < lgn 時，該算法要優於快速排序。

缺點：算法對最大的K個數排序，後面的方法可以避免。

C.數組排序分組：（利用快速排序算法思想）

繼續進行優化：如果我們能夠做到對最大的K個數不進行排序，剩下n-k個數也不進行排序，這時候排序算法應該是最優的，實際上我們只找出了區分K和N-K個數的數組臨界點，並然K個數和n-K個數分別位於數組臨界點的兩端。可以考慮對快速排序算法進行改進以解決我們的問題。

時間複雜度：算法平均時間複雜度O(n*lgk)。

優化：

1.算法可以對遞歸進行封裝，從而考慮k < 0時返回-1，k >= N時返回數組最後一個索引。

2.partition部分算法採用數組最後一個值作爲pivot，將數組分成兩部分，若隨機選取樞紐，可達到線性期望時間O(n)。

缺點：

最後一種從快速排序演化來的數組排序分組來獲得最大的K個數算法是最高效的。以上三種方法均有一個缺點：數組必須一次性存儲全部的數據到內存，當數據規模較大時由於硬件本身內存不夠約束此種方法無法使用。

算法步驟：（摘自《編程之美》）

在本問題中，假設 N 個數存儲在數組 S 中，我們從數組 S 中隨機找出一個元素 X，把數組分爲兩部分 Sa 和 Sb。

Sa 中的元素大於等於 X，Sb 中元素小於 X。這時，有兩種可能性：

1. Sa中元素的個數小於K，Sa中所有的數和Sb中最大的K-|Sa|個元素（|Sa|指Sa中元素的個數）就是數組S中最大的K個數。

2. Sa中元素的個數大於或等於K，則需要返回Sa中最大的K個元素。

算法C實現：

1.排序採用從大到小排序以從數組得到最大的K個數，採用快速排序的partition算法根據這種排序方式實現（另一種是從小到大排序）。

3.算法關鍵在於找到遞歸的終止條件，這點和原始的快速排序是不同的。

/**
 * @file find_large_numbers.c
 * @brief find largest k numbers in a given array.
 * @author chenxilinsidney
 * @version 1.0
 * @date 2015-02-04
 */

#include <stdlib.h>
#include <stdio.h>
// #define NDEBUG
#include <assert.h>

// #define NDBG_PRINT
#include "debug_print.h"

typedef int TYPE;

#define MAX_COUNT      10000000
TYPE array[MAX_COUNT] = {0};

/**
 * @brief select the last element as a pivoit.
 * Reorder the array so that all elements with values less than the pivot
 * come before the pivot, while all elements with values less than the pivot
 * come after it (equal values can go either way). After this partitioning, the
 * pivot is in its final position. 
 *
 * @param[in,out]  array          input and output array
 * @param[in]      index_begin    the begin index of the array(included)
 * @param[in]      index_end      the end index of the array(included)
 *
 * @return the position of the pivot(index from the array)
 */
TYPE partition(TYPE* array, TYPE index_begin, TYPE index_end)
{
    /// pick last element of the array as the pivot
    TYPE pivot = array[index_end];
    /// index of the elments that not greater than pivot
    TYPE i = index_begin - 1;
    TYPE j, temp;
    /// check array's elment one by one
    for (j = index_begin; j < index_end; j++) {
        if (array[j] >= pivot) {
            /// save the elements not less than pivot to left index of i.
            i++;
            temp = array[j];
            array[j] = array[i];
            array[i] = temp;
        }
    }
    /// set the pivot to the right position
    array[index_end] = array[++i];
    array[i] = pivot;
    /// return the position of the pivot
    return i;
}

/**
 * @brief find the last index of the input array for largets N numbers.
 * the numbers that index is before the last index is the largest N numbers.
 *
 * @param[in,out]  array          input and output array
 * @param[in]      index_begin    the begin index of the array(included)
 * @param[in]      index_end      the end index of the array(included)
 * @param[in]      count          the count of the largest numbers.
 */
TYPE find_quick_sort(TYPE* array, TYPE index_begin, TYPE index_end, TYPE count)
{
    assert(count > 0);
    if (index_begin < index_end) {
        /// get pivot by partition
        TYPE index_pivot = partition(array, index_begin, index_end);
        TYPE first_array_count = index_pivot - index_begin + 1;
        DEBUG_PRINT_VALUE("%d", count);
        DEBUG_PRINT_VALUE("%d", index_begin);
        DEBUG_PRINT_VALUE("%d", index_end);
        DEBUG_PRINT_VALUE("%d", index_pivot);
        if (first_array_count < count)
            /// find the other numbers in second part of the array
            return find_quick_sort(array, index_pivot + 1, index_end,
                    count - first_array_count);
        else if (first_array_count > count)
            /// still find N numbers in first part of the array
            return find_quick_sort(array, index_begin, index_pivot - 1,
                    count);
        else
            /// just find N numbers
            return index_pivot;
    } else {
        return index_begin;
    }
}

int main(void) {
    /// read data to array
    TYPE count = 0;
    while(count < MAX_COUNT && scanf("%u\n", array + count) == 1) {
        ++count;
    }
    /// find largest N numbers
    TYPE N = 15;
    TYPE i, last_index;
    if ((last_index = find_quick_sort(array, 0, count - 1, N)) >= 0) {
        printf("get the largest %d numbers:\n", N);
        DEBUG_PRINT_VALUE("%d", last_index);
        for (i = 0; i <= last_index; i++) {
            printf("%d ", array[i]);
        }
    } else {
        printf("can not get the largest N numbers.\n");
    }
    return EXIT_SUCCESS;
}

D. 大小爲K的小根堆：（堆排序算法思想）

維護一個大小爲K的小根堆，此時堆頂元素是這K個元素中最小的一個元素，即第K個元素。遍歷一次所有數據後得到的這個堆的K個元素即這個序列中最大的K個數。

時間複雜度：O(n*lgk)，與前一中方法一致，算法效率不變。但是空間複雜度爲O(k)，因此不必使序列的所有的元素都存放到內存中，只需遍歷一次序列即可，可分段遍歷載入內存，克服了上一種方法的缺陷。

算法步驟：

1.利用遍歷序列時數組前K個數建立一個大小爲K的小根堆；

2.遍歷序列時數組的剩餘的每一個元素與小根堆堆頂做比較：如果該元素比堆頂元素小，則忽略該元素並繼續遍歷下一個元素；如果該元素比堆頂元素大，則用該元素替換堆頂元素，並調整堆使之滿足堆的性質；

3.遍歷結束後得到的小根堆的K個元素即問題的解。

算法C實現：

/**
 * @file find_large_numbers_heap.c
 * @brief find largest k numbers in a given array.
 * @author chenxilinsidney
 * @version 1.0
 * @date 2015-02-04
 */

#include <stdlib.h>
#include <stdio.h>
// #define NDEBUG
#include <assert.h>

#include "heap.h"
// #define NDBG_PRINT
#include "debug_print.h"

typedef int TYPE;

#define MAX_COUNT      10000000
TYPE array[MAX_COUNT] = {0};

/**
 * @brief find the last index of the input array for largets N numbers.
 * the numbers that index in the array is before the last index is
 * the largest N numbers.
 *
 * @param[in,out]  array          input and output array
 * @param[in]      array_length   array length
 * @param[in]      count          the count of the largest numbers.
 */
void find_heap_sort(TYPE* array, TYPE array_length, TYPE count)
{
    assert(array != NULL && array_length > 0 && count > 0);
    /// array need not to adjust if count larger than or equal to array length
    if (count >= array_length)
        return;
    /// build heap with first count of the array
    Heap heap;
    BuildHeap(&heap, count, array - 1, count);
    /// adjust heap with other elements
    TYPE i, element;
    for (i = count; i < array_length; i++) {
        HeapGet(&heap, &element);
        if (array[i] > element) {
            heap.data[1] = array[i];
            Heapify(heap.data, 1, count);
        }
    }
}

int main(void) {
    /// read data to array
    TYPE count = 0;
    while(count < MAX_COUNT && scanf("%u\n", array + count) == 1) {
        ++count;
    }
    /// find largest N numbers
    TYPE N = 15;
    TYPE i;
    find_heap_sort(array, count, N);
    printf("get the largest %d numbers:\n", N);
    for (i = 0; i < N; i++) {
        printf("%d ", array[i]);
    }
    return EXIT_SUCCESS;
}

《編程之美》學習筆記——2.5尋找最大的K個數

EXCEL中下拉菜單中添加新選項或者刪除選項

號稱能打敗MLP的KAN到底行不行？數學核心原理全面解析

同事使用 insert into select 遷移數據，開開心心上線，上線後被公司開除！

Git使用經驗總結5-修改提交信息

Python 爬蟲：Spring Boot 反爬蟲的成功案例

京東科技數字化營銷能力的演進與最佳實踐| 京東雲技術團隊

Git使用經驗總結4-撤回上一次本地提交

Java中止線程的方式

壓榨數據庫的真實處理速度

國內SaaS遇冷？未來企業服務賽道是否還有機會？

《編程之美》學習筆記——2.21只考加法的面試題

《編程之美》學習筆記——2.5尋找最大的K個數

OVa Online Judge 學習筆記- AOAPC I Volume 1. Elementary Problem Solving String

OVa Online Judge 學習筆記

《編程之美》學習筆記——2.12快速尋找滿足條件的兩個數

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結