題目

搜索引擎會通過日誌文件把用戶每次檢索使用的所有檢索串都記錄下來，每個查詢串的長度爲1-255字節。
假設目前有一千萬個記錄（這些查詢串的重複度比較高，雖然總數是1千萬，但如果除去重複後，不超過3百萬個。一個查詢串的重複度越高，說明查詢它的用戶越多，也就是越熱門。），請你統計最熱門的10個查詢串，要求使用的內存不能超過1G。

思路

第一步：用Hashmap（STL中叫unordered_map）統計詞頻

第二步：用容量爲K的最小堆取出出現次數最大的K個詞
（參考 http://blog.csdn.net/fuyufjh/article/details/48369801）

代碼

#include <iostream>
#include <vector>
#include <queue>
#include <functional>
#include <algorithm>
#include <cstdlib>
#include <ctime>
#include <map>
#include <string>
#include <unordered_map>
using namespace std;

typedef pair<string, int> Record;

struct RecordComparer {
    bool operator() (const Record &r1, const Record &r2) {
        return r1.second > r2.second;
    }
};

vector<Record> TopKNumbers(vector<string> &input, int k) {
    unordered_map<string, int> stat;
    for (const string &s : input) stat[s]++;
    priority_queue<Record, vector<Record>, RecordComparer> heap;
    auto iter = stat.begin();
    for (int i = 0; i < k && iter != stat.end(); i++, iter++) {
        heap.push(*iter);
    }
    for (; iter != stat.end(); iter++) {
        if (iter->second > heap.top().second) {
            heap.pop();
            heap.push(*iter);
        }
    }
    vector<Record> result;
    while (!heap.empty()) {
        result.push_back(heap.top());
        heap.pop();
    }
    return result;
}

/********  測試代碼  *********/
int main() {
    clock_t cbegin, cend;
    vector<string> test;
    char buf[20];
    for (int i = 0; i < 100; i++) {
        int x = rand() % 20;
        sprintf(buf, "STR%d", x);
        test.push_back(string(buf));
    }
    auto result = TopKNumbers(test, 5);
    for (auto it = result.rbegin(); it != result.rend(); it++) {
        cout << it->first << '\t' << it->second << endl;
    }
    printf("============================\n");
    sort(test.begin(), test.end());
    for (const string &s : test) {
        cout << s << endl;
    }
}

Ref

http://blog.csdn.net/v_JULY_v/article/details/6403777

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

取出現次數最多的K個數

題目

思路

代碼

Ref

數組中出現次數最多的K個數

刪除鏈表中重複的結點 Delete duplicated nodes in linked list

取出現次數最多的K個數

逆序對問題的求解 Solution of Inverse-Pairs Problem

leetcode - 字符串轉換成數字(String to Integer)atoi

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結