Programming Perl--Column1

problem:

以計算機的角度來分析problem

input:一個至多包含n=10,000,000個正整數的file,所有的integer都必須小於n,integer不允許重複出現

output:升序排序的integer list

約束條件:內存最大爲1M,磁盤空間可認爲無限大,運行時間不可到分鐘級,要在seconds範圍


solution:

解決方案很簡單,就是使用一個bitmap 或者說是bit vector來表示integer,如果數字i出現在file中,則對bitmap中的第i個bit爲1。這樣子就標識了所有出現的數字。

這裏有一個關鍵的約束條件:所有數字不會重複出現


僞碼:

/* phase 1: initialize set to empty */

for i = [0,n)

bit[i] = 0

/* phase 2: insert present elements into the set */

for each i in the input file

bit[i] = 1

/* phase  3: write sorted output */

for i = [0,n)

if bit[i] == 1

write i on the output file


課後題目

1、如果沒有memory限制時,代碼如何寫

#include <iostream>
#include <set>

int main (int argc, char *argv[]) {
    std::set<int> integerSet;

    int i;

    std::set<int>::iterator iter;
    while (std::cin >> i) {
        integerSet.insert(i);
    }

    for (iter = integerSet.begin(); iter != integerSet.end(); ++iter) {
        std::cout << *iter << " ";
    }
    std::cout << std::endl;

    return 0;
}

爲何選擇set,而不是list數據結構呢?這還得看一下哪個結構適合本題目,或者說代價更小,這就涉及到了list和set的實現本質問題

Set

Sets are a kind of associative containers that stores unique elements, and in which the elements themselves are thekeys.

Associative containers are containers especially designed to be efficient accessing its elements by their key (unlike sequence containers, which are more efficient accessing elements by their relative or absolute position).

Internally, the elements in a set are always sorted from lower to higher following a specific strict weak ordering criterion set on container construction.

Sets are typically implemented as binary search trees.

Therefore, the main characteristics of set as an associative container are:

  • Unique element values: no two elements in the set can compare equal to each other. For a similar associative container allowing for multiple equivalent elements, seemultiset.
  • The element value is the key itself. For a similar associative container where elements are accessed using a key, but map to a value different than this key, seemap.
  • Elements follow a strict weak ordering at all times. Unordered associative arrays, likeunordered_set, are available in implementations following TR1.

This container class supports bidirectional iterators.

爲了節省時間就直接copy C++Library Reference了,可以看到set不允許有重複的key,並且是有序集合,採用二分查找樹來search。

再來看一下list:

List

Lists are a kind of sequence containers. As such, their elements are ordered following a linear sequence.

List containers are implemented as doubly-linked lists; Doubly linked lists can store each of the elements they contain in different and unrelated storage locations. The ordering is kept by the association to each element of a link to the element preceding it and a link to the element following it.

This provides the following advantages to list containers:

  • Efficient insertion and removal of elements anywhere in the container (constant time).
  • Efficient moving elements and block of elements within the container or even between different containers (constant time).
  • Iterating over the elements in forward or reverse order (linear time).

Compared to other base standard sequence containers (vectors anddeques), lists perform generally better in inserting, extracting and moving elements in any position within the container, and therefore also in algorithms that make intensive use of these, like sorting algorithms.

The main drawback of lists compared to these other sequence containers is that they lack direct access to the elements by their position; For example, to access the sixth element in alist one has to iterate from a known position (like the beginning or the end) to that position, which takes linear time in the distance between these. They also consume some extra memory to keep the linking information associated to each element (which may be an important factor for large lists of small-sized elements).

Storage is handled automatically by the class, allowing lists to be expanded and contracted as needed.

list採用雙向鏈表的方式實現的,這對於頻繁進行插入刪除操作比較有利,但對於本問題而言就有些聲東擊西了,但是同時也可以看到list也是有序的

Vector

Vectors are a kind of sequence containers. As such, their elements are ordered following a strict linear sequence.

Vector containers are implemented as dynamic arrays; Just as regular arrays, vector containers have their elements stored in contiguous storage locations, which means that their elements can be accessed not only using iterators but also using offsets on regular pointers to elements.

But unlike regular arrays, storage in vectors is handled automatically, allowing it to be expanded and contracted as needed.

Vectors are good at:

  • Accessing individual elements by their position index (constant time).
  • Iterating over the elements in any order (linear time).
  • Add and remove elements from its end (constant amortized time).

Compared to arrays, they provide almost the same performance for these tasks, plus they have the ability to be easily resized. Although, they usually consume more memory than arrays when their capacity is handled automatically (this is in order to accomodate for extra storage space for future growth).

vector的優點或者說突出點在於動態內存空間分配。

2、使用位操作符實現bitset操作

首先考慮到要使用int數組來完成上述bitset的構建,另外考慮到不同的計算機可能int的位數不同,考慮到移植性問題決定用int32_t

第二創建一個int32_t的數組,需要多大的數組,應該用n/32, 注意計算機是取上限的,所以需要 +1

第三set操作,首先需要定位到數組index,下標從0開始,則直接i/32即可,然後設置i位則需要原來的數組data取或操作,和誰|呢?需要與i%32 進行按位或操作

然後就構建bitset的代碼,並進行測試:

#include <iostream>

#define MAX_LENGTH 10000000
#define INT_LENGTH 32
#define SHIFT 5
#define MASK 0X1F

int32_t integerArray[1 + MAX_LENGTH >> SHIFT];

void set(int32_t i){
    integerArray[i >> SHIFT] |= (1 << (i & MASK));
}

void clear(int32_t i) {
    integerArray[i >> SHIFT] &= ~(1 << (i & MASK));
}

int test(int32_t i) {
    return integerArray[i >> SHIFT] & (1 << (i & MASK));
}

int main (int argc, char *argv[]) {
    for (int32_t i = 0; i < 10000000; i++) {
        clear(i);
    }
    int32_t i;
    while (std::cin >> i) {
        set(i);
        if (test(i)) {
            std::cout << i << " is set" << std::endl;
        }
    }

    return 0;
}

今天就寫到這裏好了,等有機會再繼續。。。


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章