普林斯頓算法課Part 1 Week 1 Analysis of Algorithms

這一課講的是如何預測算法的性能及比較不同的算法。
這裏寫圖片描述

1. Observations

例子:3-SUM
給定N個不同的integer,取三個相加之和爲0的有多少種組合。

% more 8ints.txt
8
30 -40 -20 -10 40 0 10 5
% java ThreeSum 8ints.txt
4

存在如下幾種組合:

30 -40 10
30 -20 -10
-40 40 0
-10 0 10

1.1 3-SUM: brute-force algorithm

public class ThreeSum
{
    public static int count(int[] a)
    {
        int N = a.length;
        int count = 0;
        for (int i = 0; i < N; i++)
            for (int j = i+1; j < N; j++)
                for (int k = j+1; k < N; k++)
                    if (a[i] + a[j] + a[k] == 0)
                        count++;
        return count;
    }

    public static void main(String[] args)
    {
        int[] a = In.readInts(args[0]);
        StdOut.println(count(a));
    }
}

1.2 度量運行時間

public static void main(String[] args)
{
    int[] a = In.readInts(args[0]);
    Stopwatch stopwatch = new Stopwatch();
    StdOut.println(ThreeSum.count(a));
    double time = stopwatch.elapsedTime();
}

1.3 經驗分析:記錄不同輸入大小所耗時間

N time (seconds)
250 0.0
500 0.0
1,000 0.1
2,000 0.8
4,000 6.4
8,000 51.1
16,000 ?

運行時間與輸入大小之間的關係
這裏寫圖片描述

這裏寫圖片描述
由此得到T(N)=1.006×1010×N2.999

1.4 Doubling hypothesis:快速估計指數b的方法

N time (seconds) ratio lg ratio
250 0.0
500 0.0 4.8 2.3
1,000 0.1 6.9 2.8
2,000 0.8 7.7 2.9
4,000 6.4 8.0 3.0
8,000 51.1 8.0 3.0

T(2N)T(N)=a(2N)baNb=2b

b=lg(T(2N)T(N))

得到b之後可以代入T(N)=aNb 求得a。
但注意這種方法無法用來估計存在對數關係的計算複雜度。

2. Mathematical models

總運行時間 = sum of cost × frequency for all operations.
・Need to analyze program to determine set of operations.
・Cost depends on machine, compiler.
・Frequency depends on algorithm, input data.

2.1 例子:1-Sum

How many instructions as a function of input size N ?

int count = 0;
for (int i = 0; i < N; i++)
    if (a[i] == 0)
        count++;
operation frequency
variable declaration 2
assignment statement 2
less than compare N + 1
equal to compare N
array access N
increment N to 2 N

2.2 例子:2-Sum

How many instructions as a function of input size N ?

int count = 0;
for (int i = 0; i < N; i++)
    for (int j = i+1; j < N; j++)
        if (a[i] + a[j] == 0)
            count++;
operation frequency
variable declaration 3
assignment statement 3
less than compare N+1+(N+N1+N2+...+1)=N+1+(N+1)N2=(N+1)(N+2)2
equal to compare N1+N2+...+1=N(N1)2
array access N1+N2+...+1=N(N1)
increment N+N1+N2+...+1=N+(N+1)(N+2)2toN+N(N1)

然而上面這種計數每一個operation的方式非常麻煩,所以可以採用一些簡化操作。

2.3 Simplification 1: cost model

Cost model. Use some basic operation as a proxy for running time
比如這裏只看進行了多少次array access操作

2.4 Simplification 2: tilde notation

Estimate running time (or memory) as a function of input size N.
Ignore lower order terms.
- when N is large, terms are negligible
- when N is small, we don’t care
抹掉低階項

operation frequency tilde notation
variable declaration N + 2 ~ N
assignment statement N + 2 ~ N
less than compare ½ (N + 1) (N + 2) ~ ½ N2
equal to compare ½ N (N − 1) ~ ½ N2
array access N (N − 1) ~ N2
increment ½ N (N − 1) to N (N − 1) ~ ½ N2 to ~ N2

2.5 3-Sum

int count = 0;
for (int i = 0; i < N; i++)
    for (int j = i+1; j < N; j++)
        for (int k = j+1; k < N; k++)
            if (a[i] + a[j] + a[k] == 0)
                count++;

3. Order-of-growth classifications

logN,N,NlogN,N2,N3,2N

給定一個有序的數組,和一個key,在數組中找到這個key的index。

public static int binarySearch(int[] a, int key)
{
    int lo = 0, hi = a.length-1;
    while (lo <= hi)
    {
        int mid = lo + (hi - lo) / 2;
        if (key < a[mid]) hi = mid - 1;
        else if (key > a[mid]) lo = mid + 1;
        else return mid;
    }
    return -1;
}

Binary search uses at most 1+lgN key compares to search in
a sorted array of size N.

3.2 An N2logN algorithm for 3-SUM

前面我們寫了一個order of growth是N3 的3-Sum算法,因爲我們選擇遍歷N所有的3個的組合,並挨個判斷是否和爲0。在有了Binary Search後,一個將這個算法的order of growth降低到N2logN 的方法是:
1. 首先將輸入的數組進行排序,insertion sort的order of growth爲N2
2. 然後遍歷數組兩個的組合,即兩層循環,N2 ,每一次使用binary search查找兩個數字之和的負數,lgN 的order of growth,因此共N2lgN

4. Theory of algorithms

這裏寫圖片描述
Common mistake. Interpreting big-Oh as an approximate model

5. Memory

5.1 Basics

Bit. 0 or 1.
Byte. 8 bits.
Megabyte (MB). 1 million or 220 bytes.
Gigabyte (GB). 1 billion or 230 bytes.

常見數據類型的內存佔用
這裏寫圖片描述
Java Object的內存佔用計算
Object overhead,每個primitive type佔用的內存,Object內的array記得還要加上reference的佔用,最後加起來的佔用要進行padding變成8 bytes的倍數
這裏寫圖片描述

這裏寫圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章