幾種二分查找算法的代碼和比較

4種不同的二分查找代碼,都是正確的,但可能運行的結果都不同。至於原因,直接看代碼和註釋吧。

#include "stdafx.h"
#include <iostream>
#include <conio.h>
using namespace std;

// 第1種二分查找代碼
// 來源:公司引擎中的代碼
// 說明:對最大下標值+1,即nHigh++;判斷循環退出的條件爲(nLow < nHigh)
int BinSearch1(int *arry, int nDest, int nLow, int nHigh)
{
    nHigh++;//首先對最大下標值+1
    int nMid;
    do 
    {
        nMid = (nLow + nHigh) / 2;//nMid不會到達nHigh,所以不用擔心非法訪問
        if (nDest == arry[nMid])
            return nMid;
        else if (nDest < arry[nMid])
            nHigh = nMid;
        else
            nLow = nMid+1;
    } while (nLow < nHigh);//注意這裏的退出條件
    return -1;
}

// 第2種二分查找代碼
// 來源:《編程珠璣》
// 說明:判斷循環退出的條件爲(nLow <= nHigh)
int BinSearch2(int *arry, int nDest, int nLow, int nHigh)
{
    int nMid;
    while (nLow <= nHigh) 
    {
        nMid = (nLow + nHigh) / 2;
        if (nDest == arry[nMid])
            return nMid;
        else if (nDest < arry[nMid])
            nHigh = nMid-1;
        else
            nLow = nMid+1;
    } 
    return -1;
}

// 第3種二分查找代碼
// 來源:《代碼之美》第4章(原來是java代碼,被我改成C++代碼了)
// 說明:判斷循環退出的條件爲(nHigh - nLow > 1) 
int BinSearch3(int *arry, int nDest, int nLow, int nHigh)
{
    nLow--;//nLow 首先被置爲-1
    nHigh++;//nHigh 首先被置爲最大下標+1
    int nMid;
    while (nHigh - nLow > 1) 
    {
        nMid = nLow + (nHigh - nLow)/2;//防止大數溢出
        if (arry[nMid] > nDest)
            nHigh = nMid;
        else
            nLow = nMid;
    }
    if (-1 == nLow || arry[nLow] != nDest) 
        return -1;
    else
        return nLow;
}
// 備註:
// 對於這種寫法,在數組中有重複元素的時候,最後得到的位置是數組中滿足要求的最大下標值
// 我們可以改寫該代碼,使得函數返回數組中滿足要求的最小下標值
// 請看下面的代碼實現

// 第4種二分查找代碼
// 來源:在上面的代碼的基礎做的修改,使得函數返回數組中滿足要求的最小下標值
int BinSearch4(int *arry, int nDest, int nLow, int nHigh)
{
    int nSaveHigh = nHigh;
    nLow--;//nLow 首先被置爲-1
    nHigh++;//nHigh 首先被置爲最大下標+1
    int nMid;
    while (nHigh - nLow > 1) 
    {
        nMid = nLow + (nHigh - nLow)/2;//防止大數溢出
        if (arry[nMid] < nDest)
            nLow = nMid;
        else
            nHigh = nMid;
    }
    if (nSaveHigh == nHigh || arry[nHigh] != nDest) 
        return -1;
    else
        return nHigh;
}

int _tmain(int argc, _TCHAR* argv[])
{
//    int arry[] = {2,4,6,8,10};
    int arry[] = {1,1,1,2,2,2};
    int nCount = sizeof(arry)/sizeof(int);
    int (*pFun[])(int *, int , int , int ) = {
        BinSearch1, BinSearch2, BinSearch3, BinSearch4
    };

    for (int j = 0; j < sizeof(pFun)/sizeof(pFun[0]); j++)
    {
        cout<<"BinSearch"<<j+1<<":"<<endl;
        for (int i = 0; i < nCount; i++)
        {
            int nFind = pFun[j](arry, i, 0, nCount-1);
            if (nFind >= 0 && nFind <= nCount-1)
                cout<<"find "<<i<<", a["<<nFind<<"] = "<<arry[nFind]<<endl;
            else
                cout<<"find "<<i<<", not find!"<<endl;
        }
    }
    _getch();
	return 0;
}

上述代碼的運行結果如下:

BinSearch1:
find 0, not find!
find 1, a[1] = 1
find 2, a[3] = 2
find 3, not find!
find 4, not find!
find 5, not find!
BinSearch2:
find 0, not find!
find 1, a[2] = 1
find 2, a[4] = 2
find 3, not find!
find 4, not find!
find 5, not find!
BinSearch3:
find 0, not find!
find 1, a[2] = 1
find 2, a[5] = 2
find 3, not find!
find 4, not find!
find 5, not find!
BinSearch4:
find 0, not find!
find 1, a[0] = 1
find 2, a[3] = 2
find 3, not find!
find 4, not find!
find 5, not find!

可見,搜索包含了重複元素的有序序列時,不同的二分查找實現得到的結果是不一樣的!

第1種方法找到a[1]處的1和a[3]處的2。

第2種方法找到a[2]處的1和a[4]處的2。

第3種方法找到a[2]處的1和a[5]處的2。(都是在數組中下標值最大的位置)

第4種方法找到a[0]處的1和a[3]處的2。(都是在數組中下標值最小的位置)

將3,4兩種方法綜合一下,我們就能確定一個有重複值的有序序列中,某一個值的上界和下界。


關於第3種方法的補充說明(摘自《代碼之美》第4章作者的解釋)

Escaping the Loop(退出循環)
Some look at my binary-search algorithm and ask why the loop always runs to the end
without checking whether it’s found the target. In fact, this is the correct behavior; the
math is beyond the scope of this chapter, but with a little work, you should be able to get
an intuitive feeling for it—and this is the kind of intuition I’ve observed in some of the
great programmers I’ve worked with.
Let’s think about the progress of the loop. Suppose you have n elements in the array,
where n is some really large number. The chance of finding the target the first time
through is 1/n, a really small number. The next iteration (after you divide the search set in
half) is 1/(n/2)—still small—and so on. In fact, the chance of hitting the target becomes
significant only when you’re down to 10 or 20 elements, which is to say maybe the last
four times through the loop. And in the case where the search fails (which is common in
many applications), those extra tests are pure overhead.
You could do the math to figure out when the probability of hitting the target approaches
50 percent, but qualitatively, ask yourself: does it make sense to add extra complexity to
each step of an O(log2 N) algorithm when the chances are it will save only a small number
of steps at the end?
The take-away lesson is that binary search, done properly, is a two-step process. First,
write an efficient loop that positions your low and high bounds properly, then add a simple
check to see whether you hit or missed.

翻譯:

一些人看了我的二分查找算法會問爲什麼循環一定要運行到最後而不檢查是否找到了目標。事實上,這是正確的行爲。相關的數學證明已經超出了本章的範疇,

但是稍微想一想,你應該在直覺上感覺到這一點——而且我觀察到這是和我共事過的偉大程序員們所擁有的一種直覺。

讓我們看看循環的過程。假設你有一個n個元素的數組,n是一個很大的數,第一次就找到目標的機率是1/n,一個很小的值。第二次迭代(在第一次對半分割查找集之後)

能找到目標的機率是1/(n/2)——仍然是一個很小的數值——以此類推下去。事實上,只在最後剩下10到20個元素時命中目標的概率纔會變得有意義,也就是最後的4次循環中。

在這個例子中當查找失敗時(在多數程序中普遍存在),那些額外的測試是純粹的額外開銷。【注:這裏指的是爲了檢查是否命中目標而寫的三分支代碼,如第1種和第2種算法】

你可以計算一下何時命中目標的概率能達到50%,但是憑心而論:在一個O(log2N)算法的每一步中添加額外的複雜度僅僅是爲了節省最後幾步是否真的有意義呢?

這裏給我們的經驗就是,恰當的二分查找是一個“兩步走”的過程。首先,寫一個高效的循環正確地定位下界和上界,然後添加一個簡單的檢查,看是否命中了目標。


參考書籍:

1 《編程珠璣》

2 《代碼之美》英文版

3 《代碼只沒》中文版


附上《代碼之美》中的java版代碼:

package binary;

public class Finder {
  public static int find(String[] keys, String target) {
    int high = keys.length;
    int low = -1;
    while (high - low > 1) {
      int probe = (low + high) >>> 1;
      if (keys[probe].compareTo(target) > 0)
        high = probe;
      else
        low = probe;
    }
    if (low == -1 || keys[low].compareTo(target) != 0)
      return -1;
    else
    return low;
  }
}



發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章