最近微博上有人發起投票那篇論文是自己最受益匪淺的論文，不少人說是lowe的這篇介紹SIFT的論文。確實，在圖像特徵識別領域，SIFT的出現是具有重大意義的，SIFT特徵以其穩定的存在，較高的區分度推進了諸多領域的發展，比如識別和配準。上一篇文章，解析了SIFT特徵提取的第一步高斯金字塔的構建，並詳細分析了高斯金字塔以及差分高斯金字塔如何完成一個連續的尺度空間的構建。構建高斯金字塔不是目的，目的是如何利用高斯金字塔找到極值點。

lowe在論文中闡述了爲什麼使用差分高斯金字塔：

1）差分高斯圖像可以直接由高斯圖像相減獲得，簡單高效

2）差分高斯函數是尺度規範化的高斯拉普拉斯函數的近似，而高斯拉普拉斯函數的極大值和極小值點是一種非常穩定的特徵點（與梯度特徵、Hessian特徵和Harris角點相比）

有了這些基礎，我們就可以放開手腳從差分高斯金字塔中找點了。

特徵點的確定主要包括兩個過程：確定潛在特徵點，精確確定特徵點的位置和去除不穩定特徵點。

確定潛在特徵點

上文已經闡述，高斯拉普拉斯函數的極大值和極小值點是一種非常穩定的特徵點，因此我們從差分高斯金字塔中尋找這些潛在特徵點。差分高斯金字塔是一個三維空間（平面圖像二維，尺度一維），因此我們在三維空間中在尋找極大值點和極小值點。具體方法是比較當前特徵點的灰度值和其他26個點的灰度值的大小，這26個點包括：當前尺度下該點的8鄰域以及前一尺度和後一尺度下與該點最近的9個點（9*2+8=26），如下圖所示：

OpenCV該部分源碼：

[cpp]view
plaincopy

void SIFT::findScaleSpaceExtrema( const vector<Mat>& gauss_pyr, const vector<Mat>& dog_pyr,  

                                  vector<KeyPoint>& keypoints ) const  

{  

    ......  

    for( int o = 0; o < nOctaves; o++ )//每一個八度  

        for( int i = 1; i <= nOctaveLayers; i++ )//對八度中的存在具有第1至第nOctaveLayers層高斯差分圖像提取特徵點  

        {  

            ......  

            for( int r = SIFT_IMG_BORDER; r < rows-SIFT_IMG_BORDER; r++)//圖像二維空間.行  

            {  

                ......  

                for( int c = SIFT_IMG_BORDER; c < cols-SIFT_IMG_BORDER; c++)//圖像二維空間.列  

                {  

                    .......  

                     // 當前點與26個點比較，比較兩次，分別確定是否是極大值，是否是極小值  

                     if( std::abs(val) > threshold &&  

                       ((val > 0 && val >= currptr[c-1] && val >= currptr[c+1] &&  

                         val >= currptr[c-step-1] && val >= currptr[c-step] && val >= currptr[c-step+1] &&  

                         val >= currptr[c+step-1] && val >= currptr[c+step] && val >= currptr[c+step+1] &&  

                         val >= nextptr[c] && val >= nextptr[c-1] && val >= nextptr[c+1] &&  

                         val >= nextptr[c-step-1] && val >= nextptr[c-step] && val >= nextptr[c-step+1] &&  

                         val >= nextptr[c+step-1] && val >= nextptr[c+step] && val >= nextptr[c+step+1] &&  

                         val >= prevptr[c] && val >= prevptr[c-1] && val >= prevptr[c+1] &&  

                         val >= prevptr[c-step-1] && val >= prevptr[c-step] && val >= prevptr[c-step+1] &&  

                         val >= prevptr[c+step-1] && val >= prevptr[c+step] && val >= prevptr[c+step+1]) ||  

                        (val < 0 && val <= currptr[c-1] && val <= currptr[c+1] &&  

                         val <= currptr[c-step-1] && val <= currptr[c-step] && val <= currptr[c-step+1] &&  

                         val <= currptr[c+step-1] && val <= currptr[c+step] && val <= currptr[c+step+1] &&  

                         val <= nextptr[c] && val <= nextptr[c-1] && val <= nextptr[c+1] &&  

                         val <= nextptr[c-step-1] && val <= nextptr[c-step] && val <= nextptr[c-step+1] &&  

                         val <= nextptr[c+step-1] && val <= nextptr[c+step] && val <= nextptr[c+step+1] &&  

                         val <= prevptr[c] && val <= prevptr[c-1] && val <= prevptr[c+1] &&  

                         val <= prevptr[c-step-1] && val <= prevptr[c-step] && val <= prevptr[c-step+1] &&  

                         val <= prevptr[c+step-1] && val <= prevptr[c+step] && val <= prevptr[c+step+1])))  

                    {  

                           ......  

                    }  

                }  

            }  

        }  

}

尺度空間中的極值點已經確定出來了，下面有兩個問題需要解決：

（1）這些點是最終我們確定的SIFT特徵點集的超集，該超集裏包含許多“間諜”-----不穩定的特徵點，因此必須去掉這些不穩定的特徵點。這些不穩定的特徵點主要包含兩類：低對比度的點（對噪聲敏感）和邊緣點。

（2）這一步驟中極值點的座標還是離散的整數值，如何精確確定特徵點的位置。

由於在計算上（2）問題的解決可以捎帶解決（1）中低對比度點的問題，因此我們先討論問題（2）。本部分的OpenCV源碼位於sift.cpp文件的adjustLocalExtrema函數中，本文最後會貼出此部分源碼，下面首先分析如何解決以上兩個問題。

精確確定特徵點的位置：

由於圖像是一個離散的空間，特徵點的位置的座標都是整數，但是極值點的座標並不一定就是整數，如下圖所示。

因此，如何從離散空間中估計出極值點的精確位置是重要的。爲了精確確定極值點座標，Brown和Lowe使用了三元二次函數，通過迭代確定極值點的位置，具有良好的效果。

主要是根據泰勒公式，泰勒公式作用：用值已知的點A估計點A附近的某點B的值。

求上式極值，對其求導，導數等於0，得到

去除不穩定特徵點

去除對比度低的點

以上求出了極值點的精確的位置，將求出的 x 帶入原式，得：

我們就利用這個函數去除對比度低的點，lowe文中，當D（x）<=0.03時，去除這個特徵點。

去除邊緣點

差分高斯金字塔中的極值點會有許多邊緣點，邊緣點對一些噪聲不穩定，因此需要去除這些邊緣相應點。

差分高斯金字塔中會有一些不是很好的極值點，這些點的特徵是：在跨越邊緣的方向有較大的主曲率，在與邊緣相切的方向主曲率較小。在本步驟中，需要去除這些不好的邊緣相應。主曲率可以通過2階Hessian方陣獲得：

D函數中某點的主曲率和該點的H矩陣的特徵值是成比例的，因此我們可以通過H矩陣的特徵值來確定某點在差分高斯金字塔中的主曲率。

設矩陣H的特徵值分別爲α（較大）和β（較小），有如下公式：

通過以上兩式，α和β就可以計算出來了，但是，不急！

如上文所述，那些不好的邊緣點：跨越邊緣的方向有較大的主曲率，與邊緣相切的方向主曲率較小。因此，我們通過α/β的比率函數並確定閾值來體現表徵那些不好的邊緣點，α/β越大，說明這個點就越糟糕，就越應該被刪掉，但是這樣就要真真切切計算α和β的值，前面讓大家不急了，是的，先不用着急計算，設定r=α/β(即 α=rβ)，使用如下公式：

以上函數是關於r的增函數（已經假設α是特徵值中較大的一個），r 越大，以上函數值就越大，反之，以上函數值越大，r 就是越大的，因此我們可以通過已知的Tr（H）和Det（H）“曲線地”去判斷 r的大小！所以在本步驟中，去除不好的邊緣點的閾值是：

lowe論文中設定r=10。

到這裏，在差分高斯金字塔中提取的特徵點就完成了提純的步驟。

下面是OpenCV源碼中特徵點精確位置的確定過程以及特徵點提純過程，主要實現函數爲sift.cpp中adjustLocalExtrema函數：

[cpp]view
plaincopy

// Interpolates a scale-space extremum's location and scale to subpixel  

// accuracy to form an image feature. Rejects features with low contrast.  

// Based on Section 4 of Lowe's paper.  

static bool adjustLocalExtrema( const vector<Mat>& dog_pyr, KeyPoint& kpt, int octv,  

                                int& layer, int& r, int& c, int nOctaveLayers,  

                                float contrastThreshold, float edgeThreshold, float sigma )  

{  

    const float img_scale = 1.f/(255*SIFT_FIXPT_SCALE);  

    const float deriv_scale = img_scale*0.5f;  

    const float second_deriv_scale = img_scale;  

    const float cross_deriv_scale = img_scale*0.25f;  

    float xi=0, xr=0, xc=0, contr=0;  

    int i = 0;  

    // 如上文所述，迭代計算特徵點的精確位置  

    for( ; i < SIFT_MAX_INTERP_STEPS; i++ )  

    {  

        int idx = octv*(nOctaveLayers+2) + layer;  

        const Mat& img = dog_pyr[idx];  

        const Mat& prev = dog_pyr[idx-1];  

        const Mat& next = dog_pyr[idx+1];  

        Vec3f dD((img.at<sift_wt>(r, c+1) - img.at<sift_wt>(r, c-1))*deriv_scale,  

                 (img.at<sift_wt>(r+1, c) - img.at<sift_wt>(r-1, c))*deriv_scale,  

                 (next.at<sift_wt>(r, c) - prev.at<sift_wt>(r, c))*deriv_scale);  

        float v2 = (float)img.at<sift_wt>(r, c)*2;  

        float dxx = (img.at<sift_wt>(r, c+1) + img.at<sift_wt>(r, c-1) - v2)*second_deriv_scale;  

        float dyy = (img.at<sift_wt>(r+1, c) + img.at<sift_wt>(r-1, c) - v2)*second_deriv_scale;  

        float dss = (next.at<sift_wt>(r, c) + prev.at<sift_wt>(r, c) - v2)*second_deriv_scale;  

        float dxy = (img.at<sift_wt>(r+1, c+1) - img.at<sift_wt>(r+1, c-1) -  

                     img.at<sift_wt>(r-1, c+1) + img.at<sift_wt>(r-1, c-1))*cross_deriv_scale;  

        float dxs = (next.at<sift_wt>(r, c+1) - next.at<sift_wt>(r, c-1) -  

                     prev.at<sift_wt>(r, c+1) + prev.at<sift_wt>(r, c-1))*cross_deriv_scale;  

        float dys = (next.at<sift_wt>(r+1, c) - next.at<sift_wt>(r-1, c) -  

                     prev.at<sift_wt>(r+1, c) + prev.at<sift_wt>(r-1, c))*cross_deriv_scale;  

        Matx33f H(dxx, dxy, dxs,  

                  dxy, dyy, dys,  

                  dxs, dys, dss);//通過當前像素點以及周圍像素點差值出H矩陣  

        Vec3f X = H.solve(dD, DECOMP_LU);  

        xi = -X[2];  

        xr = -X[1];  

        xc = -X[0];  

        //有任何一個維度的偏移超過0.5，會更新當前像素點  

        //如果每一個維度的偏移都沒有超過0.5,當前像素的位置加上偏移就是最終的精確點  

       if( std::abs(xi) < 0.5f && std::abs(xr) < 0.5f && std::abs(xc) < 0.5f )  

            break;  

        if( std::abs(xi) > (float)(INT_MAX/3) ||  

            std::abs(xr) > (float)(INT_MAX/3) ||  

            std::abs(xc) > (float)(INT_MAX/3) )  

            return false;  

        c += cvRound(xc);  

        r += cvRound(xr);  

        layer += cvRound(xi);  

        if( layer < 1 || layer > nOctaveLayers ||  

            c < SIFT_IMG_BORDER || c >= img.cols - SIFT_IMG_BORDER  ||  

            r < SIFT_IMG_BORDER || r >= img.rows - SIFT_IMG_BORDER )  

            return false;  

    }  

    //迭代結束  

   // ensure convergence of interpolation  

    if( i >= SIFT_MAX_INTERP_STEPS )  

        return false;  

    {  

        int idx = octv*(nOctaveLayers+2) + layer;  

        const Mat& img = dog_pyr[idx];  

        const Mat& prev = dog_pyr[idx-1];  

        const Mat& next = dog_pyr[idx+1];  

        Matx31f dD((img.at<sift_wt>(r, c+1) - img.at<sift_wt>(r, c-1))*deriv_scale,  

                   (img.at<sift_wt>(r+1, c) - img.at<sift_wt>(r-1, c))*deriv_scale,  

                   (next.at<sift_wt>(r, c) - prev.at<sift_wt>(r, c))*deriv_scale);  

        float t = dD.dot(Matx31f(xc, xr, xi));  

        contr = img.at<sift_wt>(r, c)*img_scale + t * 0.5f;  

        if( std::abs( contr ) * nOctaveLayers < contrastThreshold )//去除低對比度的點  

            return false;  

        // principal curvatures are computed using the trace and det of Hessian  

        float v2 = img.at<sift_wt>(r, c)*2.f;  

        float dxx = (img.at<sift_wt>(r, c+1) + img.at<sift_wt>(r, c-1) - v2)*second_deriv_scale;  

        float dyy = (img.at<sift_wt>(r+1, c) + img.at<sift_wt>(r-1, c) - v2)*second_deriv_scale;  

        float dxy = (img.at<sift_wt>(r+1, c+1) - img.at<sift_wt>(r+1, c-1) -  

                     img.at<sift_wt>(r-1, c+1) + img.at<sift_wt>(r-1, c-1)) * cross_deriv_scale;  

        float tr = dxx + dyy;  

        float det = dxx * dyy - dxy * dxy;  

        if( det <= 0 || tr*tr*edgeThreshold >= (edgeThreshold + 1)*(edgeThreshold + 1)*det )//去除邊緣噪聲點  

            return false;  

    }  

    kpt.pt.x = (c + xc) * (1 << octv);  

    kpt.pt.y = (r + xr) * (1 << octv);  

    kpt.octave = octv + (layer << 8) + (cvRound((xi + 0.5)*255) << 16);  

    kpt.size = sigma*powf(2.f, (layer + xi) / nOctaveLayers)*(1 << octv)*2;  

    kpt.response = std::abs(contr);  

    return true;  

}