一次不太成功的項目實戰:HOG特徵+SVM實現交通標誌的檢測

本文主要講如何通過HOG特徵和SVM分類器實現部分交通標誌的檢測。由於能力有限,本文的檢測思路很簡單,主要是用來自己練習編程用,也順便發佈出來供需要的人蔘考。本項目完整的代碼可以在我的github上下載:traffic-sign-detection。博客或代碼中遇到的任何問題,歡迎指出,希望能相互學習。廢話不多說了,下面就來一步步介紹我的檢測過程。**

數據集

數據集都是我的一個學妹幫忙採集的。在此表示感謝。本文一共選用了6種交通標誌,分別爲:


data

數據預處理

一共拍了1465張照片,由於是用手機在路上拍的,圖像像素過大且大小不一(有的是橫着拍的,有的數豎着拍的),影響檢測效率。因此,我先將所有的圖片進行了預處理,具體處理步驟爲:
(1)以圖片寬和高較小的值爲裁剪的邊長S,從原圖中裁剪出S×S的正方形中心區域;
(2)將裁剪出的區域resize爲640×640;
處理的主要函數如下:

def center_crop(img_array, crop_size=-1, resize=-1, write_path=None):
    """ crop and resize a square image from the centeral area.
    Args:
        img_array: image array
        crop_size: crop_size (default: -1, min(height, width)).
        resize: resized size (default: -1, keep cropped size)
        write_path: write path of the image (default: None, do not write to the disk).
    Return:
        img_crop: copped and resized image.
    """
    rows = img_array.shape[0]
    cols = img_array.shape[1]

    if crop_size==-1 or crop_size>max(rows,cols):
        crop_size = min(rows, cols)
    row_s = max(int((rows-crop_size)/2), 0)
    row_e = min(row_s+crop_size, rows) 
    col_s = max(int((cols-crop_size)/2), 0)
    col_e = min(col_s+crop_size, cols)

    img_crop = img_array[row_s:row_e,col_s:col_e,]

    if resize>0:
        img_crop = cv2.resize(img_crop, (resize, resize))

    if write_path is not None:
        cv2.imwrite(write_path, img_crop)
    return img_crop 
def crop_img_dir(img_dir,  save_dir, crop_method = "center", rename_pre=-1):
    """ crop and save square images from original images saved in img_dir.
    Args:
        img_dir: image directory.
        save_dir: save directory.
        crop_method: crop method (default: "center").
        rename_pre: prename of all images (default: -1, use primary image name).
    Return: none
    """
    img_names = os.listdir(img_dir)
    img_names = [img_name for img_name in img_names if img_name.split(".")[-1]=="jpg"]
    index = 0
    for img_name in img_names:
        img = cv2.imread(os.path.join(img_dir, img_name))

        rename = img_name if rename_pre==-1 else rename_pre+str(index)+".jpg"
        img_out_path = os.path.join(save_dir, rename)

        if crop_method == "center":
            img_crop = center_crop(img, resize=640, write_path=img_out_path)

        if index%100 == 0:
            print "total images number = ", len(img_names), "current image number = ", index
        index += 1

數據標註

標註信息採用和PASCAL VOC數據集一樣的方式,對於正樣本,直接使用labelImg工具進行標註,這裏給出我用的一個版本的鏈接:https://pan.baidu.com/s/1Q0cqJI9Dnvxkj7159Be4Sw。對於負樣本,可以使用python中的xml模塊自己寫xml標註文件,主要函數如下:

from xml.dom.minidom import Document
import os
import cv2

def write_img_to_xml(imgfile, xmlfile):
    """
    write xml file.
    Args:
        imgfile: image file.
        xmlfile: output xml file.
    """
    img = cv2.imread(imgfile)
    img_folder, img_name = os.path.split(imgfile)
    img_height, img_width, img_depth = img.shape
    doc = Document()

    annotation = doc.createElement("annotation")
    doc.appendChild(annotation)

    folder = doc.createElement("folder")
    folder.appendChild(doc.createTextNode(img_folder))
    annotation.appendChild(folder)

    filename = doc.createElement("filename")
    filename.appendChild(doc.createTextNode(img_name))
    annotation.appendChild(filename)

    size = doc.createElement("size")
    annotation.appendChild(size)

    width = doc.createElement("width")
    width.appendChild(doc.createTextNode(str(img_width)))
    size.appendChild(width)

    height = doc.createElement("height")
    height.appendChild(doc.createTextNode(str(img_height)))
    size.appendChild(height)

    depth = doc.createElement("depth")
    depth.appendChild(doc.createTextNode(str(img_depth)))
    size.appendChild(depth)

    with open(xmlfile, "w") as f:
        doc.writexml(f, indent="\t", addindent="\t", newl="\n", encoding="utf-8")
def write_imgs_to_xmls(imgdir, xmldir):
    img_names = os.listdir(imgdir)
    for img_name in img_names:
        img_file = os.path.join(imgdir,img_name)
        xml_file = os.path.join(xmldir, img_name.split(".")[0]+".xml")
        print img_name, "has been written to xml file in ", xml_file 
        write_img_to_xml(img_file, xml_file)

數據集劃分

這裏我們將1465張圖片按照7:2:1的比例隨機劃分爲訓練集、測試集和驗證集。爲了方便運行,我們先建立一個名爲images的文件夾,下面有JPEGImages和Annotations分別存放了所有的圖片和對應的標註文件。同樣,最後附上劃分數據集的主要函數:

import os
import shutil
import random

def _copy_file(src_file, dst_file):
    """copy file.
    """
    if not os.path.isfile(src_file):
        print"%s not exist!" %(src_file)
    else:
        fpath, fname = os.path.split(dst_file)
        if not os.path.exists(fpath):
            os.makedirs(fpath)
        shutil.copyfile(src_file, dst_file)
def split_data(data_dir, train_dir, test_dir, valid_dir, ratio=[0.7, 0.2, 0.1], shuffle=True):
    """ split data to train data, test data, valid data.
    Args:
        data_dir -- data dir to to be splitted.
        train_dir, test_dir, valid_dir -- splitted dir.
        ratio -- [train_ratio, test_ratio, valid_ratio].
        shuffle -- shuffle or not.
    """
    all_img_dir = os.path.join(data_dir, "JPEGImages/")
    all_xml_dir = os.path.join(data_dir, "Annotations/")
    train_img_dir = os.path.join(train_dir, "JPEGImages/")
    train_xml_dir = os.path.join(train_dir, "Annotations/")
    test_img_dir = os.path.join(test_dir, "JPEGImages/")
    test_xml_dir = os.path.join(test_dir, "Annotations/")
    valid_img_dir = os.path.join(valid_dir, "JPEGImages/")
    valid_xml_dir = os.path.join(valid_dir, "Annotations/")

    all_imgs_name = os.listdir(all_img_dir)
    img_num = len(all_imgs_name)
    train_num = int(1.0*img_num*ratio[0]/sum(ratio))
    test_num = int(1.0*img_num*ratio[1]/sum(ratio))
    valid_num = img_num-train_num-test_num

    if shuffle:
        random.shuffle(all_imgs_name)
    train_imgs_name = all_imgs_name[:train_num]
    test_imgs_name = all_imgs_name[train_num:train_num+test_num]
    valid_imgs_name = all_imgs_name[-valid_num:]

    for img_name in train_imgs_name:
        img_srcfile = os.path.join(all_img_dir, img_name)
        xml_srcfile = os.path.join(all_xml_dir, img_name.split(".")[0]+".xml")
        xml_name = img_name.split(".")[0] + ".xml"

        img_dstfile = os.path.join(train_img_dir, img_name)
        xml_dstfile = os.path.join(train_xml_dir, xml_name)
        _copy_file(img_srcfile, img_dstfile)
        _copy_file(xml_srcfile, xml_dstfile)

    for img_name in test_imgs_name:
        img_srcfile = os.path.join(all_img_dir, img_name)
        xml_srcfile = os.path.join(all_xml_dir, img_name.split(".")[0]+".xml")
        xml_name = img_name.split(".")[0] + ".xml"

        img_dstfile = os.path.join(test_img_dir, img_name)
        xml_dstfile = os.path.join(test_xml_dir, xml_name)
        _copy_file(img_srcfile, img_dstfile)
        _copy_file(xml_srcfile, xml_dstfile)

    for img_name in valid_imgs_name:
        img_srcfile = os.path.join(all_img_dir, img_name)
        xml_srcfile = os.path.join(all_xml_dir, img_name.split(".")[0]+".xml")
        xml_name = img_name.split(".")[0] + ".xml"

        img_dstfile = os.path.join(valid_img_dir, img_name)
        xml_dstfile = os.path.join(valid_xml_dir, xml_name)
        _copy_file(img_srcfile, img_dstfile)
        _copy_file(xml_srcfile, xml_dstfile)

代碼運行的結果是在指定的文件夾下分別創建訓練集、測試集和驗證集文件夾,並且每個文件夾下包含了JPEGImages和Annotations兩個子文件夾來存放結果。

到這裏用於目標檢測的數據集已經準備好了。下面我們介紹整個檢測模型的框架。

檢測框架

本文用的檢測思路非常直觀,總的來講分爲候選區域提取、HOG特徵提取和SVM分類。

候選區域提取

理論上可以通過設置不同的滑動窗口對整張圖像進行遍歷,但是這樣做不僅計算太大,而且窗口的大小也不好把握。考慮到我們要檢測的交通標誌都有比較規則的幾何形狀和顏色信息,我們可以通過檢測形狀(平行四邊形、橢圓)和顏色(紅色、藍色等)來實現初步的預處理以減少計算量,提高檢測效率。這裏我們以僅顏色信息爲例介紹。

由於需要檢測的6類標誌主要是紅色和藍色(或者紅藍結合),環境中的不同光照強度可能會使顏色變化較大因此給定一張圖像,先在HSV空間中通過顏色閾值分割選出藍色和紅色對應的區域得到二值化圖像。然後對二值化圖像進行凸包檢測(可通過OpenCV實現),下圖給出了一個示例:


bin_img

可以看出,經過二值化處理後,圖像中的3個標誌(其中2個標誌是我們需要檢測識別的)的輪廓信息都被保留下來了。但是存在依然存在一些問題:(1)背景噪聲較多,這會導致檢測更多的凸包,從而影響檢測速度和精度;(2)三個標誌離得很近,可能會導致只檢測出一個凸包。我之前考慮過用腐蝕膨脹來濾除一部分的噪聲,但在實驗的時候發現這會導致更多的漏檢。這是因爲在腐蝕膨脹的時候部分標誌的輪廓信息很有可能會被破壞(尤其是禁止鳴笛標誌),導致在凸包檢測的階段被遺漏。所以在最終測試的時候並沒有使用腐蝕膨脹操作。下面給出閾值化處理和凸包檢測的函數:

def preprocess_img(imgBGR, erode_dilate=True):
    """preprocess the image for contour detection.
    Args:
        imgBGR: source image.
        erode_dilate: erode and dilate or not.
    Return:
        img_bin: a binary image (blue and red).

    """
    rows, cols, _ = imgBGR.shape
    imgHSV = cv2.cvtColor(imgBGR, cv2.COLOR_BGR2HSV)

    Bmin = np.array([100, 43, 46])
    Bmax = np.array([124, 255, 255])
    img_Bbin = cv2.inRange(imgHSV,Bmin, Bmax)

    Rmin1 = np.array([0, 43, 46])
    Rmax1 = np.array([10, 255, 255])
    img_Rbin1 = cv2.inRange(imgHSV,Rmin1, Rmax1)

    Rmin2 = np.array([156, 43, 46])
    Rmax2 = np.array([180, 255, 255])
    img_Rbin2 = cv2.inRange(imgHSV,Rmin2, Rmax2)
    img_Rbin = np.maximum(img_Rbin1, img_Rbin2)
    img_bin = np.maximum(img_Bbin, img_Rbin)

    if erode_dilate is True:
        kernelErosion = np.ones((3,3), np.uint8)
        kernelDilation = np.ones((3,3), np.uint8) 
        img_bin = cv2.erode(img_bin, kernelErosion, iterations=2)
        img_bin = cv2.dilate(img_bin, kernelDilation, iterations=2)

    return img_bin
def contour_detect(img_bin, min_area=0, max_area=-1, wh_ratio=2.0):
    """detect contours in a binary image.
    Args:
        img_bin: a binary image.
        min_area: the minimum area of the contours detected.
            (default: 0)
        max_area: the maximum area of the contours detected.
            (default: -1, no maximum area limitation)
        wh_ratio: the ration between the large edge and short edge.
            (default: 2.0)
    Return:
        rects: a list of rects enclosing the contours. if no contour is detected, rects=[]
    """
    rects = []
    _, contours, _ = cv2.findContours(img_bin.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
    if len(contours) == 0:
        return rects

    max_area = img_bin.shape[0]*img_bin.shape[1] if max_area<0 else max_area
    for contour in contours:
        area = cv2.contourArea(contour)
        if area >= min_area and area <= max_area:
            x, y, w, h = cv2.boundingRect(contour)
            if 1.0*w/h < wh_ratio and 1.0*h/w < wh_ratio:
                rects.append([x,y,w,h])
    return rects

從函數中可以看出,爲了提高候選框的質量,在函數中加入了對凸包面積和外接矩形框長寬比的限制。但需要注意到,凸包的最小面積設置不能太大,否則會導致圖片中一些較小的交通標誌被漏檢。另外,長寬比的限制也不能太苛刻,因爲考慮到實際圖像中視角的不同,標誌的外接矩形框的長寬比可能會比較大。在代碼中我的最大長寬比限制爲2.5。

這樣候選區域雖然選出來了,但是還需要考慮到一件事,我們找出的候選框大小不一,而我們後面的SVM需要固定長度的特徵向量,因此在HOG特徵提取之前,應把所有的候選區域調整到固定大小(代碼中我用的是64×64),這裏提供兩種解決方案:(1)不管三七二十一,直接將候選區域resize成指定大小,這樣做很簡單,但是扭曲了原始候選區域的目標信息,不利於SVM的識別(當然,如果用卷積神經網絡,這一點問題不是太大,因爲卷積神經網絡對於物體的扭曲形變有很好的學習能力);(2)提取正方形候選區域,然後resize到指定大小。即對於一個(H×W)的候選框,假設H

HOG特徵提取

HOG特徵即梯度方向直方圖。這裏不多介紹,詳細的原理可以看我的這篇博客:梯度方向直方圖Histogram of Oriented Gradients (HOG)。在具體的實現上是利用skimage庫中的feature模塊,函數如下:

def hog_feature(img_array, resize=(64,64)):
    """extract hog feature from an image.
    Args:
        img_array: an image array.
        resize: size of the image for extracture.  
    Return:
    features:  a ndarray vector.      
    """
    img = cv2.cvtColor(img_array, cv2.COLOR_BGR2GRAY)
    img = cv2.resize(img, resize)
    bins = 9
    cell_size = (8, 8)
    cpb = (2, 2)
    norm = "L2"
    features = ft.hog(img, orientations=bins, pixels_per_cell=cell_size, 
                        cells_per_block=cpb, block_norm=norm, transform_sqrt=True)
    return features
def extra_hog_features_dir(img_dir, write_txt, resize=(64,64)):
    """extract hog features from images in a directory.
    Args:
        img_dir: image directory.
        write_txt: the path of a txt file used for saving the hog features of all images.
        resize: size of the image for extracture.  
    Return:
        none.
    """
    img_names = os.listdir(img_dir)
    img_names = [os.path.join(img_dir, img_name) for img_name in img_names]
    if os.path.exists(write_txt):
        os.remove(write_txt)

    with open(write_txt, "a") as f:
        index = 0
        for img_name in img_names:
            img_array = cv2.imread(img_name)
            features = hog_feature(img_array, resize)
            label_name = img_name.split("/")[-1].split("_")[0]
            label_num = img_label[label_name]

            row_data = img_name + "\t" + str(label_num) + "\t"

            for element in features:
                row_data = row_data + str(round(element,3)) + " "
            row_data = row_data + "\n"
            f.write(row_data)

            if index%100 == 0:
                print "total image number = ", len(img_names), "current image number = ", index
            index += 1

HOG特徵提取的一些參數設置可以在函數中看到,如圖像尺寸爲64×64,設置了9個梯度方向(bin=9)進行梯度信息統計,cell的大小爲8×8,每個block包含4個cell(cpb=(2, 2)),標準化方法採用L2標準化(norm=”L2”)。

SVM分類器

對於支持向量機的介紹,網上有一份非常不錯的教程:支持向量機通俗導論(理解SVM的三層境界),建議去看一看。我們這裏主要是用SVM來對找到的候選區域上提取到的HOG特徵做分類。這裏我將分別SVM分類器的數據集創建和擴充、模型訓練和測試。

數據集創建

這裏的數據集和剛開始我們介紹的用於目標檢測的數據集不同,我們這邊需要構建一個用於分類的數據集。因爲已經有了上面的數據,我們可以直接從我們的檢測數據中生成。這邊我採用的方法和上面介紹的候選區域提取很相似。總體的思路是從目標檢測的數據集中裁剪出目標區域作爲SVM分類的正樣本,同時裁剪出其他的區域(不包含目標的區域)作爲負樣本。具體的做法如下:

(1)對於包含目標的圖片,直接根據標籤信息裁剪出一個正方形區域(以長邊爲邊長,少數邊界情況需要變形),並移除一些不好的樣本(size很小的區域)。這裏裁剪出的正樣本或多或少包含一部分背景信息,這有利於提高模型對噪聲的魯棒性,同時也爲樣本較少的情況下數據擴充(如仿射變換)提供了可能。

(2)對於不包含任何目標的圖片,通過顏色閾值分割(紅色和藍色)和凸包檢測提取一些區域,並裁剪正方形區域(以長邊爲邊長),移除面積較小的區域。與直接隨機裁剪相比,這種做法更有針對性,因爲在檢測提取候選框的時候,很多和交通標誌顏色很像的區域會被找出來,直接把這些樣本當作負樣本對於我們的模型訓練很有幫助。

以下是我用的創建正負樣本的函數:

解析圖片標註信息

def parse_xml(xml_file):
    """parse xml_file
    Args:
        xml_file: the input xml file path
    Returns:
        image_path: string
        labels: list of [xmin, ymin, xmax, ymax, class]
    """
    tree = ET.parse(xml_file)
    root = tree.getroot()
    image_path = ''
    labels = []

    for item in root:
        if item.tag == 'filename':
            image_path = os.path.join(DATA_PATH, "JPEGImages/", item.text)
        elif item.tag == 'object':
            obj_name = item[0].text
            obj_num = classes_num[obj_name]
            xmin = int(item[4][0].text)
            ymin = int(item[4][1].text)
            xmax = int(item[4][2].text)
            ymax = int(item[4][3].text)
            labels.append([xmin, ymin, xmax, ymax, obj_num])
    return image_path, labels

正樣本和負樣本提取

def produce_pos_proposals(img_path, write_dir, labels, min_size, square=False, proposal_num=0, ):
    """produce positive proposals based on labels.
    Args:
        img_path: image path.
        write_dir: write directory.
        min_size: the minimum size of the proposals.
        labels: a list of bounding boxes.
            [[x1, y1, x2, y2, cls_num], [x1, y1, x2, y2, cls_num], ...]
        square:  crop a square or not.
    Return:
        proposal_num: proposal numbers.
    """
    img = cv2.imread(img_path)
    rows = img.shape[0]
    cols = img.shape[1]
    for label in labels:
        xmin, ymin, xmax, ymax, cls_num = np.int32(label)
        # remove the proposal with small area
        if xmax-xmin<min_size or ymax-ymin<min_size:
            continue
        # crop a square area
        if square is True:
            xcenter = int((xmin + xmax)/2)
            ycenter = int((ymin + ymax)/2)
            size = max(xmax-xmin, ymax-ymin)
            xmin = max(xcenter-size/2, 0)
            xmax = min(xcenter+size/2,cols)
            ymin = max(ycenter-size/2, 0)
            ymax = min(ycenter+size/2,rows)
            proposal = img[ymin:ymax, xmin:xmax]
            proposal = cv2.resize(proposal, (size,size))
        else:
            proposal = img[ymin:ymax, xmin:xmax]

        cls_name = classes_name[cls_num]
        proposal_num[cls_name] +=1
        write_name = cls_name + "_" + str(proposal_num[cls_name]) + ".jpg"
        cv2.imwrite(os.path.join(write_dir,write_name), proposal)
    return proposal_num
def produce_neg_proposals(img_path, write_dir, min_size, square=False, proposal_num=0):
    """produce negative proposals from a negative image.
    Args:
        img_path: image path.
        write_dir: write directory.
        min_size: the minimum size of the proposals.
        square:  crop a square or not.
        proposal_num: current negative proposal numbers.
    Return:
        proposal_num: negative proposal numbers.
    """
    img = cv2.imread(img_path)
    rows = img.shape[0]
    cols = img.shape[1]
    imgHSV = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
    imgBinBlue = cv2.inRange(imgHSV,np.array([100,43,46]), np.array([124,255,255]))
    imgBinRed1 = cv2.inRange(imgHSV,np.array([0,43,46]), np.array([10,255,255]))
    imgBinRed2 = cv2.inRange(imgHSV,np.array([156,43,46]), np.array([180,255,255]))
    imgBinRed = np.maximum(imgBinRed1, imgBinRed2)
    imgBin = np.maximum(imgBinRed, imgBinBlue)

    _, contours, _ = cv2.findContours(imgBin, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
    for contour in contours:
        x,y,w,h = cv2.boundingRect(contour)
        if w<min_size or h<min_size:
            continue

        if square is True:
            xcenter = int(x+w/2)
            ycenter = int(y+h/2)
            size = max(w,h)
            xmin = max(xcenter-size/2, 0)
            xmax = min(xcenter+size/2,cols)
            ymin = max(ycenter-size/2, 0)
            ymax = min(ycenter+size/2,rows)
            proposal = img[ymin:ymax, xmin:xmax]
            proposal = cv2.resize(proposal, (size,size))

        else:
            proposal = img[y:y+h, x:x+w]
        write_name = "background" + "_" + str(proposal_num) + ".jpg"
        proposal_num += 1
        cv2.imwrite(os.path.join(write_dir,write_name), proposal)
    return proposal_num
def produce_proposals(xml_dir, write_dir, square=False, min_size=30):
    """produce proposals (positive examples for classification) to disk.
    Args:
        xml_dir: image xml file directory.
        write_dir: write directory of all proposals.
        square: crop a square or not.
        min_size: the minimum size of the proposals.
    Returns:
        proposal_num: a dict of proposal numbers.
    """

    proposal_num = {}
    for cls_name in classes_name:
        proposal_num[cls_name] = 0

    index = 0
    for xml_file in os.listdir(xml_dir):
        img_path, labels = parse_xml(os.path.join(xml_dir,xml_file))
        img = cv2.imread(img_path)
        rows = img.shape[0]
        cols = img.shape[1]

        if len(labels) == 0:
            neg_proposal_num = produce_neg_proposals(img_path, write_dir, min_size, square, proposal_num["background"])
            proposal_num["background"] = neg_proposal_num
        else:
            proposal_num = produce_pos_proposals(img_path, write_dir, labels, min_size, square=True, proposal_num=proposal_num)

        if index%100 == 0:
            print "total xml file number = ", len(os.listdir(xml_dir)), "current xml file number = ", index
            print "proposal num = ", proposal_num
        index += 1

    return proposal_num

上面的返回值proposal_num是用來統計提取的樣本數量的。最終我在訓練集中獲取到的樣本數量如下:

proposal_num = {'right': 117, 'straight': 334, 'stop': 224, 'no hook': 168, 'crosswalk': 128, 'left': 208, 'background': 1116}

裁剪的部分正負樣本如下:


pos_neg

前面幾行對應6類正樣本,最後一行是背景,可以發現,代碼中找出來的背景主要是和我們交通標誌顏色(藍色和紅色)相似的區域。我們用相同的方法從我們的驗證集中提取正負樣本用於SVM模型參數的調整和評估。這裏就不再贅述。

訓練數據擴充

從上面各個類別樣本數量上來看,正樣本的各類標誌數量相對背景(負樣本)很少。爲了近些年數據的平衡,我們對正樣本進行了擴充。由於我們的數據中包含了向左向右等標誌,如何通過旋轉或者鏡像變換會出問題(當然可以旋轉小範圍旋轉),我也考慮過亮度變換,但是由於HOG特徵中引入了歸一化方法使得HOG特徵對光照不敏感。最終我選用的是仿射變換,這個可以通過OpenCV很方便地實現,具體的仿射變換理論和代碼示例可以參考OpenCV官方教程中的Affine Transformations ,這裏也給出我對數據集仿射變換的函數:

def affine(img, delta_pix):
    """affine transformation
    Args:
        img: a numpy image array.
        delta_pix: the offset for affine.
    Return:
        res: affined image. 
    """
    rows, cols, _ = img.shape
    pts1 = np.float32([[0,0], [rows,0], [0, cols]])
    pts2 = pts1 + delta_pix
    M = cv2.getAffineTransform(pts1, pts2)
    res = cv2.warpAffine(img, M, (rows, cols))
    return res


def affine_dir(img_dir, write_dir, max_delta_pix):
    """ affine transformation on the images in a directory.
    Args:
        img_dir: image directory.
        write_dir: save directory of affined images.
        max_delta_pix: the maximum offset for affine.
    """
    img_names = os.listdir(img_dir)
    img_names = [img_name for img_name in img_names if img_name.split(".")[-1]=="jpg"]
    for index, img_name in enumerate(img_names):
        img = cv2.imread(os.path.join(img_dir,img_name))
        save_name = os.path.join(write_dir, img_name.split(".")[0]+"f.jpg")
        delta_pix = np.float32(np.random.randint(-max_delta_pix, max_delta_pix+1, [3,2]))
        img_a = affine(img, delta_pix)
        cv2.imwrite(save_name, img_a)

上面函數輸入參數max_delta_pix用來控制隨機仿射變換的最大強度(正整數),max_delta_pix的絕對值越大,變換越明顯(太大可能導致目標信息的完全丟失),我在擴充時這個參數取爲10。需要注意的是,10只是變換的最大強度,在對每一張圖片進行變換前,會在[-max_delta, max_delta]生成一個隨機整數delta_pix(當然你也可以多取幾次不同的值來生成更多的變換圖片),這個整數控制了當前圖片變換的強度。以下是一些變換的結果示例:


affine examples

模型訓練和測試

模型的訓練我是直接調用sklearn中的svm庫,很多參數都使用了默認值,在訓練時發現,懲罰因子C的取值對訓練的影響很大,我這邊就偷個懶,大概設置了一個值。(超參數可以利用之前的驗證集去調整,這裏就不贅述了。)用到的函數如下:

def load_hog_data(hog_txt):
    """ load hog features.
    Args:
        hog_txt: a txt file used to save hog features.
            one line data is formated as "img_path \t cls_num \t hog_feature_vector"
    Return:
        img_names: a list of image names.
        labels: numpy array labels (1-dim).
        hog_feature: numpy array hog features.
            formated as [[hog1], [hog2], ...]
    """
    img_names = []
    labels = []
    hog_features = []
    with open(hog_txt, "r") as f:
        data = f.readlines()
        for row_data in data:
            row_data = row_data.rstrip()
            img_path, label, hog_str = row_data.split("\t")
            img_name = img_path.split("/")[-1]
            hog_feature = hog_str.split(" ")
            hog_feature = [float(hog) for hog in hog_feature]
            #print "hog feature length = ", len(hog_feature)
            img_names.append(img_name)
            labels.append(int(label))
            hog_features.append(hog_feature)
    return img_names, np.array(labels), np.array(hog_features)



def svm_train(hog_features, labels, save_path="./svm_model.pkl"):
    """ SVM train
    Args:
        hog_feature: numpy array hog features.
            formated as [[hog1], [hog2], ...]
        labels: numpy array labels (1-dim).
        save_path: model save path.
    Return:
        none.
    """
    clf = SVC(C=10, tol=1e-3, probability = True)
    clf.fit(hog_features, labels)
    joblib.dump(clf, save_path)
    print "finished."

def svm_test(svm_model, hog_feature, labels):
    """SVM test
    Args:
        hog_feature: numpy array hog features.
            formated as [[hog1], [hog2], ...]
        labels: numpy array labels (1-dim).
    Return:
        accuracy: test accuracy.
    """
    clf = joblib.load(svm_model)
    accuracy = clf.score(hog_feature, labels)
    return accuracy

最後,我在3474張訓練集(正樣本擴充爲原來的2倍,負樣本沒有擴充)上訓練,在C=10的時候(其他參數默認),在驗證集上(322張)的準確率爲97.2%。也就是說有9張圖片分類錯誤,還是可以接受的。

檢測結果

回顧一下,我們現在已經可以提取候選區域提取並分類了,也就是說,已經可以對一張完整的圖片進行檢測了。這裏給出我的檢測代碼和檢測結果示例。

import os
import numpy as np 
import cv2
from skimage import feature as ft 
from sklearn.externals import joblib

cls_names = ["straight", "left", "right", "stop", "nohonk", "crosswalk", "background"]
img_label = {"straight": 0, "left": 1, "right": 2, "stop": 3, "nohonk": 4, "crosswalk": 5, "background": 6}

def preprocess_img(imgBGR, erode_dilate=True):
    """preprocess the image for contour detection.
    Args:
        imgBGR: source image.
        erode_dilate: erode and dilate or not.
    Return:
        img_bin: a binary image (blue and red).

    """
    rows, cols, _ = imgBGR.shape
    imgHSV = cv2.cvtColor(imgBGR, cv2.COLOR_BGR2HSV)

    Bmin = np.array([100, 43, 46])
    Bmax = np.array([124, 255, 255])
    img_Bbin = cv2.inRange(imgHSV,Bmin, Bmax)

    Rmin1 = np.array([0, 43, 46])
    Rmax1 = np.array([10, 255, 255])
    img_Rbin1 = cv2.inRange(imgHSV,Rmin1, Rmax1)

    Rmin2 = np.array([156, 43, 46])
    Rmax2 = np.array([180, 255, 255])
    img_Rbin2 = cv2.inRange(imgHSV,Rmin2, Rmax2)
    img_Rbin = np.maximum(img_Rbin1, img_Rbin2)
    img_bin = np.maximum(img_Bbin, img_Rbin)

    if erode_dilate is True:
        kernelErosion = np.ones((9,9), np.uint8)
        kernelDilation = np.ones((9,9), np.uint8) 
        img_bin = cv2.erode(img_bin, kernelErosion, iterations=2)
        img_bin = cv2.dilate(img_bin, kernelDilation, iterations=2)

    return img_bin


def contour_detect(img_bin, min_area=0, max_area=-1, wh_ratio=2.0):
    """detect contours in a binary image.
    Args:
        img_bin: a binary image.
        min_area: the minimum area of the contours detected.
            (default: 0)
        max_area: the maximum area of the contours detected.
            (default: -1, no maximum area limitation)
        wh_ratio: the ration between the large edge and short edge.
            (default: 2.0)
    Return:
        rects: a list of rects enclosing the contours. if no contour is detected, rects=[]
    """
    rects = []
    _, contours, _ = cv2.findContours(img_bin.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
    if len(contours) == 0:
        return rects

    max_area = img_bin.shape[0]*img_bin.shape[1] if max_area<0 else max_area
    for contour in contours:
        area = cv2.contourArea(contour)
        if area >= min_area and area <= max_area:
            x, y, w, h = cv2.boundingRect(contour)
            if 1.0*w/h < wh_ratio and 1.0*h/w < wh_ratio:
                rects.append([x,y,w,h])
    return rects


def draw_rects_on_img(img, rects):
    """ draw rects on an image.
    Args:
        img: an image where the rects are drawn on.
        rects: a list of rects.
    Return:
        img_rects: an image with rects.
    """
    img_copy = img.copy()
    for rect in rects:
        x, y, w, h = rect
        cv2.rectangle(img_copy, (x,y), (x+w,y+h), (0,255,0), 2)
    return img_copy


def hog_extra_and_svm_class(proposal, clf, resize = (64, 64)):
    """classify the region proposal.
    Args:
        proposal: region proposal (numpy array).
        clf: a SVM model.
        resize: resize the region proposal
            (default: (64, 64))
    Return:
        cls_prop: propabality of all classes.
    """
    img = cv2.cvtColor(proposal, cv2.COLOR_BGR2GRAY)
    img = cv2.resize(img, resize)
    bins = 9
    cell_size = (8, 8)
    cpb = (2, 2)
    norm = "L2"
    features = ft.hog(img, orientations=bins, pixels_per_cell=cell_size, 
                        cells_per_block=cpb, block_norm=norm, transform_sqrt=True)
    print "feature = ", features.shape
    features = np.reshape(features, (1,-1))
    cls_prop = clf.predict_proba(features)
    print("type = ", cls_prop)
    print "cls prop = ", cls_prop
    return cls_prop


if __name__ == "__main__":
    img = cv2.imread("/home/meringue/Documents/traffic_sign_detection/svm_hog_classification/sign_89.jpg")
    rows, cols, _ = img.shape
    img_bin = preprocess_img(img,False)
    cv2.imshow("bin image", img_bin)
    cv2.imwrite("bin_image.jpg", img_bin)
    min_area = img_bin.shape[0]*img.shape[1]/(25*25)
    rects = contour_detect(img_bin, min_area=min_area)
    img_rects = draw_rects_on_img(img, rects)
    cv2.imshow("image with rects", img_rects)
    cv2.imwrite("image_rects.jpg", img_rects)

    clf = joblib.load("./svm_model.pkl")

    img_bbx = img.copy()

    for rect in rects:
        xc = int(rect[0] + rect[2]/2)
        yc = int(rect[1] + rect[3]/2)

        size = max(rect[2], rect[3])
        x1 = max(0, int(xc-size/2))
        y1 = max(0, int(yc-size/2))
        x2 = min(cols, int(xc+size/2))
        y2 = min(rows, int(yc+size/2))
        proposal = img[y1:y2, x1:x2]
        cls_prop = hog_extra_and_svm_class(proposal, clf)
        cls_prop = np.round(cls_prop, 2)[0]
        cls_num = np.argmax(cls_prop)
        cls_name = cls_names[cls_num]
        prop = cls_prop[cls_num]
        if cls_name is not "background":
            cv2.rectangle(img_bbx,(rect[0],rect[1]), (rect[0]+rect[2],rect[1]+rect[3]), (0,0,255), 2)
            cv2.putText(img_bbx, cls_name+str(prop), (rect[0], rect[1]), 1, 1.5, (0,0,255),2)

    cv2.imshow("detect result", img_bbx)
    cv2.imwrite("detect_result.jpg", img_bbx)
    cv2.waitKey(0)


test result1
test result2

上圖中從左到右分別爲閾值化後的圖、候選框提取結果和最終檢測檢測結果(類別名+置信度),最終各個類別標誌的準確率和召回率(IOU的閾值設爲0.5)如下(計算的代碼在我的github裏可以找到,就不放在博客裏了。):

標誌 直行 (straight) 左轉(left) 右轉 (right) 禁止鳴笛(no-honk) 人行橫道(crosswalk) 禁止通行(stop)
準確率(precision) 41.6% 45.8% 43.5% 45.3% 75.6% 45.7%
召回率 (recall) 37.1% 39.8% 43.5% 48.3% 50.8% 57.1%

用於視頻中的實時檢測視頻示例:


video_gif

對SVM輸出的概率值依次設置0.1、0.2 …0.9的閾值,得到的平均準確率和召回率變化趨勢如下:

pre_rec

從數據上可以發現,總體的檢測結果還是很不理想的。我們通過觀察準確率和召回率的變化曲線發現,當置信度的閾值不斷變大時,平均準確率不斷上升,而召回率比較平緩(閾值大於0.7的時候略微下降)。進一步觀察檢測的圖片發現,候選區域的提取是我們檢測模型性能的瓶頸,這主要體現在以下兩點:

(1)有很多標誌所在的候選區域被漏檢(詳見Bad Cases Analysis),這直接導致最終的召回率很低。
(2)有些包含標誌的候選區域雖然被找出來了,但是其中包含了大量的噪聲,如出現相似顏色的背景時,標誌只佔候選區域的一小部分,或者多個標誌相鄰時被框在了一起,這將直接影響分類的結果,降低準確率。

而提高置信度時,大量的誤檢會被排除,而漏檢情況幾乎不受影響(候選區域的提取不受置信度閾值的影響),所以會明顯提高準確率。

Bad Cases Analysis

基於上面的檢測結果,我把所有的檢測矩形框在圖像中畫出來,並一一查看,發現誤檢和漏檢問題主要體現在一下幾個方面:

光線不均勻。由於圖片都是在不同的時刻從戶外進行採集的,測試集中的部分交通標誌存在在強光和弱光的情況,這將直接對候選區域的提取造成困難。雖然我們在顏色空間上已經選用了對光線魯棒性較好的HSV空間,但仍然無法避免光照過於惡劣的情況。不過我發現,光照對分類的影響很小,這是因爲我們使用的HOG特徵裏有標準化的操作,使得同一個候選框在不同的光照下HOG特徵保持不變。我實驗的時候考慮過適當放寬藍色和紅色的閾值範圍,但是這樣做也會產生更多的背景框,影響檢測速度。


這裏寫圖片描述

複雜或相似的背景干擾。我們的閾值化是基於顏色信息的,所以當標誌物周圍有顏色相近的背景時(如樓房、藍天等),會很大程度上對候選框的提取造成影響。如下圖中,由於左邊的兩個標誌周圍有顏色接近紅色的小區的干擾,所以在閾值化時周圍包含了大量的噪聲,對SVM的分類影響很大。可以考慮加入輕微的腐蝕膨脹來弱化噪聲的影響,但對於一些較小甚至不完全封閉的標誌,會破壞原有的結構,造成漏檢。


background

局部遮擋或缺失。這兩種情況相當於在特徵提取過程中加入了噪聲,對候選框提取的影響較小,它主要影響分類器的識別。可以在SVM訓練集中加入部分包含遮擋的標誌來提高魯棒性。下圖中左邊雖然還是檢測除了人行橫道標誌,但置信度很低。


cover

多個標誌相鄰
這種情況在實際場景中非常常見,當多個標誌連在一起的時候,凸包檢測的過程中會把幾個標誌當成一個整體,從而導致漏檢和誤檢。


neighbor

相似標誌或背景的干擾
我們在檢測的時候只選取了6類標誌,所以其他標誌都相當於是背景。但是,在檢測的時候有些標誌和我們的6類標誌很像,提高了分類的難度,從而造成誤檢。


similar

小目標檢測
當目標較小的時候,我們在HOG特徵提取之前先進行了resize操作,但resize後的圖像往往不能準確的反映小目標的HOG特徵(分辨率過低),導致提取的特徵很粗糙,不利於分類。


tiny

改進思路

整個檢測框架的瓶頸是候選區域的選取。由於我只使用了顏色信息進行候選框提取,因此存在大量的噪聲,很容易導致候選區域提取階段漏掉部分標誌。所以比較有效的一個改進思路是優化候選框的提取過程,比如加入一些形狀的檢測(平行四邊形、橢圓等),但由於形狀檢測的計算量比較大,可能會降低整體的檢測速度。當應用視頻中的實時檢測時,速度可能會跟不上。

另外,雖然這邊SVM的分類效果還可以接受,但這僅僅是7(6+1)個類別,當樣本類別很多時,需要更多的數據和更強大的分類器(如卷積神經網絡)來降低誤檢率。

最後再提下一個暴力的改進思路是直接扔掉候選區域提取的過程,改用像YOLO網絡這樣端到端的檢測框架。總之,思路很多,感興趣的朋友可以自己去嘗試。


寫在最後的話:
感謝你一直讀到這裏,希望這篇的博客對你能有點幫助,關於本篇博客中的任何問題歡迎指出,虛心接受各位大佬的教導!

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章