【ONNX】使用yolov3.onnx模型進行目標識別的實驗

yolov3原理分析

關於模型原理分析,網上已有很多博客,不再贅述。下面是兩個我認爲寫的比較好的。
yolo系列之yolo v3【深度解析】
yolov3實驗總結

yolov3.onnx模型來源和介紹

來源

darknet—>caffe—>onnx
1.darknet轉caffe參考
2.caffe轉onnx用的是我前面寫的caffe2onnx工具。

介紹

模型輸入

本模型輸入爲416x416的圖像,輸入名爲input。

模型輸入爲:
name: "input"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      }
      dim {
        dim_value: 3
      }
      dim {
        dim_value: 416
      }
      dim {
        dim_value: 416
      }
    }
  }
}

模型輸出

本模型輸出爲三個feature map,維度分別是255x13x13,255x26x26,255x52x52,其中255=3 x (80 + 5),80個類的概率加tx,ty,tw,th,tot_x,t_y,t_w,t_h,t_o(置信度)。

模型輸出爲:
name: "layer82-conv_Y"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      }
      dim {
        dim_value: 255
      }
      dim {
        dim_value: 13
      }
      dim {
        dim_value: 13
      }
    }
  }
}

name: "layer94-conv_Y"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      }
      dim {
        dim_value: 255
      }
      dim {
        dim_value: 26
      }
      dim {
        dim_value: 26
      }
    }
  }
}

name: "layer106-conv_Y"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      }
      dim {
        dim_value: 255
      }
      dim {
        dim_value: 52
      }
      dim {
        dim_value: 52
      }
    }
  }
}

一共有3個輸出

節點類型種類

各類型節點數爲:
LeakyRelu:72個
BatchNormalization:72個
Conv:75個
Upsample:2個
Concat:4個
Add:23個

依賴庫

  • onnxruntime
  • numpy
  • cv2

思路

主體流程如下圖:
在這裏插入圖片描述

其中獲取bounding boxes的過程如下圖:
在這裏插入圖片描述

代碼

準備工作

導入庫並設置好標籤和anchors,由於只使用了numpy,因此自己實現一個sigmoid函數。

import onnxruntime
import numpy as np
import cv2
label = ["background", "person",
        "bicycle", "car", "motorbike", "aeroplane",
        "bus", "train", "truck", "boat", "traffic light",
        "fire hydrant", "stop sign", "parking meter", "bench",
        "bird", "cat", "dog", "horse", "sheep", "cow", "elephant",
        "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag",
        "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball",
        "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
        "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon",
        "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog",
        "pizza", "donut", "cake", "chair", "sofa", "potted plant", "bed", "dining table",
        "toilet", "TV monitor", "laptop", "mouse", "remote", "keyboard", "cell phone",
        "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase",
        "scissors", "teddy bear", "hair drier", "toothbrush"]
anchors = [[(116,90),(156,198),(373,326)],[(30,61),(62,45),(59,119)],[(10,13),(16,30),(33,23)]]

def sigmoid(x):
    s = 1 / (1 + np.exp(-1*x))
    return s

處理圖像

def process_image(img_path):
    img = cv2.imread(img_path)
    img = cv2.resize(img, (416, 416))
    image =  img[:,:,::-1].transpose((2,0,1))
    image = image[np.newaxis,:,:,:]/255
    image = np.array(image,dtype=np.float32)
    #返回原圖像和處理後的數組
    return img,image

獲取概率最大的概率值和索引

def getMaxClassScore(class_scores):
    class_score = 0
    class_index = 0
    for i in range(len(class_scores)):
        if class_scores[i] > class_score:
            class_index = i+1
            class_score = class_scores[i]
    return class_score,class_index

獲取bbox+第一次篩選(目標置信度閾值)

對feature map的每一個grid cell獲取三個對應anchors的bbox(bx,by,bw,bh,bclass_scores,bclass_indexb_x,b_y,b_w,b_h,b_{class\_scores},b_{class\_index}),並根據目標置信度閾值進行篩選。

def getBBox(feat,anchors,image_shape,confidence_threshold):
    box = []
    for i in range(len(anchors)):
        for cx in range(feat.shape[0]):
            for cy in range(feat.shape[1]):
                tx = feat[cx][cy][0 + 85 * i]
                ty = feat[cx][cy][1 + 85 * i]
                tw = feat[cx][cy][2 + 85 * i]
                th = feat[cx][cy][3 + 85 * i]
                cf = feat[cx][cy][4 + 85 * i]
                cp = feat[cx][cy][5 + 85 * i:85 + 85 * i]

                bx = (sigmoid(tx) + cx)/feat.shape[0]
                by = (sigmoid(ty) + cy)/feat.shape[1]
                bw = anchors[i][0]*np.exp(tw)/image_shape[0]
                bh = anchors[i][1]*np.exp(th)/image_shape[1]

                b_confidence = sigmoid(cf)
                b_class_prob = sigmoid(cp)
                b_scores = b_confidence*b_class_prob
                b_class_score,b_class_index = getMaxClassScore(b_scores)

                if b_class_score > confidence_threshold:
                    box.append([bx,by,bw,bh,b_class_score,b_class_index])
    return box

第二次篩選(NMS非極大值抑制)

NMS原理和實現參考

def donms(boxes,nms_threshold):
    b_x = boxes[:, 0]
    b_y = boxes[:, 1]
    b_w = boxes[:, 2]
    b_h = boxes[:, 3]
    scores = boxes[:,4]
    areas = (b_w+1)*(b_h+1)
    order = scores.argsort()[::-1]
    keep = []  # 保留的結果框集合
    while order.size > 0:
        i = order[0]
        keep.append(i)  # 保留該類剩餘box中得分最高的一個
        # 得到相交區域,左上及右下
        xx1 = np.maximum(b_x[i], b_x[order[1:]])
        yy1 = np.maximum(b_y[i], b_y[order[1:]])
        xx2 = np.minimum(b_x[i] + b_w[i], b_x[order[1:]] + b_w[order[1:]])
        yy2 = np.minimum(b_y[i] + b_h[i], b_y[order[1:]] + b_h[order[1:]])
        #相交面積,不重疊時面積爲0
        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 + 1)
        inter = w * h
        #相併面積,面積1+面積2-相交面積
        union = areas[i] + areas[order[1:]] - inter
        # 計算IoU:交 /(面積1+面積2-交)
        IoU = inter / union
        # 保留IoU小於閾值的box
        inds = np.where(IoU <= nms_threshold)[0]
        order = order[inds + 1]  # 因爲IoU數組的長度比order數組少一個,所以這裏要將所有下標後移一位

    final_boxes = [boxes[i] for i in keep]
    return final_boxes

繪製預測框

def drawBox(boxes,img):
    for box in boxes:
        x1 = int((box[0]-box[2]/2)*416)
        y1 = int((box[1]-box[3]/2)*416)
        x2 = int((box[0]+box[2]/2)*416)
        y2 = int((box[1]+box[3]/2)*416)
        cv2.rectangle(img,(x1,y1),(x2,y2),(0,255,0),2)
        cv2.putText(img, label[int(box[5])]+":"+str(round(box[4],3)), (x1+5,y1+10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1)
    cv2.imshow('image',img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

總流程

def getBoxes(prediction,confidence_threshold,nms_threshold):
    boxes = []
    for i in range(len(prediction)):
        feature_map = prediction[i][0].transpose((2, 1, 0))
        box = getBBox(feature_map, anchors[i], [416, 416], confidence_threshold)
        boxes.extend(box)
    Boxes = donms(np.array(boxes),nms_threshold)
    return Boxes

def main():
    img,TestData = process_image("dog416.jpg")
    session = onnxruntime.InferenceSession("yolov3.onnx")
    inname = [input.name for input in session.get_inputs()][0]
    outname = [output.name for output in session.get_outputs()]

    print("inputs name:",inname,"outputs name:",outname)
    prediction = session.run(outname, {inname:TestData})
    boxes = getBoxes(prediction,0.25,0.6)
    drawBox(boxes,img)

測試圖像結果

在這裏插入圖片描述

在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章