深度学习第四周--第三课目标检测代码

声明

本文参考何宽

前言

本文是为了用yolo算法实现汽车识别。
为了收集数据,在汽车前引擎盖上安装一个照相机,在开车的时候会每隔几秒拍摄一次前方的道路。让yolo识别80个分类,因此把它标记为80维的向量,或者把分类标签c从1到80进行标记。我们会使用预先训练好的权重来进行使用。
**yolo算法:**实时且高准确率。在预测时只需要进行一次前向传播,在使用非最大值抑制后,它与边界框一起输出识别对象。

模型细节

  • 输入的批量图片维度是(m,608,608,3)
  • 输出是一个识别分类与边界框的列表,每个边界框由6个数字组成:(px,bx,by,bh,bw,c)(p_x,b_x,b_y,b_h,b_w,c)。若将c放到80维的向量中,那么每个边界框就由85个数字组成。

使用5个锚框,算法流程:图像输入(m,608,608,3)-》deep cnn-》编码(m,608,608,85)。
在这里插入图片描述
如果对象的中心、中点在单元格内,那么该单元格就负责识别该对象。使用5个锚框,19x19的单元格,所以每个单元格内有5个锚框的编码信息,锚框的组成是pc+px+py+ph+pwp_c+p_x+p_y+p_h+p_w。把最后两个维度的数据进行展开,最后一步的编码由(m,19,19,5,85)变为了(m,19,19,425)。
对于每个单元格的每个锚框而言,将计算下列元素的乘积,并提取该框包含某一类的概率。
在这里插入图片描述
步骤:

  • 分类阈值过滤
  • 非最大值抑制(交并比、非最大值抑制)

分类阈值过滤

导入包:

import argparse
import os
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
import scipy.io
import scipy.misc
import numpy as np
import pandas as pd
import PIL
import tensorflow as tf
from keras import backend as K
from keras.layers import Input, Lambda, Conv2D
from keras.models import load_model, Model

from yad2k.models.keras_yolo import yolo_head, yolo_boxes_to_corners, preprocess_true_boxes, yolo_loss, yolo_body

import yolo_utils

%matplotlib inline

要为阈值进行过滤,去掉一些预测值低于预设值的锚框,模型共计会有19x19x5x85个数字,每个锚框由85个数字组成(80+pc+px+py+ph+pw80个分类+p_c+p_x+p_y+p_h+p_w),将维度为(19,19,5,85)或者(19,19,425)转换为下面的维度:

  • box_confidence:tensor类型,维度为(19,19,5,1),包含19x19单元格每个单元格预测的5个锚框中的所有锚框的pcp_c(一些对象的置信概率)。
  • boxes:tensor类型,维度为(19,19,5,4),包含了所有的锚框的pxpyphpw(p_x,p_y,p_h,p_w)
  • box_class_probs:tensor类型,维度为(19,19,5,80),包含了所有单元格中所有锚框的所有对象(c1,c2,c3,...,c80)(c_1,c_2,c_3,...,c_{80})检测的概率。

步骤:
1、计算对象的可能性
2、对于每个锚框,需要找到:
2.1、对分类的预测的概率拥有最大值的锚框的索引。
2.2、对应的最大值的锚框。
3、根据阈值来创建掩码。
4、使用tensorflow来对box_class_scores、boxes、box_classes进行掩码操作以过滤出我们想要的锚框。

def yolo_filter_boxes(box_confidence,boxes,box_class_probs,threshold=0.6):
    """
    通过阈值来过滤对象和分类的置信度。
    
    参数:
        box_confidence - tnesor类型,维度为(19,19,5,1),包含19x19单元格预测的5个锚框中的所有的锚框的pc(一些对象的置信概率)。
        boxes - tensor类型,维度为(19,19,5,4),包含所有的锚框的(px,py,ph,pw)。
        box_class_probs - tensor类型,维度为(19,19,5,80),包含了所有单元格中所有锚框的所有对象(c1,c2,c3,...,c80)检测的概率
        threshold - 实数,阈值,如果分类预测的概率高于它,那么这个分类预测的概率就会被保留。
        
    返回:
        scores - tensor类型,维度为(None,),包含了保留了的锚框的分类概率。
        boxes - tensor类型,维度为(None,4),包含了保留了的锚框的(b_x,b_y,b_h,b_w)
        classes - tensot类型,维度为(None,1),包含了保留了的锚框的索引。
        
    注意:“None”是因为你不知道所选框的确切数量,因为它取决于阈值。
        比如:如果有10个锚框,scores的实际输出大小将是(10,)
    """
    box_scores = box_confidence * box_class_probs
    
    box_classes = K.argmax(box_scores,axis=-1)
    box_class_scores = K.max(box_scores,axis=-1)
    
    filtering_mask = (box_class_scores >= threshold )
    
    scores = tf.boolean_mask(box_class_scores,filtering_mask)
    boxes = tf.boolean_mask(boxes,filtering_mask)
    classes = tf.boolean_mask(box_classes,filtering_mask)
    
    return scores,boxes,classes

测试:

with tf.Session() as test_a:
    box_confidence = tf.random_normal([19,19,5,1],mean=1,stddev=4,seed=1)
    boxes = tf.random_normal([19,19,5,4],mean=1,stddev=4,seed=1)
    box_class_probs = tf.random_normal([19,19,5,80],mean=1,stddev=4,seed=1)
    scores,boxes,classes = yolo_filter_boxes(box_confidence,boxes,box_class_probs,threshold=0.5)
    print(scores[2].eval(),boxes[2].eval(),classes[2].eval(),scores.shape,boxes.shape,classes.shape)
    
    test_a.close()

结果:

10.750582 [ 8.426533   3.2713668 -0.5313436 -4.9413733] 7 (?,) (?, 4) (?,)

非最大值抑制

即使通过阈值过滤了一些得分较低的分类,但是依旧会有很多的锚框被留了下来,第二个过滤器就是让下图左边变为右边,这就是非最大值抑制。
在这里插入图片描述

交并比

非最大值抑制使用了一个非常重要的功能,叫做交并比,现在来实现它,步骤:

  • 使用左上和右下角来定义方框(x1,y1,x2,y2)(x_1,y_1,x_2,y_2)而不是使用中点+宽高的方式定义。
  • 计算矩形的面积,需要用到高度(y2y1)(x2x1)(y_2 - y_1)*(x_2-x_1)
  • 找到两个锚框的交点的座标(x1i,y1i,x2i,y2i)(x_1^i,y_1^i,x_2^i,y_2^i)
    - x1ix_1^i=两个锚框的x1座标的最大值
    - y1iy_1^i=两个锚框的y1座标的最大值
    - x2ix_2^i=两个锚框的x2座标的最小值
    - y2iy_2^i=两个锚框的y2座标的最小值
  • 为了计算相交的区域,需要确定相交的区域的宽、高均为正数,否则就为0。
def iou(box1,box2):
    """
    实现两个锚框的交并比的计算
    
    参数:
        box1 - 第一个锚框,元组类型,(x1,y1,x2,y2)
        box2 - 第二个锚框,元组类型,(x1,y1,x2,y2)
    返回:
        iou - 实数,交并比
    """
    xi1 = np.maximum(box1[0],box2[0])
    yi1 = np.maximum(box1[1],box2[1])
    xi2 = np.minimum(box1[2],box2[2])
    yi2 = np.minimum(box1[3],box2[3])
    inter_area = (xi1-xi2)*(yi1-yi2)
    
    box1_area = (box1[2]-box1[0])*(box1[3]-box1[1])
    box2_area = (box2[2]-box2[0])*(box2[3]-box2[1])
    union_area = box1_area + box2_area - inter_area
    
    iou = inter_area/union_area
    
    return iou

测试:

box1 = (2,1,4,3)
box2 = (1,2,3,4)
iou = iou(box1,box2)
print(iou)

结果:

0.14285714285714285

非最大值抑制

步骤:

  • 选择分值高的锚框。
  • 计算与其他框的重叠部分,并删除与iou_threshold相比重叠的框。
  • 返回第一步,直到不再有比当前选中的框得分更低的框。
def yolo_non_max_suppression(scores,boxes,classes,max_boxes=10,iou_threshold=0.5):
    """
    为锚框实现非最大值抑制(Non-max suppression(NMS))
    
    参数:
        scores - tensor类型,维度为(None,),yolo_filter_boxes()的输出
        boxes - tensor类型,维度为(None,4),yolo_filter_boxes()的输出,已缩放到图像大小(见下文)
        classes - tensor类型,维度为(None,),yolo_filter_boxes()的输出,
        max_boxes - 整数,预测的锚框数量的最大值
        iou_threshold - 实数,交并比阈值
    返回:
        scores - tensor类型,维度为(,None),每个锚框的预测的可能值
        boxes - tensor类型,维度为(4,None),预测的锚框的座标
        classes - tensor类型,维度为(,None),每个锚框的预测的分类
        
    注意:“None”是明显小于max_boxes的,这个函数也会改变scores、boxes、classes的维度,这会为下一步操作提供方便。
    """
    max_boxes_tensor = K.variable(max_boxes,dtype="int32")
    K.get_session().run(tf.variables_initializer([max_boxes_tensor]))
    
    nms_indices = tf.image.non_max_suppression(boxes,scores,max_boxes,iou_threshold)
    
    scores = K.gather(scores,nms_indices)
    boxes = K.gather(boxes,nms_indices)
    classes = K.gather(classes,nms_indices)
    
    return scores,boxes,classes

测试:

with tf.Session() as test_b:
    scores = tf.random_normal([54,],mean=1,stddev=4,seed=1)
    boxes = tf.random_normal([54,4],mean=1,stddev=4,seed=1)
    classes = tf.random_normal([54,],mean=1,stddev=4,seed=1)
    scores,boxes,classes = yolo_non_max_suppression(scores,boxes,classes,max_boxes=10,iou_threshold=0.5)
    print(scores[2].eval(),boxes[2].eval(),classes[2].eval(),scores.eval().shape,boxes.eval().shape,classes.eval().shape)
    
    test_b.close()

结果:

6.938395 [-5.299932    3.1379814   4.450367    0.95942086] -2.2452729 (10,) (10, 4) (10,)

对所有框进行过滤

现在我们要实现一个cnn输出的函数,并使用刚刚实现的函数对所有框进行过滤,我们要实现的函数名为yolo_eval(),它采用yolo编码的输出,并使用分数阈值和非最大值抑制来过滤这些框,我们必须清楚知道有几种表示锚框的方式。

  • boxes=yolo_boxes_to_corners(box_xy,box_wh)

将yolo锚框座标(x,y,w,h)转换为角的座标(x1,y1,x2,y2)以适应yolo_filter_boxes()的输入。

  • boxes=yolo_utils.scale_boxes(boxes,image_shape)

步骤:

  • 输入图像为(608,608,3)
  • 输入的图像先要通过一个cnn模型,返回一个(19,19,5,85)的数据
  • 在对最后两维降维之后,输出的维度变为了(19,19,425):
    • 每个19x19的单元格拥有425个数字。
    • 425=5x85,即每个单元格拥有5个锚框,每个锚框由5个基本信息+80个分类预测构成。
    • 85=5+80,其中5个基本信息是(pc,px,py,ph,pw)(p_c,p_x,p_y,p_h,p_w),剩下80就是80个分类的预测。
  • 然后根据以下规则选择锚框:
    • 预测分数阈值:丢弃分数低于阈值的分类的锚框
    • 非最大值抑制:计算交并比,并避免选择重叠框
  • 最后给出yolo的最终输出。
def yolo_eval(yolo_outputs,image_shape=(720.,1280.),max_boxes=10,score_threshold=0.6,iou_threshold=0.5):
    """
    将YOLO编码的输出(很多锚框)转换为预测框以及它们的分数,框座标和类。
    
    参数:
        yolo_outputs - 编码模型的输出(对于维度为(608,608,3)的图片),包含了4个tensor类型的变量:
                        box_confidence : tensor类型,维度为(None,19,19,5,1)
                        box_xy         : tensor类型,维度为(None,19,19,5,2)
                        box_wh         : tensor类型,维度为(None,19,19,5,2)
                        box_class_probs: tensor类型,维度为(None,19,19,5,80)
        image_shape - tensor类型,维度为(2,),包含了输入的图像的维度,这里是(608,608,)
        max_boxes - 整数,预测的锚框数量的最大值
        acore_threshold - 实数,可能性阈值
        iou_threshold - 实数,交并比阈值
        
    返回:
        scores - tensor类型,维度为(,None),每个锚框的预测的可能值
        boxes - tensor类型,维度为(4,None),预测的锚框的座标
        classes - tensor类型,维度为(,None),每个锚框的预测的分类
    """
    box_confidence,box_xy,box_wh,box_class_probs = yolo_outputs
    
    boxes = yolo_boxes_to_corners(box_xy,box_wh)
    
    scores,boxes,classes = yolo_filter_boxes(box_confidence,boxes,box_class_probs,score_threshold)
    
    boxes = yolo_utils.scale_boxes(boxes,image_shape)
    
    scores,boxes,classes = yolo_non_max_suppression(scores,boxes,classes,max_boxes,iou_threshold)
    
    return scores,boxes,classes

测试:

with tf.Session() as test_c:
    yolo_outputs = (tf.random_normal([19,19,5,1],mean=1,stddev=4,seed=1),
                    tf.random_normal([19,19,5,2],mean=1,stddev=4,seed=1),
                    tf.random_normal([19,19,5,2],mean=1,stddev=4,seed=1),
                    tf.random_normal([19,19,5,80],mean=1,stddev=4,seed=1),
                   )
    scores,boxes,classes = yolo_eval(yolo_outputs)
    print(scores[2].eval(),boxes[2].eval(),classes[2].eval(),scores.eval().shape,boxes.eval().shape,classes.eval().shape)
    
    test_c.close()

结果:

138.79124 [1292.3297  -278.52167 3876.9893  -835.56494] 54 (10,) (10, 4) (10,)

测试已经训练好了的yolo模型

将使用一个预先训练好的模型并在汽车检测数据集上进行测试,首先创建一个会话来启动计算图:

sess = K.get_session()

定义分类,锚框与图像维度

收集了两个文件“coco_classes.txt”和"yolo_anchors.txt"中关于80个分类和5个锚框的信息,将这些数据加载到模型中。

class_names = yolo_utils.read_classes("model_data/coco_classes.txt")
anchors = yolo_utils.read_anchors("model_data/yolo_anchors.txt")
image_shape = (720.,1280.)

加载已经训练好了的模型

训练yolo模型需要很长时间,并且需要一个相当大的标签边界框数据集,用于大范围的目标类,将加载存储在“yolov2.h5”中的现有预训练Keras yolo模型。这会加载训练的yolo模型的权重。

yolo_model = load_model("model_data/yolov2.h5")

以下是模型包含的图层的摘要:

yolo_model.summary()

结果:

Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 608, 608, 3)  0                                            
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 608, 608, 32) 864         input_1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 608, 608, 32) 128         conv2d_1[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_1 (LeakyReLU)       (None, 608, 608, 32) 0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 304, 304, 32) 0           leaky_re_lu_1[0][0]              
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 304, 304, 64) 18432       max_pooling2d_1[0][0]            
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 304, 304, 64) 256         conv2d_2[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_2 (LeakyReLU)       (None, 304, 304, 64) 0           batch_normalization_2[0][0]      
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, 152, 152, 64) 0           leaky_re_lu_2[0][0]              
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 152, 152, 128 73728       max_pooling2d_2[0][0]            
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 152, 152, 128 512         conv2d_3[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_3 (LeakyReLU)       (None, 152, 152, 128 0           batch_normalization_3[0][0]      
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, 152, 152, 64) 8192        leaky_re_lu_3[0][0]              
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 152, 152, 64) 256         conv2d_4[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_4 (LeakyReLU)       (None, 152, 152, 64) 0           batch_normalization_4[0][0]      
__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (None, 152, 152, 128 73728       leaky_re_lu_4[0][0]              
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 152, 152, 128 512         conv2d_5[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_5 (LeakyReLU)       (None, 152, 152, 128 0           batch_normalization_5[0][0]      
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D)  (None, 76, 76, 128)  0           leaky_re_lu_5[0][0]              
__________________________________________________________________________________________________
conv2d_6 (Conv2D)               (None, 76, 76, 256)  294912      max_pooling2d_3[0][0]            
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, 76, 76, 256)  1024        conv2d_6[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_6 (LeakyReLU)       (None, 76, 76, 256)  0           batch_normalization_6[0][0]      
__________________________________________________________________________________________________
conv2d_7 (Conv2D)               (None, 76, 76, 128)  32768       leaky_re_lu_6[0][0]              
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 76, 76, 128)  512         conv2d_7[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_7 (LeakyReLU)       (None, 76, 76, 128)  0           batch_normalization_7[0][0]      
__________________________________________________________________________________________________
conv2d_8 (Conv2D)               (None, 76, 76, 256)  294912      leaky_re_lu_7[0][0]              
__________________________________________________________________________________________________
batch_normalization_8 (BatchNor (None, 76, 76, 256)  1024        conv2d_8[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_8 (LeakyReLU)       (None, 76, 76, 256)  0           batch_normalization_8[0][0]      
__________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D)  (None, 38, 38, 256)  0           leaky_re_lu_8[0][0]              
__________________________________________________________________________________________________
conv2d_9 (Conv2D)               (None, 38, 38, 512)  1179648     max_pooling2d_4[0][0]            
__________________________________________________________________________________________________
batch_normalization_9 (BatchNor (None, 38, 38, 512)  2048        conv2d_9[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_9 (LeakyReLU)       (None, 38, 38, 512)  0           batch_normalization_9[0][0]      
__________________________________________________________________________________________________
conv2d_10 (Conv2D)              (None, 38, 38, 256)  131072      leaky_re_lu_9[0][0]              
__________________________________________________________________________________________________
batch_normalization_10 (BatchNo (None, 38, 38, 256)  1024        conv2d_10[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_10 (LeakyReLU)      (None, 38, 38, 256)  0           batch_normalization_10[0][0]     
__________________________________________________________________________________________________
conv2d_11 (Conv2D)              (None, 38, 38, 512)  1179648     leaky_re_lu_10[0][0]             
__________________________________________________________________________________________________
batch_normalization_11 (BatchNo (None, 38, 38, 512)  2048        conv2d_11[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_11 (LeakyReLU)      (None, 38, 38, 512)  0           batch_normalization_11[0][0]     
__________________________________________________________________________________________________
conv2d_12 (Conv2D)              (None, 38, 38, 256)  131072      leaky_re_lu_11[0][0]             
__________________________________________________________________________________________________
batch_normalization_12 (BatchNo (None, 38, 38, 256)  1024        conv2d_12[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_12 (LeakyReLU)      (None, 38, 38, 256)  0           batch_normalization_12[0][0]     
__________________________________________________________________________________________________
conv2d_13 (Conv2D)              (None, 38, 38, 512)  1179648     leaky_re_lu_12[0][0]             
__________________________________________________________________________________________________
batch_normalization_13 (BatchNo (None, 38, 38, 512)  2048        conv2d_13[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_13 (LeakyReLU)      (None, 38, 38, 512)  0           batch_normalization_13[0][0]     
__________________________________________________________________________________________________
max_pooling2d_5 (MaxPooling2D)  (None, 19, 19, 512)  0           leaky_re_lu_13[0][0]             
__________________________________________________________________________________________________
conv2d_14 (Conv2D)              (None, 19, 19, 1024) 4718592     max_pooling2d_5[0][0]            
__________________________________________________________________________________________________
batch_normalization_14 (BatchNo (None, 19, 19, 1024) 4096        conv2d_14[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_14 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_14[0][0]     
__________________________________________________________________________________________________
conv2d_15 (Conv2D)              (None, 19, 19, 512)  524288      leaky_re_lu_14[0][0]             
__________________________________________________________________________________________________
batch_normalization_15 (BatchNo (None, 19, 19, 512)  2048        conv2d_15[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_15 (LeakyReLU)      (None, 19, 19, 512)  0           batch_normalization_15[0][0]     
__________________________________________________________________________________________________
conv2d_16 (Conv2D)              (None, 19, 19, 1024) 4718592     leaky_re_lu_15[0][0]             
__________________________________________________________________________________________________
batch_normalization_16 (BatchNo (None, 19, 19, 1024) 4096        conv2d_16[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_16 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_16[0][0]     
__________________________________________________________________________________________________
conv2d_17 (Conv2D)              (None, 19, 19, 512)  524288      leaky_re_lu_16[0][0]             
__________________________________________________________________________________________________
batch_normalization_17 (BatchNo (None, 19, 19, 512)  2048        conv2d_17[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_17 (LeakyReLU)      (None, 19, 19, 512)  0           batch_normalization_17[0][0]     
__________________________________________________________________________________________________
conv2d_18 (Conv2D)              (None, 19, 19, 1024) 4718592     leaky_re_lu_17[0][0]             
__________________________________________________________________________________________________
batch_normalization_18 (BatchNo (None, 19, 19, 1024) 4096        conv2d_18[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_18 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_18[0][0]     
__________________________________________________________________________________________________
conv2d_19 (Conv2D)              (None, 19, 19, 1024) 9437184     leaky_re_lu_18[0][0]             
__________________________________________________________________________________________________
batch_normalization_19 (BatchNo (None, 19, 19, 1024) 4096        conv2d_19[0][0]                  
__________________________________________________________________________________________________
conv2d_21 (Conv2D)              (None, 38, 38, 64)   32768       leaky_re_lu_13[0][0]             
__________________________________________________________________________________________________
leaky_re_lu_19 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_19[0][0]     
__________________________________________________________________________________________________
batch_normalization_21 (BatchNo (None, 38, 38, 64)   256         conv2d_21[0][0]                  
__________________________________________________________________________________________________
conv2d_20 (Conv2D)              (None, 19, 19, 1024) 9437184     leaky_re_lu_19[0][0]             
__________________________________________________________________________________________________
leaky_re_lu_21 (LeakyReLU)      (None, 38, 38, 64)   0           batch_normalization_21[0][0]     
__________________________________________________________________________________________________
batch_normalization_20 (BatchNo (None, 19, 19, 1024) 4096        conv2d_20[0][0]                  
__________________________________________________________________________________________________
space_to_depth_x2 (Lambda)      (None, 19, 19, 256)  0           leaky_re_lu_21[0][0]             
__________________________________________________________________________________________________
leaky_re_lu_20 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_20[0][0]     
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 19, 19, 1280) 0           space_to_depth_x2[0][0]          
                                                                 leaky_re_lu_20[0][0]             
__________________________________________________________________________________________________
conv2d_22 (Conv2D)              (None, 19, 19, 1024) 11796480    concatenate_1[0][0]              
__________________________________________________________________________________________________
batch_normalization_22 (BatchNo (None, 19, 19, 1024) 4096        conv2d_22[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_22 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_22[0][0]     
__________________________________________________________________________________________________
conv2d_23 (Conv2D)              (None, 19, 19, 425)  435625      leaky_re_lu_22[0][0]             
==================================================================================================
Total params: 50,983,561
Trainable params: 50,962,889
Non-trainable params: 20,672

将模型的输出转换为边界框

yolo_outputs = yolo_head(yolo_model.output,anchors,len(class_names))

过滤锚框

scores,boxes,classes = yolo_eval(yolo_outputs,image_shape=(720.,1280.),max_boxes=10,score_threshold=0.6,iou_threshold=0.5)

在实际图像中运行计算图

def predict(sess,image_file,is_show_info=True,is_plot=True):
    """
    运行存储在sess的计算图以预测image_file的边界框,打印出预测的图与信息
    
    参数:
        sess - 包含了yolo计算图的tensorflow/keras的会话
        image_file - 存储在image文件夹下的图片名称
    返回:
        out_scores - tensor类型,维度为(None,),锚框的预测的可能值
        out_boxes - tensor类型,维度为(None,4),包含了锚框位置信息
        out_classes - tensor类型,维度为(None,),锚框的预测的分类索引
    """
    
    #图像预处理
    image, image_data = yolo_utils.preprocess_image("images/" + image_file, model_image_size = (608, 608))
    
    #运行会话并在feed_dict中选择正确的占位符.
    out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes], feed_dict = {yolo_model.input:image_data, K.learning_phase(): 0})
    
    #打印预测信息
    if is_show_info:
        print("在" + str(image_file) + "中找到了" + str(len(out_boxes)) + "个锚框。")
    
    #指定要绘制的边界框的颜色
    colors = yolo_utils.generate_colors(class_names)
    
    #在图中绘制边界框
    yolo_utils.draw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors)
    
    #保存已经绘制了边界框的图
    image.save(os.path.join("out", image_file), quality=100)
    
    #打印出已经绘制了边界框的图
    if is_plot:
        output_image = scipy.misc.imread(os.path.join("out", image_file))
        plt.imshow(output_image)
        
    return out_scores,out_boxes,out_classes

测试:

out_scores,out_boxes,out_classes = predict(sess,"test.jpg")

结果:

在test.jpg中找到了7个锚框。
car 0.60 (925, 285) (1045, 374)
car 0.66 (706, 279) (786, 350)
bus 0.67 (5, 266) (220, 407)
car 0.70 (947, 324) (1280, 705)
car 0.74 (159, 303) (346, 440)
car 0.80 (761, 282) (942, 412)
car 0.89 (367, 300) (745, 648)

在这里插入图片描述

批量绘制图

将images文件夹中“0001.jpg”到“0120.jpg”的图,全部绘制出来。

for i in range(1,121):
    num_fill = int(len("0000")-len(str(1))) + 1
    filename = str(i).zfill(num_fill) + ".jpg"
    print("当前文件:"+str(filename))
    
    out_scores,out_boxes,out_classes = predict(sess,filename,is_show_info=False,is_plot=False)
    
print("绘制完成")

结果:

当前文件:0001.jpg
当前文件:0002.jpg
当前文件:0003.jpg
car 0.69 (347, 289) (445, 321)
car 0.70 (230, 307) (317, 354)
car 0.73 (671, 284) (770, 315)
当前文件:0004.jpg
car 0.63 (400, 285) (515, 327)
car 0.66 (95, 297) (227, 342)
car 0.68 (1, 321) (121, 410)
car 0.72 (539, 277) (658, 318)

······我就不拷贝完啦~

当前文件:0116.jpg
traffic light 0.63 (522, 76) (543, 113)
car 0.80 (5, 271) (241, 672)
当前文件:0117.jpg
当前文件:0118.jpg
当前文件:0119.jpg
traffic light 0.61 (1056, 0) (1138, 131)
当前文件:0120.jpg
绘制完成!
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章