【深度學習入門】Paddle實現車輛檢測和車輛類型識別(基於YOLOv3和ResNet18)

今天我們使用 Paddle 開源的兩個工具:PaddleDetection 和 X2Paddle 來進行一個車輛檢測和類型識別的小demo~

源碼地址:https://github.com/Sharpiless/yolov3-vehicle-detection-paddle

最終的檢測效果如圖:
在這裏插入圖片描述

一. PaddleDetection 簡介:

源碼地址:https://github.com/PaddlePaddle/PaddleDetection

官方文檔:https://paddledetection.readthedocs.io/

PaddleDetection 創立的目的是爲工業界和學術界提供豐富、易用的目標檢測模型。不僅性能優越、易於部署,而且能夠靈活的滿足算法研究的需求。

在這裏插入圖片描述

簡而言之就是,該工具使用百度開源的 Paddle 框架,集成了多種圖像識別和目標檢測框架,並且提供了相應的訓練、推理和部署工具,使得用戶可以自己 DIY 數據集和模型細節,實現深度學習落地應用的快速部署。

特點:

  1. 易部署:PaddleDetection的模型中使用的核心算子均通過C++或CUDA實現,同時基於PaddlePaddle的高性能推理引擎可以方便地部署在多種硬件平臺上。

  2. 高靈活度:PaddleDetection通過模塊化設計來解耦各個組件,基於配置文件可以輕鬆地搭建各種檢測模型。

  3. 高性能:基於PaddlePaddle框架的高性能內核,在模型訓練速度、顯存佔用上有一定的優勢。例如,YOLOv3的訓練速度快於其他框架,在Tesla V100 16GB環境下,Mask-RCNN(ResNet50)可以單卡Batch Size可以達到4 (甚至到5)。

支持的主流模型包括:

在這裏插入圖片描述

並且支持多種拓展特性:
在這裏插入圖片描述

該工具使得開發者只需修改相應的 yml 格式參數文件,即可一鍵 DIY 並訓練自己的模型:
在這裏插入圖片描述

二. 配置環境並安裝 Paddle:

(本機配置:1050Ti,CUDA10.0)

安裝 anaconda:

在這裏插入圖片描述

創建 python 環境:

conda create -n paddle_env python=3.6

在這裏插入圖片描述
在這裏插入圖片描述

激活環境:

conda activate paddle_env

在這裏插入圖片描述

使用清華源安裝依賴庫(如opencv-python,matplotlib,Cython等):

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -U 庫名 --default-time=1000 --user

在這裏插入圖片描述
安裝 paddlepaddle:

python -m pip install paddlepaddle-gpu

在這裏插入圖片描述

清華源安裝也可以:
在這裏插入圖片描述
進入 python 環境並測試:

python
>>> import paddle.fluid as fluid
>>> fluid.install_check.run_check()

安裝成功~
在這裏插入圖片描述

三. 安裝 PaddleDetetion:

新建一個文件夾,在該目錄激活環境:
在這裏插入圖片描述
克隆 PaddleDetection 模型庫:

git clone https://github.com/PaddlePaddle/PaddleDetection.git

在這裏插入圖片描述
在這裏插入圖片描述
再次安裝依賴庫:

pip install -r requirements.txt

在這裏插入圖片描述
指定當前 Python 路徑然後測試:

set PYTHONPATH=%PYTHONPATH%;.
python ppdet/modeling/tests/test_architectures.py

在這裏插入圖片描述

安裝成功~

四. 調試 YOLOv3 代碼:

在這裏插入圖片描述
安裝 cocotools:

pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI

在這裏插入圖片描述

下載模型的權重文件,地址:https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.2/docs/featured_model/CONTRIB_cn.md

在這裏插入圖片描述
在這裏插入圖片描述
解壓到文件目錄:
在這裏插入圖片描述
檢測這裏的幾張圖片:
在這裏插入圖片描述

python -u tools/infer.py -c contrib/VehicleDetection/vehicle_yolov3_darknet.yml \
                         -o weights=vehicle_yolov3_darknet \
                         --infer_dir contrib/VehicleDetection/demo \
                         --draw_threshold 0.2 \
                         --output_dir contrib/VehicleDetection/demo/output

在這裏插入圖片描述
在這裏插入圖片描述
檢測結果保存在 contrib\VehicleDetection\demo 目錄下:
在這裏插入圖片描述

在這裏插入圖片描述
可以看到檢測效果非常好~

五. YOLO 系列算法詳解:

這一節我們講一下上面車輛檢測使用的算法原理。

這個我之前寫過:

【論文閱讀筆記】YOLO v1——You Only Look Once: Unified, Real-Time Object Detection:
https://blog.csdn.net/weixin_44936889/article/details/104384273

【論文閱讀筆記】YOLO9000: Better, Faster, Stronger:
https://blog.csdn.net/weixin_44936889/article/details/104387529

【論文閱讀筆記】YOLOv3: An Incremental Improvement:
https://blog.csdn.net/weixin_44936889/article/details/104390227

這裏以車牌檢測爲例簡單複述一下(圖是自己畫的hahhh,所以不是很好看的樣子):

YOLOv1:

論文地址:https://arxiv.org/pdf/1506.02640.pdf

YOLO算法採用一個單獨的卷積神經網絡實現了端到端的的目標檢測[3],其算法的基本流程爲:首先將輸入圖片通過雙線性插值的方法放縮到統一大小(文中使用448×448大小)並劃分成互不重疊的網格,然後將放縮後的圖像矩陣送入卷積神經網絡提取高維語義特徵,最後通過全連接層預測每個網格內存在目標的概率,並預測目標的座標框位置。由於無需RPN網絡提取感興趣區域,所以YOLO的網絡結構十分簡潔,如圖所示:

在這裏插入圖片描述
即YOLO的卷積神經網絡(也稱之爲主幹網絡)將輸入的圖片分割成大小相同、互不重疊的單元格,然後每個單元格在卷積提取特徵的過程中同時參與計算。提取特徵後,每個單元格對應特整層上的特徵向量通過全連接層負責去檢測那些中心點落在該單元格內的目標,從而輸出相應位置存在目標類別的概率和目標的預測座標,如圖所示:

在這裏插入圖片描述

YOLO將原圖化分車7×7的49個單元格,主幹網絡依然採用了分類網絡,但最後兩層使用了全卷積層,最終輸出一個 7×7×30 的特徵層,其中每個特徵點包含的特徵向量代表了每個單元格的特徵。這樣對於每個區域的特徵向量,YOLO分別使用全連接層輸出預測,相應的預測值包括:

  1. 該區域可能包含的相應目標的個座標框的4個預測座標值(文中 B=2),分別爲{x,y,w,h},同時輸出這B個預測框的置信度,選取置信度最高的那個預測框作爲最終的結果;
  2. 該區域可能包含的目標的分類置信度,該置信度記爲C,是一個長度爲分類書的概率向量,使用softmax函數作爲激活函數;

其置信度定義爲:

在這裏插入圖片描述
IOU即相應物體預測框與真值框的交併比。因此每個區域生成30個預測值,最後全連接層的輸出大小爲S×S×30。這裏使用B個預選框並生成置信度,是爲了使得網絡能夠在同一個單元格內預測重疊目標,從而提高預測結果的容錯率。增大B可以提高模型的魯棒性,但相應的全連接層的計算複雜度也會大大提高。

此外,爲了避免使用Relu函數激活而造成的特徵丟失問題,作者在YOLO中將所有的Relu改爲Leacky Relu激活:

在這裏插入圖片描述
最終的網絡結構如圖所示:

在這裏插入圖片描述

YOLOv2:

論文地址:

YOLO算法採用網格劃分來檢測不同區域的目標,但是由於網絡結構較爲簡單,保證檢測速度下精度還有很大的提升空間。因此作者在 YOLO 的基礎上,使用了批量標準化來規範特徵數據分佈並加速收斂;使用高分辨率的圖像作爲分類器的輸入,從而提高檢測的精度;加入預選框的概念,提高小目標的檢測精度。由此提出了一個更爲高效的目標檢測算法,稱之爲YOLO v2。並在此基礎上,使用聯合訓練(Joint Training)算法,能夠在有大量分類的圖像識別數據集訓練目標檢測任務(只有分類的loss參與梯度下降),由此訓練出的YOLO 9000能夠識別和檢測超過9000個物體類別。

Batch Normalization在Inception V2中提出的方法,用來替代ResNet使用的殘差結構,防止梯度消失的問題。該算法將一個批次的特徵值矩陣,通過簡單的線性變換,轉換爲均值爲0、方差爲1的正太分佈上,從而使網絡中每一層的特徵值大體分佈一致。因此每層的梯度不會隨着網絡結構的加深發生太大變化,從而避免發生梯度爆炸或者梯度消失等問題。因此作者在YOLO中大量使用了Batch Normalization,使得相比原來的YOLO算法mAP上升了2%。計算過程爲:
(1)計算數據的均值u;
(2)計算數據的方差σ^2;
(3)通過公式 x’=(x-u)/√(σ^2+ε)標準化數據;
(4)通過公式 y=γx’+β 進行縮放平移;
在這裏插入圖片描述
爲了解決預測框對於小目標和重疊目標檢測精度缺失的問題,作者不再使用YOLO採用網格分割+全連接預測座標的方法,而是採用了跟SSD、 Faster-RCNN等相似的預選框的方法。
在這裏插入圖片描述

因此YOLOv2中移除了全連接層和最後一個下采樣層,來最終得到一個較大的特整層。並且爲了使圖像最後的預測具有單中心網格,作者使用了416×416 大小作爲輸入,下采樣參數爲32,最後得到了一個 13×13大小的特徵層;在使用Anchor Boxes之後,雖然mAP下降了0.3%,但是召回率從81%上升到了88%。

並且爲了融合目標分類需要的高層語義特徵和目標檢測需要的低層輪廓特徵,YOLOv2還設計了Pass Through層,即取出了最後一個池化層的特整層,(大小爲 26×26×512),將每個2×2局部空間區域轉換成通道特徵,最後得到了一個13×13×4048的用於預測的特徵層。

在這裏插入圖片描述

作者在訓練時採用了 32 倍數的輸入大小爲,分別爲:320,352,…,608,每 10 個 epoch 重新隨機選取一個輸入大小;

YOLOv3:

YOLOv3在網絡結構上並沒有太大改動,主要是將YOLOv2提出後目標檢測領域提出的一些模型結構和訓練技巧融合到了YOLO框架中。

作者首先在主幹網絡DarkNet-19的基礎上加入了殘差結構,使得它在 ImageNet上的表現跟ResNet-101相差無幾,但是處理速度卻快得多。

在這裏插入圖片描述
此外YOLOv3中,每個單元格對應使用了三個不同比率和大小的預選框,並且還構建了跟FPN目標檢測算法相似的特徵金字塔,利用不同大小、不同深度的特整層進行不同大小目標的預測。

在特徵金字塔中,YOLOv3共選出了3個通過反捲積得到的特徵層去檢測不同大小的物體,這三個特徵層大小分別爲:13,26,52。特徵金字塔使用卷積網絡生成的金字塔型的特徵層(左),生成一個融合了高度語義信息和低維特徵信息的特徵金字塔(右),再在這些特徵金字塔的不同層上,使用不共享權重的不同卷積層預測目標的類別和檢測框座標:

在這裏插入圖片描述

六. 檢測自己的數據:

這裏我寫了一個調用 PaddleDetection 車輛檢測模型的程序,源碼地址:https://github.com/Sharpiless/yolov3-vehicle-detection-paddle
在這裏插入圖片描述
點一個⭐然後下載解壓:

在這裏插入圖片描述
這裏我使用 VSCode,選擇好配置的環境:
在這裏插入圖片描述

測試圖片:

將圖片路徑修改爲自己的路徑即可:
在這裏插入圖片描述

在這裏插入圖片描述

運行 demo_img.py:

在這裏插入圖片描述

測試視頻:

將圖片視頻修改爲自己的路徑即可:
在這裏插入圖片描述
在這裏插入圖片描述

七. 使用 X2Paddle 進行模型轉換:

(下面只是演示一下如何使用X2Paddle進行模型轉換,感興趣的同學可以試一下)

看到這裏有同學要問了,這個類型識別是如何實現的?

這裏我們使用的是 torch 的開源車輛類型識別模型,並使用 X2Paddle 工具將其轉換爲 Paddle 模型;

X2Paddle 源碼地址:https://github.com/PaddlePaddle/X2Paddle

深度學習的應用主要包括兩個部分,一是通過深度學習框架訓練出模型,二是利用訓練出來的模型進行預測。

開發者基於不同的深度學習框架能夠得到不同的訓練模型,如果想要基於一種框架進行預測,就必須要解決不同框架的模型之間的匹配問題。基於這種考慮,也爲了幫助用戶快速從其他框架遷移,PaddlePaddle開源了模型轉換工具X2Paddle。

它可以將TensorFlow、Caffe 的模型轉換爲PaddlePaddle的核心框架Paddle Fluid可加載的格式。同時X2Paddle還支持ONNX格式的模型轉換,這樣也相當於支持了衆多可以轉換爲ONNX格式的框架,比如PyTorch、MXNet、CNTK等。

下載 torch 源碼:

源碼地址:https://github.com/Sharpiless/Paddle-Car-type-recognition

在這裏插入圖片描述

點⭐然後下載解壓:
在這裏插入圖片描述

下載權重文件,放到 src 文件夾下面:

鏈接:https://pan.baidu.com/s/1fBwOr9PM9S7LmCgRddX0Gg

提取碼:pv6e

在這裏插入圖片描述

首先運行 torch2onnx.py,將 pth 模型轉換爲 onnx 中間模型:
在這裏插入圖片描述
然後運行:

x2paddle --framework=onnx --model=classifier.onnx --save_dir=pd_model

在這裏插入圖片描述
可以看到生成了相應的 Paddle 模型;

此時將 model.py 替換爲:

from paddle.fluid.initializer import Constant
from paddle.fluid.param_attr import ParamAttr
import paddle.fluid as fluid

def x2paddle_net(inputs):
    x2paddle_124 = fluid.layers.fill_constant(shape=[1], dtype='int32', value=0)
    x2paddle_193 = fluid.layers.fill_constant(shape=[1], dtype='int32', value=512)
    x2paddle_194 = fluid.layers.fill_constant(shape=[1], dtype='int32', value=1)
    x2paddle_202 = fluid.layers.fill_constant(shape=[1], dtype='int32', value=262144)
    x2paddle_207 = fluid.layers.fill_constant(shape=[1], dtype='float32', value=9.999999747378752e-06)
    # x2paddle_input_1 = fluid.layers.data(dtype='float32', shape=[1, 3, 224, 224], name='x2paddle_input_1', append_batch_size=False)
    x2paddle_input_1 = inputs
    x2paddle_fc_bias = fluid.layers.create_parameter(dtype='float32', shape=[19], name='x2paddle_fc_bias', attr='x2paddle_fc_bias', default_initializer=Constant(0.0))
    x2paddle_fc_weight = fluid.layers.create_parameter(dtype='float32', shape=[19, 262144], name='x2paddle_fc_weight', attr='x2paddle_fc_weight', default_initializer=Constant(0.0))
    x2paddle_196 = fluid.layers.assign(x2paddle_193)
    x2paddle_197 = fluid.layers.assign(x2paddle_194)
    x2paddle_204 = fluid.layers.assign(x2paddle_202)
    x2paddle_123 = fluid.layers.shape(x2paddle_input_1)
    x2paddle_126 = fluid.layers.conv2d(x2paddle_input_1, num_filters=64, filter_size=[7, 7], stride=[2, 2], padding=[3, 3], dilation=[1, 1], groups=1, param_attr='x2paddle_features_0_weight', name='x2paddle_126', bias_attr=False)
    x2paddle_125 = fluid.layers.gather(input=x2paddle_123, index=x2paddle_124)
    x2paddle_127 = fluid.layers.batch_norm(x2paddle_126, momentum=0.8999999761581421, epsilon=9.999999747378752e-06, data_layout='NCHW', is_test=True, param_attr='x2paddle_features_1_weight', bias_attr='x2paddle_features_1_bias', moving_mean_name='x2paddle_features_1_running_mean', moving_variance_name='x2paddle_features_1_running_var', use_global_stats=False, name='x2paddle_127')
    x2paddle_195 = fluid.layers.assign(x2paddle_125)
    x2paddle_203 = fluid.layers.assign(x2paddle_125)
    x2paddle_128 = fluid.layers.relu(x2paddle_127, name='x2paddle_128')
    x2paddle_198 = fluid.layers.concat([x2paddle_195, x2paddle_196, x2paddle_197], axis=0)
    x2paddle_205 = fluid.layers.concat([x2paddle_203, x2paddle_204], axis=0)
    x2paddle_129 = fluid.layers.pool2d(x2paddle_128, pool_size=[3, 3], pool_type='max', pool_stride=[2, 2], pool_padding=[1, 1], ceil_mode=False, name='x2paddle_129', exclusive=False)
    x2paddle_130 = fluid.layers.conv2d(x2paddle_129, num_filters=64, filter_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, param_attr='x2paddle_features_4_0_conv1_weight', name='x2paddle_130', bias_attr=False)
    x2paddle_131 = fluid.layers.batch_norm(x2paddle_130, momentum=0.8999999761581421, epsilon=9.999999747378752e-06, data_layout='NCHW', is_test=True, param_attr='x2paddle_features_4_0_bn1_weight', bias_attr='x2paddle_features_4_0_bn1_bias', moving_mean_name='x2paddle_features_4_0_bn1_running_mean', moving_variance_name='x2paddle_features_4_0_bn1_running_var', use_global_stats=False, name='x2paddle_131')
    x2paddle_132 = fluid.layers.relu(x2paddle_131, name='x2paddle_132')
    x2paddle_133 = fluid.layers.conv2d(x2paddle_132, num_filters=64, filter_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, param_attr='x2paddle_features_4_0_conv2_weight', name='x2paddle_133', bias_attr=False)
    x2paddle_134 = fluid.layers.batch_norm(x2paddle_133, momentum=0.8999999761581421, epsilon=9.999999747378752e-06, data_layout='NCHW', is_test=True, param_attr='x2paddle_features_4_0_bn2_weight', bias_attr='x2paddle_features_4_0_bn2_bias', moving_mean_name='x2paddle_features_4_0_bn2_running_mean', moving_variance_name='x2paddle_features_4_0_bn2_running_var', use_global_stats=False, name='x2paddle_134')
    x2paddle_135 = fluid.layers.elementwise_add(x=x2paddle_134, y=x2paddle_129, name='x2paddle_135')
    x2paddle_136 = fluid.layers.relu(x2paddle_135, name='x2paddle_136')
    x2paddle_137 = fluid.layers.conv2d(x2paddle_136, num_filters=64, filter_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, param_attr='x2paddle_features_4_1_conv1_weight', name='x2paddle_137', bias_attr=False)
    x2paddle_138 = fluid.layers.batch_norm(x2paddle_137, momentum=0.8999999761581421, epsilon=9.999999747378752e-06, data_layout='NCHW', is_test=True, param_attr='x2paddle_features_4_1_bn1_weight', bias_attr='x2paddle_features_4_1_bn1_bias', moving_mean_name='x2paddle_features_4_1_bn1_running_mean', moving_variance_name='x2paddle_features_4_1_bn1_running_var', use_global_stats=False, name='x2paddle_138')
    x2paddle_139 = fluid.layers.relu(x2paddle_138, name='x2paddle_139')
    x2paddle_140 = fluid.layers.conv2d(x2paddle_139, num_filters=64, filter_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, param_attr='x2paddle_features_4_1_conv2_weight', name='x2paddle_140', bias_attr=False)
    x2paddle_141 = fluid.layers.batch_norm(x2paddle_140, momentum=0.8999999761581421, epsilon=9.999999747378752e-06, data_layout='NCHW', is_test=True, param_attr='x2paddle_features_4_1_bn2_weight', bias_attr='x2paddle_features_4_1_bn2_bias', moving_mean_name='x2paddle_features_4_1_bn2_running_mean', moving_variance_name='x2paddle_features_4_1_bn2_running_var', use_global_stats=False, name='x2paddle_141')
    x2paddle_142 = fluid.layers.elementwise_add(x=x2paddle_141, y=x2paddle_136, name='x2paddle_142')
    x2paddle_143 = fluid.layers.relu(x2paddle_142, name='x2paddle_143')
    x2paddle_144 = fluid.layers.conv2d(x2paddle_143, num_filters=128, filter_size=[3, 3], stride=[2, 2], padding=[1, 1], dilation=[1, 1], groups=1, param_attr='x2paddle_features_5_0_conv1_weight', name='x2paddle_144', bias_attr=False)
    x2paddle_149 = fluid.layers.conv2d(x2paddle_143, num_filters=128, filter_size=[1, 1], stride=[2, 2], padding=[0, 0], dilation=[1, 1], groups=1, param_attr='x2paddle_features_5_0_downsample_0_weight', name='x2paddle_149', bias_attr=False)
    x2paddle_145 = fluid.layers.batch_norm(x2paddle_144, momentum=0.8999999761581421, epsilon=9.999999747378752e-06, data_layout='NCHW', is_test=True, param_attr='x2paddle_features_5_0_bn1_weight', bias_attr='x2paddle_features_5_0_bn1_bias', moving_mean_name='x2paddle_features_5_0_bn1_running_mean', moving_variance_name='x2paddle_features_5_0_bn1_running_var', use_global_stats=False, name='x2paddle_145')
    x2paddle_150 = fluid.layers.batch_norm(x2paddle_149, momentum=0.8999999761581421, epsilon=9.999999747378752e-06, data_layout='NCHW', is_test=True, param_attr='x2paddle_features_5_0_downsample_1_weight', bias_attr='x2paddle_features_5_0_downsample_1_bias', moving_mean_name='x2paddle_features_5_0_downsample_1_running_mean', moving_variance_name='x2paddle_features_5_0_downsample_1_running_var', use_global_stats=False, name='x2paddle_150')
    x2paddle_146 = fluid.layers.relu(x2paddle_145, name='x2paddle_146')
    x2paddle_147 = fluid.layers.conv2d(x2paddle_146, num_filters=128, filter_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, param_attr='x2paddle_features_5_0_conv2_weight', name='x2paddle_147', bias_attr=False)
    x2paddle_148 = fluid.layers.batch_norm(x2paddle_147, momentum=0.8999999761581421, epsilon=9.999999747378752e-06, data_layout='NCHW', is_test=True, param_attr='x2paddle_features_5_0_bn2_weight', bias_attr='x2paddle_features_5_0_bn2_bias', moving_mean_name='x2paddle_features_5_0_bn2_running_mean', moving_variance_name='x2paddle_features_5_0_bn2_running_var', use_global_stats=False, name='x2paddle_148')
    x2paddle_151 = fluid.layers.elementwise_add(x=x2paddle_148, y=x2paddle_150, name='x2paddle_151')
    x2paddle_152 = fluid.layers.relu(x2paddle_151, name='x2paddle_152')
    x2paddle_153 = fluid.layers.conv2d(x2paddle_152, num_filters=128, filter_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, param_attr='x2paddle_features_5_1_conv1_weight', name='x2paddle_153', bias_attr=False)
    x2paddle_154 = fluid.layers.batch_norm(x2paddle_153, momentum=0.8999999761581421, epsilon=9.999999747378752e-06, data_layout='NCHW', is_test=True, param_attr='x2paddle_features_5_1_bn1_weight', bias_attr='x2paddle_features_5_1_bn1_bias', moving_mean_name='x2paddle_features_5_1_bn1_running_mean', moving_variance_name='x2paddle_features_5_1_bn1_running_var', use_global_stats=False, name='x2paddle_154')
    x2paddle_155 = fluid.layers.relu(x2paddle_154, name='x2paddle_155')
    x2paddle_156 = fluid.layers.conv2d(x2paddle_155, num_filters=128, filter_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, param_attr='x2paddle_features_5_1_conv2_weight', name='x2paddle_156', bias_attr=False)
    x2paddle_157 = fluid.layers.batch_norm(x2paddle_156, momentum=0.8999999761581421, epsilon=9.999999747378752e-06, data_layout='NCHW', is_test=True, param_attr='x2paddle_features_5_1_bn2_weight', bias_attr='x2paddle_features_5_1_bn2_bias', moving_mean_name='x2paddle_features_5_1_bn2_running_mean', moving_variance_name='x2paddle_features_5_1_bn2_running_var', use_global_stats=False, name='x2paddle_157')
    x2paddle_158 = fluid.layers.elementwise_add(x=x2paddle_157, y=x2paddle_152, name='x2paddle_158')
    x2paddle_159 = fluid.layers.relu(x2paddle_158, name='x2paddle_159')
    x2paddle_160 = fluid.layers.conv2d(x2paddle_159, num_filters=256, filter_size=[3, 3], stride=[2, 2], padding=[1, 1], dilation=[1, 1], groups=1, param_attr='x2paddle_features_6_0_conv1_weight', name='x2paddle_160', bias_attr=False)
    x2paddle_165 = fluid.layers.conv2d(x2paddle_159, num_filters=256, filter_size=[1, 1], stride=[2, 2], padding=[0, 0], dilation=[1, 1], groups=1, param_attr='x2paddle_features_6_0_downsample_0_weight', name='x2paddle_165', bias_attr=False)
    x2paddle_161 = fluid.layers.batch_norm(x2paddle_160, momentum=0.8999999761581421, epsilon=9.999999747378752e-06, data_layout='NCHW', is_test=True, param_attr='x2paddle_features_6_0_bn1_weight', bias_attr='x2paddle_features_6_0_bn1_bias', moving_mean_name='x2paddle_features_6_0_bn1_running_mean', moving_variance_name='x2paddle_features_6_0_bn1_running_var', use_global_stats=False, name='x2paddle_161')
    x2paddle_166 = fluid.layers.batch_norm(x2paddle_165, momentum=0.8999999761581421, epsilon=9.999999747378752e-06, data_layout='NCHW', is_test=True, param_attr='x2paddle_features_6_0_downsample_1_weight', bias_attr='x2paddle_features_6_0_downsample_1_bias', moving_mean_name='x2paddle_features_6_0_downsample_1_running_mean', moving_variance_name='x2paddle_features_6_0_downsample_1_running_var', use_global_stats=False, name='x2paddle_166')
    x2paddle_162 = fluid.layers.relu(x2paddle_161, name='x2paddle_162')
    x2paddle_163 = fluid.layers.conv2d(x2paddle_162, num_filters=256, filter_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, param_attr='x2paddle_features_6_0_conv2_weight', name='x2paddle_163', bias_attr=False)
    x2paddle_164 = fluid.layers.batch_norm(x2paddle_163, momentum=0.8999999761581421, epsilon=9.999999747378752e-06, data_layout='NCHW', is_test=True, param_attr='x2paddle_features_6_0_bn2_weight', bias_attr='x2paddle_features_6_0_bn2_bias', moving_mean_name='x2paddle_features_6_0_bn2_running_mean', moving_variance_name='x2paddle_features_6_0_bn2_running_var', use_global_stats=False, name='x2paddle_164')
    x2paddle_167 = fluid.layers.elementwise_add(x=x2paddle_164, y=x2paddle_166, name='x2paddle_167')
    x2paddle_168 = fluid.layers.relu(x2paddle_167, name='x2paddle_168')
    x2paddle_169 = fluid.layers.conv2d(x2paddle_168, num_filters=256, filter_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, param_attr='x2paddle_features_6_1_conv1_weight', name='x2paddle_169', bias_attr=False)
    x2paddle_170 = fluid.layers.batch_norm(x2paddle_169, momentum=0.8999999761581421, epsilon=9.999999747378752e-06, data_layout='NCHW', is_test=True, param_attr='x2paddle_features_6_1_bn1_weight', bias_attr='x2paddle_features_6_1_bn1_bias', moving_mean_name='x2paddle_features_6_1_bn1_running_mean', moving_variance_name='x2paddle_features_6_1_bn1_running_var', use_global_stats=False, name='x2paddle_170')
    x2paddle_171 = fluid.layers.relu(x2paddle_170, name='x2paddle_171')
    x2paddle_172 = fluid.layers.conv2d(x2paddle_171, num_filters=256, filter_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, param_attr='x2paddle_features_6_1_conv2_weight', name='x2paddle_172', bias_attr=False)
    x2paddle_173 = fluid.layers.batch_norm(x2paddle_172, momentum=0.8999999761581421, epsilon=9.999999747378752e-06, data_layout='NCHW', is_test=True, param_attr='x2paddle_features_6_1_bn2_weight', bias_attr='x2paddle_features_6_1_bn2_bias', moving_mean_name='x2paddle_features_6_1_bn2_running_mean', moving_variance_name='x2paddle_features_6_1_bn2_running_var', use_global_stats=False, name='x2paddle_173')
    x2paddle_174 = fluid.layers.elementwise_add(x=x2paddle_173, y=x2paddle_168, name='x2paddle_174')
    x2paddle_175 = fluid.layers.relu(x2paddle_174, name='x2paddle_175')
    x2paddle_176 = fluid.layers.conv2d(x2paddle_175, num_filters=512, filter_size=[3, 3], stride=[2, 2], padding=[1, 1], dilation=[1, 1], groups=1, param_attr='x2paddle_features_7_0_conv1_weight', name='x2paddle_176', bias_attr=False)
    x2paddle_181 = fluid.layers.conv2d(x2paddle_175, num_filters=512, filter_size=[1, 1], stride=[2, 2], padding=[0, 0], dilation=[1, 1], groups=1, param_attr='x2paddle_features_7_0_downsample_0_weight', name='x2paddle_181', bias_attr=False)
    x2paddle_177 = fluid.layers.batch_norm(x2paddle_176, momentum=0.8999999761581421, epsilon=9.999999747378752e-06, data_layout='NCHW', is_test=True, param_attr='x2paddle_features_7_0_bn1_weight', bias_attr='x2paddle_features_7_0_bn1_bias', moving_mean_name='x2paddle_features_7_0_bn1_running_mean', moving_variance_name='x2paddle_features_7_0_bn1_running_var', use_global_stats=False, name='x2paddle_177')
    x2paddle_182 = fluid.layers.batch_norm(x2paddle_181, momentum=0.8999999761581421, epsilon=9.999999747378752e-06, data_layout='NCHW', is_test=True, param_attr='x2paddle_features_7_0_downsample_1_weight', bias_attr='x2paddle_features_7_0_downsample_1_bias', moving_mean_name='x2paddle_features_7_0_downsample_1_running_mean', moving_variance_name='x2paddle_features_7_0_downsample_1_running_var', use_global_stats=False, name='x2paddle_182')
    x2paddle_178 = fluid.layers.relu(x2paddle_177, name='x2paddle_178')
    x2paddle_179 = fluid.layers.conv2d(x2paddle_178, num_filters=512, filter_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, param_attr='x2paddle_features_7_0_conv2_weight', name='x2paddle_179', bias_attr=False)
    x2paddle_180 = fluid.layers.batch_norm(x2paddle_179, momentum=0.8999999761581421, epsilon=9.999999747378752e-06, data_layout='NCHW', is_test=True, param_attr='x2paddle_features_7_0_bn2_weight', bias_attr='x2paddle_features_7_0_bn2_bias', moving_mean_name='x2paddle_features_7_0_bn2_running_mean', moving_variance_name='x2paddle_features_7_0_bn2_running_var', use_global_stats=False, name='x2paddle_180')
    x2paddle_183 = fluid.layers.elementwise_add(x=x2paddle_180, y=x2paddle_182, name='x2paddle_183')
    x2paddle_184 = fluid.layers.relu(x2paddle_183, name='x2paddle_184')
    x2paddle_185 = fluid.layers.conv2d(x2paddle_184, num_filters=512, filter_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, param_attr='x2paddle_features_7_1_conv1_weight', name='x2paddle_185', bias_attr=False)
    x2paddle_186 = fluid.layers.batch_norm(x2paddle_185, momentum=0.8999999761581421, epsilon=9.999999747378752e-06, data_layout='NCHW', is_test=True, param_attr='x2paddle_features_7_1_bn1_weight', bias_attr='x2paddle_features_7_1_bn1_bias', moving_mean_name='x2paddle_features_7_1_bn1_running_mean', moving_variance_name='x2paddle_features_7_1_bn1_running_var', use_global_stats=False, name='x2paddle_186')
    x2paddle_187 = fluid.layers.relu(x2paddle_186, name='x2paddle_187')
    x2paddle_188 = fluid.layers.conv2d(x2paddle_187, num_filters=512, filter_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, param_attr='x2paddle_features_7_1_conv2_weight', name='x2paddle_188', bias_attr=False)
    x2paddle_189 = fluid.layers.batch_norm(x2paddle_188, momentum=0.8999999761581421, epsilon=9.999999747378752e-06, data_layout='NCHW', is_test=True, param_attr='x2paddle_features_7_1_bn2_weight', bias_attr='x2paddle_features_7_1_bn2_bias', moving_mean_name='x2paddle_features_7_1_bn2_running_mean', moving_variance_name='x2paddle_features_7_1_bn2_running_var', use_global_stats=False, name='x2paddle_189')
    x2paddle_190 = fluid.layers.elementwise_add(x=x2paddle_189, y=x2paddle_184, name='x2paddle_190')
    x2paddle_191 = fluid.layers.relu(x2paddle_190, name='x2paddle_191')
    x2paddle_192 = fluid.layers.pool2d(x2paddle_191, pool_type='avg', global_pooling=True, name='x2paddle_192')
    x2paddle_198_cast = fluid.layers.cast(x2paddle_198, dtype='int32')
    x2paddle_199 = fluid.layers.reshape(x2paddle_192, name='x2paddle_199', actual_shape=x2paddle_198_cast, shape=[1, 512, 1])
    x2paddle_200 = fluid.layers.transpose(x2paddle_199, perm=[0, 2, 1], name='x2paddle_200')
    x2paddle_201 = fluid.layers.matmul(x=x2paddle_199, y=x2paddle_200, name='x2paddle_201')
    x2paddle_205_cast = fluid.layers.cast(x2paddle_205, dtype='int32')
    x2paddle_206 = fluid.layers.reshape(x2paddle_201, name='x2paddle_206', actual_shape=x2paddle_205_cast, shape=[1, 262144])
    x2paddle_208 = fluid.layers.elementwise_add(x=x2paddle_206, y=x2paddle_207, name='x2paddle_208')
    x2paddle_209 = fluid.layers.sqrt(x2paddle_208, name='x2paddle_209')
    x2paddle_210_mm = fluid.layers.matmul(x=x2paddle_209, y=x2paddle_fc_weight, transpose_x=False, transpose_y=True, alpha=1.0, name='x2paddle_210_mm')
    x2paddle_210 = fluid.layers.elementwise_add(x=x2paddle_210_mm, y=x2paddle_fc_bias, name='x2paddle_210')

    return [x2paddle_input_1], [x2paddle_210]

def run_net(param_dir="./"):
    import os
    inputs, outputs = x2paddle_net()
    for i, out in enumerate(outputs):
        if isinstance(out, list):
            for out_part in out:
                outputs.append(out_part)
            del outputs[i]
    exe = fluid.Executor(fluid.CPUPlace())
    exe.run(fluid.default_startup_program())

    def if_exist(var):
        b = os.path.exists(os.path.join(param_dir, var.name))
        return b

    fluid.io.load_vars(exe,
                       param_dir,
                       fluid.default_main_program(),
                       predicate=if_exist)

然後創建調用 Paddle 模型的 test_img.py:

import cv2
from pd_model.model_with_code.model import x2paddle_net

import argparse
import functools
import numpy as np
import paddle.fluid as fluid
from PIL import ImageFont, ImageDraw, Image

font_path = r'./simsun.ttc'
font = ImageFont.truetype(font_path, 32)


def putText(img, text, x, y, color=(0, 0, 255)):

    img_pil = Image.fromarray(img)
    draw = ImageDraw.Draw(img_pil)
    b, g, r = color
    a = 0
    draw.text((x, y), text, font=font, fill=(b, g, r, a))
    img = np.array(img_pil)
    return img


# 定義一個預處理圖像的函數
def process_img(img_path='', image_shape=[3, 224, 224]):

    mean = [0.485, 0.456, 0.406]
    std = [0.229, 0.224, 0.225]

    img = cv2.imread(img_path)
    img = cv2.resize(img, (image_shape[1], image_shape[2]))
    #img = cv2.resize(img,(256,256))
    #img = crop_image(img, image_shape[1], True)

    # RBG img [224,224,3]->[3,224,224]
    img = img[:, :, ::-1].astype('float32').transpose((2, 0, 1)) / 255
    #img = img.astype('float32').transpose((2, 0, 1)) / 255
    img_mean = np.array(mean).reshape((3, 1, 1))
    img_std = np.array(std).reshape((3, 1, 1))
    img -= img_mean
    img /= img_std

    img = img.astype('float32')
    img = np.expand_dims(img, axis=0)

    return img

# 模型推理函數


color_attrs = ['Black', 'Blue', 'Brown',
               'Gray', 'Green', 'Pink',
               'Red', 'White', 'Yellow']  # 車體顏色

direction_attrs = ['Front', 'Rear']  # 拍攝位置

type_attrs = ['passengerCar', 'saloonCar',
              'shopTruck', 'suv', 'trailer', 'truck', 'van', 'waggon']  # 車輛類型


def inference(img):
    fetch_list = [out.name]

    output = exe.run(eval_program,
                     fetch_list=fetch_list,
                     feed={'image': img})
    color_idx, direction_idx, type_idx = get_predict(np.array(output))

    color_name = color_attrs[color_idx]
    direction_name = direction_attrs[direction_idx]
    type_name = type_attrs[type_idx]

    return color_name, direction_name, type_name


def get_predict(output):
    output = np.squeeze(output)
    pred_color = output[:9]
    pred_direction = output[9:11]
    pred_type = output[11:]

    color_idx = np.argmax(pred_color)
    direction_idx = np.argmax(pred_direction)
    type_idx = np.argmax(pred_type)

    return color_idx, direction_idx, type_idx


use_gpu = True
# Attack graph
adv_program = fluid.Program()

# 完成初始化
with fluid.program_guard(adv_program):
    input_layer = fluid.layers.data(
        name='image', shape=[3, 224, 224], dtype='float32')
    # 設置爲可以計算梯度
    input_layer.stop_gradient = False

    # model definition
    _, out_logits = x2paddle_net(inputs=input_layer)
    out = fluid.layers.softmax(out_logits[0])

    place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace()
    exe = fluid.Executor(place)
    exe.run(fluid.default_startup_program())

    # 記載模型參數
    fluid.io.load_persistables(exe, './pd_model/model_with_code/')

# 創建測試用評估模式
eval_program = adv_program.clone(for_test=True)

# im_pt = './a.jpg'
im_pt = './a.png'
img = process_img(im_pt)

color_name, direction_name, type_name = inference(img)

label = '顏色:{}\n朝向:{}\n類型:{}'.format(color_name, direction_name, type_name)

img = cv2.imread(im_pt)
img = putText(img, label, x=1, y=10, color=(0, 215, 255))

cv2.imshow('a', img)
cv2.waitKey(0)

cv2.destroyAllWindows()

運行測試:

在這裏插入圖片描述
成功~

八. 總結:

在本篇文章中,我們使用了 PaddleDetection 和 X2Paddle 兩個工具,實現了一個圖片、視頻中車簾檢測和類型識別的小項目。

其中:

  1. PaddleDetection 提供了很好的應用接口和預訓練模型,實現了快速的車輛檢測;
  2. X2Paddle 則解決了不同深度學習框架的模型權重文件轉換的問題;

更多其他項目和信息請關注我的博客:https://blog.csdn.net/weixin_44936889

在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章