【飛槳開發者說】梁瑛平，北京理工大學徐特立學院本科二年級，人工智能開發愛好者。

項目簡介

無人駕駛汽車利用傳感器技術、信號處理技術、通訊技術和計算機技術等，通過集成視覺、激光雷達、超聲傳感器、微波雷達、GPS、里程計、磁羅盤等多種車載傳感器來辨識汽車所處的環境和狀態，並根據所獲得的道路信息、交通信號的信息、車輛位置和障礙物信息做出分析和判斷，向主控計算機發出期望控制，控制車輛轉向和速度，從而實現無人駕駛車輛依據自身意圖和環境的擬人駕駛。

該項目使用PaddleX提供的YOLOv3模型，在 UA-DETRAC 車輛檢測數據集進行訓練；
訓練結果能夠檢測到car，van，bus等不同類型車輛，mAP爲0.73；
並使用開源車道檢測算法，實現了無人駕駛部分的視覺感知——車輛檢測和車道線分割；

最終效果

PaddleX工具簡介

PaddleX是飛槳全流程開發工具，集飛槳核心框架、模型庫、工具及組件等深度學習開發所需全部能力於一身，打通深度學習開發全流程，並提供簡明易懂的Python API，方便用戶根據實際生產需求進行直接調用或二次開發，爲開發者提供飛槳全流程開發的最佳實踐。目前，該工具代碼已開源於GitHub，同時可訪問PaddleX在線使用文檔，快速查閱使用教程和API文檔說明。

PaddleX代碼GitHub鏈接：

https://github.com/PaddlePaddle/PaddleX

PaddleX文檔鏈接：

https://paddlex.readthedocs.io/zh_CN/latest/index.html

PaddleX官網鏈接：

https://www.paddlepaddle.org.cn/paddle/paddlex

項目過程回放

一、準備PaddleX環境

1. 安裝PaddleX庫

pip install paddlex -i https://mirror.baidu.com/pypi/simple

2. 設置工作路徑，並使用0號GPU卡

import matplotlib
matplotlib.use('Agg') import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddlex as pdx

os.chdir('/home/aistudio/work/')

二、準備數據

1. 數據集簡介

數據集使用 UA-DETRAC 數據集，是一個具有挑戰性的真實多目標檢測和多目標跟蹤基準。該數據集由10小時的視頻組成，這些視頻由中國北京和天津的24個不同地點使用Cannon EOS 550D攝像機拍攝。視頻以每秒 25 幀（fps）的速度錄製，分辨率爲 960×540 像素。UA-DETRAC 數據集中有超過 140 000 個幀，手動標註了 8250 輛車，總共有 121 萬個標記了邊界框的目標。

2. 準備所需文件

PaddleX同時支持VOC和COCO兩種格式的數據，需要的文件有：

labels.txt：保存目標類別的文件，不包括背景類；

train_list.txt和val_list.txt：保存訓練/測試所需的圖片和標註文件的相對路徑；

!unzip /home/aistudio/data/data34332/VOC2012.zip -d ./

imgs = os.listdir('./VOC2012/JPEGImages')
print('total:', len(imgs))
with open('./VOC2012/train_list.txt', 'w') as f:
    for im in imgs[:-200]:
        info = 'JPEGImages/'+im+' '
        info += 'Annotations/'+im[:-4]+'.xml\n'
        f.write(info)
with open('./VOC2012/val_list.txt', 'w') as f:
    for im in imgs[-200:]:
        info = 'JPEGImages/'+im+' '
        info += 'Annotations/'+im[:-4]+'.xml\n'
        f.write(info)

三、數據預處理

1. 設置圖像數據預處理和數據增強模塊

具體參數見：

https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/det_transforms.html

from paddlex.det import transforms
train_transforms = transforms.Compose([
    transforms.MixupImage(mixup_epoch=250),
    transforms.RandomDistort(),
    transforms.RandomExpand(),
    transforms.RandomCrop(),
    transforms.Resize(target_size=608, interp='RANDOM'),
    transforms.RandomHorizontalFlip(),
    transforms.Normalize(),
])

eval_transforms = transforms.Compose([
    transforms.Resize(target_size=608, interp='CUBIC'),
    transforms.Normalize(),
])

2. 定義數據迭代器

訓練集總共有6000張圖片，我們選取5800訓練，剩餘200張進行測試。

base = './VOC2012/'

train_dataset = pdx.datasets.VOCDetection(
    data_dir=base,
    file_list=base+'train_list.txt',
    label_list=base+'labels.txt',
    transforms=train_transforms,
    shuffle=True)
eval_dataset = pdx.datasets.VOCDetection(
    data_dir=base,
    file_list=base+'val_list.txt',
    label_list=base+'labels.txt',
    transforms=eval_transforms)

2020-05-11 07:57:15 [INFO]    Starting to read file list from dataset...2020-05-11 07:57:16 [INFO]    5800 samples in file ./VOC2012/train_list.txt
creating index...index created!2020-05-11 07:57:17 [INFO]    Starting to read file list from dataset...2020-05-11 07:57:17 [INFO]    200 samples in file ./VOC2012/val_list.txt
creating index...index created!

參數說明：

data_dir (str): 數據集所在的目錄路徑。
file_list (str): 描述數據集圖片文件和對應標註文件的文件路徑（文本內每行路徑爲相對data_dir的相對路徑）。
label_list (str): 描述數據集包含的類別信息文件路徑。
transforms (paddlex.det.transforms): 數據集中每個樣本的預處理/增強算子，詳見paddlex.det.transforms。
num_workers (int|str)：數據集中樣本在預處理過程中的線程或進程數。默認爲’auto’。當設爲’auto’時，根據系統的實際CPU核數設置num_workers: 如果CPU核數的一半大於8，則num_workers爲8，否則爲CPU核數的一半。
buffer_size (int): 數據集中樣本在預處理過程中隊列的緩存長度，以樣本數爲單位。默認爲100。
parallel_method (str): 數據集中樣本在預處理過程中並行處理的方式，支持’thread’線程和’process’進程兩種方式。默認爲’thread’（Windows和Mac下會強制使用thread，該參數無效）。
shuffle (bool): 是否需要對數據集中樣本打亂順序。默認爲False。

四、定義YOLOv3模型並開始訓練

1. YOLOv3簡介：

論文地址：

https://arxiv.org/abs/1804.02767

‘Sometimes you just kinda phone it in for a year, you know?’

作者說他一年大部分時間去刷 Twitter 了，然後玩了（play around）一陣子 GAN，正好剩下一點時間，就改進了一下 YOLO 算法，提出了 YOLO v3。YOLOv3添加了ResNet中提出的殘差結果和FPN中提出的通過上採樣得到的特徵金字塔結果。它最顯着特徵是它可以三種不同的比例進行檢測，最終輸出是通過在特徵圖上應用1 x 1內核生成的。在YOLO v3中，通過在網絡中三個不同位置的三個不同大小的特徵圖上使用1 x 1大小的卷積來完成檢測。

num_classes = len(train_dataset.labels)
print('class num:', num_classes)
model = pdx.det.YOLOv3(num_classes=num_classes, backbone='DarkNet53')
model.train(
    num_epochs=4,
    train_dataset=train_dataset,
    train_batch_size=4,
    eval_dataset=eval_dataset,
    learning_rate=0.000125,
    lr_decay_epochs=[400, 800],
    save_interval_epochs=2,
    log_interval_steps=200,
    save_dir='./yolov3_darknet53',
    use_vdl=True)
class num: 4

2020-05-11 08:15:15 [INFO]    Load pretrain weights from ./yolov3_darknet53/pretrain/DarkNet53.2020-05-11 08:15:16 [INFO]    There are 260 varaibles in ./yolov3_darknet53/pretrain/DarkNet53 are loaded.

參數說明：

num_classes (int): 類別數。默認爲80。
backbone (str): YOLOv3的backbone網絡，取值範圍爲[‘DarkNet53’, ‘ResNet34’, ‘MobileNetV1’, ‘MobileNetV3_large’]。默認爲’MobileNetV1’。
anchors (list|tuple): anchor框的寬度和高度，爲None時表示使用默認值 [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]]。
anchor_masks (list|tuple): 在計算YOLOv3損失時，使用anchor的mask索引，爲None時表示使用默認值 [[6, 7, 8], [3, 4, 5], [0, 1, 2]]。
ignore_threshold (float): 在計算YOLOv3損失時，IoU大於ignore_threshold的預測框的置信度被忽略。默認爲0.7。
nms_score_threshold (float): 檢測框的置信度得分閾值，置信度得分低於閾值的框應該被忽略。默認爲0.01。
nms_topk (int): 進行NMS時，根據置信度保留的最大檢測框數。默認爲1000。
nms_keep_topk (int): 進行NMS後，每個圖像要保留的總檢測框數。默認爲100。
nms_iou_threshold (float): 進行NMS時，用於剔除檢測框IOU的閾值。默認爲0.45。
label_smooth (bool): 是否使用label smooth。默認值爲False。
train_random_shapes (list|tuple): 訓練時從列表中隨機選擇圖像大小。默認值爲[320, 352, 384, 416, 448, 480, 512, 544, 576, 608]。

五、評估模型

使用 evaluate 方法進行模型評估，最終mAP爲0.73左右。

六、加載模型用於測試

image_name = './test6.jpg'
result = model.predict(image_name)
pdx.det.visualize(image_name, result, threshold=0.5, save_dir='./output/')

檢測結果：

七、定義車道線檢測模型

這裏使用了開源的項目：

https://github.com/Sharpiless/advanced_lane_detection

該車道檢測算法流程爲：

（1）給定一組棋盤圖像（在camera_cal文件夾內），計算相機校準矩陣和失真係數。

（2）根據校準矩陣和失真係數對原始圖像應用失真校正。

（3）使用顏色變換，漸變等創建閾值二進制圖像。

（4）應用透視變換以校正二進制圖像（“鳥瞰”）。

（5）檢測圖像中車道像素並擬合，以找到車道邊界。

（6）將檢測到的車道邊界矯正到原始圖像。

具體實現如下：

import numpy as np
import cv2, pickle, glob, os
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import tools

from moviepy.editor import VideoFileClip
from IPython.display import HTML
# code adopted from: https://github.com/t-lanigan/vehicle-detection-and-tracking/blob/master/road_sensor.py
class GlobalObjects:

    def __init__(self):
        self.__set_folders()
        self.__set_hyper_parameters()
        self.__set_perspective()
        self.__set_kernels()
        self.__set_mask_regions()

    def __set_folders(self):
        # Use one slash for paths.
        self.camera_cal_folder = 'camera_cal/'
        self.test_images = glob.glob('test_images/*.jpg')
        self.output_image_path = 'output_images/test_'
        self.output_movie_path = 'output_movies/done_'


    def __set_hyper_parameters(self):
        self.img_size   = (1280, 720) # (x,y) values for img size (cv2 uses this)
        self.img_shape  = (self.img_size[1], self.img_size[0]) # (y,x) As numpy spits out
        return

    def __set_kernels(self):
        """Kernels used for image processing"""
        self.clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))


    def __set_perspective(self):
        """The src points draw a persepective trapezoid, the dst points draw
        them as a square.  M transforms x,y from trapezoid to square for
        a birds-eye view.  M_inv does the inverse.
        """

        src = np.float32([[(.42 * self.img_shape[1],.65 * self.img_shape[0] ),
                           (.58 * self.img_shape[1], .65 * self.img_shape[0]),
                           (0 * self.img_shape[1],self.img_shape[0]),
                           (1 * self.img_shape[1], self.img_shape[0])]])

        dst = np.float32([[0,0],
                          [self.img_shape[1],0],
                          [0,self.img_shape[0]],
                          [self.img_shape[1],self.img_shape[0]]])

        self.M = cv2.getPerspectiveTransform(src, dst)
        self.M_inv = cv2.getPerspectiveTransform(dst, src)

    def __set_mask_regions(self):
        """These are verticies used for clipping the image.
        """
        self.bottom_clip = np.int32(np.int32([[[60,0], [1179,0], [1179,650], [60,650]]]))
        self.roi_clip =  np.int32(np.int32([[[640, 425], [1179,550], [979,719],
                              [299,719], [100, 550], [640, 425]]]))
class LaneFinder(object):
    """
    The mighty LaneFinder takes in a video from the front camera of a self driving car
    and produces a new video with the traffic lanes highlighted and statistics about where
    the car is relative to the center of the lane shown.
    """    

    def __init__(self):

        self.g             = GlobalObjects()        
        self.thresholder   = tools.ImageThresholder()
        self.distCorrector = tools.DistortionCorrector(self.g.camera_cal_folder)
        self.histFitter    = tools.HistogramLineFitter()
        self.laneDrawer    = tools.LaneDrawer()
        self.leftLane      = tools.Line()
        self.rightLane     = tools.Line()

        return

    def __image_pipeline(self, img):
        """The pipeline for processing images. Globals g are added to functions that need
        access to global variables.
        """
        resized     = self.__resize_image(img)
        undistorted = self.__correct_distortion(resized)
        warped      = self.__warp_image_to_biv(undistorted)
        thresholded = self.__threshold_image(warped)
        lines       = self.__get_lane_lines(thresholded)
        result      = self.__draw_lane_lines(undistorted, thresholded, include_stats=False)

        return result


    def __draw_lane_lines(self, undistorted, thresholded, include_stats):

        lines = {'left_line': self.leftLane,
                 'right_line': self.rightLane }

        return self.laneDrawer.draw_lanes(undistorted,
                                          thresholded,
                                          lines,
                                          self.g.M_inv,
                                          include_stats)

    def __get_lane_lines(self, img):

        self.leftLane    = self.histFitter.get_line(img, self.leftLane, 'left')
        self.rightLane   = self.histFitter.get_line(img, self.rightLane, 'right')

        return True

    def __mask_region(self, img, vertices):
        """
        Masks a region specified by clockwise vertices.
        """

        mask = np.zeros_like(img)   
        if len(img.shape) > 2:
            channel_count = img.shape[2]  # i.e. 3 or 4 depending on your image
            ignore_mask_color = (255,) * channel_count
        else:
            ignore_mask_color = 255
        cv2.fillConvexPoly(mask, vertices, ignore_mask_color)
        masked_image = cv2.bitwise_and(img, mask)
        return masked_image 

    def __resize_image(self, img):
        """
        Image is resized to the selected size for the project.
        """
        return cv2.resize(img, self.g.img_size, 
                          interpolation = cv2.INTER_CUBIC)

    def __correct_distortion(self, img):
        return self.distCorrector.undistort(img)

    def __threshold_image(self, img):
        return self.thresholder.get_thresholded_image(img)

    def __warp_image_to_biv(self, img):
        return cv2.warpPerspective(img, self.g.M, self.g.img_size)


    def test_one_image(self, pt):
        image = (mpimg.imread(pt))
        return self.__image_pipeline(image)

八、最終效果

%matplotlib inline
obj = LaneFinder()
result = obj.test_one_image('./output/visualize_test6.jpg')print(type(result), result.shape)

plt.figure(figsize=(15,12))
plt.imshow(result)
plt.savefig('result.png')
plt.show()

小結

本項目使用PaddleX提供的高層接口，快速、高效地完成了無人駕駛任務中車輛檢測部分的模型訓練和部署。最大的感受就是Paddle爲開發者提供了很好的開發環境。通過Python API方式完成全流程使用或集成，該模型提供全面、靈活、開放的深度學習功能，有更高的定製化空間以及更低門檻的方式快速完成產業模型部署,並提供了應用層的軟件和可視化服務。
數據集選擇和模型選擇。訓練集最終選擇了UA-DETRAC 數據集，並且我也將該訓練集轉換到了VOC格式並在AI Studio上公開。模型最終選擇了PaddleX提供的YOLOv3，該算法不僅在COCO、VOC等公開數據集上表現出色，並且實踐證明在別的任務中，YOLOv3也具有比其他算法更好的泛化能力。
開發過程：開發最初效果並不理想，在UA-DETRAC數據集上的mAP僅有0.64左右。這裏嘗試了調整學習率、批次大小等超參數，並使用了不同的數據增強方法，但是提升效果微乎其微。最終查閱原論文發現，YOLOv3使用了K-means的方法獲取預選框大小。修改並訓練後，檢測精度得到了很好的提升（mAP爲0.79左右）。

人工設置anchor大小的弊端：

修改前anchor使用默認值。這些anchor雖然能夠提供不同尺寸和長寬比的ROI，但是針對特定任務，有一些大小的anchor並不能很好地表徵目標，甚至會額外增加不必要的計算量。比如針對小目標檢測，較大的anchor幾乎不會被選取爲正樣本。而且如果anchor的尺寸和目標的尺寸差異較大，則會影響模型的檢測效果。

YOLO的作者Joseph Redmon等建議使用K-means聚類來代替人工設計，通過對訓練集的真值框進行聚類，自動生成一組更加適合數據集的anchor大小，可以使網絡的檢測效果更好。

K-means算法獲取anchor大小：

Joseph Redmon希望anchor能夠滿足與目標框儘可能相似並且距離儘可能相近，所以他提出了選取anchor大小的度量d：

其中IOU表示真值框和預選框的交併比。

因此，最終算法步驟爲：

隨機選取K個box作爲初始anchor；
使用IOU度量，將每個box分配給與其距離最近的anchor；
計算每個簇中所有box寬和高的均值，更新anchor；
重複2、3步，直到anchor不再變化，或者達到了最大迭代次數。

在UA-DETRAC數據集上得到的anchor大小爲：

(13,11),(17,15),(23,17),(29,23),

(41,29),(68,33),(51,46),(93,57),(135,95)

相關代碼參考：

https://github.com/ybcc2015/DeepLearning-Utils/tree/master/Anchor-Kmeans

def iou(boxes, anchors):
    # 計算IOU
    w_min = np.minimum(boxes[:, 0, np.newaxis], anchors[np.newaxis, :, 0])
    h_min = np.minimum(boxes[:, 1, np.newaxis], anchors[np.newaxis, :, 1])
    inter = w_min * h_min

    box_area = boxes[:, 0] * boxes[:, 1]
    anchor_area = anchors[:, 0] * anchors[:, 1]
    union = box_area[:, np.newaxis] + anchor_area[np.newaxis]

return inter / (union - inter)

def fit(self, boxes):
        if self.n_iter > 0:
            self.n_iter = 0

        np.random.seed(self.random_seed)
        n = boxes.shape[0]

        # 初始化隨機anchor大小
        self.anchors_ = boxes[np.random.choice(n, self.k, replace=True)]
        self.labels_ = np.zeros((n,))

        while True:
            self.n_iter += 1
            if self.n_iter > self.max_iter:
                break

            self.ious_ = self.iou(boxes, self.anchors_)
            distances = 1 - self.ious_
            cur_labels = np.argmin(distances, axis=1)

            # 如果anchor大小不再變化，則表示已收斂，終止迭代
            if (cur_labels == self.labels_).all():
                break

            # 更新anchor大小
            for i in range(self.k):
                self.anchors_[i] = np.mean(boxes[cur_labels == i], axis=0)

            self.labels_ = cur_labels

此案例應用的目標檢測場景，還可以通過飛槳目標檢測套件PaddleDetection來實現，這裏提供了更專業的端到端開發套件和工具，歡迎感興趣的小夥伴動手實踐一把。

PaddleDetection GitHub項目地址：

https://github.com/PaddlePaddle/PaddleDetection

更多資源

如在使用過程中有問題，可加入飛槳官方QQ羣進行交流：703252161。

飛槳PaddleX技術交流QQ羣：1045148026

如果您想詳細瞭解更多飛槳的相關內容，請參閱以下文檔。

官網地址：

https://www.paddlepaddle.org.cn

更多PaddleX的應用方法，歡迎訪問項目地址：

GitHub:

https://github.com/PaddlePaddle/PaddleX

Gitee:

https://gitee.com/paddlepaddle/PaddleX

飛槳開源框架項目地址：

GitHub:

https://github.com/PaddlePaddle/Paddle

Gitee:

https://gitee.com/paddlepaddle/Paddle

END

PaddleX助力無人駕駛：基於YOLOv3的車輛檢測和車道線分割實戰

一、準備PaddleX環境

二、準備數據

三、數據預處理

四、定義YOLOv3模型並開始訓練

五、評估模型

六、加載模型用於測試

七、定義車道線檢測模型

八、最終效果

「Pygors跨平臺GUI」1：Pygors跨平臺GUI應用研究

[轉帖]

python列出centos7內存使用前50的進程信息

「Pygors跨平臺GUI」2：安裝MinGW-w64、MSYS2還是WSL2

Garnet：微軟官方基於.NET開源的高性能分佈式緩存存儲數據庫

Flink執行圖

Java響應式編程

評估統計算法在銀行僞造鈔票檢測中的價值

Dokcer部署Kafka集羣

【Linux命令學習】lsof查看打開的文件

盤點2020 | 百度AI的2020

智能標註原理揭祕，一文讀懂人工智能如何解決標註難題

百度人臉多模態活體算法再次通過國家級權威檢測率先獲得“增強級”認證

今晚直播 |深度強化學習的關鍵點在哪？世界冠軍的乾貨分享千萬不能錯過

飛槳PaddleSeg新升級！帶來187K超輕量級人像分割模型，視頻級光流後處理方案

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結