一種基於目標檢測實現黑花屏分類任務的方案

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"視頻幀的黑、花屏的檢測是視頻質量檢測中比較重要的一部分,傳統做法是由測試人員通過肉眼來判斷視頻中是否有黑、花屏的現象,這種方式不僅耗費人力且效率較低。爲了進一步節省人力、提高效率,一種自動的檢測方法是大家所期待的。目前,通過分類網絡模型對視頻幀進行分類來自動檢測是否有黑、花屏是比較可行且高效的。然而,在項目過程中,視頻幀數據的收集比較困難,數據量較少,部分花屏和正常屏之間差異不夠明顯,導致常用的分類算法難以滿足項目對分類準確度的要求。因此本文嘗試了一種利用目標檢測算法實現分類的方式,幫助改善單純的分類的算法效果不夠理想的問題。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"核心技術與架構圖"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一般分類任務的流程如下圖,首先需要收集數據,構成數據集;併爲每一類數據定義一個類型標籤,例如:0、1、2;再選擇一個合適的分類網絡進行分類模型的訓練,圖像分類的網絡有很多,常見的有VggNet, ResNet,DenseNet等;最後用訓練好的模型對新的數據進行預測,輸出新數據的類別。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/54\/548a38c2833d0332eb8a1a0be8b9b9a7.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目標檢測任務的流程不同於分類任務,其在定義類別標籤的時候還需要對目標位置進行標註;目標檢測的方法也有很多,例如Fast R-CNN, SSD,YOLO等;模型訓練的中間過程也比分類模型要複雜,其輸出一般爲目標的位置、目標置信度以及分類結果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/40\/40453baaef2bc8d36f702089a1c8c407.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於分類算法依賴於一定量的數據,在項目實踐中,數據量較少或圖像類間差異較小時,傳統分類算法效果不一定能滿足項目需求。這時,不妨考慮用目標檢測的方式來做‘分類’。接下來以Yolov5爲例來介紹如何將目標檢測框架用於實現單純的分類任務。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"技術實現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了分類之外,目標檢測還可以從自然圖像中的大量預定義類別中識別出目標實例的位置。大家可能會考慮目標檢測模型用於分類是不是過於繁瑣或者用目標檢測框架來做單純的分類對代碼的修改比較複雜。這裏,我們將用一種非常簡單的方式直接在數據標註和輸出內容上稍作修改就能實現單純的分類了。接下來將介紹一下具體實現方法:"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"數據的標註"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實現目標檢測時,需要對數據中的目標進行標註,這一過程是十分繁瑣的。但在用於純粹的分類上可以將這一繁瑣過程簡單化,無需手動標註,直接將整張圖作爲我們的目標,目標中心也就是圖像的中心點。只需讀取整張圖像,獲得其長、寬以及中心點的座標就可以完成標註了。並定義好類別標籤,正常屏爲0,花屏爲:1,黑屏爲2。具體實現如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"OBJECT_DICT = {\"Normalscreen\": 0, \"Colorfulscreen\": 1, \"Blackscreen\": 2}\ndef parse_json_file(image_path):\n imageName = os.path.basename(image_path).split('.')[0]\n img = cv2.imread(image_path)\n size = img.shape\n label = image_path.split('\/')[4].split('\\\\')[0]\n label = OBJECT_DICT.get(label)\n imageWidth = size[0]\n imageHeight = size[1]\n label_dict = {}\n xmin, ymin = (0, 0)\n xmax, ymax = (imageWidth, imageHeight)\n xcenter = (xmin + xmax) \/ 2\n xcenter = xcenter \/ float(imageWidth)\n ycenter = (ymin + ymax) \/ 2\n ycenter = ycenter \/ float(imageHeight)\n width = ((xmax - xmin) \/ float(imageWidth))\n heigt = ((ymax - ymin) \/ float(imageHeight))\n label_dict.update({label: [str(xcenter), str(ycenter), str(width), str(heigt)]})\n label_dict = sorted(label_dict.items(), key=lambda x: x[0])\n return imageName, label_dict"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"訓練過程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該過程與目標檢測的訓練過程一致,不需要進行大的修改,只需要根據數據集的特性對參數進行調整。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"# 加載數據,獲取訓練集、測試集圖片路徑\nwith open(opt.data) as f:\n data_dict = yaml.load(f, Loader=yaml.FullLoader) \n with torch_distributed_zero_first(rank):\n check_dataset(data_dict) \ntrain_path = data_dict['train']\ntest_path = data_dict['val']\nNumber_class, names = (1, ['item']) if opt.single_cls else (int(data_dict['nc']), data_dict['names']) \n\n# 創建模型\nmodel = Model(opt.cfg, ch=3, nc=Number_class).to(device)\n\n# 學習率的設置\nlf = lambda x: ((1 + math.cos(x * math.pi \/ epochs)) \/ 2) * (1 - hyp['lrf']) + hyp['lrf'] \nscheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf)\n\n# 訓練\nfor epoch in range(start_epoch, epochs): \nmodel.train()"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"損失的計算"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"損失由三部分組成,邊框損失,目標損失,分類損失,具體如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"def compute_loss(p, targets, model):\n device = targets.device\n loss_cls, loss_box, loss_obj = torch.zeros(1, device=device), torch.zeros(1, device=device), torch.zeros(1, device=device)\n tcls, tbox, indices, anchors = build_targets(p, targets, model) \nh = model.hyp\n # 定義損失函數\n BCEcls = nn.BCEWithLogitsLoss(pos_weight=torch.Tensor([h['cls_pw']])).to(device)\n BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.Tensor([h['obj_pw']])).to(device)\n cp, cn = smooth_BCE(eps=0.0)\n # 損失\n nt = 0 \n np = len(p) \n balance = [4.0, 1.0, 0.4] if np == 3 else [4.0, 1.0, 0.4, 0.1] \nfor i, pi in enumerate(p): \n image, anchor, gridy, gridx = indices[i] \n tobj = torch.zeros_like(pi[..., 0], device=device) \n n = image.shape[0] \n if n:\n nt += n # 計算目標\n ps = pi[anchor, image, gridy, gridx]\n pxy = ps[:, :2].sigmoid() * 2. - 0.5\n pwh = (ps[:, 2:4].sigmoid() * 2) ** 2 * anchors[i]\n predicted_box = torch.cat((pxy, pwh), 1).to(device) giou = bbox_iou(predicted_box.T, tbox[i], x1y1x2y2=False, CIoU=True) \n loss_box += (1.0 - giou).mean() \n tobj[image, anchor, gridy, gridx] = (1.0 - model.gr) + model.gr * giou.detach().clamp(0).type(tobj.dtype) \n if model.nc > 1:\n t = torch.full_like(ps[:, 5:], cn, device=device)\n t[range(n), tcls[i]] = cp\n loss_cls += BCEcls(ps[:, 5:], t) \n loss_obj += BCEobj(pi[..., 4], tobj) * balance[i] \n s = 3 \/ np\n loss_box *= h['giou'] * s\n loss_obj *= h['obj'] * s * (1.4 if np == 4 else 1.)\n loss_cls *= h['cls'] * s\n bs = tobj.shape[0]\n loss = loss_box + loss_obj + loss_cls\n return loss * bs, torch.cat((loss_box, loss_obj, loss_cls, loss)).detach()"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"對輸出內容的處理"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"進行預測時,會得到所有檢測到的目標的位置(x,y,w,h),objectness置信度和分類結果。由於最終目的是對整張圖進行分類,可以忽略位置信息,重點考慮置信度和分類結果:將檢測到的目標類別作爲分類結果,如果同時檢測出多個目標,可以將置信度最大的目標的類別作爲分類結果。代碼如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"def detect(opt,img):\n out, source, weights, view_img, save_txt, imgsz = \\\n opt.output, img, opt.weights, opt.view_img, opt.save_txt, opt.img_size\n device = select_device(opt.device)\n half = device.type != 'cpu'\n model = experimental.attempt_load(weights, map_location=device)\n imgsz = check_img_size(imgsz, s=model.stride.max())\n if half:\n model.half()\n img = letterbox(img)[0]\n img = img[:, :, ::-1].transpose(2, 0, 1)\n img = np.ascontiguousarray(img)\n img_warm = torch.zeros((1, 3, imgsz, imgsz), device=device)\n _ = model(img_warm.half() if half else img_warm) if device.type != 'cpu' else None \n img = torch.from_numpy(img).to(device)\n img = img.half() if half else img.float()\n img \/= 255.0\n if img.ndimension() == 3:\n img = img.unsqueeze(0)\n pred = model(img, augment=opt.augment)[0]\n # 應用非極大值抑制\n pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres, classes=opt.classes, agnostic=opt.agnostic_nms)\n # 處理檢測的結果\n for i, det in enumerate(pred): \n if det is not None and len(det):\n det[:, :4] = scale_coords(img.shape[2:], det[:, :4], img.shape).round()\n all_conf = det[:, 4]\n if len(det[:, -1]) > 1:\n ind = torch.max(all_conf, 0)[1]\n c = torch.take(det[:, -1], ind)\ndetect_class = int(c)\n else:\n for c in det[:, -1]:\n detect_class = int(c)\n return detect_class"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"效果展示"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了將視頻幀進行黑、花屏分類,測試人員根據經驗將屏幕分爲正常屏(200張)、花屏(200張)和黑屏(200張)三類,其中正常屏幕標籤爲0,花屏的標籤爲1,黑屏的標籤爲2。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/40\/4065c690a71462e57161dcbd0d1e9994.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了進一步說明該方法的有效性,我們將基於Yolov5的‘分類’效果與ResNet分類效果做了對比。根據測試人員對ResNet分類效果的反饋來看,ResNet模型容易將正常屏與花屏錯誤分類,例如,下圖被測試人員定義爲正常屏:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/8d\/8db7c8f3f65c85feecc93990be21e614.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ResNet的分類結果爲1,即爲花屏,顯然,這不是我們想要的結果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/8d\/8db7c8f3f65c85feecc93990be21e614.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於Yolov5的分類結果爲0,即爲正常屏,這是我們所期待的結果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/7a\/7affda10b525aa092cf1c16338098e0c.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時,通過對一批測試數據的分類效果來看,Yolov5的分類效果比ResNet的分類準確度更高,ResNet的分類準確率爲88%,而基於Yolov5的分類準確率高達97%。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於較小數據集的黑、花屏的分類問題,採用Yolov5來實現分類相較於ResNet的分類效果會更好一些。當我們在做圖像分類任務時,純粹的分類算法不能達到想要的效果時,不妨嘗試一下用目標檢測框架來分類吧!雖然過程稍微複雜一些,但可能會有不錯的效果。目前目標檢測框架有很多,用它們完成分類任務的處理方式大致和本文所描述的類似,可以根據數據集的特徵選擇合適目標檢測架構來實現分類。本文主要介紹瞭如何將現有的目標檢測框架直接用於單純的圖像分類任務,當然,爲了使得結構更簡潔,也可以將目標檢測中的分類網絡提取出來用於分類。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:360技術(ID:qihoo_tech)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/JpQx8UzzcDD3jtQPbe4tvA","title":"xxx","type":null},"content":[{"type":"text","text":"一種基於目標檢測實現黑花屏分類任務的方案"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章