CVPR2019 圖像分割論文閱讀（一）

原創

2019-07-30 03:42

因爲主要是學習圖像語義分割，所以針對cvpr中的語義分割論文。

DFANET：https://arxiv.org/abs/1904.02216

是曠世公司的作品

摘要：This paper introduces an extremely efficient CNN architecture named DFANet for semantic segmentation under resource constraints. Our proposed network starts from a single lightweight backbone and aggregates discriminative features through sub-network and sub-stage cascade respectively. Based on the multi-scale feature propagation, DFANet substantially reduces the number of parameters, but still obtains sufficient receptive field and enhances the model learning ability, which strikes a balance between the speed and segmentation performance. Experiments on Cityscapes and CamVid datasets demonstrate the superior performance of DFANet with 8× less FLOPs and 2× faster than the existing state-of-the-art real-time semantic segmentation methods while providing comparable accuracy. Specifically, it achieves 70.3% Mean IOU on the Cityscapes test dataset with only 1.7 GFLOPs and a speed of 160 FPS on one NVIDIA Titan X card, and 71.3% Mean IOU with 3.4 GFLOPs while inferring on a higher resolution image。

本論文介紹了一種高效的CNN結構：DFANet用於在資源限制下的語義分割。

我們提出的網絡從單個輕量級骨幹網開始，分別通過子網和子級級聯聚合判別特徵。

基於多尺度特徵傳播，DFANet大大減少了參數的數量，但仍然獲得了足夠的感受野，提高了模型學習能力，在速度和分割性能之間取得了平衡。

Cityscapes和CamVid數據集的實驗證明了DFANet的優越性能，其FLOP比現有的最先進的實時語義分割方法少8倍，同時提供相當的精度。（其中FLOP全稱floating point operations per second是描述模型的計算力，並不是德州撲克的翻前意思）一個 MFLOPS (megaFLOPS) 等於每秒1百萬 (=10^6) 次的浮點運算，
一個 GFLOPS (gigaFLOPS) 等於每秒10億 (=10^9) 次的浮點運算，
一個 TFLOPS (teraFLOPS) 等於每秒1萬億 (=10^12) 次的浮點運算，
一個 PFLOPS (petaFLOPS) 等於每秒1千萬億 (=10^15) 次的浮點運算。

總的來說分割率是70% 左右 mask rcnn，91%map

至於mean iou和map是否能放在一起比較我還沒弄清楚

總結：FANet是一套用在資源有限的模型，其特點是針對了輕量級的層。

Introduction：

介紹了現在雖然語義分割發展不錯，但是實時分割還欠缺火候，而且在高性能的語義分割還不錯，資源限制的語義分割更是沒有研究。

對於減少運算資源，許多方式是通過在input上做些文章，比如減小輸入圖像的尺寸，修剪網絡中的冗餘通道（

A deep convolutional encoder-decoder architecture
for image segmentation

Adam Paszke, Abhishek Chaurasia, Sangpil Kim, and Eugenio Culurciello. Enet: A deep neural network architecture for real-time semantic segmentation. arXiv preprint
arXiv:1606.02147, 2016.

）

這兩種方法非常愚蠢，自認爲通過降低像素能夠讓模型運行更快，其實損失的是空間信息，這會導致小目標喪失分割能力。

然後又有兩篇論文（不列舉了），通過構建了多支結構融合空間信息。這方法不錯，但是對於運算資源是一種考驗。

引出了本文的論文，又不需要大量的計算資源，又是實時的。

作者提到了空間金字塔，總是會拿卷積層會丟失信息來說事。

然後提出他所給出的兩種策略，第一，重新使用high-level特徵連接每一層。

其次，我們將網絡體系結構處理路徑中不同階段的特徵結合起來，以增強特徵表示能力。

（那麼這兩個乍一看，和FCN有什麼區別？看一下圖最直觀）

具體說明呢，就是用了空間金字塔和多尺度池化兩個方案的融合，具體說來有三個方面：

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

CVPR2019 圖像分割論文閱讀（一）

總結：FANet是一套用在資源有限的模型，其特點是針對了輕量級的層。

Spring Cloud 部署時如何使用 Kubernetes 作爲註冊中心和配置中心

（三）單片機程序語言----頭文件

keil與仿真電路的學習（一）

爬蟲入門之beautifulsoup（一）

LeetCode No 2 兩數相加（鏈表操作）

GitHub for Mac 記錄（二）建立自定義倉庫

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結