語義分割進階之路之回首2015cvpr(四)

Deep Hierarchical Parsing for Semantic Segmentation [full paper] [ext. abstract]
Abhishek Sharma, Oncel Tuzel, David W. Jacobs

 

 

摘要:

This paper proposes a learning-based approach to scene parsing inspired by the deep Recursive Context Propagation Network (RCPN). RCPN is a deep feed-forward neural network that utilizes the contextual information from the entire image, through bottom-up followed by top-down context propagation via random binary parse trees. This improves the feature representation of every super-pixel in the image for better classification into semantic categories. We analyze RCPN and propose two novel contributions to further improve the model. We first analyze the learning of RCPN parameters and discover the presence of bypass error paths in the computation graph of RCPN that can hinder contextual propagation. We propose to tackle this problem by including the classification loss of the internal nodes of the random parse trees in the original RCPN loss function. Secondly, we use an MRF on the parse tree nodes to model the hierarchical dependency present in the output. Both modifications provide performance boosts over the original RCPN and the new system achieves state-of-the-art performance on Stanford Background, SIFT-Flow and Daimler urban datasets.

 

本文提出了一種基於學習的場景解析方法,該方法受深度遞歸上下文傳播網絡(RCPN)的啓發。 RCPN是一種深度前饋神經網絡,它利用來自整個圖像的上下文信息,通過自下而上的跟隨傳播,然後通過隨機二進制分析樹進行自上而下的上下文傳播。這改進了圖像中每個超像素的特徵表示,以便更好地分類爲語義類別。我們分析了RCPN並提出了兩個新的貢獻來進一步改進模型。我們首先分析RCPN參數的學習,並發現可能阻礙上下文傳播的RCPN計算圖中存在旁路錯誤路徑。我們建議通過在原始RCPN損失函數中包括隨機解析樹的內部節點的分類丟失來解決該問題。其次,我們在解析樹節點上使用MRF來模擬輸出中存在的層次依賴性。這兩項修改都提供了優於原始RCPN的性能提升,新系統在斯坦福背景,SIFT-Flow和戴姆勒城市數據集上實現了最先進的性能。

摘要剛看完,又是熟悉的那幾個字,上下文信息。其實本文是基於RCPN算法的改進,而且改進的是兩個地方,一個是RESNET的思路(我特意看了一下resnet,2015年發佈,是在2015cvpr之後發表的,所以能說什麼呢?這篇論文優先提出了這種通過旁支丟棄神經元的方法),其次才用了馬爾科夫隨機場。

目前就這兩篇cvpr圖像分割論文基本上可以領會到,圖像分割在2015年的研究集中在上下文、卷積神經網絡、各種條件隨機場。更不可否認的是,這些方法依然是目前語義分割的靈魂。

 

 

1. Introduction

Semantic segmentation refers to the problem of labeling every pixel in an image with the correct semantic category. Handling the immense variability in the appearance of semantic categories requires the use of context to achieve human-level accuracy, as shown, for example, by [24, 14, 13]. Specifically, [14, 13] found that human performance in labeling a super-pixel is worse than a computer when both have access to that super-pixel only. Effectively using context presents a significant challenge, especially when a real-time solution is required. An elegant deep recursive neural network approach for semantic segmentation was proposed in [19], referred to as RCPN. The main idea was to facilitate the propagation of contextual information from each super-pixel to every other super-pixel through random binary parse trees. First, a semantic mapper mapped visual features of the super-pixels into a semantic space. This was followed by a recursive combination of semantic features of two adjacent image regions, using a combiner, to yield the holistic feature vector of the entire image, termed the root feature. Next, the global information contained in the root feature was disseminated to every super-pixel in the image, using a decombiner, followed by classification of each super-pixel via a categorizer. The parameters were learned by minimizing the classification loss of the super-pixels by backpropagation through structure [5]. RCPN was shown to outperform recent approaches in terms of per-pixel accuracy (PPA) and mean-class accuracy (MCA). Most interestingly, it was almost two orders of magnitude faster than competing algorithms. RCPN’s speed and state-of-the-art performance motivate us to carefully analyze it. In this paper we show that it still has some weaknesses and we show how to remedy them. In particular, the direct path from the semantic mapper to the categorizer gives rise to bypass errors that can cause RCPN to bypass the combiner and decombiner assembly. This can cause back-propogation to reduce RCPN to a simple multilayer neural network for each super-pixel. We propose modifications to RCPN that overcome this problem

1. Pure-node RCPN - We improve the loss function by adding the classification loss of those internal nodes of the random parse trees that correspond to a single semantic category, referred to as pure-nodes. This serves three purposes.

a) It provides more labels for training, which results in better generalization.

b) It encourages stronger gradients deep in the network.

c) Lastly, it tackles the problem of bypass errors, resulting in better use of contextual information.

2. Tree MRF RCPN - Pure-node RCPN also provides us with reliable estimates of the internal node label distributions. We utilize the label distribution of the internal nodes to define a tree-style MRF on the parse tree to model the hierarchical dependency between the nodes. 1 The resulting architectures provide promising improvements over the previous state-of-the-art on three semantic segmentation datasets: Stanford background [6], SIFT flow [11] and Daimler urban [16]. The next section describes some of the related works followed by a brief overview of RCPN in Sec.

3. We describe our proposed methods in Sec.

4 followed by experiments in Sec.

5. Finally, we conclude in Sec. 6.

 

1.引言

語義分割是指用正確的語義類別標記圖像中每個像素的問題。處理語義類別外觀的巨大變化需要使用上下文來實現人類級別的準確性,例如[24,14,13]所示。具體而言,[14,13]發現,當兩者都只能訪問超級像素時,人們在標記超像素時的表現比計算機更差。有效地使用上下文是一項重大挑戰,尤其是在需要實時解決方案時。在[19]中提出了一種優雅的深度遞歸神經網絡語義分割方法,稱爲RCPN。主要思想是通過隨機二進制分析樹促進從每個超像素到每個其他超像素的上下文信息的傳播。首先,語義映射器將超像素的視覺特徵映射到語義空間。接下來是使用組合器對兩個相鄰圖像區域的語義特徵進行遞歸組合,以產生整個圖像的整體特徵向量,稱爲根特徵。接下來,使用解組合器將根特徵中包含的全局信息傳播到圖像中的每個超像素,然後通過分類器對每個超像素進行分類。通過結構反向傳播最小化超像素的分類損失來學習參數[5]。在每像素精度(PPA)和平均精度(MCA)方面,RCPN的表現優於最近的方法。最有趣的是,它比競爭算法快了近兩個數量級。 RCPN的速度和最先進的性能促使我們仔細分析它。在本文中,我們表明它仍然存在一些缺點,並且我們展示瞭如何補救它們。特別是,從語義映射器到分類器的直接路徑引起旁路錯誤,這可能導致RCPN繞過組合器和解組合器組件。這可以導致反向傳播將RCPN減少到每個超像素的簡單多層神經網絡。我們建議修改RCPN來克服這個問題

1.純節點RCPN  - 我們通過添加對應於單個語義類別(稱爲純節點)的隨機解析樹的那些內部節點的分類丟失來改進損失函數。這有三個目的。

a)它爲培訓提供了更多標籤,從而實現更好的概括。

b)它鼓勵網絡中更強的漸變。

c)最後,它解決了繞過錯誤的問題,從而更好地利用了上下文信息。

2.樹MRF RCPN  - 純節點RCPN還爲我們提供了內部節點標籤分佈的可靠估計。我們利用內部節點的標籤分佈在解析樹上定義樹型MRF,以模擬節點之間的層次依賴關係。 1由此產生的結構在三個語義分割數據集上提供了前所未有的改進:斯坦福背景[6],SIFT流[11]和戴姆勒城市[16]。下一節將介紹一些相關的工作,然後簡要概述第二節中的RCPN。我們在Sec中描述了我們提出的方法。

4然後在Sec中進行實驗。最後,我們在第二節結束。 6。

 

首先介紹了自己的底牌,RCPN和超像素。

 

 

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章