亞馬遜提出：用於人羣計數的尺度感知注意力網絡

原創

osc_emgrwx5d

2021-01-30 09:57

前戲

最近出了真的很多論文，各種SOTA。比如前天po的商湯等提出：統一多目標跟蹤框架，今天po的人羣計數（Crowd Counting），又稱人羣密度估計。下次應該會po一篇目標檢測方向的SOTA論文。

注意最新的論文，Amusi就不詳細解讀了（可能自己也不會）。更主要的是論文這玩意，還是要自己去品纔有滋味。或許過兩天，論文的作者團隊會解讀一番，對照着作者的解答來理解，這才原滋原味。

正文

《Scale-Aware Attention Network for Crowd Counting》

arXiv：https://arxiv.org/abs/1901.06026

作者團隊：Amazon

注：2019年01月21日剛出爐的paper

Abstract：In crowd counting datasets, people appear at different scales, depending on their distance to the camera. To address this issue, we propose a novel multi-branch scale-aware attention network that exploits the hierarchical structure of convolutional neural networks and generates, in a single forward pass, multi-scale density predictions from different layers of the architecture. To aggregate these maps into our final prediction, we present a new soft attention mechanism that learns a set of gating masks. Furthermore, we introduce a scale-aware loss function to regularize the training of different branches and guide them to specialize on a particular scale. As this new training requires ground-truth annotations for the size of each head, we also propose a simple, yet effective technique to estimate it automatically. Finally, we present an ablation study on each of these components and compare our approach against the literature on 4 crowd counting datasets: UCF-QNRF, ShanghaiTech A & B and UCF_CC_50. Without bells and whistles, our approach achieves state-of-the-art on all these datasets. We observe a remarkable improvement on the UCF-QNRF (25%) and a significant one on the others (around 10%).

摘要：在人羣計數的數據集中，人們以不同的尺度（scales）出現，具體取決於他們與攝像頭的距離。爲了解決這個問題，我們提出了一種新的多分支尺度感知注意網絡，它利用卷積神經網絡的層次結構，並在單個前向傳播中生成來自架構不同層的多尺度密度預測。爲了將這些 maps 聚合到我們的最終預測中，我們提出了一種新的 soft 注意力機制，其可以學習一組 gating masks。此外，我們引入了規模感知損失函數來規範不同分支的訓練並指導它們專門研究特定的尺度。由於這種新訓練需要對每個頭部的大小進行 ground-truth 標註，我們還提出了一種簡單而有效的技術來自動估計它。最後，我們對每個部分進行ablation study ，並將我們的方法與4個人羣計數數據集的文獻進行比較：UCF-QNRF，ShanghaiTech A＆B和UCF_CC_50。實驗結果表明，我們的方法在這些數據集上取得最先進技術的水平（state-of-the-art，SOTA）。我們觀察到UCF-QNRF顯著提高（25％），其他顯著提高（約10％）。

Our multi-branch architecture

創新點

Baseline network for crowd counting

Scale-aware soft attention masks

Scale-aware loss regularization

Estimating the size of each head

實驗結果

想要了解最新最快最好的論文速遞、開源項目和乾貨資料，歡迎加入CVer學術交流羣，旨在提供一個便於所有CVers進行學術交流的平臺。涉及圖像分類、目標檢測、圖像分割、人臉檢測&識別、目標跟蹤、GANs、學術競賽交流、Re-ID、風格遷移、醫學影像分析、姿態估計、OCR、SLAM、場景文字檢測&識別和超分辨率等方向。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

亞馬遜提出：用於人羣計數的尺度感知注意力網絡

認知提升的方法

螞蟻面試：Springcloud核心組件的底層原理，你知道多少？

C#開源的兩款功能強大的錄屏神器

這個坑我都說了很多遍了，怎麼還有人踩？Kotlin可能帶來的一個深坑（最後一次提醒）

CSS3只讓背景圖片旋轉180度

8 個你不知道的 DOM 功能[每日前端夜話0x79]

2021年1月程序員工資統計，網友：我又拖後腿了～

亞馬遜提出：用於人羣計數的尺度感知注意力網絡

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結