Deep Meta Learning for Real-Time Visual Tracking based on Target-Specific Feature Space||視覺追蹤||論文閱讀

轉載:https://www.cnblogs.com/wangxiaocvpr/p/8193880.html

Deep Meta Learning for Real-Time Visual Tracking based on Target-Specific Feature Space 

2018-01-04  15:58:15 

  

  寫在前面:爲什麼要看這個paper?這篇 paper 貌似是第一個將 meta-learning 應用到 visual tracking 領域的,取得了速度和精度較好的平衡。

  Introduction:

  我們知道,tracking 中比較重要的就是 target object 特徵的學習 以及 物體外觀的變化。很多算法都針對這兩點一直進行改進,而最近 NN 對特徵的表達提供了很好的解決,但是,物體外觀的變化,仍然不能很好的處理,很多都是 用跟蹤的結果弄一個 target object 的集合,然後適時的進行更新。但是,這種策略是不可避免的,分類器通常都會 overfitting,然後丟失了 the generalization capabilities due to the insufficient training of samples. 

  本文基於以上背景和動機,提出了一種 end to end visual tracking network structure,主要包括了兩個部分:

  一個是:Siamese matching network for target search,

  另一個是:meta-learning network for adaptive feature space. 

  這裏我們主要關注的是這個 meta-learning network,我們提出的一個 參數預測網絡(parameter prediction network),當然這裏是借鑑了最新的 meta-learning 用於 few-shot learning problem. 

  

  The proposed meta-learner network is trained to provide the matching network with additional convolutional kernels so that the feature space of the matching network can be modified adaptively to adopt new appearance templates obtained in the course of tracking. The meta-learner network only sees the gradients from the last layer of the matching network, given new training samples for the appearance.

  We also employ a novel training scheme for the meta-learner network to maintain the generalization capability of the feature space by preventing the meta-learner network from generating new parameters that causes overfitting of the matching network. By incorporating our metalearner network, the target-specific feature space can be constructed instantly with a single forward pass without any iterative computation and optimization and free-from the innate overfitting. Fig.1 illustrates the motivation of proposed visual tracking algorithm. 
  

  

  Tracking with Meta-Learner :

  1. Overview of Proposed Method  

  1.1. Compoent 

  本文所涉及到的網絡結構有兩個部分構成:the matching network and meta-learning network. 

  Siamese Matching Network 用來計算兩個 image patch 之間的相應圖(the response map):

  

  這部分特徵提取 CNN是 fully convolutional network,損失函數就是計算:預測的響應圖 和  groundtruth Response map 的差異。

  

  Meta-learning Network:這個網絡提供的是  the matching network with target-specific weights givenan image patch of the target with context patches {z1, ..., zM}.

  爲了調整 weights 超向 target patch,我們利用 損失函數的平均負梯度 δδ 來更新 matching network 的最後一層:

  

  The meta-learning network 的設計是基於一個假設:the characteristic of δδ is empirically different according to a target.  這句話是什麼意思 ?

  

  然後,這裏將 δδ 作爲輸入,the meta-learning network gθ(∗)gθ(∗) 對應輸入的 target-specific weights wweightswweights:

  

  其中,θθ 是 the meta-learning network 的參數。這個新的 weights 被用來更新 matching network's 原始權重:

  

  其中, 連接了 $w^{target}$ to $w_{N}$ of last layer for feature extraction. 本文方法的流程圖,如圖2所示。
  

  

  

  

  

  Experiment:

    

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章