目錄
- [1] Bi-directional Relationship Inferring Network for Referring Image Segmentation
- [2] A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension
- [3] Vision-Dialog Navigation by Exploring Cross-modal Memory
- [4] VQA with No Questions-Answers Training
- [5] Referring Image Segmentation via Cross-Modal Progressive Comprehension
- [6] Local-Global Video-Text Interactions for Temporal Grounding
- [7] Hypergraph Attention Networks for Multimodal Learning
- 總結
[1] Bi-directional Relationship Inferring Network for Referring Image Segmentation
- 盧湖川老師
- 已有方法:語言->視覺,沒有視覺->語言。(->:指導)
[2] A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension
- 北航劉偲、中山李冠斌
- 現有方法:兩階段(生成proposals、選最優proposal)比較慢
- 將相關濾波引入跨模態領域,用language feature當做kernel,在image feature上做相關濾波,得到響應圖(bbox的中心),再回歸w和h。
- 像極了SiamRPN,只不過一個branch改成了另一個模態。
[3] Vision-Dialog Navigation by Exploring Cross-modal Memory
- 跨模態記憶問題?
- 導航:只基於對話歷史->加入視覺模塊
[4] VQA with No Questions-Answers Training
- 不用answer就可以訓練。
- 通過問題圖,生成問題,生成的問題的答案沒有意義。
[5] Referring Image Segmentation via Cross-Modal Progressive Comprehension
- 額,沒太聽懂。
[6] Local-Global Video-Text Interactions for Temporal Grounding
[7] Hypergraph Attention Networks for Multimodal Learning
總結
這次結束的超級快,一小時20分鐘。