long term recurrent convolutional networks for visual recognition and description

原創

2018-08-27 11:53

這篇屬於很早就探索cnn+rnn解決high-level computer vision task的文章

Abstract

基於深度卷積網絡的模型已經在最近的圖像解釋任務中成爲主流，在這裏我們研究了是否recurrent model能夠有效的處理涉及到sequences以及視覺的各種任務。我們開發了一個新的recurrent convolutional 結構來實現大規模的學習任務，而且這個結構是end-to-end trainable, 針對video recogntion tasks,image description 以及image retrival 問題。現有的其他model一般對序列處理，採用假設固定的時空感受野或者簡單的時態均值，本文中的recurrent convolutional models 是double deep，並且是對於空間和時間組合的。這個model的優勢在於當目標的概念比較複雜或者訓練數據有限時候，學習出這種long term 的dependency是可能的。 long term RNN models能夠將可變長度的輸入（例如 video frames）映射成爲可變長度的輸出(e.g. natural languages). 我們的model與現代的visual convnet models 直接相連，可以同時學習temporal dynamics 以及convolutional perceptual representations. 我們的結果表明，這些models能夠實現state of the art results.

Introduction

本文提出了一個model叫LRCN，結構如上圖所示。然後分別拿video activity recognition,image caption generation, video description tasks.我們在這裏表明LRCN通常可以應用到visual time-series建模。我們認爲在視覺任務中，Long term rnn 能夠提供明顯的提升，特別是存在大量的訓練數據來學習或者refine這個表達的時候。

Long term Recurrent Convolutional Network model

在這篇文章中，作者提出了LRCN model來組合deep hierarchical visual feature extractor和一個能夠學會識別自己synthesize temporal dynamic的model。我們的LRCN模型通過將每一個輸入vt經過一個特徵轉化成爲一個固定長度的特徵表達。當我們獲得了visual input sequence:,然後就可以調用sequence model了。

對於sequence model,最一般的形式是用W將輸入xt，ht-1映射到輸出zt以及ht. 因此，這個inference必須是序列化運行的。我們將這種sequencial learning分成三個大類：

1. Sequenctial inputs, fixed outputs. 也就是說很多幀輸進去，但是隻輸出固定個數的輸出，比如action recognition。

2. Fixed inputs, sequential outputs. 也就是輸入的個數固定的，輸出是可變的。比如image captioning.

3. Sequential inputs, sequential outputs. 輸入個數不固定，輸出也不固定。比如video description。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

long term recurrent convolutional networks for visual recognition and description

這篇屬於很早就探索cnn+rnn解決high-level computer vision task的文章

Abstract

Introduction

Long term Recurrent Convolutional Network model

再談23種設計模式（3）：行爲型模式（學習筆記）

Power Automate Desktop 安裝完，登錄後老是提示one driver 錯誤

微前端學習筆記(4):從微前端到微模塊之EMP與hel-micro方案探索

微前端學習筆記（1）：微前端總體架構概述，從微服務發微

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

線性轉化與座標軸收縮(linear transform and ollapsed on some axis)

淺談矩陣乘法與矩陣二次型

Representive learning: A review and New respective

Correlation Filter in Visual Tracking系列一

Caffe + Ubuntu 14.04 64bit + CUDA 6.5 配置說明

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結