2019
CVPR
《STEP: Spatio-Temporal Progressive Learning for Video Action Detection》,[pytorch]
- 開源代碼簡單易用,同時有AVA預訓練模型;
- 問題在於模型inference速度過慢,單GPU速度爲0.4FPS,無法滿足實時要求。
《TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection》
《Spatio-Temporal Video Re-Localization by Warp LSTM》
ICCV
《SlowFast Networks for Video Recognition》(+《Non-local Neural Networks》),[PySlowFast]
- 開源代碼配置環境稍微繁瑣,但也沒有太大問題,model zoo 提供多種類型的預訓練模型,能做的選擇更多;
- 從論文數據上來看,其性能要強於STEP很多;
- 問題在於沒有demo,只能從訓練腳本中自己提煉出可用的demo,同時inference速度也是一個問題,大概只有0.2fps。
others
《Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals》
《A Proposal-Based Solution to Spatio-Temporal Action Detection in Untrimmed Videos》
《You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization》,[pytorch]
- 開源代碼環境配置簡單易用,開源時間不長可謂新鮮出爐,作者提供了一個UCF101-24預訓練模型;
- 從論文數據上來看,其性能是UCF101-24、J-HMDB-21目前的SOTA,速度更是能達到61FPS,完全可以實時;
- 問題在於沒有demo,只能從訓練腳本中自己提煉demo;同時,沒有AVA的預訓練模型,作者說正在進行。
2018
CVPR
《AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions》
IEEE
《Spatio-Temporal Attention-Based LSTM Networks for 3D Action Recognition and Detection》
others
《Online Action Tube Detection via Resolving the Spatio-temporal Context Pattern》
《A Proposal-Based Solution to Spatio-Temporal Action Detection in Untrimmed Videos》
2017
CVPR
《Spatio-Temporal Vector of Locally Max Pooled Features for Action Recognition in Videos》
《ActionVLAD: Learning Spatio-Temporal Aggregation for Action Classification》
《Spatio-Temporal Naive-Bayes Nearest-Neighbor (ST-NBNN) for Skeleton-Based Action Recognition》
《Spatio-Temporal Vector of Locally Max Pooled Features for Action Recognition in Videos》
ICCV
《Action Tubelet Detector for Spatio-Temporal Action Localization》,2017,[caffe]
《Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos 》,2017,[caffe]
《TORNADO: A Spatio-Temporal Convolutional Regression Network for Video Action Proposal》
2015
ICCV
《Learning to Track for Spatio-Temporal Action Localization》
《Human Action Recognition Using Factorized Spatio-Temporal Convolutional Networks》