StNet模型框架爲ActivityNet Kinetics Challenge 2018中奪冠的基礎網絡框架,本次開源的是基於ResNet50實現的StNet模型,基於其他backbone網絡的框架用戶可以依樣配置。該模型提出“super-image"的概念,在super-image上進行2D卷積,建模視頻中局部時空相關性。另外通過temporal modeling block建模視頻的全局時空依賴,最後用一個temporal Xception block對抽取的特徵序列進行長時序建模。StNet主體網絡結構如下圖所示:
StNet Framework Overview
詳細內容請參考AAAI’2019年論文StNet:Local and Global Spatial-Temporal Modeling for Human Action Recognition





export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python train.py --model_name=STNET \
                --config=./configs/stnet.yaml \
                --log_interval=10 \
                --valid_interval=1 \
                --use_gpu=True \
                --save_dir=./data/checkpoints \

bash run.sh train STNET ./configs/stnet.yaml
  • 從頭開始訓練,需要加載在ImageNet上訓練的ResNet50權重作爲初始化參數,請下載此模型參數並解壓,將上面啓動命令行或者run.sh腳本中的pretrain參數設置爲解壓之後的模型參數存放路徑。如果沒有手動下載並設置pretrain參數,則程序會自動下載並將參數保存在~/.paddle/weights/ResNet50_pretrained目錄下面

  • 可下載已發佈模型model通過–resume指定權重存放路徑進行finetune等開發

**數據讀取器說明: **模型讀取Kinetics-400數據集中的mp4數據,每條數據抽取seg_num段,每段抽取seg_len幀圖像,對每幀圖像做隨機增強後,縮放至target_size。




git clone https://github.com/PaddlePaddle/models.git
cd models/PaddleCV/video




Kinetics數據集下載:可以按照https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/video/data/dataset/README.md#Kinetics數據集 這個鏈接裏的說明準備數據集的


# 首先生成預處理需要的數據集標籤文件
python generate_label.py kinetics-400_train.csv kinetics400_label.txt

# 然後執行如下程序:
python video2pkl.py kinetics-400_train.csv $Source_dir $Target_dir  8 #以8個進程爲例

# 對於train數據,
Source_dir = $Code_Root/data/dataset/kinetics/data_k400/train_mp4
Target_dir = $Code_Root/data/dataset/kinetics/data_k400/train_pkl

# 對於val數據,
Source_dir = $Code_Root/data/dataset/kinetics/data_k400/val_mp4
Target_dir = $Code_Root/data/dataset/kinetics/data_k400/val_pkl

# 這樣即可將mp4文件解碼並保存爲pkl文件。


cd $Code_Root/data/dataset/kinetics
ls $Code_Root/data/dataset/kinetics/data_k400/train_pkl/* > train.list
ls $Code_Root/data/dataset/kinetics/data_k400/val_pkl/* > val.list
ls $Code_Root/data/dataset/kinetics/data_k400/val_pkl/* > test.list
ls $Code_Root/data/dataset/kinetics/data_k400/val_pkl/* > infer.list

# 即可生成相應的文件列表,train.list和val.list的每一行表示一個pkl文件的絕對路徑,示例如下:
# 或者


TRAIN: batch_size: 64
assert os.path.exists(args.pretrain + ".pdparams"), \


# 單卡訓練:export CUDA_VISIBLE_DEVICES=0
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python train.py --model_name=STNET \
		--model_name=STNET \
		--config=./configs/stnet.yaml \
		--log_interval=10 \
		--valid_interval=1 \
		--use_gpu=True \
		--save_dir=./data/checkpoints \
		--fix_random_seed=False \

bash run.sh train STNET ./configs/stnet.yaml



/home/dell/miniconda3/bin/python3.7 /home/dell/PycharmProjects/stnet_train_paddle/train.py --model_name=STNET --config=./configs/stnet.yaml --log_interval=10 --valid_interval=1 --use_gpu=True --save_dir=./data/checkpoints --fix_random_seed=False --pretrain=/home/dell/.paddle/weights/STNET
DALI is not installed, you can improve performance if use DALI
python video2pkl.py kinetics-400_train.csv \
		data/dataset/kinetics/data_k400/train_mp4 \
		data/dataset/kinetics/data_k400/train_pkl \


#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#    http://www.apache.org/licenses/LICENSE-2.0
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import sys
import glob
    import cPickle as pickle
    import pickle
from multiprocessing import Pool
# example command line: python generate_k400_pkl.py kinetics-400_train.csv 8
# kinetics-400_train.csv is the training set file of K400 official release
# each line contains laebl,youtube_id,time_start,time_end,split,is_cc
assert (len(sys.argv) == 5)

# 打開kinetics-400_train.csv文件並讀出列表
f = open(sys.argv[1])
source_dir = sys.argv[2]
target_dir = sys.argv[3]
num_threads = sys.argv[4]
all_video_entries = [x.strip().split(',') for x in f.readlines()]
all_video_entries = all_video_entries[1:]

# 讀取kinetics400標籤信息
category_label_map = {}
f = open('kinetics400_label.txt')
for line in f:
    ens = line.strip().split(' ')
    category = " ".join(ens[0:-1])
    label = int(ens[-1])
    category_label_map[category] = label

def generate_pkl(entry):
    mode = entry[4]
    category = entry[0].strip('"')
    category_dir = category
    video_path = os.path.join(
        entry[1] + "_%06d" % int(entry[2]) + "_%06d" % int(entry[3]) + ".mp4")
    video_path = os.path.join(source_dir, category_dir, video_path)
    label = category_label_map[category]

    vid = './' + video_path.split('/')[-1].split('.')[0]
    if os.path.exists(video_path):
        if not os.path.exists(vid):
        # 這裏是將視頻分離爲圖片
        os.system('ffmpeg -i ' + video_path + ' -q 0 ' + vid + '/%06d.jpg')
        print("File not exists {}".format(video_path))

    images = sorted(glob.glob(vid + '/*.jpg'))
    ims = []
    for img in images:
        f = open(img, 'rb')
        # 這裏是將這段10s視頻的很多張圖片,放入ims

    output_pkl = vid + ".pkl"
    output_pkl = os.path.join(target_dir, output_pkl)
    f = open(output_pkl, 'wb')
    # 這裏可以發現:生成的pkl文件格式爲3項:vid, 標籤, 圖片列表
    pickle.dump((vid, label, ims), f, protocol=2)

    os.system('rm -rf %s' % vid)

pool = Pool(processes=int(sys.argv[4]))
pool.map(generate_pkl, all_video_entries)


import six.moves.cPickle as pickle
inf = pickle.load(open(r'0-nxKQTMo-Y_000000_000010.pkl', 'rb'))
# 以下是輸出:
<class 'tuple'>: ('./0-nxKQTMo-Y_000000_000010', 183, [b'\xff\xd8\xff...\xe0\x00\x10]

tuple[2]是一個圖片集list:<class 'list'>: [b'\xff\xd8\xff\xe0\x00...(如果是30幀率的視頻源,則10s的數據,這裏就是300張圖)



GitHub: https://github.com/PaddlePaddle/models/blob/release/1.8/PaddleCV/video/models/stnet/README.md


百度大腦:STNET可閱讀版(只是該項目使用的是HMDB 51數據)

Kinetics數據集下載:可以按照https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/video/data/dataset/README.md#Kinetics數據集 這個鏈接裏的說明準備數據集的

