動作識別 - 視頻分類-StNetSTNET訓練

一、STNET模型簡介

StNet模型框架爲ActivityNet Kinetics Challenge 2018中奪冠的基礎網絡框架,本次開源的是基於ResNet50實現的StNet模型,基於其他backbone網絡的框架用戶可以依樣配置。該模型提出“super-image"的概念,在super-image上進行2D卷積,建模視頻中局部時空相關性。另外通過temporal modeling block建模視頻的全局時空依賴,最後用一個temporal Xception block對抽取的特徵序列進行長時序建模。StNet主體網絡結構如下圖所示:
StNet Framework Overview
詳細內容請參考AAAI’2019年論文StNet:Local and Global Spatial-Temporal Modeling for Human Action Recognition

數據介紹

StNet的訓練數據採用由DeepMind公佈的Kinetics-400動作識別數據集。數據下載及準備請參考數據說明

訓練介紹

數據準備完畢後,可以通過如下兩種方式啓動訓練:

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python train.py --model_name=STNET \
                --config=./configs/stnet.yaml \
                --log_interval=10 \
                --valid_interval=1 \
                --use_gpu=True \
                --save_dir=./data/checkpoints \
                --fix_random_seed=False
                --pretrain=$PATH_TO_PRETRAIN_MODEL

bash run.sh train STNET ./configs/stnet.yaml
  • 從頭開始訓練,需要加載在ImageNet上訓練的ResNet50權重作爲初始化參數,請下載此模型參數並解壓,將上面啓動命令行或者run.sh腳本中的pretrain參數設置爲解壓之後的模型參數存放路徑。如果沒有手動下載並設置pretrain參數,則程序會自動下載並將參數保存在~/.paddle/weights/ResNet50_pretrained目錄下面

  • 可下載已發佈模型model通過–resume指定權重存放路徑進行finetune等開發

**數據讀取器說明: **模型讀取Kinetics-400數據集中的mp4數據,每條數據抽取seg_num段,每段抽取seg_len幀圖像,對每幀圖像做隨機增強後,縮放至target_size。


二、STNET模型實戰訓練

1.下載源碼

注:該源碼含有其他的模型,我們只用的到models/PaddleCV/video代碼。

git clone https://github.com/PaddlePaddle/models.git
cd models/PaddleCV/video

2.下載預訓練模型

我是下載已發佈模型model通過–resume指定權重存放路徑進行finetune等開發:https://paddlemodels.bj.bcebos.com/video_classification/STNET.pdparams
放入到文件夾~/.paddle/weights/下。

3.下載數據集

Kinetics數據集下載:可以按照https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/video/data/dataset/README.md#Kinetics數據集 這個鏈接裏的說明準備數據集的

數據需要轉化爲pkl格式:

# 首先生成預處理需要的數據集標籤文件
python generate_label.py kinetics-400_train.csv kinetics400_label.txt

# 然後執行如下程序:
python video2pkl.py kinetics-400_train.csv $Source_dir $Target_dir  8 #以8個進程爲例

# 對於train數據,
Source_dir = $Code_Root/data/dataset/kinetics/data_k400/train_mp4
Target_dir = $Code_Root/data/dataset/kinetics/data_k400/train_pkl

# 對於val數據,
Source_dir = $Code_Root/data/dataset/kinetics/data_k400/val_mp4
Target_dir = $Code_Root/data/dataset/kinetics/data_k400/val_pkl

# 這樣即可將mp4文件解碼並保存爲pkl文件。

生成訓練和驗證集list:

cd $Code_Root/data/dataset/kinetics
ls $Code_Root/data/dataset/kinetics/data_k400/train_pkl/* > train.list
ls $Code_Root/data/dataset/kinetics/data_k400/val_pkl/* > val.list
ls $Code_Root/data/dataset/kinetics/data_k400/val_pkl/* > test.list
ls $Code_Root/data/dataset/kinetics/data_k400/val_pkl/* > infer.list

# 即可生成相應的文件列表,train.list和val.list的每一行表示一個pkl文件的絕對路徑,示例如下:
/ssd1/user/models/PaddleCV/PaddleVideo/data/dataset/kinetics/data_k400/train_pkl/data_batch_100-097
/ssd1/user/models/PaddleCV/PaddleVideo/data/dataset/kinetics/data_k400/train_pkl/data_batch_100-114
/ssd1/user/models/PaddleCV/PaddleVideo/data/dataset/kinetics/data_k400/train_pkl/data_batch_100-118
# 或者
/ssd1/user/models/PaddleCV/PaddleVideo/data/dataset/kinetics/data_k400/val_pkl/data_batch_102-085
/ssd1/user/models/PaddleCV/PaddleVideo/data/dataset/kinetics/data_k400/val_pkl/data_batch_102-086
/ssd1/user/models/PaddleCV/PaddleVideo/data/dataset/kinetics/data_k400/val_pkl/data_batch_102-090

3.StNet訓練

準備:
1.我使用的顯卡是RTX2070,顯存8G,需要修改configs/stnet.yaml
TRAIN: batch_size: 64
2.我運行的時候‘train.py’有代碼會報錯,需要修改170行爲:
assert os.path.exists(args.pretrain + ".pdparams"), \

訓練:
可以通過如下兩種方式啓動訓練:

# 單卡訓練:export CUDA_VISIBLE_DEVICES=0
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python train.py --model_name=STNET \
		--model_name=STNET \
		--config=./configs/stnet.yaml \
		--log_interval=10 \
		--valid_interval=1 \
		--use_gpu=True \
		--save_dir=./data/checkpoints \
		--fix_random_seed=False \
		--pretrain=~/.paddle/weights/STNET

bash run.sh train STNET ./configs/stnet.yaml

4.訓練結果

因爲我是找朋友要的Kinetics的pkl數據集來訓練,數據量不是很多,很快就訓練完成了。
測試不是怎麼好測試,暫時就沒做了,至少訓練是沒問題了,如果想用自己的數據來訓練,我後續會進行研究。

/home/dell/miniconda3/bin/python3.7 /home/dell/PycharmProjects/stnet_train_paddle/train.py --model_name=STNET --config=./configs/stnet.yaml --log_interval=10 --valid_interval=1 --use_gpu=True --save_dir=./data/checkpoints --fix_random_seed=False --pretrain=/home/dell/.paddle/weights/STNET
DALI is not installed, you can improve performance if use DALI
[INFO: train.py:  254]: Namespace(batch_size=None, config='./configs/stnet.yaml', epoch=None, fix_random_seed=False, is_profiler=0, learning_rate=None, log_interval=10, model_name='STNET', no_memory_optimize=False, pretrain='/home/dell/.paddle/weights/STNET', profiler_path='./', resume=None, save_dir='./data/checkpoints', use_gpu=True, valid_interval=1)
[INFO: config_utils.py:   70]: ---------------- Train Arguments ----------------
[INFO: config_utils.py:   72]: MODEL:
[INFO: config_utils.py:   74]:     name:STNET
[INFO: config_utils.py:   74]:     format:pkl
[INFO: config_utils.py:   74]:     num_classes:400
[INFO: config_utils.py:   74]:     seg_num:7
[INFO: config_utils.py:   74]:     seglen:5
[INFO: config_utils.py:   74]:     image_mean:[0.485, 0.456, 0.406]
[INFO: config_utils.py:   74]:     image_std:[0.229, 0.224, 0.225]
[INFO: config_utils.py:   74]:     num_layers:50
[INFO: config_utils.py:   74]:     topk:5
[INFO: config_utils.py:   72]: TRAIN:
[INFO: config_utils.py:   74]:     epoch:60
[INFO: config_utils.py:   74]:     short_size:256
[INFO: config_utils.py:   74]:     target_size:224
[INFO: config_utils.py:   74]:     num_reader_threads:12
[INFO: config_utils.py:   74]:     buf_size:1024
[INFO: config_utils.py:   74]:     batch_size:64
[INFO: config_utils.py:   74]:     num_gpus:8
[INFO: config_utils.py:   74]:     use_gpu:True
[INFO: config_utils.py:   74]:     filelist:./data/dataset/kinetics/train.list
[INFO: config_utils.py:   74]:     learning_rate:0.01
[INFO: config_utils.py:   74]:     learning_rate_decay:0.1
[INFO: config_utils.py:   74]:     l2_weight_decay:0.0001
[INFO: config_utils.py:   74]:     momentum:0.9
[INFO: config_utils.py:   74]:     total_videos:224684
[INFO: config_utils.py:   74]:     pretrain_base:./data/dataset/pretrained/ResNet50_pretrained
[INFO: config_utils.py:   72]: VALID:
[INFO: config_utils.py:   74]:     short_size:256
[INFO: config_utils.py:   74]:     target_size:224
[INFO: config_utils.py:   74]:     num_reader_threads:12
[INFO: config_utils.py:   74]:     buf_size:1024
[INFO: config_utils.py:   74]:     batch_size:128
[INFO: config_utils.py:   74]:     filelist:./data/dataset/kinetics/val.list
[INFO: config_utils.py:   72]: TEST:
[INFO: config_utils.py:   74]:     seg_num:25
[INFO: config_utils.py:   74]:     short_size:256
[INFO: config_utils.py:   74]:     target_size:256
[INFO: config_utils.py:   74]:     num_reader_threads:12
[INFO: config_utils.py:   74]:     buf_size:1024
[INFO: config_utils.py:   74]:     batch_size:4
[INFO: config_utils.py:   74]:     filelist:./data/dataset/kinetics/test.list
[INFO: config_utils.py:   72]: INFER:
[INFO: config_utils.py:   74]:     seg_num:25
[INFO: config_utils.py:   74]:     short_size:256
[INFO: config_utils.py:   74]:     target_size:256
[INFO: config_utils.py:   74]:     num_reader_threads:12
[INFO: config_utils.py:   74]:     buf_size:1024
[INFO: config_utils.py:   74]:     batch_size:1
[INFO: config_utils.py:   74]:     filelist:./data/dataset/kinetics/infer.list
[INFO: config_utils.py:   74]:     video_path:
[INFO: config_utils.py:   74]:     kinetics_labels:./data/dataset/kinetics_labels.json
[INFO: config_utils.py:   75]: -------------------------------------------------
W0520 19:19:57.004699 29621 device_context.cc:236] Please NOTE: device: 0, CUDA Capability: 75, Driver API Version: 10.1, Runtime API Version: 10.0
W0520 19:19:57.007618 29621 device_context.cc:244] device: 0, cuDNN Version: 7.5.
W0520 19:19:57.007634 29621 device_context.cc:270] WARNING: device: 0. The installed Paddle is compiled with CUDNN 7.6, but CUDNN version in your machine is 7.5, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version.
[INFO: stnet.py:  163]: Load pretrain weights from /home/dell/.paddle/weights/STNET, exclude fc, batch_norm, xception, conv3d layers.
[INFO: stnet.py:  173]: Delete conv3d_0.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete conv3d_0.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete batch_norm_24.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete batch_norm_24.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete batch_norm_24.w_1 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete batch_norm_24.w_2 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete conv3d_1.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete conv3d_1.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete batch_norm_44.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete batch_norm_44.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete batch_norm_44.w_1 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete batch_norm_44.w_2 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete xception_bn.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete xception_bn.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete xception_bn.w_1 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete xception_bn.w_2 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete xception_att_conv.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete xception_att_conv.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete xception_att_2.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete xception_att_2.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete xception_bndw.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete xception_bndw.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete xception_bndw.w_1 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete xception_bndw.w_2 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete xception_att1.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete xception_att1.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete xception_att1_2.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete xception_att1_2.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete xception_dw.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete xception_dw.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete xception_bn2.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete xception_bn2.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete xception_bn2.w_1 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete xception_bn2.w_2 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete fc_0.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  173]: Delete fc_0.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py:  179]: conv1_weights is transformed from [Cout, 3, Kh, Kw] into [Cout, 3*seglen, Kh, Kw]
[INFO: accuracy_metrics.py:   34]: Resetting train metrics...
[INFO: accuracy_metrics.py:   34]: Resetting valid metrics...
[INFO: train_utils.py:   46]: ------- learning rate [0.], learning rate counter [-1] -----
reader shuffle seed 0
[INFO: kinetics_reader.py:  249]: trainerid 0, trainer_count 1
[INFO: kinetics_reader.py:  253]: read images from 0, length: 756, lines length: 756, total: 756
I0520 19:20:00.296499 29621 parallel_executor.cc:421] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0520 19:20:00.323364 29621 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
I0520 19:20:00.353277 29621 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0520 19:20:00.369305 29621 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 19:20:01] Epoch 0, iter 0, time 2.6954545974731445, 	Loss: 6.386939,	top1_acc: 0.00, 	top5_acc: 0.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 19:20:11] Epoch 0, iter 10, time 1.144268274307251, 	Loss: 7.805868,	top1_acc: 0.00, 	top5_acc: 0.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 19:20:25] Epoch 0, iter 20, time 1.8410155773162842, 	Loss: 12.061253,	top1_acc: 0.00, 	top5_acc: 0.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 19:20:41] Epoch 0, iter 30, time 1.1101765632629395, 	Loss: 9.782310,	top1_acc: 0.00, 	top5_acc: 0.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 19:20:57] Epoch 0, iter 40, time 1.6426472663879395, 	Loss: 6.662434,	top1_acc: 0.00, 	top5_acc: 0.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 19:21:12] Epoch 0, iter 50, time 1.0377476215362549, 	Loss: 7.927030,	top1_acc: 0.00, 	top5_acc: 0.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 19:21:27] Epoch 0, iter 60, time 1.8886034488677979, 	Loss: 7.618662,	top1_acc: 0.00, 	top5_acc: 0.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 19:21:42] Epoch 0, iter 70, time 1.4986093044281006, 	Loss: 10.829721,	top1_acc: 0.00, 	top5_acc: 0.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 19:21:57] Epoch 0, iter 80, time 1.5821115970611572, 	Loss: 10.367525,	top1_acc: 0.00, 	top5_acc: 0.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 19:22:12] Epoch 0, iter 90, time 1.1715724468231201, 	Loss: 10.460176,	top1_acc: 0.00, 	top5_acc: 0.00
[INFO: train_utils.py:  122]: [TRAIN] Epoch 0 training finished, average time: 1.4606082644513858
[INFO: accuracy_metrics.py:   34]: Resetting valid metrics...
[INFO: kinetics_reader.py:  249]: trainerid 0, trainer_count 1
[INFO: kinetics_reader.py:  253]: read images from 0, length: 76, lines length: 76, total: 76
share_vars_from is set, scope is ignored.
I0520 19:22:21.868980 29621 parallel_executor.cc:421] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0520 19:22:21.874469 29621 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
I0520 19:22:21.879205 29621 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0520 19:22:21.883241 29621 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
[INFO: metrics_util.py:  143]: [TEST] test_iter 0 	Loss: 24.683359,	top1_acc: 0.00, 	top5_acc: 12.50
[INFO: metrics_util.py:  184]: [TEST] Finish	Loss: 24.371124,	top1_acc: 1.56, 	top5_acc: 12.50
[INFO: train_utils.py:   46]: ------- learning rate [0.01], learning rate counter [93] -----
reader shuffle seed 1
...
...(中間省略)
...
[INFO: kinetics_reader.py:  249]: trainerid 0, trainer_count 1
[INFO: kinetics_reader.py:  253]: read images from 0, length: 756, lines length: 756, total: 756
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 21:48:33] Epoch 58, iter 0, time 4.258568286895752, 	Loss: 0.576913,	top1_acc: 87.50, 	top5_acc: 100.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 21:48:45] Epoch 58, iter 10, time 1.3456335067749023, 	Loss: 0.317392,	top1_acc: 100.00, 	top5_acc: 100.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 21:48:59] Epoch 58, iter 20, time 1.1674671173095703, 	Loss: 0.671914,	top1_acc: 87.50, 	top5_acc: 100.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 21:49:14] Epoch 58, iter 30, time 1.9933085441589355, 	Loss: 0.784231,	top1_acc: 87.50, 	top5_acc: 100.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 21:49:29] Epoch 58, iter 40, time 1.1111698150634766, 	Loss: 0.930491,	top1_acc: 87.50, 	top5_acc: 87.50
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 21:49:42] Epoch 58, iter 50, time 1.8673505783081055, 	Loss: 0.543070,	top1_acc: 87.50, 	top5_acc: 100.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 21:49:57] Epoch 58, iter 60, time 1.634033203125, 	Loss: 0.919805,	top1_acc: 62.50, 	top5_acc: 87.50
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 21:50:11] Epoch 58, iter 70, time 0.9776091575622559, 	Loss: 0.418453,	top1_acc: 87.50, 	top5_acc: 100.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 21:50:27] Epoch 58, iter 80, time 0.9865224361419678, 	Loss: 1.184469,	top1_acc: 62.50, 	top5_acc: 100.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 21:50:42] Epoch 58, iter 90, time 1.0359406471252441, 	Loss: 0.816228,	top1_acc: 75.00, 	top5_acc: 100.00
[INFO: train_utils.py:  122]: [TRAIN] Epoch 58 training finished, average time: 1.4280521023658015
[INFO: accuracy_metrics.py:   34]: Resetting valid metrics...
[INFO: kinetics_reader.py:  249]: trainerid 0, trainer_count 1
[INFO: kinetics_reader.py:  253]: read images from 0, length: 76, lines length: 76, total: 76
[INFO: metrics_util.py:  143]: [TEST] test_iter 0 	Loss: 6.010135,	top1_acc: 18.75, 	top5_acc: 37.50
[INFO: metrics_util.py:  184]: [TEST] Finish	Loss: 10.055953,	top1_acc: 6.25, 	top5_acc: 28.12
[INFO: train_utils.py:   46]: ------- learning rate [0.01], learning rate counter [5545] -----
reader shuffle seed 59
[INFO: kinetics_reader.py:  249]: trainerid 0, trainer_count 1
[INFO: kinetics_reader.py:  253]: read images from 0, length: 756, lines length: 756, total: 756
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 21:51:02] Epoch 59, iter 0, time 2.8308911323547363, 	Loss: 0.166549,	top1_acc: 100.00, 	top5_acc: 100.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 21:51:18] Epoch 59, iter 10, time 1.7925031185150146, 	Loss: 0.996779,	top1_acc: 75.00, 	top5_acc: 100.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 21:51:31] Epoch 59, iter 20, time 1.660839319229126, 	Loss: 0.654242,	top1_acc: 87.50, 	top5_acc: 100.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 21:51:45] Epoch 59, iter 30, time 2.043001651763916, 	Loss: 0.896705,	top1_acc: 75.00, 	top5_acc: 100.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 21:51:57] Epoch 59, iter 40, time 1.0093119144439697, 	Loss: 0.860474,	top1_acc: 75.00, 	top5_acc: 100.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 21:52:11] Epoch 59, iter 50, time 1.7284026145935059, 	Loss: 0.901385,	top1_acc: 62.50, 	top5_acc: 100.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 21:52:26] Epoch 59, iter 60, time 1.1205298900604248, 	Loss: 0.465495,	top1_acc: 87.50, 	top5_acc: 100.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 21:52:41] Epoch 59, iter 70, time 1.2147243022918701, 	Loss: 0.953047,	top1_acc: 75.00, 	top5_acc: 100.00
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 21:52:56] Epoch 59, iter 80, time 1.788550853729248, 	Loss: 0.777208,	top1_acc: 75.00, 	top5_acc: 87.50
[INFO: metrics_util.py:  143]: [TRAIN 2020-05-20 21:53:11] Epoch 59, iter 90, time 0.9238030910491943, 	Loss: 1.572572,	top1_acc: 50.00, 	top5_acc: 87.50
[INFO: train_utils.py:  122]: [TRAIN] Epoch 59 training finished, average time: 1.4281317880076747
[INFO: accuracy_metrics.py:   34]: Resetting valid metrics...
[INFO: kinetics_reader.py:  249]: trainerid 0, trainer_count 1
[INFO: kinetics_reader.py:  253]: read images from 0, length: 76, lines length: 76, total: 76
[INFO: metrics_util.py:  143]: [TEST] test_iter 0 	Loss: 4.260227,	top1_acc: 18.75, 	top5_acc: 43.75
[INFO: metrics_util.py:  184]: [TEST] Finish	Loss: 8.431045,	top1_acc: 9.38, 	top5_acc: 26.56

Process finished with exit code 0

三、pkl數據包分析

如果我們要使用自己的數據來訓練怎麼辦?
那麼就要按照Kinetics數據集的方式,將視頻轉換爲pkl文件,我們需要了解Kinetics數據集的pkl文件是怎樣生成的,我們可以查看源碼:data/dataset/kinetics/video2pkl.py
根據運行該腳本的命令來分析:

python video2pkl.py kinetics-400_train.csv \
		data/dataset/kinetics/data_k400/train_mp4 \
		data/dataset/kinetics/data_k400/train_pkl \
		8

kinetics-400_train.csv:kinetics-400數據集列表,包含視頻源、視頻截取信息、標籤等。
data/dataset/kinetics/data_k400/train_mp4:源視頻目錄
data/dataset/kinetics/data_k400/train_pkl:目標pkl文件目錄
8:線程數

#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import sys
import glob
try:
    import cPickle as pickle
except:
    import pickle
from multiprocessing import Pool
# example command line: python generate_k400_pkl.py kinetics-400_train.csv 8
# 
# kinetics-400_train.csv is the training set file of K400 official release
# each line contains laebl,youtube_id,time_start,time_end,split,is_cc
assert (len(sys.argv) == 5)

# 打開kinetics-400_train.csv文件並讀出列表
f = open(sys.argv[1])
source_dir = sys.argv[2]
target_dir = sys.argv[3]
num_threads = sys.argv[4]
all_video_entries = [x.strip().split(',') for x in f.readlines()]
all_video_entries = all_video_entries[1:]
f.close()

# 讀取kinetics400標籤信息
category_label_map = {}
f = open('kinetics400_label.txt')
for line in f:
    ens = line.strip().split(' ')
    category = " ".join(ens[0:-1])
    label = int(ens[-1])
    category_label_map[category] = label
f.close()

def generate_pkl(entry):
    mode = entry[4]
    category = entry[0].strip('"')
    category_dir = category
    video_path = os.path.join(
        './',
        entry[1] + "_%06d" % int(entry[2]) + "_%06d" % int(entry[3]) + ".mp4")
    video_path = os.path.join(source_dir, category_dir, video_path)
    label = category_label_map[category]

    vid = './' + video_path.split('/')[-1].split('.')[0]
    if os.path.exists(video_path):
        if not os.path.exists(vid):
            os.makedirs(vid)
        # 這裏是將視頻分離爲圖片
        os.system('ffmpeg -i ' + video_path + ' -q 0 ' + vid + '/%06d.jpg')
    else:
        print("File not exists {}".format(video_path))
        return

    images = sorted(glob.glob(vid + '/*.jpg'))
    ims = []
    for img in images:
        f = open(img, 'rb')
        # 這裏是將這段10s視頻的很多張圖片,放入ims
        ims.append(f.read())
        f.close()

    output_pkl = vid + ".pkl"
    output_pkl = os.path.join(target_dir, output_pkl)
    f = open(output_pkl, 'wb')
    # 這裏可以發現:生成的pkl文件格式爲3項:vid, 標籤, 圖片列表
    pickle.dump((vid, label, ims), f, protocol=2)
    f.close()

    os.system('rm -rf %s' % vid)

pool = Pool(processes=int(sys.argv[4]))
pool.map(generate_pkl, all_video_entries)
pool.close()
pool.join()

例如:我嘗試打開某個pkl文件:0-nxKQTMo-Y_000000_000010.pkl

import six.moves.cPickle as pickle
inf = pickle.load(open(r'0-nxKQTMo-Y_000000_000010.pkl', 'rb'))
print(inf)
# 以下是輸出:
<class 'tuple'>: ('./0-nxKQTMo-Y_000000_000010', 183, [b'\xff\xd8\xff...\xe0\x00\x10]

可以發現是一個tuple:
tuple[0]是文件名。
tuple[1]是對應的標籤index(可查看kinetics400_label.txt)
tuple[2]是一個圖片集list:<class 'list'>: [b'\xff\xd8\xff\xe0\x00...(如果是30幀率的視頻源,則10s的數據,這裏就是300張圖)


四、使用自己的視頻數據集,生成pkl



附、相關資料

GitHub: https://github.com/PaddlePaddle/models/blob/release/1.8/PaddleCV/video/models/stnet/README.md

百度AI開放平臺:https://www.paddlepaddle.org.cn/modelbasedetail/stnet

百度大腦:STNET可閱讀版(只是該項目使用的是HMDB 51數據)

Kinetics數據集下載:可以按照https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/video/data/dataset/README.md#Kinetics數據集 這個鏈接裏的說明準備數據集的

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章