人體姿態估計數據集整理（Pose Estimation/Keypoint）：MSCOCO（逐年）、LSP、FLIC、MPII、AI Challenge及打分標準

姿態估計等數據集收集整理；

LSP:
    樣本數：2K
    關節點個數：14
    全身，單人

Leeds Sports Dataset [12]及其擴展[13]，我們將通過LSP共同表示。它們包含11000個訓練和1000個測試圖像。這些是來自體育活動的圖像，因此在外觀和特別是關節方面非常具有挑戰性。

FLIC
    樣本數：2W.
    關節點個數：9
    全身，單人，FLIC由從電影中拍攝的5003張圖像（3987次訓練，1016次測試）組成。 圖像在上半身註釋，大多數圖形直接面向相機。

MPII
    樣本數：25K
    關節點個數：16
    全身，單人/多人，40K people，410 human activities

MPII Human Pose由大約25k個圖像組成，爲多人提供註釋，提供40k註釋樣本（28k訓練，11k測試）。

MSCOCO
    樣本數：>= 30W
    關節點個數：18
    全身，多人，keypoints on 10W people。118,287.張

AI Challenge
    樣本數：21W Training, 3W Validation, 3W Testing
    關節點個數：14
    全身，多人，38W people

打分標準Metrics：

LSP、FLIC、MPII：

Percentage of Correct Parts (PCP)

Percent of DetectedJoints (PDJ).（pcp改進版本）

COCO:

MAP

mask rcnn 是把2014組合了，也就是coco 2017，包括對應的annotations具體爲：

train2014：person_keypoints_train2014.json （80k）

val2014： ①person_keypoints_val2014.json

②person_keypoints_minival2014.json （5k ，常用驗證集）

③ person_keypoints_valminusminival2014.json （35k）

train 2017=person_keypoints_train2014.json +person_keypoints_valminusminival2014.json

也就是訓練集有 115k張圖片。準確數字爲118,287.張

1 gpu下，2img/gpu；一個epoch 爲57500iters。作者是12.17個epoch，也就是約爲 72萬。

"keypoints_coco_2014_train": {
	"img_dir": "coco/train2014",
	"ann_file": "coco/annotations/person_keypoints_train2014.json",
	},
	"keypoints_coco_2014_val": {
	"img_dir": "coco/val2014",
	"ann_file": "coco/annotations/person_keypoints_val2014.json"
	},
	"keypoints_coco_2014_minival": {
	"img_dir": "coco/val2014",
	"ann_file": "coco/annotations/person_keypoints_minival2014.json",
	},
	"keypoints_coco_2014_valminusminival": {
	"img_dir": "coco/val2014",
	"ann_file": "coco/annotations/person_keypoints_valminusminival2014.json",
	},

此數據集要求在具有挑戰性、不受控制的條件下定位人員關鍵點。

train: trainval dataset (57k圖像和15萬個人實例)。

validated: minival dataset (5000 圖像)。

testing sets ：test-dev set(20k imgs) +test-chanllenge set(20k imgs)

are evaluated in OKS-based mAP.(object keypoints similarity 不同人姿態的相似性)

那麼：

img_batch==2，num_GPU=1， 57000/2=28500 iters/epoch 72000 才 2.79 個epoch

img_batch==2，num_GPU=8， 57000/16=3562.5 iters/epoch 90000 約爲 25 個epoch

25個epoch 我需要 712500 也就是所謂的 720000

MSCOCO 2014: 包含150 k多個train實例和8萬多個test實例，250k個帶有17個關鍵點的Person實例。該數據集中的人員的重疊頻率低於衆包，並且它的人羣索引集中在接近於零的地方。

keypoints-challenge2016：(棄)它包括105 698次train和大約80 000次testing human instances。訓練集包含100多萬個標有標籤的關鍵點。測試集分爲四個大致相同大小的劃分：測試挑戰、測試開發、測試標準和測試保留。

MSCOCO 2017: 包括trainval dataset(57k圖像和15萬個人實例。我們評估了我們在val2017集和test-dev2017集上的方法，分別包含5000張圖像和20k圖像。

COCO 2014+minival ==2017

2018 ：訓練集和驗證集的標註是公開的（超150k個人和170萬已標註關鍵點）。數據集包括超200k張圖片和250k個標註有關鍵點信息的人物實例（COCO中大部分人物都是中等或大尺寸），這裏是下載地址。

mpii人體姿態數據集[2]包括從廣泛的真實世界活動中拍攝的圖像，並附有全身姿態註釋。大約有25k張有40k個主題的圖片，其中有12k個測試對象和其餘的訓練對象。數據增強和訓練策略與ms coco相同，但與其他方法相比，輸入大小爲256×256。

下面是轉載的，不記得轉誰的了侵權請告知。

在 2017 年之前，測試集有四個拆分 (dev / standard / reserve / challenge)。從 2017 年開始，將測試集簡化爲只有 dev / challenge 拆分，其他兩個拆分被刪除。

2017 Test Set Splits
2017測試集拆分
2017 年 COCO 測試集包含 ~40K 個測試圖像。測試集被分成兩個大致相同大小的分割約 20K 的圖像： test-dev 和 test-challenge。

split	#imgs	submit limit	scores available	leaderboard
Val	~5K	no limit	immediate	none
Test-Dev	~20K	5 per day	immediate	year-round
Test-Challenge	~20K	5 total	workshop	worksho

Test-Dev: test-dev split (拆分) 是在一般情況下測試的默認測試數據。通常論文中提供的結果應該來自於 test-dev 集，以便公正公開比較。每位參與者的提交次數限制爲每天上傳 5 次以避免過擬合。請注意，每個參與者只能向公衆排行榜發佈一次提交 (然而，論文可能會報告多個測試開發結果)。測試開發服務器將保持全年開放。
Test-Challenge: test-challenge split 被用於每年的 COCO 比賽。結果在相關研討會 (通常是 ECCV 或 ICCV) 中公佈。每個參與者的提交數量限制在挑戰過程中最多5次上傳。如果您提交多個條目，則基於 test-dev AP 的最佳結果將被選中作爲參賽者的參賽作品。請注意，每個參與者只能向公衆排行榜發佈一次提交。測試挑戰服務器將在每年的比賽前保持一段固定的時間。

屬於每個 split 的圖像在 image_info_test-dev2017 (用於 test-dev)和 image_info_test2017 (用於 test-dev 和test-challenge) 中定義。test-challenge 圖像的信息沒有明確提供。相反，在參與挑戰時，必須在完整的測試集 (包括test-dev 和test-challenge) 上提交結果。這有兩個目標。首先，參與者在挑戰研討會之前通過在 test-dev 中看到評估結果，獲得關於他們提交的自動反饋。其次，在挑戰研討會之後，它爲未來的參與者提供了一個機會，可以與test-devsplit 的挑戰條目進行比較。我們強調，當提交到完整的測試集 (image_info_test2017) 時，必須在所有圖像上生成結果而不區分拆分。最後，我們注意到，2017年的 dev / challenge 分組包含與2015年 dev / challenge 分組相同的圖像，因此跨越多年的結果可以直接進行比較。

2015 Test Set Splits
該測試集用於 2015 年和 2016 年的檢測和關鍵點比賽。它不再使用，評估服務器關閉。

2014 Test Set Splits
The 2014 test set is only used for the captioning challenge. Please see the caption eval page for details.

https://arxiv.org/pdf/1603.06937.pdf

We evaluate our network on two benchmark datasets, FLIC [1] and MPII HumanPose [21]. FLIC is composed of 5003 images (3987 training, 1016 testing) takenfrom films. The images are annotated on the upper body with most figures facingthe camera straight on. MPII Human Pose consists of around 25k images withannotations for multiple people providing 40k annotated samples (28k training,11k testing). The test annotations are not provided so in all of our experimentswe train on a subset of training images while evaluating on a heldout validationset of around 3000 samples. MPII consists of images taken from a wide range ofhuman activities with a challenging array of widely articulated full-body poses.

我們在兩個基準數據集FLIC [1]和MPII HumanPose [21]上評估我們的網絡。 FLIC由從電影中拍攝的5003張圖像（3987次訓練，1016次測試）組成。 圖像在上半身註釋，大多數圖形直接面向相機。 MPII Human Pose由大約25k個圖像組成，爲多人提供註釋，提供40k註釋樣本（28k訓練，11k測試）。沒有提供測試註釋，因此在我們的所有實驗中，我們訓練一部分訓練圖像，同時評估約3000個樣本的保持驗證集。 MPII由來自各種各樣的人類活動的圖像組成，具有廣泛關注的全身姿勢。

Evaluation is done using the standard Percentage of Correct Keypoints (PCK)metric which reports the percentage of detections that fall within a normalized distance of the ground truth. For FLIC, distance is normalized by torso size, andfor MPII, by a fraction of the head size (referred to as PCKh).

使用標準的正確關鍵點百分比（PCK）度量進行評估，該度量報告落在地面實況的標準化距離內的檢測百分比。對於FLIC，距離通過軀幹大小標準化，對於MPII，通過頭部大小的一小部分（稱爲PCKh）標準化。

FLIC:Results can be seen in Figure 6 and Table 1. Our results on FLIC arevery competitive reaching 99% [email protected] accuracy on the elbow, and 97% onthe wrist. It is important to note that these results are observer-centric, whichis consistent with how others have evaluated their output on FLIC.

FLIC:我們對FLIC的結果非常有競爭力，達到99％PCK 肘部精確度爲0.2，腕部精度爲97％。值得注意的是，這些結果是以觀察者爲中心的，這與其他人如何評估其在FLIC上的輸出一致。

We achieve state-of-the-art results across all joints on the MPII Hu-man Pose dataset. All numbers can be seen in Table 2 along with PCK curves inFigure 7. On difficult joints like the wrist, elbows, knees, and ankles we improveupon the most recent state-of-the-art results by an average of 3.5% ([email protected])with an average error rate of 12.8% down from 16.3%. The final elbow accuracyis 91.2% a

MPII:我們在MPII Hu-man Pose數據集的所有關節上實現了最先進的結果。表2中可以看到所有數字以及圖7中的PCK曲線。在手腕，肘部，膝蓋和腳踝等困難關節上，我們將最新的最新結果平均提高3.5％（PCKh @ 0.5）平均錯誤率爲12.8％，低於16.3％。最終肘關節準確率爲91.2％，腕關節準確度爲87.1％。網絡MPII做出的示例預測可以在圖5中看到nd wrist accuracy is 87.1%. Example predictions made by the networkon MPII can be seen in Figure 5

https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42237.pdf

DatasetsThere is a wide variety of benchmarks for hu-man pose estimation. In this work we use datasets, which have large number of training examples sufficient to train alarge model such as the proposed DNN, as well as are real-istic and challenging.The first dataset we use is Frames Labeled In Cinema(FLIC), introduced by [19], which consists of 4000 train-ing and 1000 test images obtained from popular Hollywood movies. The images contain people in diverse poses and es-pecially diverse clothing. For each labeled human,10 upper body joints are labeled.The second dataset we use is Leeds Sports Dataset [12]and its extension [13], which we will jointly denote by LSP.Combined they contain 11000 training and 1000 testing im-ages. These are images from sports activities and as such are quite challenging in terms of appearance and especially articulations. In addition, the majority of people have 150pixel height which makes the pose estimation even more challenging. In this dataset, for each person the full body is labeled with total 14 joints.

數據集人體姿勢估計有各種各樣的基準。在這項工作中，我們使用數據集，其中有大量的訓練樣例足以訓練大型模型，如建議的DNN，以及真實和具有挑戰性。我們使用的第一個數據集是Frames Labeled In Cinema（FLIC），介紹由[19]組成，其中包括從好萊塢流行電影中獲得的4000張訓練圖像和1000張測試圖像。這些圖像包含各種姿勢的人和各種各樣的服裝。對於每個標記的人，標記了10個上身關節。我們使用的第二個數據集是Leeds Sports Dataset [12]及其擴展[13]，我們將通過LSP共同表示。它們包含11000個訓練和1000個測試圖像。這些是來自體育活動的圖像，因此在外觀和特別是關節方面非常具有挑戰性。此外，大多數人的身高爲150像素，這使得姿勢估計更具挑戰性。在該數據集中，對於每個人，全身標記有總共14個關節。

We refer to this metric as Percent of DetectedJoints (PDJ).

人體姿態估計數據集整理（Pose Estimation/Keypoint）：MSCOCO（逐年）、LSP、FLIC、MPII、AI Challenge及打分標準

姿態估計等數據集收集整理；

打分標準Metrics：

self.info["photoshop"] = photoshop UnboundLocalError: local variable 'photoshop' referenced before

這個要看呀！！！！：中間層提取特徵的可視化，以及熱圖的可視化

maskrcnn_benchmark-----Step-by-step tutorial 如何訓練自己的數據集以及網絡的finetune

人體姿態估計數據集整理（Pose Estimation/Keypoint）：MSCOCO（逐年）、LSP、FLIC、MPII、AI Challenge及打分標準

關於卷積的非常形象的圖

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結