【系列論文研讀】Pose Estimation

Human Pose Estimation

一、Definition:

  • defined as the problem of localization of human joints
  • challenges of this problem – strong articulations, small and barely visible joints

 

二、Two methods:

  1. Top-downLocate Person -> Locate Joints
  2. Bottom-up Locate All Joints -> which person

 

Paper

一、DeepPosecvpr2014

TitleDeepPose: Human Pose Estimation via Deep Neural Networks

AuthorAlexander Toshev, Christian Szegedy (Google)

Main contributions:

1、formulate the pose estimation as a joint regression problem

              Using the entire image as input for each joint.

2、propose a cascade of DNN-based pose predictors

 

Method:

pose vector:

A labeled image:

Normalize the joint coordinates to a box bounding the human body or parts of it:

Image: (x,N(yi,b))

Network architecture

1. estimating an initial pose as outlined in the previous section

2. additional DNN regressors are trained to predict a displacement of the joint locations from previous stage to the true location.

 

二、Stacked Hourglass Networks ECCV2016

TitleStacked Hourglass Networks for Human Pose Estimation

Author Alejandro Newell, Kaiyu Yang, and Jia Deng (University of Michigan)

Key idea:

1.The network captures and consolidates information across all scales of the image.

2.pools down to a very low resolution, then upsamples and combines features across multiple resolutions 

3.consecutively placing multiple hourglass modules together end-to-end

Network architecture:

1、Residual Module

All convolutional layers have stride=1, padding=1, no change of data size, and only changes the depth.

2、HG

 

  • The symmetric topology of these networks is similar
  • simple nearest neighbor upsampling
  • outputs a collection of heatmaps

3、Total network with intermediate supervision

 

三、Convolutional Part Heatmap Regression(ECCV2016)

TitleHuman pose estimation via Convolutional Part Heatmap Regression

AuthorAdrian Bulat and Georgios Tzimiropoulos (University of Nottingham)

Key idea Can effectively handle occlusion

Loss = pixelwise + cross entropy loss

 

四、FANICCV2017

TitleHow far are we from solving the 2D & 3D Face Alignment problem?

AuthorAdrian Bulat and Georgios Tzimiropoulos (University of Nottingham)

1、2D FAN

2、2D to 3D FAN

1.the input RGB channels have been augmented with 68 additional channels, one for each 2D landmark

2.3D annotations are actually the 2D projections of the 3D coordinates

 

五、CPNCVPR2018

TitleCascaded Pyramid Network for Multi-Person Pose Estimation

AuthorYilun Chen∗ Zhicheng Wang∗ Yuxiang Peng1 Zhiqiang Zhang2 Gang Yu Jian Sun

Institution Face++

PerformanceCOCO 17 Key point detection champion

Problemhard keypoint detection

Key idea: two stages: GlobalNet and RefineNet; Pyramid

Network:

1. 把人體的關節點,大致分類兩類:簡單的部分和難的部分。用不同的方案去解決簡單點和難點,先解決容易點,再解決難點。在解決難關節點部分時,提供更多的上下文信息,更大的可感受野。

2. 動態地將loss值比較大的幾個channels進行反向學習;個人理解是RefineNet Loss更加關注loss值比較大的點,而這些點往往就是hard keypoint

3、GlobalNetResnet50cov2~5——Pyramid

4、Upsample before elem-sum   

5、L2loss:heatmap;L2loss*: online hard mining, which only BP some channels for hard points

 

六、Fast Human Pose Estimationcvpr2019

TitleFast Human Pose Estimation

Author Feng Zhang(1) Xiatian Zhu(2) Mao Ye(1)

1. University of Electronic Science and Technology of China

2. Vision Semantics Limited

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章