該篇文章主要用來估計Joint的位置
貢獻在於用了 capture spatial context across all stages
值得注意的句子
- the magnitude of back-propagated gradients decreases in strength with the number of intermediate layers between the output layer and the input layer.
- Intermediate supervision has the advantage that, even though the full architecture can have many layers, it does not fall prey to the vanishing gradient problem as the intermediate loss functions replenish the gradients at each stages.