Log-Sum-Exp Pooling
Papers
From Image-level to Pixel-level Labeling with Convolutional Networks
ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases
LSE Pooling
在閱讀這兩篇文章之前,我印象中常用的 Pooling 有 Max Pooling 和 Average Pooling,而這兩篇文章中用到了 Log-Sum-Exp Pooling,其定義爲:
x p = 1 r ⋅ l o g [ 1 S ⋅ ∑ ( i , j ) ∈ S e x p ( r ⋅ x i j ) ]
x_p=\frac{1}{r}\cdot log[\frac{1}{S}\cdot \sum_{(i,j)\in\mathbf{S}}exp(r\cdot x_{ij})]
x p = r 1 ⋅ l o g [ S 1 ⋅ ( i , j ) ∈ S ∑ e x p ( r ⋅ x i j ) ]
其中,x i j x_{ij} x i j 表示在( i , j ) (i,j) ( i , j ) 的激活值,( i , j ) (i,j) ( i , j ) 是池化區域 S \mathbf{S} S 的一點並且 S = s × s S=s\times s S = s × s 是池化區域 S \mathbf{S} S 總點數,r r r 是超參數。
在第一篇文章中,作者提到 LSE Pooling 的作用爲:
The hyper-parameter r controls how smooth one wants the approximation to be: high r values implies having an effect similar to the max, very low values will have an effect similar to the score averaging. The advantage of this aggregation is that pixels having similar scores will have a similar weight in the training procedure, r controlling this notion of “similarity”.
在第二篇文章中,作者提到 LSE Pooling 的作用爲:
By controlling the hyper-parameter, r, the pooled value ranges from the maximum in S (when r → ∞ r\to\infin r → ∞ ) to average (r → 0 r\to0 r → 0 ).
一個直觀的理解可以看下圖:
數學證明
作爲一個嚴謹的大學僧,肯定不會止步於直觀理解啦,數學證明走起!
在證明前,不妨把式子簡化一點:
x p = 1 r ⋅ l o g [ 1 n ⋅ ∑ i = 1 n e x p ( r ⋅ x i ) ]
x_p=\frac{1}{r}\cdot log[\frac{1}{n}\cdot \sum_{i=1}^{n}exp(r\cdot x_i)]
x p = r 1 ⋅ l o g [ n 1 ⋅ i = 1 ∑ n e x p ( r ⋅ x i ) ]
證明 r → 0 r\to0 r → 0 相當於 Average Pooling
首先,我們需要藉助均值不等式:
a 1 + a 2 + . . . + a n n ≥ a 1 ⋅ a 2 . . . a n n
\frac{a_1+a_2+...+a_n}{n}\ge\sqrt[n]{a_1\cdot a_2...a_n}
n a 1 + a 2 + . . . + a n ≥ n a 1 ⋅ a 2 . . . a n
當且僅當 a 1 = a 2 = . . . = a n a_1=a_2=...=a_n a 1 = a 2 = . . . = a n 時取等號。
x p = 1 r ⋅ l o g [ 1 n ⋅ ∑ i = 1 n e x p ( r ⋅ x i ) ] = l o g ( 1 n ⋅ ∑ i = 1 n e r ⋅ x i ) 1 r
\begin{aligned}
x_p &= \frac{1}{r}\cdot log[\frac{1}{n}\cdot \sum_{i=1}^{n}exp(r\cdot x_i)] \\
&= log(\frac{1}{n}\cdot\sum_{i=1}^{n}e^{r\cdot x_i})^{\frac{1}{r}}
\end{aligned}
x p = r 1 ⋅ l o g [ n 1 ⋅ i = 1 ∑ n e x p ( r ⋅ x i ) ] = l o g ( n 1 ⋅ i = 1 ∑ n e r ⋅ x i ) r 1
應用均值不等式:
( 1 n ⋅ ∑ i = 1 n e r ⋅ x i ) 1 r ≥ ( ∏ i = 1 n e r ⋅ x i ) 1 n ⋅ 1 r = ( ∏ i = 1 n e x i ) 1 n
\begin{aligned}
(\frac{1}{n}\cdot \sum_{i=1}^{n}e^{r\cdot x_i})^{\frac{1}{r}}
&\ge (\prod_{i=1}^{n} e^{r\cdot x_i})^{\frac{1}{n}\cdot\frac{1}{r}} \\
&= (\prod_{i=1}^{n} e^{x_i})^{\frac{1}{n}}
\end{aligned}
( n 1 ⋅ i = 1 ∑ n e r ⋅ x i ) r 1 ≥ ( i = 1 ∏ n e r ⋅ x i ) n 1 ⋅ r 1 = ( i = 1 ∏ n e x i ) n 1
當 r = 0 r = 0 r = 0 時,可取等號。代入整個式子:
x p = l o g ( 1 n ⋅ ∑ i = 1 n e r ⋅ x i ) 1 r ≥ l o g ( ∏ i = 1 n e x i ) 1 n = 1 n ∑ i = 1 n x i
\begin{aligned}
x_p
&= log(\frac{1}{n}\cdot\sum_{i=1}^{n}e^{r\cdot x_i})^{\frac{1}{r}} \\
&\ge log(\prod_{i=1}^{n} e^{x_i})^{\frac{1}{n}} \\
&= \frac{1}{n}\sum_{i=1}^{n}x_i
\end{aligned}
x p = l o g ( n 1 ⋅ i = 1 ∑ n e r ⋅ x i ) r 1 ≥ l o g ( i = 1 ∏ n e x i ) n 1 = n 1 i = 1 ∑ n x i
於是 r → 0 r\to0 r → 0 相當於 Average Pooling 得證。
證明 r → ∞ r\to \infin r → ∞ 相當於 Max Pooling
x p = 1 r ⋅ l o g [ 1 n ⋅ ∑ i = 1 n e x p ( r ⋅ x i ) ] = l o g ( ∑ i = 1 n e r ⋅ x i ) 1 r − 1 r ⋅ l o g ( n )
\begin{aligned}
x_p
&= \frac{1}{r}\cdot log[\frac{1}{n}\cdot \sum_{i=1}^{n}exp(r\cdot x_i)] \\
&= log(\sum_{i=1}^{n}e^{r\cdot x_i})^{\frac{1}{r}} - \frac{1}{r}\cdot log(n)
\end{aligned}
x p = r 1 ⋅ l o g [ n 1 ⋅ i = 1 ∑ n e x p ( r ⋅ x i ) ] = l o g ( i = 1 ∑ n e r ⋅ x i ) r 1 − r 1 ⋅ l o g ( n )
因爲 r > 0 r > 0 r > 0 ,我們有:
m a x ( e r ⋅ x i ) 1 r ≤ ( ∑ i = 1 n e r ⋅ x i ) 1 r ≤ [ n ⋅ m a x ( e r ⋅ x i ) ] 1 r
\begin{aligned}
max(e^{r\cdot x_i})^{\frac{1}{r}} \le (\sum_{i=1}^{n}e^{r\cdot x_i})^{\frac{1}{r}} \le [n\cdot max(e^{r\cdot x_i})]^{\frac{1}{r}}
\end{aligned}
m a x ( e r ⋅ x i ) r 1 ≤ ( i = 1 ∑ n e r ⋅ x i ) r 1 ≤ [ n ⋅ m a x ( e r ⋅ x i ) ] r 1
代入整個式子,得:
m a x ( x i ) ≤ l o g ( ∑ i = 1 n e r ⋅ x i ) 1 r ≤ 1 r ⋅ l o g ( n ) + m a x ( x i )
max(x_i)\le log(\sum_{i=1}^{n}e^{r\cdot x_i})^{\frac{1}{r}} \le \frac{1}{r}\cdot log(n)+max(x_i)
m a x ( x i ) ≤ l o g ( i = 1 ∑ n e r ⋅ x i ) r 1 ≤ r 1 ⋅ l o g ( n ) + m a x ( x i )
當 r → ∞ r\to\infin r → ∞ 時有:1 r ⋅ l o g ( n ) → 0 \frac{1}{r}\cdot log(n)\to0 r 1 ⋅ l o g ( n ) → 0 ,故 r → ∞ r
\to\infin r → ∞ 相當於 Max Pooling 得證