tensorflow的padding到底是怎麼一回事?
之前設計網絡的時候,就大概算了一下feature map的輸入輸出的形狀,padding呢一般就是兩種,一種是‘valid’的方式,另外一種是‘same’方式,特別具體的怎麼計算都記不太清了,結果轉caffe model的時候發現,兩種的padding方式是不同的,於是想在弄弄清楚tensorflow到底是怎麼個padding方式,結果網上一堆,各個博客的公式還都不一樣,算了,自己看源碼吧。
首先,我們先進行實驗,直接調用接口,看一下,這個比較直觀。
import numpy as np
import tensorflow as tf
test = np.random.random([1, 10, 10, 3])
valid_pad = tf.layers.average_pooling2d(test, pool_size=[3, 3], strides=3,
padding='valid')
same_pad = tf.layers.average_pooling2d(test, pool_size=[3, 3], strides=3,
padding='same')
print(valid_pad.get_shape)
print(same_pad.get_shape)
<bound method Tensor.get_shape of <tf.Tensor 'average_pooling2d_11/AvgPool:0' shape=(1, 3, 3, 3) dtype=float64>>
<bound method Tensor.get_shape of <tf.Tensor 'average_pooling2d_12/AvgPool:0' shape=(1, 4, 4, 3) dtype=float64>>
可以看到‘valid’和‘same’確實是不一樣的,那麼pad到底是怎麼計算它的輸出長度的呢?
def conv_output_length(input_length, filter_size, padding, stride, dilation=1):
"""Determines output length of a convolution given input length.
Arguments:
input_length: integer.
filter_size: integer.
padding: one of "same", "valid", "full", "causal"
stride: integer.
dilation: dilation rate, integer.
Returns:
The output length (integer).
"""
if input_length is None:
return None
assert padding in {'same', 'valid', 'full', 'causal'}
dilated_filter_size = filter_size + (filter_size - 1) * (dilation - 1)
if padding in ['same', 'causal']:
output_length = input_length
elif padding == 'valid':
output_length = input_length - dilated_filter_size + 1
elif padding == 'full':
output_length = input_length + dilated_filter_size - 1
return (output_length + stride - 1) // stride
在不用膨脹卷積的時候,我們可以認爲
dilated_filter_size = filter_size
那麼對於‘same’的模式下,計算就應該是
output_length = input_length
res = (output_length + stride - 1) // stride
我們把上面例子中input_length=10, filter_size = 3, stride=3帶入到上面式子中,可以得到:
(10+3-1) // 3 = 4
那麼對於‘valid’的模式呢?
output_length = input_length - dilated_filter_size + 1
res = (output_length + stride - 1) // stride
我們把上面例子中input_length=10, filter_size = 3, stride=3帶入到上面式子中,可以得到:
(10-3+1+3-1) // 3 = 3
我們再試一個例子呢
test = np.random.random([1, 127, 127,3])
valid_pad = tf.layers.average_pooling2d(test, pool_size=[5, 5], strides=3,
padding='valid')
same_pad = tf.layers.average_pooling2d(test, pool_size=[5, 5], strides=3,
padding='same')
print(valid_pad.get_shape)
print(same_pad.get_shape)
<bound method Tensor.get_shape of <tf.Tensor 'average_pooling2d_13/AvgPool:0' shape=(1, 41, 41, 3) dtype=float64>>
<bound method Tensor.get_shape of <tf.Tensor 'average_pooling2d_14/AvgPool:0' shape=(1, 43, 43, 3) dtype=float64>>
小夥伴可以按照上述的公式計算一下,答案是正確的!
LZ覺得所有的根據以事實說話,那麼我們會計算輸出的維度,但是具體的padding,‘same’是怎麼padding的呢?
test = np.random.randint(1, 5, [1, 5, 5, 1]).astype(np.float32)
print(np.squeeze(test))
valid_pad = tf.layers.average_pooling2d(test, pool_size=[2, 2], strides=2,
padding='valid')
same_pad_1 = tf.layers.average_pooling2d(test, pool_size=[2, 2], strides=2,
padding='same')
same_pad_0 = tf.layers.average_pooling2d(test, pool_size=[2, 2], strides=3,
padding='same')
same_pad_2 = tf.layers.average_pooling2d(test, pool_size=[3, 3], strides=4,
padding='same')
same_pad_3 = tf.layers.average_pooling2d(test, pool_size=[4, 4], strides=4,
padding='same')
print(valid_pad.get_shape)
print(same_pad_0.get_shape)
print(same_pad_1.get_shape)
print(same_pad_2.get_shape)
print(same_pad_3.get_shape)
sess = tf.Session()
print("valid_pad: ",np.squeeze(sess.run(valid_pad)))
print("same_pad_0: ", np.squeeze(sess.run(same_pad_0)))
print("same_pad_1: ", np.squeeze(sess.run(same_pad_1)))
print("same_pad_2: ", np.squeeze(sess.run(same_pad_2)))
print("same_pad_3: ", np.squeeze(sess.run(same_pad_3)))
[[4. 2. 1. 1. 3.]
[4. 3. 4. 1. 2.]
[1. 1. 3. 1. 3.]
[4. 4. 4. 4. 1.]
[4. 4. 2. 4. 4.]]
<bound method Tensor.get_shape of <tf.Tensor 'average_pooling2d_22/AvgPool:0' shape=(1, 2, 2, 1) dtype=float32>>
<bound method Tensor.get_shape of <tf.Tensor 'average_pooling2d_24/AvgPool:0' shape=(1, 2, 2, 1) dtype=float32>>
<bound method Tensor.get_shape of <tf.Tensor 'average_pooling2d_23/AvgPool:0' shape=(1, 3, 3, 1) dtype=float32>>
<bound method Tensor.get_shape of <tf.Tensor 'average_pooling2d_25/AvgPool:0' shape=(1, 2, 2, 1) dtype=float32>>
<bound method Tensor.get_shape of <tf.Tensor 'average_pooling2d_26/AvgPool:0' shape=(1, 2, 2, 1) dtype=float32>>
valid_pad: [[3.25 1.75]
[2.5 3. ]]
same_pad_0: [[3.25 1.75]
[4. 3.25]]
same_pad_1: [[3.25 1.75 2.5 ]
[2.5 3. 2. ]
[4. 3. 4. ]]
same_pad_2: [[3.25 1.75]
[4. 3.25]]
same_pad_3: [[2.5555556 1.8333334]
[3.6666667 3.25 ]]
下面LZ來總結下到底tensorflow的padding到底是怎麼一回事?
‘VALID’
valid很簡單,就是丟,如果kernel_size超過了feature map剩下沒有處理過的維度,那就丟掉,但是這樣就會產生信息的丟失,最好還是用‘same’比較好
‘SAME’
same是要把feature map上的信息都用上,並且會在feature map周圍進行補零操作。
如果pad的數量爲1,也就是在feature map的右邊和下邊各補充一列和一行零
如果pad的數量爲2, 也就是在feature map的上下左右各補充一列和一行零
如果pad的數量爲3,會在上和左邊補一行和一列,而在右邊和下邊補兩列,兩行
…
以此類推,如果pad的維度爲偶數,上下左右補零的維度相同,如果pad的維度爲奇數,則右邊和下邊補零的維度比左邊和上邊補零的維度多1
網上博客都是個人理解,如果真正想弄明白,自己動手試一試!