關於Deformable Convolutional Networks的論文解讀，共分爲5個部分，本章是第五部分：

[ ] Part1：快速學習實現仿射變換
[ ] Part2： Spatial Transfomer Networks論文解讀
[ ] Part3： TenosorFlow實現STN
[ ] Part4： Deformable Convolutional Networks論文解讀
[x] Part5： TensorFlow實現Deformable ConvNets

本章講解使用TensorFlow實現Deformable ConvNets。

Deformable Convolution介紹

論文給出的代碼是MXNet寫的，blog裏使用的是TensorFlow/Keras實現的簡易版。

注意該TensorFlow版存在的問題：

前向速度太慢，下面案例模型帶變形卷積層的前向傳播約需要240ms，而正常的CNN需要10ms都不到。
只是簡單的變形層實現，沒有Deformable Align-Rol層。
使用的Keras，原MXNet是要快很多的

相關資源地址：

在變形的MNIST上示意圖：

初探Deformable Conv

下載代碼：

git clone https://github.com/felixlaumon/deform-conv.git

解壓，在相應的目錄使用Jupyter創建新的對話：

導包：

from __future__ import division
# %env CUDA_VISIBLE_DEVICES=0

import numpy as np
import tensorflow as tf
import keras.backend as K
from keras.models import Model
from keras.losses import categorical_crossentropy
from keras.optimizers import Adam, SGD
from deform_conv.layers import ConvOffset2D
from deform_conv.callbacks import TensorBoard
from deform_conv.cnn import get_cnn, get_deform_cnn
from deform_conv.mnist import get_gen
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
K.set_session(sess)

# 配置訓練和測試數據
batch_size = 32
n_train = 60000
n_test = 10000
steps_per_epoch = int(np.ceil(n_train / batch_size))
validation_steps = int(np.ceil(n_test / batch_size))

# 正常的mnist訓練數據集
train_gen = get_gen(
    'train', batch_size=batch_size,
    scale=(1.0, 1.0), translate=0.0,
    shuffle=True
)

# 正常的mnist測試數據集
test_gen = get_gen(
    'test', batch_size=batch_size,
    scale=(1.0, 1.0), translate=0.0,
    shuffle=False
)

# 變形的mnist訓練數據集
train_scaled_gen = get_gen(
    'train', batch_size=batch_size,
    scale=(1.0, 2.5), translate=0.2,
    shuffle=True
)
# 變形的mnist測試數據集
test_scaled_gen = get_gen(
    'test', batch_size=batch_size,
    scale=(1.0, 2.5), translate=0.2,
    shuffle=False
)

常規的CNN模型

訓練常規的CNN模型：

inputs, outputs = get_cnn()
model = Model(inputs=inputs, outputs=outputs)
model.summary() # 打印網絡結構
optim = Adam(1e-3)
# optim = SGD(1e-3, momentum=0.99, nesterov=True)
loss = categorical_crossentropy
model.compile(optim, loss, metrics=['accuracy'])

model.fit_generator(
    train_gen, steps_per_epoch=steps_per_epoch,
    epochs=10, verbose=1,
    validation_data=test_gen, validation_steps=validation_steps
)
model.save_weights('models/cnn.h5')

模型結構：

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           (None, 28, 28, 1)         0         
_________________________________________________________________
conv11 (Conv2D)              (None, 28, 28, 32)        320       
_________________________________________________________________
conv11_relu (Activation)     (None, 28, 28, 32)        0         
_________________________________________________________________
conv11_bn (BatchNormalizatio (None, 28, 28, 32)        128       
_________________________________________________________________
conv12 (Conv2D)              (None, 14, 14, 64)        18496     
_________________________________________________________________
conv12_relu (Activation)     (None, 14, 14, 64)        0         
_________________________________________________________________
conv12_bn (BatchNormalizatio (None, 14, 14, 64)        256       
_________________________________________________________________
conv21 (Conv2D)              (None, 14, 14, 128)       73856     
_________________________________________________________________
conv21_relu (Activation)     (None, 14, 14, 128)       0         
_________________________________________________________________
conv21_bn (BatchNormalizatio (None, 14, 14, 128)       512       
_________________________________________________________________
conv22 (Conv2D)              (None, 7, 7, 128)         147584    
_________________________________________________________________
conv22_relu (Activation)     (None, 7, 7, 128)         0         
_________________________________________________________________
conv22_bn (BatchNormalizatio (None, 7, 7, 128)         512       
_________________________________________________________________
avg_pool (GlobalAveragePooli (None, 128)               0         
_________________________________________________________________
fc1 (Dense)                  (None, 10)                1290      
_________________________________________________________________
out (Activation)             (None, 10)                0         
=================================================================
Total params: 242,954
Trainable params: 242,250
Non-trainable params: 704

常規的CNN在正常和變形的MNIST數據集上測試：

# ---
# Evaluate normal CNN

model.load_weights('models/cnn.h5', by_name=True)

val_loss, val_acc = model.evaluate_generator(
    test_gen, steps=validation_steps
)
print('Test accuracy', val_acc)
# 0.9874

val_loss, val_acc = model.evaluate_generator(
    test_scaled_gen, steps=validation_steps
)
print('Test accuracy with scaled images', val_acc)
# 0.5701

測試結果：

Test accuracy 0.9884
Test accuracy with scaled images 0.577

帶入變形層的Deform-Conv模型

訓練模型，注意這是在上面常規的CNN基礎在做fine-tune：

# ---
# Deformable CNN

inputs, outputs = get_deform_cnn(trainable=False)
model = Model(inputs=inputs, outputs=outputs)
model.load_weights('models/cnn.h5', by_name=True)
model.summary()
optim = Adam(5e-4)
# optim = SGD(1e-4, momentum=0.99, nesterov=True)
loss = categorical_crossentropy
model.compile(optim, loss, metrics=['accuracy'])

model.fit_generator(
    train_scaled_gen, steps_per_epoch=steps_per_epoch,
    epochs=20, verbose=1,
    validation_data=test_scaled_gen, validation_steps=validation_steps
)
# Epoch 20/20
# 1875/1875 [==============================] - 442s 236ms/step - loss: 0.2554 - acc: 0.9203 - val_loss: 0.2030 - val_acc: 0.9357
model.save_weights('models/deform_cnn.h5')

網絡結構和訓練過程：

Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           (None, 28, 28, 1)         0         
_________________________________________________________________
conv11 (Conv2D)              (None, 28, 28, 32)        320       
_________________________________________________________________
conv11_relu (Activation)     (None, 28, 28, 32)        0         
_________________________________________________________________
conv11_bn (BatchNormalizatio (None, 28, 28, 32)        128       
_________________________________________________________________
conv12_offset (ConvOffset2D) (None, 28, 28, 32)        18432     
_________________________________________________________________
conv12 (Conv2D)              (None, 14, 14, 64)        18496     
_________________________________________________________________
conv12_relu (Activation)     (None, 14, 14, 64)        0         
_________________________________________________________________
conv12_bn (BatchNormalizatio (None, 14, 14, 64)        256       
_________________________________________________________________
conv21_offset (ConvOffset2D) (None, 14, 14, 64)        73728     
_________________________________________________________________
conv21 (Conv2D)              (None, 14, 14, 128)       73856     
_________________________________________________________________
conv21_relu (Activation)     (None, 14, 14, 128)       0         
_________________________________________________________________
conv21_bn (BatchNormalizatio (None, 14, 14, 128)       512       
_________________________________________________________________
conv22_offset (ConvOffset2D) (None, 14, 14, 128)       294912    
_________________________________________________________________
conv22 (Conv2D)              (None, 7, 7, 128)         147584    
_________________________________________________________________
conv22_relu (Activation)     (None, 7, 7, 128)         0         
_________________________________________________________________
conv22_bn (BatchNormalizatio (None, 7, 7, 128)         512       
_________________________________________________________________
avg_pool (GlobalAveragePooli (None, 128)               0         
_________________________________________________________________
fc1 (Dense)                  (None, 10)                1290      
_________________________________________________________________
out (Activation)             (None, 10)                0         
=================================================================
Total params: 630,026
Trainable params: 387,776
Non-trainable params: 242,250

Epoch 1/20
1875/1875 [==============================] - 397s 212ms/step - loss: 0.3851 - acc: 0.8873 - val_loss: 0.2935 - val_acc: 0.9102
Epoch 2/20
1875/1875 [==============================] - 281s 150ms/step - loss: 0.3454 - acc: 0.8971 - val_loss: 0.2775 - val_acc: 0.9123
Epoch 3/20
1875/1875 [==============================] - 316s 169ms/step - loss: 0.3299 - acc: 0.8994 - val_loss: 0.2838 - val_acc: 0.9127
Epoch 4/20
1875/1875 [==============================] - 348s 186ms/step - loss: 0.3299 - acc: 0.8994 - val_loss: 0.2839 - val_acc: 0.9120
Epoch 5/20
1875/1875 [==============================] - 372s 198ms/step - loss: 0.3198 - acc: 0.9014 - val_loss: 0.2781 - val_acc: 0.9149
Epoch 6/20
1875/1875 [==============================] - 378s 202ms/step - loss: 0.3057 - acc: 0.9040 - val_loss: 0.2475 - val_acc: 0.9243
Epoch 7/20
1875/1875 [==============================] - 468s 250ms/step - loss: 0.2942 - acc: 0.9076 - val_loss: 0.2487 - val_acc: 0.9234
Epoch 8/20
1875/1875 [==============================] - 469s 250ms/step - loss: 0.2917 - acc: 0.9085 - val_loss: 0.2448 - val_acc: 0.9211
Epoch 9/20
1875/1875 [==============================] - 442s 236ms/step - loss: 0.2936 - acc: 0.9075 - val_loss: 0.2383 - val_acc: 0.9248
Epoch 10/20
1875/1875 [==============================] - 431s 230ms/step - loss: 0.2928 - acc: 0.9079 - val_loss: 0.2516 - val_acc: 0.9208
Epoch 11/20
1875/1875 [==============================] - 458s 244ms/step - loss: 0.2886 - acc: 0.9089 - val_loss: 0.2347 - val_acc: 0.9262
Epoch 12/20
1875/1875 [==============================] - 434s 231ms/step - loss: 0.2830 - acc: 0.9099 - val_loss: 0.2342 - val_acc: 0.9253
Epoch 13/20
1875/1875 [==============================] - 453s 242ms/step - loss: 0.2745 - acc: 0.9127 - val_loss: 0.2308 - val_acc: 0.9257
Epoch 14/20
1875/1875 [==============================] - 449s 240ms/step - loss: 0.2795 - acc: 0.9124 - val_loss: 0.2279 - val_acc: 0.9287
Epoch 15/20
1875/1875 [==============================] - 458s 244ms/step - loss: 0.2709 - acc: 0.9139 - val_loss: 0.2338 - val_acc: 0.9288
Epoch 16/20
1875/1875 [==============================] - 422s 225ms/step - loss: 0.2767 - acc: 0.9116 - val_loss: 0.2145 - val_acc: 0.9286
Epoch 17/20
1875/1875 [==============================] - 364s 194ms/step - loss: 0.2663 - acc: 0.9160 - val_loss: 0.2259 - val_acc: 0.9302
Epoch 18/20
1875/1875 [==============================] - 366s 195ms/step - loss: 0.2665 - acc: 0.9162 - val_loss: 0.2118 - val_acc: 0.9325
Epoch 19/20
1875/1875 [==============================] - 403s 215ms/step - loss: 0.2634 - acc: 0.9168 - val_loss: 0.2204 - val_acc: 0.9309
Epoch 20/20
1875/1875 [==============================] - 442s 236ms/step - loss: 0.2554 - acc: 0.9203 - val_loss: 0.2030 - val_acc: 0.9357

變形卷積的效果：

# --
# Evaluate deformable CNN

model.load_weights('models/deform_cnn.h5')

val_loss, val_acc = model.evaluate_generator(
    test_scaled_gen, steps=validation_steps
)
print('Test accuracy of deformable convolution with scaled images', val_acc)
# 0.9255

val_loss, val_acc = model.evaluate_generator(
    test_gen, steps=validation_steps
)
print('Test accuracy of deformable convolution with regular images', val_acc)
# 0.9727

輸出爲：

Test accuracy of deformable convolution with scaled images 0.9323
Test accuracy of deformable convolution with regular images 0.9016

對比與常規的CNN：

模型	正常MNIST	變形MNIST
常規CNN	0.9884	0.577
變形CNN	0.9016	0.9323

這裏在正常MNIST數據集上表現下降很多，實際操作中因爲增加了3個變形層，只是在變形的MNIST做了fine-tune，可以在正常的MNIST上做fine-tune做適配。

變形卷積模型分析

上述使用了兩個模型常規的cnn和變形的cnn，來源於deform-conv/deform_conv/cnn.py文件。

代碼如下：

def get_cnn():
    inputs = l = Input((28, 28, 1), name='input')

    # conv11
    l = Conv2D(32, (3, 3), padding='same', name='conv11')(l)
    l = Activation('relu', name='conv11_relu')(l)
    l = BatchNormalization(name='conv11_bn')(l)

    # conv12
    l = Conv2D(64, (3, 3), padding='same', strides=(2, 2), name='conv12')(l)
    l = Activation('relu', name='conv12_relu')(l)
    l = BatchNormalization(name='conv12_bn')(l)

    # conv21
    l = Conv2D(128, (3, 3), padding='same', name='conv21')(l)
    l = Activation('relu', name='conv21_relu')(l)
    l = BatchNormalization(name='conv21_bn')(l)

    # conv22
    l = Conv2D(128, (3, 3), padding='same', strides=(2, 2), name='conv22')(l)
    l = Activation('relu', name='conv22_relu')(l)
    l = BatchNormalization(name='conv22_bn')(l)

    # out
    l = GlobalAvgPool2D(name='avg_pool')(l)
    l = Dense(10, name='fc1')(l)
    outputs = l = Activation('softmax', name='out')(l)

    return inputs, outputs


def get_deform_cnn(trainable):
    inputs = l = Input((28, 28, 1), name='input')

    # conv11
    l = Conv2D(32, (3, 3), padding='same', name='conv11', trainable=trainable)(l)
    l = Activation('relu', name='conv11_relu')(l)
    l = BatchNormalization(name='conv11_bn')(l)

    # conv12
    l_offset = ConvOffset2D(32, name='conv12_offset')(l)
    l = Conv2D(64, (3, 3), padding='same', strides=(2, 2), name='conv12', trainable=trainable)(l_offset)
    l = Activation('relu', name='conv12_relu')(l)
    l = BatchNormalization(name='conv12_bn')(l)

    # conv21
    l_offset = ConvOffset2D(64, name='conv21_offset')(l)
    l = Conv2D(128, (3, 3), padding='same', name='conv21', trainable=trainable)(l_offset)
    l = Activation('relu', name='conv21_relu')(l)
    l = BatchNormalization(name='conv21_bn')(l)

    # conv22
    l_offset = ConvOffset2D(128, name='conv22_offset')(l)
    l = Conv2D(128, (3, 3), padding='same', strides=(2, 2), name='conv22', trainable=trainable)(l_offset)
    l = Activation('relu', name='conv22_relu')(l)
    l = BatchNormalization(name='conv22_bn')(l)

    # out
    l = GlobalAvgPool2D(name='avg_pool')(l)
    l = Dense(10, name='fc1', trainable=trainable)(l)
    outputs = l = Activation('softmax', name='out')(l)

    return inputs, outputs

可以看到兩個模型的區別很明顯，get_deform_cnn相比於get_cnn多個3個ConvOffset2D層。其他的層都相同，故deform_cnn能夠在cnn的基礎上做fine-tune.

ConvOffset2D層

由論文可分爲以下幾步：

對於輸出特徵圖U，正常的卷積輸出的 $N$ 通道
對於可變性卷積，在U上使用 $2 N$ 的普通卷積，得到 $2 N$ 的特徵圖，這代表變形卷積採樣的偏移量( $2 N$ 是代表x,y兩個方向)。
對於得到的的特徵圖，shape爲，輸入是。將代表偏移的特徵圖offsets與原本的採樣位置x相加得到實際的採樣位置coord。
- 將偏移 $(b, h, w, 2 c)$ 變形爲–> $(b * c, h, w, 2)$
- 輸入 $(b, h, w, c)$ 變形爲–> $(b * c, h, w)$
- 調用tf_batch_map_offsets函數做採樣，得到採樣後 $(b * c, h, w)$
- 採樣後 $(b * c, h, w)$ 變形得到最終輸出–> $(b, h, w, c)$

下面我們一步一步看。

`ConvOffset2D`類定義

關鍵的ConvOffset2D類在deform_conv/layers.py下定義：

class ConvOffset2D(Conv2D):
    """
    ConvOffset2D卷積層學習2D的偏移量，使用雙線性插值輸出變形後採樣值
    """

    def __init__(self, filters, init_normal_stddev=0.01, **kwargs):
        """Init

        Parameters
        ----------
        filters : int
            Number of channel of the input feature map
        init_normal_stddev : float
            Normal kernel initialization
        **kwargs:
            Pass to superclass. See Con2D layer in Keras
        """

        self.filters = filters
        # 注意通道數翻倍，輸出的特徵圖表示偏移量x,y
        super(ConvOffset2D, self).__init__(
            self.filters * 2, (3, 3), padding='same', use_bias=False,
            kernel_initializer=RandomNormal(0, init_normal_stddev),
            **kwargs
        )

    def call(self, x):
        """Return the deformed featured map"""
        x_shape = x.get_shape()

        # 卷積輸出得到2倍通道的feature map，獲取到偏移量 大小爲(batch,h,w,2c)
        offsets = super(ConvOffset2D, self).call(x)

        # offsets reshape成: (b*c, h, w, 2) 共有b*c個map.大小爲h,w
        offsets = self._to_bc_h_w_2(offsets, x_shape)

        # 將輸入x也切換成這樣: (b*c, h, w)
        x = self._to_bc_h_w(x, x_shape)

        # 雙線性採樣得到採樣後的X_offset: (b*c, h, w)
        x_offset = tf_batch_map_offsets(x, offsets)

        # 再變原本的shape，即x_offset: (b, h, w, c)
        x_offset = self._to_b_h_w_c(x_offset, x_shape)

        return x_offset

    def compute_output_shape(self, input_shape):
        """Output shape is the same as input shape

        Because this layer does only the deformation part
        """
        return input_shape

    @staticmethod
    def _to_bc_h_w_2(x, x_shape):
        """(b, h, w, 2c) -> (b*c, h, w, 2)"""
        x = tf.transpose(x, [0, 3, 1, 2])
        x = tf.reshape(x, (-1, int(x_shape[1]), int(x_shape[2]), 2))
        return x

    @staticmethod
    def _to_bc_h_w(x, x_shape):
        """(b, h, w, c) -> (b*c, h, w)"""
        x = tf.transpose(x, [0, 3, 1, 2])
        x = tf.reshape(x, (-1, int(x_shape[1]), int(x_shape[2])))
        return x

    @staticmethod
    def _to_b_h_w_c(x, x_shape):
        """(b*c, h, w) -> (b, h, w, c)"""
        x = tf.reshape(
            x, (-1, int(x_shape[3]), int(x_shape[1]), int(x_shape[2]))
        )
        x = tf.transpose(x, [0, 2, 3, 1])
        return x

再具體看看tf_batch_map_offsets函數：

tf_batch_map_offsets函數

前面得到了偏移的 $(b * c, h, w, 2)$ 和變形的輸入 $(b * c, h, w)$ 。

先將偏移原本的feature採樣位置相加，得到實際的採樣position
調用tf_batch_map_coordinates做雙線性插值操作，得到輸出

變形是一堆的張量操作，沒怎麼看懂。。

def tf_batch_map_offsets(input, offsets, order=1):
    """Batch map offsets into input

    Parameters
    ---------
    input : tf.Tensor. shape = (b, s, s)
    offsets: tf.Tensor. shape = (b, s, s, 2)

    Returns
    -------
    tf.Tensor. shape = (b, s, s)
    """

    input_shape = tf.shape(input)
    batch_size = input_shape[0]
    input_size = input_shape[1]

    offsets = tf.reshape(offsets, (batch_size, -1, 2))
    grid = tf.meshgrid(
        tf.range(input_size), tf.range(input_size), indexing='ij'
    )
    grid = tf.stack(grid, axis=-1)
    grid = tf.cast(grid, 'float32')
    grid = tf.reshape(grid, (-1, 2))
    grid = tf_repeat_2d(grid, batch_size)
    coords = offsets + grid # 實際的採樣座標

    mapped_vals = tf_batch_map_coordinates(input, coords) # 雙線性插值
    return mapped_vals

tf_batch_map_coordinates函數對應的雙線性插值操作：

獲取採樣位置周圍的4個座標點位置
獲取採樣位置的像素值，雙線性插值得到實際的採樣結果


def tf_batch_map_coordinates(input, coords, order=1):
    """Batch version of tf_map_coordinates

    Only supports 2D feature maps

    Parameters
    ----------
    input : tf.Tensor. shape = (b, s, s)
    coords : tf.Tensor. shape = (b, n_points, 2)

    Returns
    -------
    tf.Tensor. shape = (b, s, s)
    """

    input_shape = tf.shape(input)
    batch_size = input_shape[0]
    input_size = input_shape[1]
    n_coords = tf.shape(coords)[1]

    # 包裝加上偏移後的Position沒有超過邊界
    coords = tf.clip_by_value(coords, 0, tf.cast(input_size, 'float32') - 1)

    # 獲取採樣的四個角座標，用於雙線性插值
    coords_lt = tf.cast(tf.floor(coords), 'int32')
    coords_rb = tf.cast(tf.ceil(coords), 'int32')
    coords_lb = tf.stack([coords_lt[..., 0], coords_rb[..., 1]], axis=-1)
    coords_rt = tf.stack([coords_rb[..., 0], coords_lt[..., 1]], axis=-1)

    idx = tf_repeat(tf.range(batch_size), n_coords)

    # 得到像素值
    def _get_vals_by_coords(input, coords):
        indices = tf.stack([
            idx, tf_flatten(coords[..., 0]), tf_flatten(coords[..., 1])
        ], axis=-1)
        vals = tf.gather_nd(input, indices)
        vals = tf.reshape(vals, (batch_size, n_coords))
        return vals

    # 獲取對應座標像素值
    vals_lt = _get_vals_by_coords(input, coords_lt)
    vals_rb = _get_vals_by_coords(input, coords_rb)
    vals_lb = _get_vals_by_coords(input, coords_lb)
    vals_rt = _get_vals_by_coords(input, coords_rt)

    # 雙線性插值
    coords_offset_lt = coords - tf.cast(coords_lt, 'float32')
    vals_t = vals_lt + (vals_rt - vals_lt) * coords_offset_lt[..., 0]
    vals_b = vals_lb + (vals_rb - vals_lb) * coords_offset_lt[..., 0]
    mapped_vals = vals_t + (vals_b - vals_t) * coords_offset_lt[..., 1]

    # 返回雙線性插值採樣值
    return mapped_vals

Deformable ConvNets--Part5： TensorFlow實現Deformable ConvNets

Deformable Convolution介紹

初探Deformable Conv

常規的CNN模型

帶入變形層的Deform-Conv模型

變形卷積模型分析

ConvOffset2D層

`ConvOffset2D`類定義

tf_batch_map_offsets函數

vue項目獲取富文本編輯器wangEditor內容導出爲word（html轉word格式並下載）

dotnet C# 創建 X11 應用時設置窗口背景顏色

Navicat安裝與激活教程

TDengine docker安裝方法

vue3組件通信與props

sapui5

Alpine Linux apk add DNS lookup error

部分JDK版本的發佈時間

工作中用到的腳本合集

合併代碼時Beyond Compare設置

Semantic Segmentation -- (DeepLabv2)Semantic Image Segmentation ... Fully Connected CRFs論文解讀

Object Detection -- 論文FPN(Feature Pyramid Networks for Object Detection)解讀

TensorFlow實戰：Chapter-8上(Mask R-CNN介紹與實現)

機器學習Chapter3-(聚類分析)聚類簡介

論文DenseNet（Densely Connected Convolutional Networks）解讀

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

Deformable ConvNets--Part5： TensorFlow實現Deformable ConvNets

Deformable Convolution介紹

初探Deformable Conv

常規的CNN模型

帶入變形層的Deform-Conv模型

變形卷積模型分析

ConvOffset2D層

ConvOffset2D類定義

tf_batch_map_offsets函數

`ConvOffset2D`類定義