本文參考論文Deep Bilateral Learning for Real-Time Image Enhanceme,按照原文中的第三章,通過闡述重點思想並結合代碼來介紹hdrnet的網絡結構。
論文project地址:https://groups.csail.mit.edu/graphics/hdrnet/
3 OUR ARCHITECTURE
大部分的運算髮生在低分辨率的圖像上,即上圖的黃色背景部分,其最終預測出一個視爲雙向網格(bilateral grid)的局部仿射變換A。根據經驗,圖像增強不僅依賴於局部圖像特徵,還依賴於全局圖像特徵如直方圖、平均強度甚至場景類別等。於是低分辨率處理流被分割成了提取局部(3.1.2) 和 全局(3.1.3) 特徵的兩條路徑,最後將這兩條路徑 匯聚(3.1.4) 起來得到最終的預測。
上圖下半部分的高分辨率處理流運算量小,但在捕捉高頻效果(capturing high-frequency effects )和保留邊緣信息(preserving edges)上發揮重要作用。爲此引入了由雙向網格處理激發的切割節點(slicing node),即上圖標紅部分。這個節點基於學習到的引導圖g(3.4.1),在A上進行了數據相關的查找,由此得到切割後的係數A~(3.3)。最後對每個像素應用局部顏色變換(local color transform),上圖綠色部分,即可獲得最終輸出O。
3.1 Low-resolution prediction of bilateral coefficients
低分辨率圖像尺寸固定爲256X256。其首先經過一系列卷積層來提取低等級特徵並下采樣(3.1.1),然後分爲兩條不對稱路徑:局部路徑全由卷積層組成,用於學習局部特徵和在保留空間信息的前提下向前傳播圖像數據;全局路徑有卷積層和FC層,用覆蓋了整個低分辨率圖像的感受野來學習一個固定尺寸的全局特徵向量。隨後這兩條路徑被匯聚成(fused into)特徵F,其經過一逐點線性層後輸出A。
3.1.1 Low-level features
這些層使得數據的空間維度下降了2nS,其中nS爲層數,文中nS=4.
nS的作用爲:
- 決定了低分辨率圖像到最終仿射係數A的空間下采樣,nS越大,最終網格A粒度越大
- 控制着預測的複雜度。層數越多,其空間支持將會指數性增大,非線性複雜度也會增大。因此可以從輸入中提取更復雜的模式
# -----------------------------------------------------------------------
# low-level features Si
with tf.variable_scope('splat'):
n_ds_layers = int(np.log2(params['net_input_size']/spatial_bin))
current_layer = input_tensor
for i in range(n_ds_layers): # 4個卷積層
if i > 0: # don't normalize first layer
use_bn = params['batch_norm']
else:
use_bn = False
current_layer = conv(current_layer, cm*(2**i)*gd, 3, stride=2, # 可推算出cm*gd=8
batch_norm=use_bn, is_training=is_training,
scope='conv{}'.format(i+1))
splat_features = current_layer
# -----------------------------------------------------------------------
3.1.2 Local features path
# -----------------------------------------------------------------------
# 3.1.2 local features Li 經過兩層卷積層後得到局部特徵
with tf.variable_scope('local'):
current_layer = splat_features
#兩層卷積層
current_layer = conv(current_layer, 8*cm*gd, 3,
batch_norm=params['batch_norm'],
is_training=is_training,
scope='conv1')
# don't normalize before fusion
current_layer = conv(current_layer, 8*cm*gd, 3, activation_fn=None,
use_bias=False, scope='conv2')
grid_features = current_layer
# -----------------------------------------------------------------------
3.1.3 Global features path
# -----------------------------------------------------------------------
# 3.1.3 global features Gi 經過兩層卷積層和三層全連接層得到全局特徵
with tf.variable_scope('global'):
n_global_layers = int(np.log2(spatial_bin/4)) # 4x4 at the coarsest lvl
current_layer = splat_features
for i in range(2): # 兩層卷積
current_layer = conv(current_layer, 8*cm*gd, 3, stride=2,
batch_norm=params['batch_norm'], is_training=is_training,
scope="conv{}".format(i+1))
_, lh, lw, lc = current_layer.get_shape().as_list()
current_layer = tf.reshape(current_layer, [bs, lh*lw*lc])
# 三層全連接層
current_layer = fc(current_layer, 32*cm*gd,
batch_norm=params['batch_norm'], is_training=is_training,
scope="fc1")
current_layer = fc(current_layer, 16*cm*gd,
batch_norm=params['batch_norm'], is_training=is_training,
scope="fc2")
# don't normalize before fusion
current_layer = fc(current_layer, 8*cm*gd, activation_fn=None, scope="fc3")
global_features = current_layer # (1, 64)
# -----------------------------------------------------------------------
3.1.4 Fusion and linear prediction & 3.2 Image features as a bilateral grid
# -----------------------------------------------------------------------
# 3.1.4 將局部特徵與全局特徵進行fusion
# “fuse the contributions of the local and global paths with a pointwise affine mixing followed by a ReLU activation”
with tf.name_scope('fusion'):
fusion_grid = grid_features # (1, 16, 16, 64)
fusion_global = tf.reshape(global_features, [bs, 1, 1, 8*cm*gd]) # (1, 1, 1, 64)
fusion = tf.nn.relu(fusion_grid+fusion_global) # (1, 16, 16, 64) 公式(2),此處獲得Fusion F
# fusion is a 16*16*64 array of features
# -----------------------------------------------------------------------
# -----------------------------------------------------------------------
# 3.1.4 linear prediction, from fusion we make our final 1*1 linear prediction to produce a 16*16 map with 96 channels
with tf.variable_scope('prediction'):
current_layer = fusion # (1,16,16,96)
current_layer = conv(current_layer, gd*cls.n_out()*cls.n_in(), 1,
activation_fn=None, scope='conv1') # 公式(3), 此處獲得feature map A
# 3.2 Image features as a bilateral grid
with tf.name_scope('unroll_grid'): # 公式(4)
current_layer = tf.stack(
tf.split(current_layer, cls.n_out()*cls.n_in(), axis=3), axis=4) # (1,16,16,8,12)
current_layer = tf.stack(
tf.split(current_layer, cls.n_in(), axis=4), axis=5) # (1,16,16,8,3,4)
tf.add_to_collection('packed_coefficients', current_layer)
# -----------------------------------------------------------------------
3.3 Upsampling with a trainable slicing layer
@classmethod
# 3.3 UpSamling with a trainable slicing layer
# 輸入引導圖g與以雙向網格存儲的特徵圖A,經過slicing後得到full res的特徵圖A_
def _output(cls, im, guide, coeffs):
with tf.device('/gpu:0'):
out = bilateral_slice_apply(coeffs, guide, im, has_offset=True, name='slice')
return out
def bilateral_slice_apply(grid, guide, input_image, has_offset=True, name=None):
"""Slices into a bilateral grid using the guide map.
Args:
grid: (Tensor) [batch_size, grid_h, grid_w, depth, n_outputs]
grid to slice from.
guide: (Tensor) [batch_size, h, w ] guide map to slice along.
input_image: (Tensor) [batch_size, h, w, n_input] input data onto which to
apply the affine transform.
name: (string) name for the operation.
Returns:
sliced: (Tensor) [batch_size, h, w, n_outputs] sliced output.
"""
with tf.name_scope(name):
gridshape = grid.get_shape().as_list()
if len(gridshape) == 6:
gs = tf.shape(grid)
_, _, _, _, n_out, n_in = gridshape
grid = tf.reshape(grid, tf.stack([ gs[0], gs[1], gs[2], gs[3], gs[4]*gs[5] ])) # 將grid的形狀reshape爲(1,16,16,8,12)
# grid = tf.concat(tf.unstack(grid, None, axis=5), 4)
sliced = hdrnet_ops.bilateral_slice_apply(grid, guide, input_image, has_offset=has_offset)
return sliced
@ops.RegisterShape('BilateralSliceApply')
def _bilateral_slice_shape(op):
grid_tensor = op.inputs[0] # reshape 之後的grid
guide_tensor = op.inputs[1] # 引導圖
input_tensor = op.inputs[2] # full-res 圖像
has_offset = op.get_attr('has_offset')
chan_in = input_tensor.get_shape()[-1]
chan_grid = grid_tensor.get_shape()[-1]
if has_offset:
chan_out = chan_grid // (chan_in+1)
else:
chan_out = chan_grid // chan_in
return [guide_tensor.get_shape().concatenate(chan_out)]
3.4 Assembling the full-resolution output
3.4.1 Guidance map auxiliary network
def _guide(cls, input_tensor, params, is_training):
# 3.4.1 輸入全分辨率圖像來獲得引導圖g; input_tensor爲(1, ?, ?, 3)的full_res input,
npts = 16 # number of control points for the curve
nchans = input_tensor.get_shape().as_list()[-1]
guidemap = input_tensor
# Color space change
idtity = np.identity(nchans, dtype=np.float32) + np.random.randn(1).astype(np.float32)*1e-4 # 三階單位矩陣加上隨機數, "M is initialized to the identity"
ccm = tf.get_variable('ccm', dtype=tf.float32, initializer=idtity) # 用以上矩陣來初始化一個變量ccm,故其爲(3,3),ccm是需要通過學習來優化的
with tf.name_scope('ccm'):
ccm_bias = tf.get_variable('ccm_bias', shape=[nchans,], dtype=tf.float32, initializer=tf.constant_initializer(0.0)) # 初始化要學習的偏置ccm_bias
# 下兩行爲執行文中的公式(6)中的括號部分
guidemap = tf.matmul(tf.reshape(input_tensor, [-1, nchans]), ccm) # 原始圖像reshape後與參數矩陣相乘
guidemap = tf.nn.bias_add(guidemap, ccm_bias, name='ccm_bias_add') # 加上偏置
guidemap = tf.reshape(guidemap, tf.shape(input_tensor)) # reshape回原來的形狀(1,?,?,3)
# Per-channel curve, 以下block爲執行公式(7)
with tf.name_scope('curve'):
shifts_ = np.linspace(0, 1, npts, endpoint=False, dtype=np.float32) # 在指定的間隔內返回均勻間隔的數字,此處以0.0625爲間隔構造16個元素的數組【0, 0.0625, 0.125, ……, 0.9375】
shifts_ = shifts_[np.newaxis, np.newaxis, np.newaxis, :]
shifts_ = np.tile(shifts_, (1, 1, nchans, 1))
guidemap = tf.expand_dims(guidemap, 4) # 在guidemap的第四維插入一個維度,此時guidmap形狀爲(1,?,?,3,1)
shifts = tf.get_variable('shifts', dtype=tf.float32, initializer=shifts_) # shitfs_形狀爲(1,1,3,16), 內容爲三個上述間隔數組
slopes_ = np.zeros([1, 1, 1, nchans, npts], dtype=np.float32)
slopes_[:, :, :, :, 0] = 1.0
slopes = tf.get_variable('slopes', dtype=tf.float32, initializer=slopes_) # (1,1,1,3,16)
guidemap = tf.reduce_sum(slopes*tf.nn.relu(guidemap-shifts), reduction_indices=[4]) # 公式(7)
guidemap = tf.contrib.layers.convolution2d( # p_c再經過一個卷積核大小爲1的卷積層,相當於公式(6)
inputs=guidemap,
num_outputs=1, kernel_size=1,
weights_initializer=tf.constant_initializer(1.0/nchans),
biases_initializer=tf.constant_initializer(0),
activation_fn=None,
variables_collections={'weights':[tf.GraphKeys.WEIGHTS], 'biases':[tf.GraphKeys.BIASES]},
outputs_collections=[tf.GraphKeys.ACTIVATIONS],
scope='channel_mixing')
guidemap = tf.clip_by_value(guidemap, 0, 1)
guidemap = tf.squeeze(guidemap, squeeze_dims=[3,])
return guidemap
3.4.2 Assembling the final output
模型運行結果
使用pretrained_models目錄下的download.py下載預訓練模型後,在圖像上使用預訓練模型experts/experts_cm1/expertA與experts/experts_cm1/expertB獲得的結果如下:
Tensorboard
爲了獲取對網絡結構和模型運算處理過程更直觀的認識,用tensorboard生成了graph並根據文中章節和公示等一一標註,如下:
【參考文獻】:
Gharbi M, Chen J, Barron J T, et al. Deep bilateral learning for real-time image enhancement[J]. ACM Transactions on Graphics (TOG), 2017, 36(4): 118.