最近做病理AI的細胞計數問題，需要對圖像中的各個細胞進行分類，若採用普通的CNN+普通圖像分割，估計實現效果不佳。爲了解決這個問題，大致有兩種方案：目標檢測和圖像分割。目標檢測的算法以Faster R-CNN、RetinaNet、YOLO3、SSD等算法爲代表；圖像分割則以U-Net 等爲代表。本文將簡述 U-Net。

平時接觸較多的是TensorFlow、PyTorch 和 Keras 三大框架，因此本文附上了這三大框架的代碼實現。讀者可根據自己的習慣選擇相應的實現方法。

當然，對於圖像分割問題，其實更推薦TensorFlow官方資料：Image Segmentation

注：由於本文大多數內容借鑑自大佬們的博客，而非原創，是故本文爲轉載類型，參考資料附在了文末。

1、PyTorch框架下 U-Net的實現

2、TensorFlow框架下 U-Net的實現

2-1. Layers

2-2. U-Net

3、Keras框架下 U-Net的實現

一、預備知識

1、反捲積操作

本文所介紹的U-Net中關鍵步驟是上採樣，用到了反捲積的知識，具體可參考如下資料。

反捲積（轉置卷積）操作（資料1）：卷積神經網絡CNN（1）——圖像卷積與反捲積（後卷積，轉置卷積）

反捲積（轉置卷積）操作（資料2）：Convolution arithmetic tutorial

反捲積本質上可以轉化爲卷積，下面將卷積操作的概念進行擴展（參考資料：MATLAB二維卷積）。

二維卷積的幾種計算形式（shape）：1.full 2.same 3. valid

full - 返回完整的二維卷積（如下圖）。

same - 返回卷積中大小與 A 相同的中心部分（如下圖）。

valid - 僅返回計算的沒有補零邊緣的卷積部分（如下圖）。

2、基於普通CNN實現圖像分割

早先，就有人嘗試使用傳統的CNN框架實現圖像分割，2012年NIPS上有一篇論文：Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images

思路：對圖像的每一個像素點進行分類，在每一個像素點上取一個patch，當做一幅圖像，輸入神經網絡進行訓練。

這種網絡顯然有兩個缺點：冗餘太大，每個像素點都需取patch，相鄰像素點的patch相似度高，網絡訓練很慢；感受野和定位精度不可兼得。

3、FCN（全卷積網絡）

所謂全卷積，就是將原先的全連接層替換成卷積層，使得整個網絡的所有層都有卷積操作。

對於圖像的語義分割（像素級圖像分類），Jonathan Long於2015年發表了《Fully Convolutional Networks for Semantic Segmentation》，使用FCN初步實現了圖像分割。這裏不詳述，請參考相關資料：全卷積網絡 FCN 詳解

但是到此爲止，圖像分割並不理想，之後有人在此基礎上進行上採樣，達到了更精確的分割，這也就是本文所要敘述的U-Net。U-Net是一種特殊的全卷積網絡。很多分割網絡都是基於FCNs做改進，包括Unet。

二、U-Net介紹

部分內容摘自：Unet 論文解讀代碼解讀和深入理解深度學習分割網絡Ｕnet——U-Net: Convolutional Networks for Biomedical Image Segmentation

原論文：http://www.arxiv.org/pdf/1505.04597.pdf

1、基本框架

Unet包括兩部分：第一部分，特徵提取（convolution layers），與VGG、Inception、ResNet等類似。第二部分上採樣部分（upsamping layers）。convolutions layers中每個pooling layer前一刻的activation值會concatenate到對應的upsamping層的activation值中。由於網絡結構像U型，所以叫U-Net網絡。

特徵提取部分（convolution layers），每經過一個池化層就一個尺度，包括原圖尺度一共有5個尺度。
上採樣部分（upsamping layers），每上採樣一次，就和特徵提取部分對應的通道數相同尺度融合，但是融合之前要將其crop。這裏的融合也是拼接。

Unet可以採用resnet/vgg/inception+upsampling的形式來實現。

Architecture:
a. U-net建立在FCN的網絡架構上，作者修改並擴大了這個網絡框架，使其能夠使用很少的訓練圖像就得到很精確的分割結果。
b.添加上採樣階段，並且添加了很多的特徵通道，允許更多的原圖像紋理的信息在高分辨率的layers中進行傳播。
c. U-net沒有FC層，且全程使用valid來進行卷積，這樣的話可以保證分割的結果都是基於沒有缺失的上下文特徵得到的，因此輸入輸出的圖像尺寸不太一樣(但是在keras上代碼做的都是same convolution)，對於圖像很大的輸入，可以使用overlap-strategy來進行無縫的圖像輸出。

d.爲了預測輸入圖像的邊緣部分，通過鏡像輸入圖像來外推丟失的上下文(不懂)，實則輸入大圖像也是可以的，但是這個策略基於GPU內存不夠的情況下所提出的。
e.細胞分割的另外一個難點在於將相同類別且互相接觸的細胞分開，因此作者提出了weighted loss，也就是賦予相互接觸的兩個細胞之間的background標籤更高的權重。

2、輸入輸出

醫學圖像是一般相當大，但是分割時候不可能將原圖太小輸入網絡，所以必須切成一張一張的小patch，在切成小patch的時候，Unet由於網絡結構原因適合有overlap的切圖，可以看圖，紅框是要分割區域，但是在切圖時要包含周圍區域，overlap另一個重要原因是周圍overlap部分可以爲分割區域邊緣部分提供文理等信息。可以看黃框的邊緣，分割結果並沒有受到切成小patch而造成分割情況不好。

3、反向傳播

Unet反向傳播過程，大家都知道卷積層和池化層都能反向傳播，Unet上採樣部分可以用上採樣或反捲積，那反捲積和上採樣可以怎麼反向傳播的呢？由預備知識可知，反捲積（轉置卷積）可以轉化爲卷積操作，因此也是可以反向傳播的。

三、U-Net的代碼實現

1、PyTorch框架下 U-Net的實現

本部分摘自：用Unet實現圖像分割（by pytorch）

採用的是ResNet34+upsampling的架構

class SaveFeatures():
    features=None
    def __init__(self, m): self.hook = m.register_forward_hook(self.hook_fn)
    def hook_fn(self, module, input, output): self.features = output
    def remove(self): self.hook.remove()


class UnetBlock(nn.Module):
  def __init__(self, up_in, down_in, n_out, dp=False, ps=0.25):
    super().__init__()
    up_out = down_out = n_out // 2
    self.tr_conv = nn.ConvTranspose2d(up_in, up_out, 2, 2, bias=False)
    self.conv = nn.Conv2d(down_in, down_out, 1, bias=False)
    self.bn = nn.BatchNorm2d(n_out)
    self.dp = dp
    if dp: self.dropout = nn.Dropout(ps, inplace=True)
  
  def forward(self, up_x, down_x):
    x1 = self.tr_conv(up_x)
    x2 = self.conv(down_x)
    x = torch.cat([x1, x2], dim=1)
    x = self.bn(F.relu(x))
    return self.dropout(x) if self.dp else x


class Unet34(nn.Module):
  def __init__(self, rn, drop_i=False, ps_i=None, drop_up=False, ps=None):
    super().__init__()
    self.rn = rn
    self.sfs = [SaveFeatures(rn[i]) for i in [2, 4, 5, 6]]
    self.drop_i = drop_i
    if drop_i:
      self.dropout = nn.Dropout(ps_i, inplace=True)
    if ps_i is None: ps_i = 0.1
    if ps is not None: assert len(ps) == 4
    if ps is None: ps = [0.1] * 4
    self.up1 = UnetBlock(512, 256, 256, drop_up, ps[0])
    self.up2 = UnetBlock(256, 128, 256, drop_up, ps[1])
    self.up3 = UnetBlock(256, 64, 256, drop_up, ps[2])
    self.up4 = UnetBlock(256, 64, 256, drop_up, ps[3])
    self.up5 = nn.ConvTranspose2d(256, 1, 2, 2)
  
  def forward(self, x):
    x = F.relu(self.rn(x))
    x = self.dropout(x) if self.drop_i else x
    x = self.up1(x, self.sfs[3].features)
    x = self.up2(x, self.sfs[2].features)
    x = self.up3(x, self.sfs[1].features)
    x = self.up4(x, self.sfs[0].features)
    x = self.up5(x)
    return x[:, 0]
  
  def close(self):
    for o in self.sfs: o.remove()

通過註冊nn.register_forward_hook() ，將指定resnet34指定層（2, 4, 5, 6）的activation值保存起來，在upsampling的過程中將它們concatnate到相應的upsampling layer中。upsampling layer中使用ConvTranspose2d()來做deconvolution，ConvTranspose2d()的工作機制和conv2d()正好相反，用於增加feature map的grid size

Training

Unet模型訓練大致分兩步：

通過LR Test找出合適的學習率區間。
Cycle Learning Rate (CLR) 的方法來訓練模型，直至過擬合。

wd = 4e-4
arch = resnet34
ps_i = 0.05
ps = np.array([0.1, 0.1, 0.1, 0.1]) * 1
m_base = get_base_model(arch, cut, True)
m = to_gpu(Unet34(m_base, drop_i=True, drop_up=True, ps=ps, ps_i=ps_i))
models = UnetModel(m)
learn = ConvLearner(md, models)
learn.opt_fn = optim.Adam
learn.crit = nn.BCEWithLogitsLoss()
learn.metrics = [accuracy_thresh(0.5), miou]

當模型訓練到無法通過變化學習率來減少loss值，val loss收斂且有過擬合的可能時，停止模型的訓練。

除了上述代碼，網上還有幾個不錯的實現：

https://github.com/milesial/Pytorch-UNet

http://www.andrewjanowczyk.com/pytorch-unet-for-digital-pathology-segmentation/

https://github.com/ugent-korea/pytorch-unet-segmentation

2、TensorFlow框架下 U-Net的實現

代碼來源：https://github.com/jakeret/tf_unet

解讀來源：Unet 論文解讀代碼解讀

2-1. Layers

初始化weights 和 bias

def weight_variable(shape, stddev=0.1, name="weight"):
    initial = tf.truncated_normal(shape, stddev=stddev)
    return tf.Variable(initial, name=name)

def weight_variable_devonc(shape, stddev=0.1, name="weight_devonc"):
    return tf.Variable(tf.truncated_normal(shape, stddev=stddev), name=name)

def bias_variable(shape, name="bias"):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial, name=name)

創建卷積層和池化層
這裏的padding使用的是VALID，和論文裏面所指出的是一樣的。deconv2d是反捲積，也就是upsampling，以第一個upsample爲例，輸如的x的shape爲[None,28,28,1024]，則輸出的shape爲[None,52,52,512]。反捲積的計算細節參考https://blog.csdn.net/nijiayan123/article/details/79416764。

def conv2d(x, W, b, keep_prob_):
    with tf.name_scope("conv2d"):
        conv_2d = tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='VALID')
        conv_2d_b = tf.nn.bias_add(conv_2d, b)
        return tf.nn.dropout(conv_2d_b, keep_prob_)

def deconv2d(x, W,stride):
    with tf.name_scope("deconv2d"):
        x_shape = tf.shape(x)
        output_shape = tf.stack([x_shape[0], x_shape[1]*2, x_shape[2]*2, x_shape[3]//2])
        return tf.nn.conv2d_transpose(x, W, output_shape, strides=[1, stride, stride, 1], padding='VALID', name="conv2d_transpose")

def max_pool(x,n):
    return tf.nn.max_pool(x, ksize=[1, n, n, 1], strides=[1, n, n, 1], padding='VALID')

連接前面部分的池化層和後面的反捲積層

def crop_and_concat(x1,x2):
    with tf.name_scope("crop_and_concat"):
        x1_shape = tf.shape(x1)
        x2_shape = tf.shape(x2)
        # offsets for the top left corner of the crop
        offsets = [0, (x1_shape[1] - x2_shape[1]) // 2, (x1_shape[2] - x2_shape[2]) // 2, 0]
        size = [-1, x2_shape[1], x2_shape[2], -1]
        x1_crop = tf.slice(x1, offsets, size)
        return tf.concat([x1_crop, x2], 3)

計算pixel-wise softmax和cross entropy
注意到這裏一個pixel相當於一個預測目標，在通常的分類任務中，最後輸出結果通常都是一個一維向量[1,class_nums]，然後取softmax運算後得分最高的class標籤。在這裏，最後輸出結果是一個三維向量[width,height,class_nums]，每一個pixel都要單獨進行標籤的預測，故叫pixel-wise softmax。

def pixel_wise_softmax(output_map):
    with tf.name_scope("pixel_wise_softmax"):
        max_axis = tf.reduce_max(output_map, axis=3, keepdims=True)
        exponential_map = tf.exp(output_map - max_axis)
        normalize = tf.reduce_sum(exponential_map, axis=3, keepdims=True)
        return exponential_map / normalize

def cross_entropy(y_,output_map):
    return -tf.reduce_mean(y_*tf.log(tf.clip_by_value(output_map,1e-10,1.0)), name="cross_entropy")

2-2. U-Net

網絡分爲四個主要部分：preprocessing、down convolution、up convolution、Output Map
preprocessing

def create_conv_net(x, keep_prob, channels, n_class, layers=3, features_root=16, filter_size=3, pool_size=2,
                    summaries=True):
    """
    Creates a new convolutional unet for the given parametrization.
    :param x: input tensor, shape [?,nx,ny,channels]
    :param keep_prob: dropout probability tensor
    :param channels: number of channels in the input image
    :param n_class: number of output labels
    :param layers: number of layers in the net
    :param features_root: number of features in the first layer
    :param filter_size: size of the convolution filter
    :param pool_size: size of the max pooling operation
    :param summaries: Flag if summaries should be created
    """

    logging.info(
        "Layers {layers}, features {features}, filter size {filter_size}x{filter_size}, pool size: {pool_size}x{pool_size}".format(
            layers=layers,
            features=features_root,
            filter_size=filter_size,
            pool_size=pool_size))

    # Placeholder for the input image
    with tf.name_scope("preprocessing"):
        nx = tf.shape(x)[1]
        ny = tf.shape(x)[2]
        x_image = tf.reshape(x, tf.stack([-1, nx, ny, channels]))
        in_node = x_image
        batch_size = tf.shape(x_image)[0]

    weights = []
    biases = []
    convs = []
    pools = OrderedDict()
    deconv = OrderedDict()
    dw_h_convs = OrderedDict()
    up_h_convs = OrderedDict()

    in_size = 1000
    size = in_size

down convolution
layers=3，有三次下卷積層，一個下卷積層實際包括兩次下卷積和一次pooling。

    # down layers
    for layer in range(0, layers):
        with tf.name_scope("down_conv_{}".format(str(layer))):
            features = 2 ** layer * features_root
            stddev = np.sqrt(2 / (filter_size ** 2 * features))
            if layer == 0:
                w1 = weight_variable([filter_size, filter_size, channels, features], stddev, name="w1")
            else:
                w1 = weight_variable([filter_size, filter_size, features // 2, features], stddev, name="w1")

            w2 = weight_variable([filter_size, filter_size, features, features], stddev, name="w2")
            b1 = bias_variable([features], name="b1")
            b2 = bias_variable([features], name="b2")

            conv1 = conv2d(in_node, w1, b1, keep_prob)
            tmp_h_conv = tf.nn.relu(conv1)
            conv2 = conv2d(tmp_h_conv, w2, b2, keep_prob)
            dw_h_convs[layer] = tf.nn.relu(conv2)

            weights.append((w1, w2))
            biases.append((b1, b2))
            convs.append((conv1, conv2))

            size -= 4
            if layer < layers - 1:
                pools[layer] = max_pool(dw_h_convs[layer], pool_size)
                in_node = pools[layer]
                size /= 2

    in_node = dw_h_convs[layers - 1]

up convolution
layers=3，有三次反捲積層，一個反捲積層實際包括一個反捲積，一個連接操作和兩次下卷積。

    # up layers
    for layer in range(layers - 2, -1, -1):
        with tf.name_scope("up_conv_{}".format(str(layer))):
            features = 2 ** (layer + 1) * features_root
            stddev = np.sqrt(2 / (filter_size ** 2 * features))

            wd = weight_variable_devonc([pool_size, pool_size, features // 2, features], stddev, name="wd")
            bd = bias_variable([features // 2], name="bd")
            h_deconv = tf.nn.relu(deconv2d(in_node, wd, pool_size) + bd)
            h_deconv_concat = crop_and_concat(dw_h_convs[layer], h_deconv)
            deconv[layer] = h_deconv_concat

            w1 = weight_variable([filter_size, filter_size, features, features // 2], stddev, name="w1")
            w2 = weight_variable([filter_size, filter_size, features // 2, features // 2], stddev, name="w2")
            b1 = bias_variable([features // 2], name="b1")
            b2 = bias_variable([features // 2], name="b2")

            conv1 = conv2d(h_deconv_concat, w1, b1, keep_prob)
            h_conv = tf.nn.relu(conv1)
            conv2 = conv2d(h_conv, w2, b2, keep_prob)
            in_node = tf.nn.relu(conv2)
            up_h_convs[layer] = in_node

            weights.append((w1, w2))
            biases.append((b1, b2))
            convs.append((conv1, conv2))

            size *= 2
            size -= 4

Output Map

    # Output Map
    with tf.name_scope("output_map"):
        weight = weight_variable([1, 1, features_root, n_class], stddev)
        bias = bias_variable([n_class], name="bias")
        conv = conv2d(in_node, weight, bias, tf.constant(1.0))
        output_map = tf.nn.relu(conv)
        up_h_convs["out"] = output_map

    if summaries:
        with tf.name_scope("summaries"):
            for i, (c1, c2) in enumerate(convs):
                tf.summary.image('summary_conv_%02d_01' % i, get_image_summary(c1))
                tf.summary.image('summary_conv_%02d_02' % i, get_image_summary(c2))

            for k in pools.keys():
                tf.summary.image('summary_pool_%02d' % k, get_image_summary(pools[k]))

            for k in deconv.keys():
                tf.summary.image('summary_deconv_concat_%02d' % k, get_image_summary(deconv[k]))

            for k in dw_h_convs.keys():
                tf.summary.histogram("dw_convolution_%02d" % k + '/activations', dw_h_convs[k])

            for k in up_h_convs.keys():
                tf.summary.histogram("up_convolution_%s" % k + '/activations', up_h_convs[k])

    variables = []
    for w1, w2 in weights:
        variables.append(w1)
        variables.append(w2)

    for b1, b2 in biases:
        variables.append(b1)
        variables.append(b2)

    return output_map, variables, int(in_size - size)