文章目錄

3.1.鏈式法則與梯度彌散

ResNet（Residual Neural Network）由微軟研究員的 Kaiming He 等四位華人提出，通過使用 Residual Uint 成功訓練152層深的神經網絡，在 ILSVRC 2015比賽中獲得了冠軍，取得了 3.57%的top-5 的錯誤率，同時參數量卻比 VGGNet低，效果非常突出，因爲它“簡單與實用”並存，之後很多方法都建立在ResNet50或者ResNet101的基礎上完成的，檢測，分割，識別等領域都紛紛使用ResNet，Alpha zero 也使用了ResNet，所以可見ResNet確實很好用。ResNet的結構可以極快的加速超深神經網絡的訓練，模型的準確率也有非常大的提升。之前我們學習了Inception V3，而Inception V4則是將 Inception Module和ResNet相結合。可以看到ResNet是一個推廣性非常好的網絡結構，甚至可以直接應用到 Inception Net中。

1.Highway Network簡介

在ResNet之前，瑞士教授 Schmidhuber 提出了 Highway Network，原理與ResNet很相似。這位Schmidhuber 教授同時也是 LSTM網絡的發明者，而且是早在1997年發明的，可謂是神經網絡領域元老級的學者。通常認爲神經網絡的深度對其性能非常重要，但是網絡越深其訓練難度越大，Highway Network的目標就是解決極深的神經網絡難以訓練的問題。
Highway Network相當於修改了每一層的激活函數，此前的激活函數只是對輸入做一個非線性變換 y = H(x, WH) ，Highway Network 則允許保留一定比例的原始輸入 x，即 y = H(x, WH)*T(x, WT) + x * C(x, WC) ，其中 T是變換系數，C爲保留係數。論文中令 C= 1 - T。這樣前面一層的信息，有一定比例可以不經過矩陣乘法和非線性變換，直接傳輸到下一層，彷彿一條信息高速公路，因而得名 Highway Network。Highway Network主要通過 gating units 學習如何控制網絡中的信息流，即學習信息應保留的比例。
這個可學習的 gating機制，正是借鑑自Schmidhuber 教授早年的 LSTM 訓練神經網絡中的gating。幾百乃至上千層深的 Highway Network可以直接使用梯度下降算法訓練，並可以配合多種非線性激活函數，學習極深的神經網絡現在變得可行了。事實上，Highway Network 的設計在理論上允許其訓練任意深的網絡，其優化方法基本上與網絡的深度獨立，而傳統的神經網絡結構則對深度非常敏感，訓練複雜度隨着深度增加而急劇增加。

2.模型加深存在的問題

ResNet 和 HighWay Network非常類似，也就是允許原始輸入信息直接傳輸到後面的層中。ResNet最初的靈感來自這個問題：在不斷加神經網絡的深度時，會出現一個 Degradation 的問題，即準確率會先上升然後達到飽和，再持續增加深度則會導致準確率下降。這並不是一個過擬合的問題，因爲不光在測試集上誤差增大，訓練集本身誤差也會增大。假設有一個比較淺的網絡達到了飽和的準確率，那麼後面再加上幾個y=x的全等映射層，起碼誤差不會增加，即更深的網絡不應該帶來訓練集上誤差上升。而這裏提到的使用全等映射直接將前一層輸出傳到後面的思想，就是 ResNet的靈感來源。

假定某段神經網絡的輸入是 x，期望輸出是 H(x)，如果我們直接把輸入 x 傳到輸出作爲初始結果，那麼此時我們需要學習的目標就是 F(x) = H(x) - x。如下圖所示，這就是一個ResNet的殘差學習單元（Residual Unit），ResNet相當於將學習目標改變了，不再是學習一個完整的輸出 H(x)，只是輸出和輸入的差別 H(x) - x，即殘差。

如下圖所示，CIFIR10 數據的一個實驗，左側爲訓練誤差，右側是測試誤差，不光在測試集上誤差比較大，訓練集本身的誤差也非常大。

隨着網絡越深，精準度的變化如下圖：

通過實驗可以發現：隨着網絡層級的不斷增加，模型精度不斷得到提升，而當網絡層級增加到一定的數目以後，訓練精度和測試精度迅速下降，這說明當網絡變得很深以後，深度網絡變得更加難以訓練了。

3.爲什麼深度模型難以訓練

爲什麼隨着網絡層級越深，模型效果卻變差了呢？

3.1.鏈式法則與梯度彌散

下圖是一個簡單的神經網絡圖，由輸入層，隱含層，輸出層構成：

回想一下神經網絡反向傳播的原理，先通過正向傳播計算出結果 output，然後通過與樣本比較得出誤差值 $E_{total}$ ：

根據誤差結果，利用著名的“鏈式法則”求偏導，使結果誤差反向傳播從而得出權重w調整的梯度。下圖是輸出結果到隱含層的反向傳播過程（隱含層到輸入層的反向傳播過程也是類似）：

通過不斷迭代，對參數矩陣進行不斷調整後，使得輸出結果的誤差值更小，使輸出結果與事實更加接近。

從上面的過程來看，神經網絡在反向傳播過程中要不斷地傳播梯度，而當網絡層數加深時，梯度在傳播過程中會逐漸消失（假如採用Sigmoid函數，對於幅度爲1的信號，每向後傳遞一層，梯度就衰減爲原來的 0.25，層數越多，衰減越厲害），導致無法對前面網絡層的權重進行有效的調整。

#3.2.冪的特點
$1.01^{ 365} = 37.783$
$0.99 ^{365} = 0.0255$

4.ResNet的特點

假設：假如有一個比較淺網絡（Shallow Net）的準確率達到了飽和，那麼後面再加上幾個 y = x 的恆等映射（Identity Mappings），按理說，即使準確率不能再提速了，起碼誤差不會增加（也即更深的網絡不應該帶來訓練集上誤差的上升），但是實驗證明準確率下降了，這說明網絡越深，訓練難度越大。而這裏提到的使用恆等映射直接將前一層輸出傳到後面的思想，便是著名深度殘差網絡ResNet的靈感來源。

ResNet引入了殘差網絡結構（residual Network），通過這種殘差網絡結構，可以把網絡層弄得很深（據說目前可以達到1000多層），並且最終的分類效果也非常好，殘差網絡的基本結構如下圖所示，很明顯，該圖示帶有跳躍結構的：

F(x) 是一個殘差映射 w, r, t 恆等，如果說恆等是理想，很容易將權重值設定爲0，如果理想化映射更接近於恆等映射，便更容易發現微小波動。
殘差網絡借鑑了高速網絡（Highway Network）的跨層鏈接思想，但對其進行修改（殘差項原本是帶權值的，但是ResNet用恆等映射代替之）

假定某段神經網絡的輸入是x，期望輸出是H(x)，即H(x)是期望的複雜潛在映射，如果是要學習這樣的模型，則訓練難度會比較大；

保證訓練準確率不下降的辦法：
回想前面的假設，如果已經學習到較飽和的準確率（或者當發現下層的誤差變大時），那麼接下來的學習目標就轉變爲恆等映射的學習，也就是使輸入x近似於輸出H(x)，以保持在後面的層次中不會造成精度下降。

在上圖的殘差網絡結構圖中，通過“shortcut connections（捷徑連接）”的方式，直接把輸入x傳到輸出作爲初始結果，輸出結果爲H(x)=F(x)+x，當F(x)=0時，那麼H(x)=x，也就是上面所提到的恆等映射。於是，ResNet相當於將學習目標改變了，不再是學習一個完整的輸出，而是目標值H(X)和x的差值，也就是所謂的殘差F(x) = H(x)-x，因此，後面的訓練目標就是要將殘差結果逼近於0，使到隨着網絡加深，準確率不下降。
這種殘差跳躍式的結構，打破了傳統的神經網絡n-1層的輸出只能給n層作爲輸入的慣例，使某一層的輸出可以直接跨過幾層作爲後面某一層的輸入，其意義在於爲疊加多層網絡而使得整個學習模型的錯誤率不降反升的難題提供了新的方向。
至此，神經網絡的層數可以超越之前的約束，達到幾十層、上百層甚至千層，爲高級語義特徵提取和分類提供了可行性。

下面感受一下34層的深度殘差網絡的結構圖：

從圖中可以看出，怎麼有一些“shortcut connections（捷徑連接）”是實現，有一些是虛線，有什麼區別呢？

因爲經過“shortcut-connections（捷徑連接）”後，H(x) = F(x) + x，如果 F(x) 和 x 通道相同，則可直接相加，那麼通道不同怎麼相加呢。上圖的實線，虛線就是爲了區分這兩種情況的：

實線的Connection部分，表示通道相同，如上圖的第一個粉色矩形和第三個粉色矩形，都是 3*3*64 的特徵圖，由於通道相同，所以採用計算方式爲H(x) = F(x) + x；
虛線的 Connection 部分，表示通道不同，如上圖的第一個綠色矩形和第三個粉色矩形，分別爲 3*3*64 和 3*3*128 的特徵圖，通道不同，採用的計算方式爲 H(x) = F(x) + Wx，其中 W 爲卷積操作，用來調整x維度的。
下圖是兩層及三層的ResNet殘差學習模塊：

兩種結構分別針對 ResNet34（左圖）和 ResNet50/101/152（右圖），其目的主要就是爲了降低參數的數目，左圖是兩個 3*3*256 的卷積，參數數目：3*3*256*256*2 = 1179648（輸入卷積核的參數個數輸出濾波器個數，width×height×in_channel×out_channel），右圖是第一個11的卷積把256維通道降到64維，然後在最後通過1*1卷積恢復，整體上用的參數數目爲：1*1*256*64 + 3*3*64*64 + 1*1*64*256 = 69632，右圖的參數數量比左圖減少 16.94倍，因此，右圖的主要目的就是爲了減少參數量，從而減少計算量。

對於常規的ResNet，可以用於34層或者更少的網絡中（左圖）；對於更深的網絡（如101層），則使用右圖，其目的是減少計算和參數量。
經檢驗，深度殘差網絡的確解決了退化問題，如下圖所示，上圖爲平原網絡（plain network）網絡層次越深（34層）比網絡層次淺的（18層）的誤差率更高；右圖爲殘差網絡ResNet的網絡層次越深（34層）比網絡層次淺（18層）的誤差率更低。

5.VGGNet-19 VS ResNet-34（ResNet的創新點）

在提出殘差學習的思想，傳統的卷積網絡或者全連接網絡在信息傳遞的時候或多或少會存在信息丟失，損耗等問題，同時還有導致梯度小時或梯度爆炸，導致很深的網絡無法訓練。ResNet在一定程度上解決了這個問題，通過直接將輸入信息繞道傳到輸出，保護信息的完整性，整個網絡只需要學習輸入，輸出差別的那一部分，簡化學習目標和難度。

下圖所示爲 VGGNet-19，以及一個34層深的普通卷積網絡，和34層深的ResNet網絡的對比圖。可以看到普通直連的卷積神經網絡和ResNet的最大區別在於，ResNet有很多旁路的支線將輸入直接連到後面的層，使得後面的層可以直接學習殘差，這種結構也被稱爲 shortcut或 skip connections。

傳統的卷積層或全連接層在信息傳遞時，或多或少的會存在信息丟失，損耗等問題。ResNet 在某種程度上解決了這個問題，通過直接將輸入信息繞道傳到輸出，保護信息的完整性，整個網絡則需要學習輸入，輸出差別的那一部分，簡化學習目標和難度。

在ResNet的論文中，處理下圖中的兩層殘差學習單元，還有三層的殘差學習單元。兩層的殘差學習單元中包含兩個相同輸出通道數（因爲殘差等於目標輸出減去輸入，即 H(x) - x，因此輸入，輸出維度需保持一致）的 3*3 卷積；而3層的殘差網絡則使用了 Network In Network 和 Inception Net中的 1*1 卷積，並且是在中間 3*3 的卷積前後都使用了 1*1 卷積，有先降維再升維的操作。另外，如果有輸入，輸出維度不同的情況，我們可以對 x 做一個線性映射變換維度，再連接到後面的層。

下圖爲 VGG-19 ，直連的 34層網絡，和ResNet的34層網絡的結構對比：

6.ResNet不同層數的網絡配置

下圖是ResNet 不同層數時的網絡配置（這裏我們特別提出ResNet50和ResNet101，主要是因爲他們的出鏡率很高，所以需要做特別的說明）：

　　上表中，我們一共提出了五種深度的ResNet，分別是18， 34， 50， 101和152，首先看圖2最左側，我們發現所有的網絡都分爲五部分，分別是 conv1, conv2_x, conv3_x, conv4_x , conv5_x，之後的其他論文也會專門用這個稱呼指代 ResNet 50 或者 101 的每部分。
　　拿 101-layer 那列，我們先看看 101-layer 是不是真的是 101 層網絡，首先有個輸入 7*7*64的卷積，然後經過 3 + 4 + 23+ 3 = 33 個 building block ，每個 block 爲3層，所以有 33*3 = 99 層，最後有個 fc 層（用於分類），所有有 1+99+1=101層，確實有101層網絡；

注意1：101 層網絡僅僅指卷積或者全連接層，而激活層或者 Pooling 層並沒有計算在內；
注意2：這裏我們關注50-layer 和 101-layer 這兩列，可以發現，他們唯一的不同在於 conv4_x, ResNet50有6個block，而 ResNet101有 23 個 block，插了17個block，也就是 17*3=51層。

在使用了ResNet的結構後，可以發現層數不斷加深導致的訓練集上誤差增大的現象被消除了，ResNet 網絡的訓練誤差會隨着層數增大而逐漸減小，並且在測試機上的表現也會變好。在ResNet推出後不久，Google就借鑑了ResNet的精髓，提出了 Inception V4和 Inception-ResNet-V2，並通過融合這兩個模型，在 ILSVRC數據集上取得了驚人的 3.08%的錯誤率。可見，ResNet及其思想對卷積神經網絡研究的貢獻確實非常顯著，具有很強的推廣性。在ResNet的作者的第二篇相關論文 Identity Mappings in Deep Rsidual Networks中，ResNet V2被提出。ResNet V2和 ResNet V1 的主要區別在於，作者通過研究 ResNet 殘差學習單元的傳播公式，發現前饋和反饋信息可以直接傳輸，因此 skip connection 的非線性激活函數（如ReLU）替換爲 Identity Mappings(y = x)。同時，ResNet V2在每一層中都使用了Batch Normalization。這樣處理之後，新的殘差學習單元將比以前更容易訓練且泛化性更強。
　　根據 Schmidhuber 教授的觀點，ResNet 類似於一個沒有Gates 的LSTM 網絡，即將輸入 x 傳遞到後面層的過程是一直髮生的，而不是學習出來的。同時，最近也有兩篇論文表示，ResNet 基本等價於 RNN且ResNet的效果類似於在多層網絡間的集成方法（ensemble）。ResNet在加深網絡層數上做出來重大貢獻，而另一篇論文 The Power of Depth for Feedforward Neural Networks 則從理論上證明了加深網絡比加寬網絡更有效，算是給ResNet 提供了聲援，也是給深度學習爲什麼要深纔有效提供合理的解釋。

7.TensorFlow 實現ResNet V2網絡

在ResNet的作者的第二篇相關論文《Identity Mappings in Deep Residual Networks》中，提出了ResNet V2。ResNet V2 和 ResNet V1 的主要區別在於，作者通過研究 ResNet 殘差學習單元的傳播公式，發現前饋和反饋信號可以直接傳輸，因此“shortcut connection”（捷徑連接）的非線性激活函數（如ReLU）替換爲 Identity Mappings。同時，ResNet V2 在每一層中都使用了 Batch Normalization。這樣處理後，新的殘差學習單元比以前更容易訓練且泛化性更強。
　　
下面我們使用TensorFlow實現一個ResNet V2 網絡。我們依然使用方便的 contrib.slim 庫來輔助創建 ResNet，其餘載入的庫還有原生的 collections。本文代碼主要來自於TensorFlow的開源實現。
　　
我們使用 collections.namedtuple 設計ResNet 基本Block 模塊組的 named tuple，並用它創建 Block 的類，但只包含數據結構，不包含具體方法。我們要定義一個典型的 Block，需要輸入三個參數，分別是 scope，unit_fn 和 args。

以Block(‘block1’, bottleneck, [(256, 64, 1]) x 2 + [(256, 64, 2 )]) 這一行代碼爲例，它可以定義一個典型的Block，其中 block1 就是我們這個Block 的名稱（或 scope）；bottleneck 是ResNet V2中的殘差學習單元；
而最後一個參數 [(256, 64, 1]) x 2 + [(256, 64, 2 )] 則是這個Block 的 args，args 是一個列表，其中每個元素都對應一個 bottleneck殘差學習單元，前面兩個元素都是（256,64,1），最後一個是（256,64,2）。每一個元素都是一個三元 tuple，即 （depth，depth_bottleneck, stride）。
比如（256， 64， 3）代表構建的 bottleneck 殘差學習單元（每個殘差學習單元包含三個卷積層）中，第三層輸出通道數 depth 爲 256，前兩層輸出通道數 depth_bottleneck 爲64，且中間那層的步長 stride 爲3。這個殘差學習單元結構即爲 [(1x1/s1, 64), (3x3/s2, 64), (1x1/s1, 256)]。而在這個Block中，一共有3個bottleneck殘差學習單元，除了最後一個的步長由3變爲2，其餘都一致。

#_*_coding:utf-8_*_import collectionsimport tensorflow as tf

slim = tf.contrib.slim
class Block(collections.namedtuple('Block', ['scope', 'uint_fn', 'args'])):
    'A named tuple describing a ResNet block'

下面定義一個降採樣 subsample的方法，參數包括 inputs（輸入），factor（採樣因子）和scope。這個函數也非常簡單，如果factor爲1，則不做修改直接返回 inputs；如果不爲1，則使用 slim.max_pool2d 最大池化來實現，通過1x1的池化尺寸，stride作步長，即可實現降採樣。

def subsample(inputs, factor, scope=None):
    if factor == 1:
        return inputs
    else:
        return slim.max_pool2d(inputs, [1, 1], stride=factor, scope=scope)

再定義一個 conv2d_same函數創建卷積層。先判斷 stride 是否爲1，如果爲1，則直接使用 slim.conv2d 並令 padding 模式爲SAME。如果 stride 不爲1，則顯式地 pad zero，要pad zero 的總數爲 Kernel_size -1 ,pad_beg 爲 pad/2，pad_end 爲餘下的部分。接下來使用 tf.pad 對輸入變量進行補零操作。最後，因爲已經進行了 zero padding ,所以只需要使用一個 padding 模式爲VALID 的 slim.conv2d 創建這個卷積層。

def conv2d_same(inputs, num_outputs, kernel_size, stride, scope=None):
    if stride == 1:
        return slim.conv2d(inputs, num_outputs, kernel_size, stride=1,
                           padding='SAME', scope=scope)
    else:
        pad_total = kernel_size - 1
        pad_beg = pad_total // 2
        pad_end = pad_total - pad_beg
        inputs = tf.pad(inputs, [[0, 0], [pad_beg, pad_end],
                                 [pad_beg, pad_end], [0, 0]])
        return slim.conv2d(inputs, num_outputs, kernel_size, stride=stride,
                           padding='VALID', scope=scope)

接下來定義堆疊Blocks的函數，參數中的 net 即爲輸入，blocks是之前定義的Block 的class 的列表，而 outputs_collections 則是用來收集各個 end_points 的 collections。下面使用兩層循環，逐個Block，逐個Residual Uint 地堆疊，先使用兩個 tf.variable_scope 將殘差學習單元命名爲 block1 / uint_1 的形式。在第二層循環中，我們拿到每個Block中每個Residual Unit的args，並展開爲 depth，depth_bottleneck 和 stide，其含義在前面定義Blocks類時已經學習過。然後使用 unit_fn 函數（即殘差學習單元的生成函數）順序地創建並連接所有的殘差學習單元。最後，我們使用 slim.utils.collect_named_outpouts 函數將輸出 net 添加到 collection 中。最後，當所有 Block 中的所有Residual Unit 都堆疊完之後，我們再返回最後的 net 作爲 stack_blocks_dense 函數的結果。bottleneck=block.unit_fn

@slim.add_arg_scope
def stack_blocks_dense(net, blocks, outputs_collections=None):

    for block in blocks:
        with tf.variable_scope(block.scope, 'block', [net]) as sc:
            for i, unit in enumerate(block.args):
                with tf.variable_scope('unit_%d' % (i+1), values=[net]):
                    unit_depth, unit_depth_bottleneck, unit_stride = unit
                    net = block.unit_fn(net,
                                        depth=unit_depth,
unit_depth_bottleneck=unit_depth_bottleneck,
                                        steide=unit_stride)
            net = slim.utils.collect_named_outputs(outputs_collections, sc.name, net)
    return net

這裏創建 ResNet通用的 arg_scope，關於 arg_scope，我們已經知道其功能——用來定義某些函數的參數默認值。這裏定義訓練標記 is_training 默認爲TRUE，權重衰減速率 weight_decay 默認爲 0.0001，BN的衰減速率默認爲 0.997，BN的 epsilon默認爲 1e-5，BN的 scale默認爲 TRUE，和Inception V3定義 arg_scope一樣，先設置好BN的各項參數，然後通過slim.arg_scope將 slim.conv2d的幾個默認參數設置好：權重正則器設置爲 L2正則，權重初始化器設爲 slim.variance_scaling_initializer()，激活函數設爲 ReLU，標準化器設爲 BN。並將最大池化的padding模式默認設爲 SAME（注意，ResNet原論文中使用的 VALID模式，設爲SAME可讓特徵對其更簡單，大家可以嘗試改爲 VALID）。最後將幾層嵌套的 arg_scope 作爲結果返回。

def resnet_arg_scope(is_training=True,
                     weight_decay=0.0001,
                     batch_norm_decay=0.997,
                     batch_norm_epsilon=1e-5,
                     batch_norm_scale=True):
    batch_norm_params = {
        'is_training': is_training,
        'decay': batch_norm_decay,
        'epsilon': batch_norm_epsilon,
        'scale': batch_norm_scale,
        'updates_collections': tf.GraphKeys.UPDATE_OPS,
    }

    with slim.arg_scope(
        [slim.conv2d],
        weights_regularizer=slim.l2_regularizer(weight_decay),
        weights_initializer=slim.variance_scaling_initializer(),
        activation_fn=tf.nn.relu,
        normalizer_fn=slim.batch_norm,
        normalizer_params=batch_norm_params
    ):
        with slim.arg_scope([slim.batch_norm], **batch_norm_params):
            with slim.arg_scope([slim.max_pool2d], padding='SAME') as arg_sc:
                return arg_sc

接下來定義核心的 bottleneck 殘差學習單元，它是ResNet V2 的論文中提到的 Full Preactivation Residual Unit 的一個變種。它和ResNet V1 中的殘差學習單元的主要區別有兩點，一是在每層前都用了Batch Bormalization，而是對輸入進行 practivation，而不是在卷積進行激活函數處理。我們來看一下bottleneck 函數的參數，inputs是輸入，depth，depth_bottleneck和stride這三個參數前面的 Batch Normalization，並使用 ReLU函數進行預激活Preactivate。然後定義 shortcut（即直連的 x）：如果殘差單元的輸入通道數 depth_in和輸出通道數 depth一致，那麼使用 subsample按步長爲 stride 對 Inputs 進行空間上的降採樣（確保空間尺寸和殘差一致，因爲殘差中間那層的卷積步長爲 stride）；如果輸入，輸出通道數不一樣，我們用步長爲 stride 的11 卷積改變其通道數，使得與輸出通道數一致。然後定義 Residual（殘差），residual這裏有3層，先是一個11尺寸，步長爲1，輸出通道數爲depth_bottleneck的卷積，然後是一個33尺寸，步長爲 stride，輸出通道數爲 depth_bottleneck的卷積，最後是一個11的卷積，步長爲1，輸出通道數爲depth的卷積，得到最終的 residual，這裏注意最後一層沒有正則項也沒有激活函數。然後將residual 和 shortcut 相加，得到最後結果 output，再使用 slim.utils.collect_named_outpouts 將結果添加進 collection並返回 output 作爲函數結果。

@slim.add_arg_scope
def bottleneck(inputs, depth, depth_bottleneck, stride,
               outputs_collections=None, scope=None):
    with tf.variable_scope(scope, 'bottleneck_v2', [inputs]) as sc:
        depth_in = slim.utils.last_dimension(inputs.get_shape(), min_rank=4)
        preact = slim.batch_norm(inputs, activation_fn=tf.nn.relu, scope='preact')

        if depth == depth_in:  # 此輸出形狀和三個卷積的輸出形狀一致
            shortcut = subsample(inputs, stride, 'shortcut')
        else:
            shortcut = slim.conv2d(preact, depth, [1, 1], stride=stride,
                                   normalizer_fn=None, activation_fn=None,
                                   scope='shortcut')
        residual = slim.conv2d(preact, depth_bottleneck, [1, 1], stride=1,
                               scope='conv1')
        residual = conv2d_same(residual, depth_bottleneck, 3, stride,
                               scope='conv2')
        residual = slim.conv2d(residual, depth, [1, 1], stride=1,
                               normalozer_fn=None, activation_fn=None,
                               scope='conv3')
        output = shortcut + residual

        return slim.utils.collect_named_outputs(outputs_collections, sc.name, output)

下面定義生成ResNet V2 的主函數，我們只需要預先定義好網絡的殘差學習模塊組blocks，它就可以生成對應的完整的ResNet。先看看這個函數的參數，Inputs 即輸入，blocks爲定義好的Block類的列表，num_classes是最後輸出的類數。
global_pool 標誌是否加上最後的一層全局平均池化，
include_root_block 標誌是否加上ResNet網絡最前面通常使用的77卷積和最大池化，
reuse標誌是否重用，
scope是整個網絡的名稱。
在函數體內，我們先定義好variable_scope及 end_points_collection，再通過 slim.arg_scope 將（slim.con2d，bottleneck， stack_block_dense）這三個函數的參數 outputs_collections默認設爲 end_points_collection。然後根據 include_root_block標記，創建ResNet最前面的 64輸出通道的步長爲2的77卷積，然後再接一個步長爲2的33的最大池化。經歷兩個步長爲2的層，圖片尺寸已經被縮小爲1/4。然後，使用前面定義好的 stack_blocks_dense 將殘差學習模塊組生成好，再根據標記添加全局池化層，這裏用 tf.reduce_mean 實現全局平均池化，效率比直接用 avg_pool高。下面根據是否有分類數，添加一個輸出通道數爲 Num_classes的11卷積（該卷積層無激活函數和正則項），再添加一個 Softmax層輸出網絡結果。同時使用 slim.utils.convert_collection_to_dict 將 collection 轉化爲Python的 dict，最後返回 net 和 end_points。

def resnet_v2(inputs,
              blocks,
              num_classes=None,
              global_pool=True,
              include_root_block=True,
              reuse=None,
              scope=None):
    with tf.variable_scope(scope, 'resnet_v2', [inputs], reuse=reuse) as sc:
        end_points_collection = sc.original_name_scope + '_end_points'
        with slim.arg_scope([slim.conv2d, bottleneck,
                             stack_blocks_dense],
                            outputs_collections=end_points_collection):
            net = inputs
            if include_root_block:
                with slim.arg_scope([slim.conv2d],
                                    activation_fn=None, normalizer_fn=None):
                    net = conv2d_same(net, 64, 7, stride=2, scope='conv1')
                net = slim.max_pool2d(net, [3, 3], stride=2, scope='pool1')
            net = stack_blocks_dense(net, blocks)
            net = slim.batch_norm(net, activation_fn=tf.nn.relu, scope='postnorm')
            if global_pool:
                net = tf.reduce_mean(net, [1, 2], name='pool5', keep_dims=True)
            if num_classes is not None:
                net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None,
                                  normalizer_fn=None, scope='logits')
            end_points = slim.utils.convert_collection_to_dict(
                end_points_collection
            )
            if num_classes is not None:
                end_points['predictions'] = slim.softmax(net, scope='predictions')
            return net, end_points

至此，我們就將 ResNet 的生成函數定義好了。下面根據ResNet不同層數時的網絡配置圖中推薦的幾個不同深度的ResNet網絡配置，來設計層數分別爲 50， 101， 152 和 200 的ResNet。

我們先來看 50層的ResNet，其嚴格遵守了圖中的設置，4個殘差學習Blocks 的 units數量分別爲3， 4， 6和3，總層數即爲（3+4+6+3）x3+2=50。需要注意的時，殘差學習模塊之前的卷積，池化已經將尺寸縮小爲4倍，我們前3個Blocks又都包含步長爲2的層，因此總尺寸縮小了 4*(2*2*2)=32倍，輸入圖片尺寸最後變爲 224/32=7 。和 Inception V3很像，ResNet 不斷使用步長爲2的層來縮減尺寸，但同時輸出通道數也在持續增加，最後達到了 2048。

def resnet_v2_50(inputs,
                 num_classes=None,
                 global_pool=True,
                 reuse=None,
                 scope='resnet_v2_50'):
    blocks = [
        Block('block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
        Block('block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]),
        Block('block3', bottleneck, [(1024, 256, 1)] * 5 + [(1024, 256, 2)]),
        Block('block4', bottleneck, [(2048, 512, 1)] * 3)
    ]
    return resnet_v2(inputs, blocks, num_classes, global_pool,
                     include_root_block=True, reuse=reuse, scope=scope)

101 層的ResNet 和50層相比，主要變化就是把4個Blocks的units 數量從3， 4， 6，3提升到了3， 4， 23， 3 。即將第三個殘差學習Block 的units 數增加到接近4倍。

def resnet_v2_101(inputs,
                 num_classes=None,
                 global_pool=True,
                 reuse=None,
                 scope='resnet_v2_101'):
    blocks = [
        Block('block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
        Block('block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]),
        Block('block3', bottleneck, [(1024, 256, 1)] * 22 + [(1024, 256, 2)]),
        Block('block4', bottleneck, [(2048, 512, 1)] * 3)
    ]
    return resnet_v2(inputs, blocks, num_classes, global_pool,
                     include_root_block=True, reuse=reuse, scope=scope)

然後152層的ResNet，則是將第二個Block 的units數提高到8，將第三個 Block的 units 數提高到36。Units數量提升的主要場所依然是第三個Block。

def resnet_v2_152(inputs,
                 num_classes=None,
                 global_pool=True,
                 reuse=None,
                 scope='resnet_v2_152'):
    blocks = [
        Block('block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
        Block('block2', bottleneck, [(512, 128, 1)] * 7 + [(512, 128, 2)]),
        Block('block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]),
        Block('block4', bottleneck, [(2048, 512, 1)] * 3)
    ]
    return resnet_v2(inputs, blocks, num_classes, global_pool,
                     include_root_block=True, reuse=reuse, scope=scope)

最後，200層的Resnet 相比152層的ResNet ，沒有繼續提升第三個Block的units數，而是將第二個Block的 units 數一下子提升到了23。

def resnet_v2_200(inputs,
                 num_classes=None,
                 global_pool=True,
                 reuse=None,
                 scope='resnet_v2_200'):
    blocks = [
        Block('block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
        Block('block2', bottleneck, [(512, 128, 1)] * 23 + [(512, 128, 2)]),
        Block('block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]),
        Block('block4', bottleneck, [(2048, 512, 1)] * 3)
    ]
    return resnet_v2(inputs, blocks, num_classes, global_pool,
                     include_root_block=True, reuse=reuse, scope=scope)

最後我們使用一直以來的測評函數 timne_tensorflow_run，來測試 152層深的 ResNet（即獲得 ILSVRC 2015 冠軍的版本）的forward 性能。圖片尺寸迴歸到AlexNet ，VGGNet的 224*224，batch_size 爲32。我們將 is_training 這個 FLAG置爲FALSE。然後使用 resnet_v2_152 創建網絡，再由 time_tensorflow_run 函數測評其 forward 性能。這裏不再對訓練時的性能進行測試了，大家可以自行測試求解ResNet全部參數的梯度所需要的時間。

def time_tensorflow_run(session, target, info_string):
    num_steps_burn_in = 10
    total_duration = 0.0
    total_duration_squared = 0.0

    for i in range(num_batches + num_steps_burn_in):
        start_time = time.time()
        _ = session.run(target)
        duration = time.time() - start_time
        if i >= num_steps_burn_in:
            if not i % 10:
                print('%s: step %d, duration=%.3f'%(datetime.now(),
                                                    i - num_steps_burn_in, duration))
                total_duration += duration
                total_duration_squared += duration * duration

    mn = total_duration / num_batches
    vr = total_duration_squared / num_batches - mn * mn
    sd = math.sqrt(vr)
    print('%s: %s across %d steps, %.3f +/- %.3f sec / batch'% (datetime.now(),
                                                               info_string, num_batches, mn, sd))
if __name__ == '__main__':
    batch_size = 32
    height, width = 224, 224
    inputs = tf.random_uniform((batch_size, height, width, 3))
    with slim.arg_scope(resnet_arg_scope(is_training=False)):
        net, endpoints = resnet_v2_152(inputs, 1000)

    init = tf.global_variables_initializer()
    sess = tf.Session()
    sess.run(init)
    num_batches = 100
time_tensorflow_run(sess, net, 'Forward')

這裏可以看到，雖然這個ResNet有152層深，但其forward計算耗時並沒有特別誇張，相比 VGGNet 和 Inception_v3，大概只增加了 50%，每batch爲 0.122 秒。這說明 ResNet也是一個實用的卷積神經網絡結構，不僅支持超深網絡的訓練，同時在實際工業應用時也有不差的forward 性能。

2019-09-17 13:40:28.111221: step 0, duration=0.124
2019-09-17 13:40:29.336873: step 10, duration=0.122
2019-09-17 13:40:30.555401: step 20, duration=0.122
2019-09-17 13:40:31.774261: step 30, duration=0.122
2019-09-17 13:40:32.993206: step 40, duration=0.122
2019-09-17 13:40:34.210301: step 50, duration=0.122
2019-09-17 13:40:35.426938: step 60, duration=0.122
2019-09-17 13:40:36.644774: step 70, duration=0.122
2019-09-17 13:40:37.861877: step 80, duration=0.122
2019-09-17 13:40:39.078488: step 90, duration=0.122
2019-09-17 13:40:40.173907: Forward across 100 steps, 0.012 +/- 0.037 sec / batch

本文我們完整的學習了ResNet的基本原理及Tensorflow實現，也設計了一系列不同深度的 ResNet。如果大家感興趣可以自行探索不同深度，乃至不同殘差單元結構的ResNet的分類性能。例如，ResNet 原論文中主要增加的時第二個和第三個Block的 units數，大家可以嘗試增加其餘兩個Block的 units數，或者修改bottleneck單元中的 depth，depth_bottleneck等參數，可對其參數設置的意義加深理解。ResNet 可以算是深度學習中的一個里程碑式的圖片，真正意義上支持極深神經網絡的訓練。其網絡結構值得反覆思索，如Google等已將其融合到自家的 Inception Net中，並取得了非常好的效果。相信ResNet的成功也會啓發其他在深度學習領域研究的靈感。

完整代碼如下：

import collections
import tensorflow as tf

slim = tf.contrib.slim


class Block(collections.namedtuple('Block', ['scope', 'unit_fn', 'args'])):
    """A named tuple describing a ResNet block.
    Its parts are:
      scope: The scope of the `Block`.
      unit_fn: The ResNet unit function which takes as input a `Tensor` and
        returns another `Tensor` with the output of the ResNet unit.
      args: A list of length equal to the number of units in the `Block`. The list
        contains one (depth, depth_bottleneck, stride) tuple for each unit in the
        block to serve as argument to unit_fn.
    """


def subsample(inputs, factor, scope=None):
    """Subsamples the input along the spatial dimensions.
    Args:
      inputs: A `Tensor` of size [batch, height_in, width_in, channels].
      factor: The subsampling factor.
      scope: Optional variable_scope.
    Returns:
      output: A `Tensor` of size [batch, height_out, width_out, channels] with the
        input, either intact (if factor == 1) or subsampled (if factor > 1).
    """
    if factor == 1:
        return inputs
    else:
        return slim.max_pool2d(inputs, [1, 1], stride=factor, scope=scope)


def conv2d_same(inputs, num_outputs, kernel_size, stride, scope=None):
    """Strided 2-D convolution with 'SAME' padding.
    When stride > 1, then we do explicit zero-padding, followed by conv2d with
    'VALID' padding.
    Note that
       net = conv2d_same(inputs, num_outputs, 3, stride=stride)
    is equivalent to
       net = slim.conv2d(inputs, num_outputs, 3, stride=1, padding='SAME')
       net = subsample(net, factor=stride)
    whereas
       net = slim.conv2d(inputs, num_outputs, 3, stride=stride, padding='SAME')
    is different when the input's height or width is even, which is why we add the
    current function. For more details, see ResnetUtilsTest.testConv2DSameEven().
    Args:
      inputs: A 4-D tensor of size [batch, height_in, width_in, channels].
      num_outputs: An integer, the number of output filters.
      kernel_size: An int with the kernel_size of the filters.
      stride: An integer, the output stride.
      rate: An integer, rate for atrous convolution.
      scope: Scope.
    Returns:
      output: A 4-D tensor of size [batch, height_out, width_out, channels] with
        the convolution output.
    """
    if stride == 1:
        return slim.conv2d(inputs, num_outputs, kernel_size, stride=1,
                           padding='SAME', scope=scope)
    else:
        # kernel_size_effective = kernel_size + (kernel_size - 1) * (rate - 1)
        pad_total = kernel_size - 1
        pad_beg = pad_total // 2
        pad_end = pad_total - pad_beg
        inputs = tf.pad(inputs,
                        [[0, 0], [pad_beg, pad_end], [pad_beg, pad_end], [0, 0]])
        return slim.conv2d(inputs, num_outputs, kernel_size, stride=stride,
                           padding='VALID', scope=scope)


@slim.add_arg_scope
def stack_blocks_dense(net, blocks,
                       outputs_collections=None):
    """Stacks ResNet `Blocks` and controls output feature density.
    First, this function creates scopes for the ResNet in the form of
    'block_name/unit_1', 'block_name/unit_2', etc.
    Args:
      net: A `Tensor` of size [batch, height, width, channels].
      blocks: A list of length equal to the number of ResNet `Blocks`. Each
        element is a ResNet `Block` object describing the units in the `Block`.
      outputs_collections: Collection to add the ResNet block outputs.
    Returns:
      net: Output tensor

    class Block(collections.namedtuple('Block', ['scope', 'unit_fn', 'args'])):
    [(512, 128, 1)] * 7 + [(512, 128, 2)]
Out[2]:
[(512, 128, 1),
 (512, 128, 1),
 (512, 128, 1),
 (512, 128, 1),
 (512, 128, 1),
 (512, 128, 1),
 (512, 128, 1),
 (512, 128, 2)]
    blocks = [
        Block(  # a class
            'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
        Block(
            'block2', bottleneck, [(512, 128, 1)] * 7 + [(512, 128, 2)]),
        Block(
            'block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]),
        Block(
            'block4', bottleneck, [(2048, 512, 1)] * 3)]
    """
    for block in blocks:
        with tf.variable_scope(block.scope, 'block', [net]) as sc:
            for i, unit in enumerate(block.args):
                with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
                    unit_depth, unit_depth_bottleneck, unit_stride = unit
                    net = block.unit_fn(net,
                                        depth=unit_depth,
                                        depth_bottleneck=unit_depth_bottleneck,
                                        stride=unit_stride)
            net = slim.utils.collect_named_outputs(outputs_collections, sc.name, net)

    return net


def resnet_arg_scope(is_training=True,
                     weight_decay=0.0001,
                     batch_norm_decay=0.997,
                     batch_norm_epsilon=1e-5,
                     batch_norm_scale=True):
    """Defines the default ResNet arg scope.
    TODO(gpapan): The batch-normalization related default values above are
      appropriate for use in conjunction with the reference ResNet models
      released at https://github.com/KaimingHe/deep-residual-networks. When
      training ResNets from scratch, they might need to be tuned.
    Args:
      is_training: Whether or not we are training the parameters in the batch
        normalization layers of the model.
      weight_decay: The weight decay to use for regularizing the model.
      batch_norm_decay: The moving average decay when estimating layer activation
        statistics in batch normalization.
      batch_norm_epsilon: Small constant to prevent division by zero when
        normalizing activations by their variance in batch normalization.
      batch_norm_scale: If True, uses an explicit `gamma` multiplier to scale the
        activations in the batch normalization layer.
    Returns:
      An `arg_scope` to use for the resnet models.
    """
    batch_norm_params = {
        'is_training': is_training,
        'decay': batch_norm_decay,
        'epsilon': batch_norm_epsilon,
        'scale': batch_norm_scale,
        'updates_collections': tf.GraphKeys.UPDATE_OPS,
    }

    with slim.arg_scope(
            [slim.conv2d],
            weights_regularizer=slim.l2_regularizer(weight_decay),
            weights_initializer=slim.variance_scaling_initializer(),
            activation_fn=tf.nn.relu,
            normalizer_fn=slim.batch_norm,
            normalizer_params=batch_norm_params):
        with slim.arg_scope([slim.batch_norm], **batch_norm_params):
            # The following implies padding='SAME' for pool1, which makes feature
            # alignment easier for dense prediction tasks. This is also used in
            # https://github.com/facebook/fb.resnet.torch. However the accompanying
            # code of 'Deep Residual Learning for Image Recognition' uses
            # padding='VALID' for pool1. You can switch to that choice by setting
            # slim.arg_scope([slim.max_pool2d], padding='VALID').
            with slim.arg_scope([slim.max_pool2d], padding='SAME') as arg_sc:
                return arg_sc


@slim.add_arg_scope
def bottleneck(inputs, depth, depth_bottleneck, stride,
               outputs_collections=None, scope=None):
    """Bottleneck residual unit variant with BN before convolutions.
    This is the full preactivation residual unit variant proposed in [2]. See
    Fig. 1(b) of [2] for its definition. Note that we use here the bottleneck
    variant which has an extra bottleneck layer.
    When putting together two consecutive ResNet blocks that use this unit, one
    should use stride = 2 in the last unit of the first block.
    Args:
      inputs: A tensor of size [batch, height, width, channels].
      depth: The depth of the ResNet unit output.
      depth_bottleneck: The depth of the bottleneck layers.
      stride: The ResNet unit's stride. Determines the amount of downsampling of
        the units output compared to its input.
      rate: An integer, rate for atrous convolution.
      outputs_collections: Collection to add the ResNet unit output.
      scope: Optional variable_scope.
    Returns:
      The ResNet unit's output.
    """
    with tf.variable_scope(scope, 'bottleneck_v2', [inputs]) as sc:
        depth_in = slim.utils.last_dimension(inputs.get_shape(), min_rank=4)
        preact = slim.batch_norm(inputs, activation_fn=tf.nn.relu, scope='preact')
        if depth == depth_in:
            shortcut = subsample(inputs, stride, 'shortcut')
        else:
            shortcut = slim.conv2d(preact, depth, [1, 1], stride=stride,
                                   normalizer_fn=None, activation_fn=None,
                                   scope='shortcut')

        residual = slim.conv2d(preact, depth_bottleneck, [1, 1], stride=1,
                               scope='conv1')
        residual = conv2d_same(residual, depth_bottleneck, 3, stride,
                               scope='conv2')
        residual = slim.conv2d(residual, depth, [1, 1], stride=1,
                               normalizer_fn=None, activation_fn=None,
                               scope='conv3')

        output = shortcut + residual

        return slim.utils.collect_named_outputs(outputs_collections,
                                                sc.name,
                                                output)


def resnet_v2(inputs,
              blocks,
              num_classes=None,
              global_pool=True,
              include_root_block=True,
              reuse=None,
              scope=None):
    """Generator for v2 (preactivation) ResNet models.
    This function generates a family of ResNet v2 models. See the resnet_v2_*()
    methods for specific model instantiations, obtained by selecting different
    block instantiations that produce ResNets of various depths.
    Args:
      inputs: A tensor of size [batch, height_in, width_in, channels].
      blocks: A list of length equal to the number of ResNet blocks. Each element
        is a resnet_utils.Block object describing the units in the block.
      num_classes: Number of predicted classes for classification tasks. If None
        we return the features before the logit layer.
      include_root_block: If True, include the initial convolution followed by
        max-pooling, if False excludes it. If excluded, `inputs` should be the
        results of an activation-less convolution.
      reuse: whether or not the network and its variables should be reused. To be
        able to reuse 'scope' must be given.
      scope: Optional variable_scope.
    Returns:
      net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
        If global_pool is False, then height_out and width_out are reduced by a
        factor of output_stride compared to the respective height_in and width_in,
        else both height_out and width_out equal one. If num_classes is None, then
        net is the output of the last ResNet block, potentially after global
        average pooling. If num_classes is not None, net contains the pre-softmax
        activations.
      end_points: A dictionary from components of the network to the corresponding
        activation.
    Raises:
      ValueError: If the target output_stride is not valid.

    blocks = [
        Block(
            'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
        Block(
            'block2', bottleneck, [(512, 128, 1)] * 7 + [(512, 128, 2)]),
        Block(
            'block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]),
        Block(
            'block4', bottleneck, [(2048, 512, 1)] * 3)]

    """
    with tf.variable_scope(scope, 'resnet_v2', [inputs], reuse=reuse) as sc:
        end_points_collection = sc.original_name_scope + '_end_points'
        with slim.arg_scope([slim.conv2d, bottleneck,
                             stack_blocks_dense],
                            outputs_collections=end_points_collection):
            net = inputs
            if include_root_block:
                # We do not include batch normalization or activation functions in conv1
                # because the first ResNet unit will perform these. Cf. Appendix of [2].
                with slim.arg_scope([slim.conv2d],
                                    activation_fn=None, normalizer_fn=None):
                    net = conv2d_same(net, 64, 7, stride=2, scope='conv1')
                net = slim.max_pool2d(net, [3, 3], stride=2, scope='pool1')
            net = stack_blocks_dense(net, blocks)
            # This is needed because the pre-activation variant does not have batch
            # normalization or activation functions in the residual unit output. See
            # Appendix of [2].
            net = slim.batch_norm(net, activation_fn=tf.nn.relu, scope='postnorm')
            if global_pool:
                # Global average pooling.
                net = tf.reduce_mean(net, [1, 2], name='pool5', keep_dims=True)
            if num_classes is not None:
                net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None,
                                  normalizer_fn=None, scope='logits')
            # Convert end_points_collection into a dictionary of end_points.
            end_points = slim.utils.convert_collection_to_dict(end_points_collection)
            if num_classes is not None:
                end_points['predictions'] = slim.softmax(net, scope='predictions')
            return net, end_points


def resnet_v2_50(inputs,
                 num_classes=None,
                 global_pool=True,
                 reuse=None,
                 scope='resnet_v2_50'):
    """ResNet-50 model of [1]. See resnet_v2() for arg and return description."""
    blocks = [
        Block('block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
        Block(
            'block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]),
        Block(
            'block3', bottleneck, [(1024, 256, 1)] * 5 + [(1024, 256, 2)]),
        Block(
            'block4', bottleneck, [(2048, 512, 1)] * 3)]
    return resnet_v2(inputs, blocks, num_classes, global_pool,
                     include_root_block=True, reuse=reuse, scope=scope)


def resnet_v2_101(inputs,
                  num_classes=None,
                  global_pool=True,
                  reuse=None,
                  scope='resnet_v2_101'):
    """ResNet-101 model of [1]. See resnet_v2() for arg and return description."""
    blocks = [
        Block(
            'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
        Block(
            'block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]),
        Block(
            'block3', bottleneck, [(1024, 256, 1)] * 22 + [(1024, 256, 2)]),
        Block(
            'block4', bottleneck, [(2048, 512, 1)] * 3)]
    return resnet_v2(inputs, blocks, num_classes, global_pool,
                     include_root_block=True, reuse=reuse, scope=scope)


def resnet_v2_152(inputs,
                  num_classes=None,
                  global_pool=True,
                  reuse=None,
                  scope='resnet_v2_152'):
    """ResNet-152 model of [1]. See resnet_v2() for arg and return description."""
    blocks = [
        Block(
            'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
        Block(
            'block2', bottleneck, [(512, 128, 1)] * 7 + [(512, 128, 2)]),
        Block(
            'block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]),
        Block(
            'block4', bottleneck, [(2048, 512, 1)] * 3)]
    return resnet_v2(inputs, blocks, num_classes, global_pool,
                     include_root_block=True, reuse=reuse, scope=scope)


def resnet_v2_200(inputs,
                  num_classes=None,
                  global_pool=True,
                  reuse=None,
                  scope='resnet_v2_200'):
    """ResNet-200 model of [2]. See resnet_v2() for arg and return description."""
    blocks = [
        Block(
            'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
        Block(
            'block2', bottleneck, [(512, 128, 1)] * 23 + [(512, 128, 2)]),
        Block(
            'block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]),
        Block(
            'block4', bottleneck, [(2048, 512, 1)] * 3)]
    return resnet_v2(inputs, blocks, num_classes, global_pool,
                     include_root_block=True, reuse=reuse, scope=scope)


from datetime import datetime
import math
import time


def time_tensorflow_run(session, target, info_string):
    num_steps_burn_in = 10
    total_duration = 0.0
    total_duration_squared = 0.0
    for i in range(num_batches + num_steps_burn_in):
        start_time = time.time()
        _ = session.run(target)
        duration = time.time() - start_time
        if i >= num_steps_burn_in:
            if not i % 10:
                print('%s: step %d, duration = %.3f' %
                      (datetime.now(), i - num_steps_burn_in, duration))
            total_duration += duration
            total_duration_squared += duration * duration
    mn = total_duration / num_batches
    vr = total_duration_squared / num_batches - mn * mn
    sd = math.sqrt(vr)
    print('%s: %s across %d steps, %.3f +/- %.3f sec / batch' %
          (datetime.now(), info_string, num_batches, mn, sd))


if __name__ == '__main__':
    batch_size = 32
    height, width = 224, 224
    inputs = tf.random_uniform((batch_size, height, width, 3))
    with slim.arg_scope(resnet_arg_scope(is_training=False)):
        net, end_points = resnet_v2_152(inputs, 1000)

    init = tf.global_variables_initializer()
    sess = tf.Session()
    sess.run(init)
    num_batches = 100
    time_tensorflow_run(sess, net, "Forward") 




blocks = [
        Block(
            'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
        Block(
            'block2', bottleneck, [(512, 128, 1)] * 7 + [(512, 128, 2)]),
        Block(
            'block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]),
        Block(
            'block4', bottleneck, [(2048, 512, 1)] * 3)]

resnet_v1_101網絡圖

blocks = [
Block(
‘block1’, bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
Block(
‘block2’, bottleneck, [(512, 128, 1)] * 7 + [(512, 128, 2)]),
Block(
‘block3’, bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]),
Block(
‘block4’, bottleneck, [(2048, 512, 1)] * 3)]

本文是學習ResNet網絡的筆記，參考了《tensorflow實戰》這本書中關於ResNet的章節，寫的非常好，所以在此做了筆記，侵刪。
　　而且本文在學習中，摘抄了下面博客的ResNet筆記，也寫的通俗易通：
　　https://my.oschina.net/u/876354/blog/1634322　　　　　
　　https://www.zybuluo.com/rianusr/note/1419006
　　https://my.oschina.net/u/876354/blog/1622896
參考文獻：https://blog.csdn.net/u013181595/article/details/80990930
　　　　　https://blog.csdn.net/lanran2/article/details/79057994
ResNet的論文文獻： https://arxiv.org/abs/1512.03385
強烈建議學習何凱文關於深度殘差網絡的兩篇經典論文，深度殘差網絡的主要思想，便是來自下面兩篇論文：

《Deep Residual Learning for Image Recognition》（基於深度殘差學習的圖像識別）

《Identity Mappings in Deep Residual Networks》（深度殘差網絡中的特徵映射）

在學習後，確實對ResNet 理解了不少，在此很感謝。
原文出處：https://www.cnblogs.com/wj-1314/p/11519663.html

ResNet（Residual Neural Network）的原理和tf實現

文章目錄

1.Highway Network簡介

2.模型加深存在的問題

3.爲什麼深度模型難以訓練

3.1.鏈式法則與梯度彌散

4.ResNet的特點

5.VGGNet-19 VS ResNet-34（ResNet的創新點）

6.ResNet不同層數的網絡配置

7.TensorFlow 實現ResNet V2網絡

resnet_v1_101網絡圖

工作中用到的腳本合集

24-5-18 X

爬蟲入門--糗百

條件隨機場的相關知識--（1）

Mysql---數據庫綜合筆記和在ubuntu中的使用

解決mysql [1045] Access denied for user ‘root‘@‘192.168.180.248‘ (using password: YES)

可視化數據庫管理工具

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結