用Python從零開始構建ResNET

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近年來,深度學習和計算機視覺領域取得了一系列突破。特別是行業引入了非常深的卷積神經網絡後,在這些模型的幫助下,圖像識別和圖像分類等問題取得了非常好的成果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此這些年來,深度學習架構變得越來越深(層越來越多)以解決越來越複雜的任務,這也有助於提高分類和識別任務的性能,並讓它們表現穩健。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但當我們繼續向神經網絡添加更多層時,模型訓練起來也越來越困難,模型的準確度開始飽和,然後還會下降。於是ResNet誕生了,讓我們擺脫了這種窘境,並能幫助解決這個問題。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"什麼是ResNet?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"殘差網絡(ResNet)是著名的深度學習模型之一,由任少清、何開明、孫健和張翔宇在他們的論文中引入。這篇2015年的論文全名叫“Deep Residual Learning for Image Recognition”[1]。ResNet模型是迄今爲止廣泛流行和最成功的深度學習模型之一。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"殘差塊"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着這些殘差(Residual)塊的引入,訓練非常深的網絡時面臨的問題得到了緩解,ResNet模型由這些塊組成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/d7\/a7\/d797d7d91690d15ffcd4d7ef857b71a7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"來源:“圖像識別的深度殘差學習”論文"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着這些殘差塊的引入,訓練非常深的網絡時面臨的問題得到了緩解,ResNet模型由這些塊組成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在上圖中,我們可以注意到的第一件事是跳過模型的某些層的直接連接。這種連接稱爲“跳過連接”,是殘差塊的核心。由於存在這種跳過連接,輸出是不相同的。如果沒有跳過連接,輸入‘X將乘以層的權重,然後添加一個偏置項。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後是激活函數f(),我們得到輸出爲H(x)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"H(x)=f(wx+b)或H(x)=f(x)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在引入了新的跳過連接技術,輸出H(x)更改爲"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"H(x)=f(x)+x"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是輸入的維度可能與輸出的維度不同,這可能發生在卷積層或池化層中。因此,這個問題可以用這兩種方法來處理:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用跳過連接填充零以增加其維度。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1×1卷積層被添加到輸入以匹配維度。在這種情況下,輸出爲:"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"H(x)=f(x)+w1.x"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏添加了一個額外的參數w1,而在使用第一種方法時沒有添加額外的參數。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ResNet中的這些跳過連接技術通過梯度流經的替代快捷路徑來解決深度CNN中梯度消失的問題。此外,如果有任何層損害了架構的性能,跳過連接也能起作用,它將被正則化跳過。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"ResNet的架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"架構中有一個34層的普通網絡,其靈感來自VGG-19,其中添加了快捷連接或跳過連接。這些跳過連接或殘差塊將架構轉換爲殘差網絡,如下圖所示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/d9\/a4\/d9752fbf94f63f1b708566d5f94517a4.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"來源:“圖像識別的深度殘差學習”論文"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"將ResNet與Keras結合使用:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Keras是一個開源深度學習庫,能夠在TensorFlow上運行。Keras Applications提供以下ResNet版本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ResNet50"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ResNet50V2"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ResNet101"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ResNet101V2"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ResNet152"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ResNet152V2"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"讓我們從零開始構建ResNet:"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/98\/eb\/9895fc6a50db13cdf5279dbdacc7bfeb.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"來源:“圖像識別的深度殘差學習”論文"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們將上圖作爲參考,開始構建網絡。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ResNet架構多次使用CNN塊,因此我們爲CNN塊創建一個類,它接受輸入通道和輸出通道。每個conv層之後都有一個batchnorm2d。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"import torch\nimport torch.nn as nn\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"class block(nn.Module):\ndef __init__(\nself, in_channels, intermediate_channels, identity_downsample=None, stride=1\n):\nsuper(block, self).__init__()\nself.expansion = 4\nself.conv1 = nn.Conv2d(\nin_channels, intermediate_channels, kernel_size=1, stride=1, padding=0, bias=False\n)\nself.bn1 = nn.BatchNorm2d(intermediate_channels)\nself.conv2 = nn.Conv2d(\nintermediate_channels,\nintermediate_channels,\nkernel_size=3,\nstride=stride,\npadding=1,\nbias=False\n)\nself.bn2 = nn.BatchNorm2d(intermediate_channels)\nself.conv3 = nn.Conv2d(\nintermediate_channels,\nintermediate_channels * self.expansion,\nkernel_size=1,\nstride=1,\npadding=0,\nbias=False\n)\nself.bn3 = nn.BatchNorm2d(intermediate_channels * self.expansion)\nself.relu = nn.ReLU()\nself.identity_downsample = identity_downsample\nself.stride = stride\ndef forward(self, x):\nidentity = x.clone()\nx = self.conv1(x)\nx = self.bn1(x)\nx = self.relu(x)\nx = self.conv2(x)\nx = self.bn2(x)\nx = self.relu(x)\nx = self.conv3(x)\nx = self.bn3(x)\nif self.identity_downsample is not None:\nidentity = self.identity_downsample(identity)\nx += identity\nx = self.relu(x)\nreturn x\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後創建一個ResNet類,它接受許多塊、層、圖像通道和類數的輸入。在下面的代碼中,函數‘_make_layer’"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"創建ResNet層,它接受塊的輸入、殘差塊數、輸出通道和步幅。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"class ResNet(nn.Module):\ndef __init__(self, block, layers, image_channels, num_classes):\nsuper(ResNet, self).__init__()\nself.in_channels = 64\nself.conv1 = nn.Conv2d(image_channels, 64, kernel_size=7, stride=2, padding=3, bias=False)\nself.bn1 = nn.BatchNorm2d(64)\nself.relu = nn.ReLU()\nself.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"# Essentially the entire ResNet architecture are in these 4 lines below\nself.layer1 = self._make_layer(\nblock, layers[0], intermediate_channels=64, stride=1\n)\nself.layer2 = self._make_layer(\nblock, layers[1], intermediate_channels=128, stride=2\n)\nself.layer3 = self._make_layer(\nblock, layers[2], intermediate_channels=256, stride=2\n)\nself.layer4 = self._make_layer(\nblock, layers[3], intermediate_channels=512, stride=2\n)\nself.avgpool = nn.AdaptiveAvgPool2d((1, 1))\nself.fc = nn.Linear(512 * 4, num_classes)\ndef forward(self, x):\nx = self.conv1(x)\nx = self.bn1(x)\nx = self.relu(x)\nx = self.maxpool(x)\nx = self.layer1(x)\nx = self.layer2(x)\nx = self.layer3(x)\nx = self.layer4(x)\nx = self.avgpool(x)\nx = x.reshape(x.shape[0], -1)\nx = self.fc(x)\nreturn x\ndef _make_layer(self, block, num_residual_blocks, intermediate_channels, stride):\nidentity_downsample = None\nlayers = []\n# Either if we half the input space for ex, 56x56 -> 28x28 (stride=2), or channels changes\n# we need to adapt the Identity (skip connection) so it will be able to be added\n# to the layer that's ahead\nif stride != 1 or self.in_channels != intermediate_channels * 4:\nidentity_downsample = nn.Sequential(\nnn.Conv2d(\nself.in_channels,\nintermediate_channels * 4,\nkernel_size=1,\nstride=stride,\nbias=False\n),\nnn.BatchNorm2d(intermediate_channels * 4),\n)\nlayers.append(\nblock(self.in_channels, intermediate_channels, identity_downsample, stride)\n)\n# The expansion size is always 4 for ResNet 50,101,152\nself.in_channels = intermediate_channels * 4\n# For example for first resnet layer: 256 will be mapped to 64 as intermediate layer,\n# then finally back to 256. Hence no identity downsample is needed, since stride = 1,\n# and also same amount of channels.\nfor i in range(num_residual_blocks - 1):\nlayers.append(block(self.in_channels, intermediate_channels))\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"返回nn.Sequential(*layers)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後定義不同版本的ResNet"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於ResNet50,層序列爲[3,4,6,3]。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於ResNet101,層序列爲[3,4,23,3]。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於ResNet152,層序列爲[3,8,36,3]。(請參閱“圖像識別的深度殘差學習”論文)"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"def ResNet50(img_channel=3, num_classes=1000):return ResNet(block, [3, 4, 6, 3], img_channel, num_classes)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"```plain\ndef ResNet101(img_channel=3, num_classes=1000):\nreturn ResNet(block, [3, 4, 23, 3], img_channel, num_classes)\ndef ResNet152(img_channel=3, num_classes=1000):\nreturn ResNet(block, [3, 8, 36, 3], img_channel, num_classes)\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後編寫一個小的測試來檢查模型是否工作正常。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"def test():\nnet = ResNet101(img_channel=3, num_classes=1000)\ndevice = \"cuda\" if torch.cuda.is_available() else \"cpu\"\ny = net(torch.randn(4, 3, 224, 224)).to(device)\nprint(y.size())\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"test()\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於上面的測試用例,輸出應該是:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/58\/ed\/587c9d3f45430d79c49e1b13ec4799ed.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"全部代碼可以在這裏訪問:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/github.com\/BakingBrains\/Deep_Learning_models_implementation_from-scratch_using_pytorch_\/blob\/main\/ResNet_.py?fileGuid=45ZbZ1uTOtQ8SANF","title":"","type":null},"content":[{"type":"text","text":"https:\/\/github.com\/BakingBrains\/Deep_Learning_models_implementation_from-scratch_using_pytorch_\/blob\/main\/ResNet_.py"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[1]:Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun: Deep Residual Learning for Image Recognition, Dec 2015, DOI:"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/abs\/1512.03385?fileGuid=45ZbZ1uTOtQ8SANF","title":"","type":null},"content":[{"type":"text","text":"https:\/\/arxiv.org\/abs\/1512.03385"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"https:\/\/www.analyticsvidhya.com\/blog\/2021\/06\/build-resnet-from-scratch-with-python\/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+AnalyticsVidhya+%28Analytics+Vidhya%29"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章