街景字符識別3-字符模型識別

在前面的章節裏,我們將街景字符識別問題定位成定長字符多分類問題,即針對各個字符訓練分類器,進而進行字符串的識別。

1 學習目標

  • 學習CNN基礎和原理
  • 瞭解遷移學習之微調(Fine Tuning)
  • 使用Pytorch框架構建CNN模型,並完成訓練

2 卷積神經網絡(CNN)

  1. 入門資料推薦:卷積神經網絡入門見《機器學習_ 學習筆記 (all in one)_V0.96.pdf》。
    如果文字還是覺得抽象,那這個資源可視化形式解釋卷積神經網絡的每一個流程CNN Explainer:比如1個卷積核怎麼講RGB3個Channel的像素transform成一個feature map的;全連接層是怎麼得到最終的分類結果的。

  2. 卷積神經網絡(CNN)用於分類,主要是因爲它能很好地自動提取圖像特徵。其中卷積核是單一特徵提取器,下圖是單一特徵提取示例。
    在這裏插入圖片描述
    我覺得最有意思的是經多層卷積和池化後,簡單特徵逐步升級到複雜特徵提取。
    在這裏插入圖片描述3. 最大池化層的作用是downsizing.卷積層的多個卷積核處理得到不同的特徵,而最大池化縮小尺寸的同時儘可能保留特徵。
    在這裏插入圖片描述

  3. Feature Map感覺像是下一層的輸入,詳見關於feature map是什麼的介紹

3 反向傳播

Michael Nielsen的《Neural Network and Deep Learning神經網絡與深度學習》

4 遷移學習之微調

4.1 目的

遷移學習的目的是充分利用已經過大規模數據訓練成熟的複雜模型。因爲這些複雜模型可以抽取較通用的圖像特徵,從而能夠幫助識別邊緣、紋理、形狀和物體組成等特徵。通過將從源數據集(如ImageNet)學到的知識遷移到目標數據集上提取特徵,比自行利用小數據集訓練精度高,而且省時。

4.2 微調fine tuning

4.2.1 流程

微調是遷移學習中的一種常用技術,當目標數據集遠小於源數據集時,微調有利於提升模型的泛化能力。主要步驟是:

  1. 選擇與當前應用相似的神經網絡模型,即源模型
    【0527補充】關於“與當前應用相似”,CNNTricks中提到最重要的因素是目標數據集的大小以及與源數據集的相似程度。這兩個因素參數組合成不同場景,需要相應選用不同的策略。例如,當目標數據集與源數據集相似時,如果目標數據集很小,可以根據源模型中 top layers提取出來的特徵訓練一個線性分類器;如果手頭有大量數據,可以以較小的學習速率微調源模型中的 top layers。當目標數據集與源數據集存在較大差異時,如果數據量足夠,可以以較小的學習速率微調源模型中的layers;如果數據量少,經神經網絡訓練出來的線性分類器精度不夠,建議選用SVM進行分類。

Fine-tune on pre-trained models. Nowadays, many state-of-the-arts deep networks are released by famous research groups, i.e., Caffe Model Zoo and VGG Group. Thanks to the wonderful generalization abilities of pre-trained deep models, you could employ these pre-trained models for your own applications directly. For further improving the classification performance on your data set, a very simple yet effective approach is to fine-tune the pre-trained models on your own data. As shown in following table, the two most important factors are the size of the new data set (small or big), and its similarity to the original data set. Different strategies of fine-tuning can be utilized in different situations. For instance, a good case is that your new data set is very similar to the data used for training pre-trained models. In that case, if you have very little data, you can just train a linear classifier on the features extracted from the top layers of pre-trained models. If your have quite a lot of data at hand, please fine-tune a few top layers of pre-trained models with a small learning rate. However, if your own data set is quite different from the data used in pre-trained models but with enough training images, a large number of layers should be fine-tuned on your data also with a small learning rate for improving performance. However, if your data set not only contains little data, but is very different from the data used in pre-trained models, you will be in trouble. Since the data is limited, it seems better to only train a linear classifier. Since the data set is very different, it might not be best to train the classifier from the top of the network, which contains more dataset-specific features. Instead, it might work better to train the SVM classifier on activations/features from somewhere earlier in the network.

  1. 複製源模型上除輸出層外的所有層及參數【因爲源模型的輸出層與源數據集標籤相關,故排除】,得到目標模型;
  2. 爲目標模型添加一個輸出大小爲目標數據集類別個數的輸出層,並隨機初始化該層的模型參數;
  3. 在目標數據集上訓練目標模型。訓練方法是輸出層從頭訓練,其餘各層基於源模型參數微調。

4.2.2 pytorch實現例子

  1. 我們使用ImageNet數據集上預訓練的ResNet-18作爲源模型pretrained_net。指定 pretrained=True 來自動下載並加載預訓練的模型參數。
  2. 源模型中全連接層將ResNet最終的全局平均池化層輸出變換成1000類的輸出。去掉源模型輸出層爲目標模型model_conv,其中model_conv只包含特徵提取部分,並不包括分類器。
  3. model_conv後加爲定長字符多分類模型需要的多個多分類層fc1,fc2,fc3,fc4,fc5。
  4. 然後將目標數據集的樣本依次傳入特徵提取模型model_conv、多個分類層的輸出結果拼起來即可得到定長字符串結果。

在pytorch中定義模型的操作包含建立神經網絡結構和返回輸出的前向傳播方法forward。其中model_conv返回每個batch樣本image的形狀爲(batch_size,channel,W,H),所以我們要用view()將image的形狀轉換成(batch_size,W*H)才送入全連接層。

class SVHN_Model2(nn.Module):
	def __init__(self):
		super(SVHN_Model1, self).__init__()
		pretrained_net = models.resnet18(pretrained=True)
		pretrained_net.avgpool = nn.AdaptiveAvgPool2d(1)
		self.model_conv = nn.Sequential(*list(pretrained_net .children())
		[:-1])
		self.fc1 = nn.Linear(512, 11)
		self.fc2 = nn.Linear(512, 11)
		self.fc3 = nn.Linear(512, 11)
		self.fc4 = nn.Linear(512, 11)
		self.fc5 = nn.Linear(512, 11)
		
	def forward(self,image):
		feature = self.model_conv(image) 
		feature = feature.view(feature.shape[0], -1)#數據Shape轉換
		c1 = self.fc1(feature)
		c2 = self.fc2(feature)
		c3 = self.fc3(feature)
		c4 = self.fc4(feature)
		c5 = self.fc5(feature)
		return c1, c2, c3, c4, c5

5 參考資料

《動手學深度學習》
《機器學習_ 學習筆記 (all in one)_V0.96.pdf》

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章