CNN結構
CNN圖層
VGG-16
CNN由處理視覺信息的層組成。CNN首先接收輸入圖像,然後將其傳遞通過這些層。有幾種不同類型的層:最常用的層:卷積,池化和完全連接的層。 首先,讓我們來看看完整的CNN架構; 下面是一個名爲VGG-16的網絡,它經過培訓可識別各種圖像類別。它接收圖像作爲輸入,並輸出該圖像的預測類。
VGG-16
在PyTorch定義圖層
對於卷積神經網絡,由一些列簡單的層組成:
- 卷積層
- 最大池化層
- 全連接(線性)層
要在PyTorch定義神經網絡,創建並命名一個新的神經網絡類,在函數init中定義網絡層。 注意:在訓練期間,PyTorch將能夠通過跟蹤網絡的前饋行爲並使用autograd來計算網絡中權重的更新來執行反向傳播。
import torch.nn as nn import torch.nn.functional as F class Net(nn.Module): def __init__(self, n_classes): super(Net, self).__init__() # 1 input image channel (grayscale), 32 output channels/feature maps # 5x5 square convolution kernel self.conv1 = nn.Conv2d(1, 32, 5) # maxpool layer # pool with kernel_size=2, stride=2 self.pool = nn.MaxPool2d(2, 2) # fully-connected layer # 32*4 input size to account for the downsampled image size after pooling # num_classes outputs (for n_classes of image data) self.fc1 = nn.Linear(32*4, n_classes) # define the feedforward behavior def forward(self, x): # one conv/relu + pool layers x = self.pool(F.relu(self.conv1(x))) # prep for linear layer by flattening the feature maps into feature vectors x = x.view(x.size(0), -1) # linear layer x = F.relu(self.fc1(x)) # final output return x # instantiate and print your Net n_classes = 20 # example number of classes net = Net(n_classes) print(net)
可視化四個濾波器輸出
mport cv2 import matplotlib.pyplot as plt %matplotlib inline # TODO: Feel free to try out your own images here by changing img_path # to a file path to another image on your computer! img_path = 'images/udacity_sdc.png' # load color image bgr_img = cv2.imread(img_path) # convert to grayscale gray_img = cv2.cvtColor(bgr_img, cv2.COLOR_BGR2GRAY) # normalize, rescale entries to lie in [0,1] gray_img = gray_img.astype("float32")/255 # plot image plt.imshow(gray_img, cmap='gray') plt.show()
定義濾波器
import numpy as np ## TODO: Feel free to modify the numbers here, to try out another filter! filter_vals = np.array([[-1, -1, 1, 1], [-1, -1, 1, 1], [-1, -1, 1, 1], [-1, -1, 1, 1]]) print('Filter shape: ', filter_vals.shape) # define four filters filter_1 = filter_vals filter_2 = -filter_1 filter_3 = filter_1.T filter_4 = -filter_3 filters = np.array([filter_1, filter_2, filter_3, filter_4]) # For an example, print out the values of filter 1 print('Filter 1: \n', filter_1)
### do not modify the code below this line ### # visualize all four filters fig = plt.figure(figsize=(10, 5)) for i in range(4): ax = fig.add_subplot(1, 4, i+1, xticks=[], yticks=[]) ax.imshow(filters[i], cmap='gray') ax.set_title('Filter %s' % str(i+1)) width, height = filters[i].shape for x in range(width): for y in range(height): ax.annotate(str(filters[i][x][y]), xy=(y,x), horizontalalignment='center', verticalalignment='center', color='white' if filters[i][x][y]<0 else 'black')
初始化單個卷積層,使其包含創建的所有過濾器
import torch import torch.nn as nn import torch.nn.functional as F # define a neural network with a single convolutional layer with four filters class Net(nn.Module): def __init__(self, weight): super(Net, self).__init__() # initializes the weights of the convolutional layer to be the weights of the 4 defined filters k_height, k_width = weight.shape[2:] # assumes there are 4 grayscale filters self.conv = nn.Conv2d(1, 4, kernel_size=(k_height, k_width), bias=False) self.conv.weight = torch.nn.Parameter(weight) def forward(self, x): # calculates the output of a convolutional layer # pre- and post-activation conv_x = self.conv(x) activated_x = F.relu(conv_x) # returns both layers return conv_x, activated_x # instantiate the model and set the weights weight = torch.from_numpy(filters).unsqueeze(1).type(torch.FloatTensor) model = Net(weight) # print out the layer in the network print(model)
可視化每個過濾器的輸出¶ 首先,我們將定義一個輔助函數viz_layer,它接受一個特定的圖層和多個過濾器(可選參數),並在圖像通過後顯示該圖層的輸出。
# helper function for visualizing the output of a given layer # default number of filters is 4 def viz_layer(layer, n_filters= 4): fig = plt.figure(figsize=(20, 20)) for i in range(n_filters): ax = fig.add_subplot(1, n_filters, i+1, xticks=[], yticks=[]) # grab layer outputs ax.imshow(np.squeeze(layer[0,i].data.numpy()), cmap='gray') ax.set_title('Output %s' % str(i+1))
# plot original image plt.imshow(gray_img, cmap='gray') # visualize all filters fig = plt.figure(figsize=(12, 6)) fig.subplots_adjust(left=0, right=1.5, bottom=0.8, top=1, hspace=0.05, wspace=0.05) for i in range(4): ax = fig.add_subplot(1, 4, i+1, xticks=[], yticks=[]) ax.imshow(filters[i], cmap='gray') ax.set_title('Filter %s' % str(i+1)) # convert the image into an input Tensor gray_img_tensor = torch.from_numpy(gray_img).unsqueeze(0).unsqueeze(1) # get the convolutional layer (pre and post activation) conv_layer, activated_layer = model(gray_img_tensor) # visualize the output of a conv layer viz_layer(conv_layer)
# visualize the output of an activated conv layer viz_layer(activated_layer)
池化層
在幾個卷積層(+ ReLu)之後,在VGG-16網絡中,會有一個最大化層。
- 池化層會獲取圖像(通常是過濾後的圖像)並輸出該圖像的縮小版本
- 池化層會降低輸入的維度
- 最大池化層會查看輸入圖像中的區域(如下圖所示的4x4像素區域),並選擇在新的縮小區域中保留該區域中的最大像素值。
- Maxpooling是CNN中最常見的池化層類型,但也有其他類型,如平均池化。
全連接層
在一系列卷積和池化層之後的完全連接的層。注意它們的扁平形狀 完全連接層的工作是將它看到的輸入連接到所需的輸出形式。通常,這意味着將圖像特徵矩陣轉換爲尺寸爲1xC的特徵向量,其中C是類的數量。例如,假設我們將圖像分類爲十個類,您可以爲完全連接的圖層提供一組[池化,激活]特徵圖作爲輸入,並告訴它使用這些特徵的組合(將它們相乘,添加它們,將它們組合起來等)輸出10項長特徵向量。該向量將來自特徵映射的信息壓縮爲單個特徵向量。
softmax
網絡中看到的最後一層是SOFTMAX功能。softmax函數可以將任何值向量作爲輸入,並返回相同長度的向量,其值均在範圍(0,1)中,並且這些值一起將加起來爲1
Train in PyTorch
一旦你加載了一個訓練數據集,接下來你的工作將是定義一個CNN網並訓練它來對這組圖像進行分類。
Loss and Optimizer
要訓練一個模型,你需要通過選擇一個損失函數和優化器來定義它是如何訓練的。這些函數決定了模型在其訓練時如何更新其參數,並且可以影響模型收斂的速度。 對於這樣的分類問題,通常使用交叉熵損失,這可以在如下代碼中定義:criterion = nn.CrossEntropyLoss()。PyTorch還包括一些標準隨機優化,如隨機梯度下降和亞當。鼓勵你嘗試不同的優化器,看看你的模型是如何響應這些選擇的。
通常,我們通過訓練數據集來訓練多個時期或週期的任何網絡。 下面是訓練函數在迭代訓練數據集時執行的步驟:
- 準備所有輸入圖像和標籤數據以進行培訓
- 通過網絡傳遞輸入(正向傳遞)
- 計算損失(預測類與正確標籤的距離)
- 將梯度傳播回網絡參數(向後傳遞)
- 更新權重(參數更新)
CNN for Classification
在 _ init_ 定義層
conv / pool層可以像這樣定義(在init中):
# 1 input image channel (for grayscale images), 32 output channels/feature maps, 3x3 square convolution kernel self.conv1 = nn.Conv2d(1, 32, 3) # 通道1,輸出32,卷積核3*3 # maxpool that uses a square window of kernel_size=2, stride=2 self.pool = nn.MaxPool2d(2, 2)
# our basic libraries import torch import torchvision # data loading and transforming from torchvision.datasets import FashionMNIST from torch.utils.data import DataLoader from torchvision import transforms # The output of torchvision datasets are PILImage images of range [0, 1]. # We transform them to Tensors for input into a CNN ## Define a transform to read the data in as a tensor data_transform = transforms.ToTensor() # choose the training and test datasets train_data = FashionMNIST(root='./data', train=True, download=True, transform=data_transform) test_data = FashionMNIST(root='./data', train=False, download=True, transform=data_transform) # Print out some stats about the training and test data print('Train data, number of images: ', len(train_data)) print('Test data, number of images: ', len(test_data))
# prepare data loaders, set the batch_size ## TODO: you can try changing the batch_size to be larger or smaller ## when you get to training your network, see how batch_size affects the loss batch_size = 20 train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True) test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=True) # specify the image classes classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
該單元格遍歷訓練數據集,使用dataiter.next()加載隨機批次的圖像/標籤數據。 然後,它在2 x batch_size / 2網格中繪製一批圖像和標籤。
import numpy as np import matplotlib.pyplot as plt %matplotlib inline # obtain one batch of training images dataiter = iter(train_loader) images, labels = dataiter.next() images = images.numpy() # plot the images in the batch, along with the corresponding labels fig = plt.figure(figsize=(25, 4)) for idx in np.arange(batch_size): ax = fig.add_subplot(2, batch_size/2, idx+1, xticks=[], yticks=[]) ax.imshow(np.squeeze(images[idx]), cmap='gray') ax.set_title(classes[labels[idx]])
對於卷積神經網絡,我們將使用一系列簡單的層:
- 卷積層
- 最大池化層數
- 完全連接(線性)層
flattening
回想一下,要從卷積/池化層的輸出移動到線性層,必須先將提取的特徵展平爲矢量。 如果您使用了深度學習庫Keras,您可能已經看過Flatten()完成此操作,而在PyTorch中,您可以使用x = x.view(x.size(0), - 1)展平輸入x。
import torch.nn as nn import torch.nn.functional as F class Net(nn.Module): def __init__(self): super(Net, self).__init__() # 1 input image channel (grayscale), 10 output channels/feature maps # 3x3 square convolution kernel ## output size = (W-F)/S +1 = (28-3)/1 +1 = 26 # the output Tensor for one image, will have the dimensions: (10, 26, 26) # after one pool layer, this becomes (10, 13, 13) self.conv1 = nn.Conv2d(1, 10, 3) # maxpool layer # pool with kernel_size=2, stride=2 self.pool = nn.MaxPool2d(2, 2) # second conv layer: 10 inputs, 20 outputs, 3x3 conv ## output size = (W-F)/S +1 = (13-3)/1 +1 = 11 # the output tensor will have dimensions: (20, 11, 11) # after another pool layer this becomes (20, 5, 5); 5.5 is rounded down self.conv2 = nn.Conv2d(10, 20, 3) # 20 outputs * the 5*5 filtered/pooled map size # 10 output channels (for the 10 classes) self.fc1 = nn.Linear(20*5*5, 10) # define the feedforward behavior def forward(self, x): # two conv/relu + pool layers x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) # prep for linear layer # flatten the inputs into a vector x = x.view(x.size(0), -1) # one linear layer x = F.relu(self.fc1(x)) # a softmax layer to convert the 10 outputs into a distribution of class scores x = F.log_softmax(x, dim=1) # final output return x # instantiate and print your Net net = Net() print(net)
TODO: Specify the loss function and optimizer
import torch.optim as optim ## TODO: specify loss function # cross entropy loss combines softmax and nn.NLLLoss() in one single class. criterion = nn.NLLLoss() ## TODO: specify optimizer # stochastic gradient descent with a small learning rate optimizer = optim.SGD(net.parameters(), lr=0.001)
A note on accuracy
before training
correct = 0 total = 0 # Iterate through test dataset for images, labels in test_loader: # forward pass to get outputs # the outputs are a series of class scores outputs = net(images) # get the predicted class from the maximum value in the output-list of class scores _, predicted = torch.max(outputs.data, 1) # count up total number of correct labels # for which the predicted and true labels are equal total += labels.size(0) correct += (predicted == labels).sum() # calculate the accuracy # to convert `correct` from a Tensor into a scalar, use .item() accuracy = 100.0 * correct.item() / total # print it out! print('Accuracy before training: ', accuracy)
Here are the steps that this training function performs as it iterates over the training dataset:
- Zero's the gradients to prepare for a forward pass
- Passes the input through the network (forward pass)
- Computes the loss (how far is the predicted classes are from the correct labels)
- Propagates gradients back into the network’s parameters (backward pass)
- Updates the weights (parameter update)
- Prints out the calculated loss
def train(n_epochs): loss_over_time = [] # to track the loss as the network trains for epoch in range(n_epochs): # loop over the dataset multiple times running_loss = 0.0 for batch_i, data in enumerate(train_loader): # get the input images and their corresponding labels inputs, labels = data # zero the parameter (weight) gradients optimizer.zero_grad() # forward pass to get outputs outputs = net(inputs) # calculate the loss loss = criterion(outputs, labels) # backward pass to calculate the parameter gradients loss.backward() # update the parameters optimizer.step() # print loss statistics # to convert loss into a scalar and add it to running_loss, we use .item() running_loss += loss.item() if batch_i % 1000 == 999: # print every 1000 batches avg_loss = running_loss/1000 # record and print the avg loss over the 1000 batches loss_over_time.append(avg_loss) print('Epoch: {}, Batch: {}, Avg. Loss: {}'.format(epoch + 1, batch_i+1, avg_loss)) running_loss = 0.0 print('Finished Training') return loss_over_time # define the number of epochs to train for n_epochs = 30 # start small to see if your model works, initially # call train and record the loss over time training_loss = train(n_epochs)
Visualizing the loss
# visualize the loss as the network trained plt.plot(training_loss) plt.xlabel('1000\'s of batches') plt.ylabel('loss') plt.ylim(0, 2.5) # consistent scale plt.show()
Test
# initialize tensor and lists to monitor test loss and accuracy test_loss = torch.zeros(1) class_correct = list(0. for i in range(10)) class_total = list(0. for i in range(10)) # set the module to evaluation mode net.eval() for batch_i, data in enumerate(test_loader): # get the input images and their corresponding labels inputs, labels = data # forward pass to get outputs outputs = net(inputs) # calculate the loss loss = criterion(outputs, labels) # update average test loss test_loss = test_loss + ((torch.ones(1) / (batch_i + 1)) * (loss.data - test_loss)) # get the predicted class from the maximum value in the output-list of class scores _, predicted = torch.max(outputs.data, 1) # compare predictions to true label # this creates a `correct` Tensor that holds the number of correctly classified images in a batch correct = np.squeeze(predicted.eq(labels.data.view_as(predicted))) # calculate test accuracy for *each* object class # we get the scalar value of correct items for a class, by calling `correct[i].item()` for i in range(batch_size): label = labels.data[i] class_correct[label] += correct[i].item() class_total[label] += 1 print('Test Loss: {:.6f}\n'.format(test_loss.numpy()[0])) for i in range(10): if class_total[i] > 0: print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % ( classes[i], 100 * class_correct[i] / class_total[i], np.sum(class_correct[i]), np.sum(class_total[i]))) else: print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i])) print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % ( 100. * np.sum(class_correct) / np.sum(class_total), np.sum(class_correct), np.sum(class_total)))
保存模型
# Saving the model model_dir = 'saved_models/' model_name = 'fashion_net_simple.pt' # after training, save your model parameters in the dir 'saved_models' # when you're ready, un-comment the line below torch.save(net.state_dict(), model_dir+model_name)
可視化feature map
Load the data
# our basic libraries import torch import torchvision # data loading and transforming from torchvision.datasets import FashionMNIST from torch.utils.data import DataLoader from torchvision import transforms # The output of torchvision datasets are PILImage images of range [0, 1]. # We transform them to Tensors for input into a CNN ## Define a transform to read the data in as a tensor data_transform = transforms.ToTensor() test_data = FashionMNIST(root='./data', train=False, download=True, transform=data_transform) # Print out some stats about the test data print('Test data, number of images: ', len(test_data))
# prepare data loaders, set the batch_size ## TODO: you can try changing the batch_size to be larger or smaller ## when you get to training your network, see how batch_size affects the loss batch_size = 20 test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=True) # specify the image classes classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
import numpy as np import matplotlib.pyplot as plt %matplotlib inline # obtain one batch of training images dataiter = iter(test_loader) images, labels = dataiter.next() images = images.numpy() # plot the images in the batch, along with the corresponding labels fig = plt.figure(figsize=(25, 4)) for idx in np.arange(batch_size): ax = fig.add_subplot(2, batch_size/2, idx+1, xticks=[], yticks=[]) ax.imshow(np.squeeze(images[idx]), cmap='gray') ax.set_title(classes[labels[idx]])
Define the network architecture
The various layers that make up any neural network are documented, here. For a convolutional neural network, we'll use a simple series of layers:
- Convolutional layers
- Maxpooling layers
- Fully-connected (linear) layers
import torch.nn as nn import torch.nn.functional as F class Net(nn.Module): def __init__(self): super(Net, self).__init__() # 1 input image channel (grayscale), 10 output channels/feature maps # 3x3 square convolution kernel ## output size = (W-F)/S +1 = (28-3)/1 +1 = 26 # the output Tensor for one image, will have the dimensions: (10, 26, 26) # after one pool layer, this becomes (10, 13, 13) self.conv1 = nn.Conv2d(1, 10, 3) # maxpool layer # pool with kernel_size=2, stride=2 self.pool = nn.MaxPool2d(2, 2) # second conv layer: 10 inputs, 20 outputs, 3x3 conv ## output size = (W-F)/S +1 = (13-3)/1 +1 = 11 # the output tensor will have dimensions: (20, 11, 11) # after another pool layer this becomes (20, 5, 5); 5.5 is rounded down self.conv2 = nn.Conv2d(10, 20, 3) # 20 outputs * the 5*5 filtered/pooled map size self.fc1 = nn.Linear(20*5*5, 50) # dropout with p=0.4 self.fc1_drop = nn.Dropout(p=0.4) # finally, create 10 output channels (for the 10 classes) self.fc2 = nn.Linear(50, 10) # define the feedforward behavior def forward(self, x): # two conv/relu + pool layers x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) # prep for linear layer # this line of code is the equivalent of Flatten in Keras x = x.view(x.size(0), -1) # two linear layers with dropout in between x = F.relu(self.fc1(x)) x = self.fc1_drop(x) x = self.fc2(x) # final output return x
# instantiate your Net net = Net() # load the net parameters by name net.load_state_dict(torch.load('saved_models/fashion_net_ex.pt')) print(net)
import torch.optim as optim ## TODO: specify loss function # using cross entropy whcih combines softmax and NLL loss criterion = nn.CrossEntropyLoss() ## TODO: specify optimizer # stochastic gradient descent with a small learning rate AND some momentum optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
# Calculate accuracy before training correct = 0 total = 0 # Iterate through test dataset for images, labels in test_loader: # forward pass to get outputs # the outputs are a series of class scores outputs = net(images) # get the predicted class from the maximum value in the output-list of class scores _, predicted = torch.max(outputs.data, 1) # count up total number of correct labels # for which the predicted and true labels are equal total += labels.size(0) correct += (predicted == labels).sum() # calculate the accuracy # to convert `correct` from a Tensor into a scalar, use .item() accuracy = 100.0 * correct.item() / total # print it out! print('Accuracy before training: ', accuracy)
Train
def train(n_epochs): loss_over_time = [] # to track the loss as the network trains for epoch in range(n_epochs): # loop over the dataset multiple times running_loss = 0.0 for batch_i, data in enumerate(train_loader): # get the input images and their corresponding labels inputs, labels = data # zero the parameter (weight) gradients optimizer.zero_grad() # forward pass to get outputs outputs = net(inputs) # calculate the loss loss = criterion(outputs, labels) # backward pass to calculate the parameter gradients loss.backward() # update the parameters optimizer.step() # print loss statistics # to convert loss into a scalar and add it to running_loss, we use .item() running_loss += loss.item() if batch_i % 1000 == 999: # print every 1000 batches avg_loss = running_loss/1000 # record and print the avg loss over the 1000 batches loss_over_time.append(avg_loss) print('Epoch: {}, Batch: {}, Avg. Loss: {}'.format(epoch + 1, batch_i+1, avg_loss)) running_loss = 0.0 print('Finished Training') return loss_over_time
# define the number of epochs to train for n_epochs = 30 # start small to see if your model works, initially # call train training_loss = train(n_epochs)
可視化LOSS
# visualize the loss as the network trained plt.plot(training_loss) plt.xlabel('1000\'s of batches') plt.ylabel('loss') plt.ylim(0, 2.5) # consistent scale plt.show()
Test
# initialize tensor and lists to monitor test loss and accuracy test_loss = torch.zeros(1) class_correct = list(0. for i in range(10)) class_total = list(0. for i in range(10)) # set the module to evaluation mode net.eval() for batch_i, data in enumerate(test_loader): # get the input images and their corresponding labels inputs, labels = data # forward pass to get outputs outputs = net(inputs) # calculate the loss loss = criterion(outputs, labels) # update average test loss test_loss = test_loss + ((torch.ones(1) / (batch_i + 1)) * (loss.data - test_loss)) # get the predicted class from the maximum value in the output-list of class scores _, predicted = torch.max(outputs.data, 1) # compare predictions to true label # this creates a `correct` Tensor that holds the number of correctly classified images in a batch correct = np.squeeze(predicted.eq(labels.data.view_as(predicted))) # calculate test accuracy for *each* object class # we get the scalar value of correct items for a class, by calling `correct[i].item()` for i in range(batch_size): label = labels.data[i] class_correct[label] += correct[i].item() class_total[label] += 1 print('Test Loss: {:.6f}\n'.format(test_loss.numpy()[0])) for i in range(10): if class_total[i] > 0: print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % ( classes[i], 100 * class_correct[i] / class_total[i], np.sum(class_correct[i]), np.sum(class_total[i]))) else: print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i])) print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % ( 100. * np.sum(class_correct) / np.sum(class_total), np.sum(class_correct), np.sum(class_total)))
# Saving the model model_dir = 'saved_models/' model_name = 'fashion_net_ex.pt' # after training, save your model parameters in the dir 'saved_models' # when you're ready, un-comment the line below torch.save(net.state_dict(), model_dir+model_name)