CNN Weights - Learnable Parameters in PyTorch Neural Networks

PyTorch Parameter Class

Our Neural Network

class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        
        self.fc1 = nn.Linear(in_features=12*4*4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)
        
    def forward(self, t):
        # implement the forward pass
        return t

The hyperparameters we’ve used up to this point were the parameters that we used to construct our network’s architecture through the layers we constructed and assigned as class attributes.
Hyperparameter values are chosen arbitrarily.

Learnable Parameters

Learnable parameters are parameters whose values are learned during the training process.
With learnable parameters, we typically start out with a set of arbitrary values, and these values then get updated in an iterative fashion as the network learns.
Appropriate values are values that minimize the loss function.
We’ll the learnable parameters are the weights inside our network, and they live inside each layer.

Getting an Instance the Network

network = Network()

When this code executes, the code inside the init class constructor will run, assigning our layers as attributes before the object instance is returned.

The name init is short for initialize. In an object’s case, the attributes are initialized with values, and these values can indeed be other objects. In this way, objects can be nested inside other objects.

This is the case with our network class whose class attributes are initialized with instances of PyTorch layer classes. After the object is initialized, we can then access our object using the network variable.

> print(network)
Network(
    (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
    (conv2): Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))
    (fc1): Linear(in_features=192, out_features=120, bias=True)
    (fc2): Linear(in_features=120, out_features=60, bias=True)
    (out): Linear(in_features=60, out_features=10, bias=True)
)

What’s in the string representation?

Convolutional Layers

For the convolutional layers, the kernel_size argument is a Python tuple (5,5) even though we only passed the number 5 in the constructor.
This is because our filters actually have a height and width, and when we pass a single number, the code inside the layer’s constructor assumes that we want a square filter.

Network(
    (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
    (conv2): Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))
    (fc1): Linear(in_features=192, out_features=120, bias=True)
    (fc2): Linear(in_features=120, out_features=60, bias=True)
    (out): Linear(in_features=60, out_features=10, bias=True)
)

self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

The stride is an additional parameter that we could have set, but we left it out. When the stride is not specified in the layer constructor the layer automatically sets it.

The stride tells the conv layer how far the filter should slide after each operation in the overall convolution. This tuple says to slide by one unit when moving to the right and also by one unit when moving down.步幅告訴conv層，在整個卷積中，每個操作之後濾波器應該滑動多遠。這個元組表示當向右移動時滑動一個單元，向下移動時也滑動一個單元。

Linear Layers

For the linear layers, we have an additional parameter called bias which has a default parameter value of true. It is possible to turn this off by setting it to false.

self.fc1 = nn.Linear(in_features=12*4*4, out_features=120)
self.fc2 = nn.Linear(in_features=120, out_features=60)
self.out = nn.Linear(in_features=60, out_features=10)

Accessing the Network’s Layers

In Python and many other programming languages, we access attributes and methods of objects using dot notation.

> network.conv1
Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))

> network.conv2
Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))

> network.fc1
Linear(in_features=192, out_features=120, bias=True)

> network.fc2                                    
Linear(in_features=120, out_features=60, bias=True)

> network.out
Linear(in_features=60, out_features=10, bias=True)

Accessing the Layer Weights

> network.conv1.weight
Parameter containing:
tensor([[[[ 0.0692,  0.1029, -0.1793,  0.0495,  0.0619],
            [ 0.1860,  0.0503, -0.1270, -0.1240, -0.0872],
            [-0.1924, -0.0684, -0.0028,  0.1031, -0.1053],
            [-0.0607,  0.1332,  0.0191,  0.1069, -0.0977],
            [ 0.0095, -0.1570,  0.1730,  0.0674, -0.1589]]],


        [[[-0.1392,  0.1141, -0.0658,  0.1015,  0.0060],
            [-0.0519,  0.0341,  0.1161,  0.1492, -0.0370],
            [ 0.1077,  0.1146,  0.0707,  0.0927,  0.0192],
            [-0.0656,  0.0929, -0.1735,  0.1019, -0.0546],
            [ 0.0647, -0.0521, -0.0687,  0.1053, -0.0613]]],


        [[[-0.1066, -0.0885,  0.1483, -0.0563,  0.0517],
            [ 0.0266,  0.0752, -0.1901, -0.0931, -0.0657],
            [ 0.0502, -0.0652,  0.0523, -0.0789, -0.0471],
            [-0.0800,  0.1297, -0.0205,  0.0450, -0.1029],
            [-0.1542,  0.1634, -0.0448,  0.0998, -0.1385]]],


        [[[-0.0943,  0.0256,  0.1632, -0.0361, -0.0557],
            [ 0.1083, -0.1647,  0.0846, -0.0163,  0.0068],
            [-0.1241,  0.1761,  0.1914,  0.1492,  0.1270],
            [ 0.1583,  0.0905,  0.1406,  0.1439,  0.1804],
            [-0.1651,  0.1374,  0.0018,  0.0846, -0.1203]]],


        [[[ 0.1786, -0.0800, -0.0995,  0.1690, -0.0529],
            [ 0.0685,  0.1399,  0.0270,  0.1684,  0.1544],
            [ 0.1581, -0.0099, -0.0796,  0.0823, -0.1598],
            [ 0.1534, -0.1373, -0.0740, -0.0897,  0.1325],
            [ 0.1487, -0.0583, -0.0900,  0.1606,  0.0140]]],


        [[[ 0.0919,  0.0575,  0.0830, -0.1042, -0.1347],
            [-0.1615,  0.0451,  0.1563, -0.0577, -0.1096],
            [-0.0667, -0.1979,  0.0458,  0.1971, -0.1380],
            [-0.1279,  0.1753, -0.1063,  0.1230, -0.0475],
            [-0.0608, -0.0046, -0.0043, -0.1543,  0.1919]]]], 
            requires_grad=True
）

The output is a tensor,One thing to notice about the weight tensor output is that it says parameter containing at the top of the output. This is because this particular tensor is a special tensor because its values or scalar components are learnable parameters of our network.

This means that the values inside this tensor, the ones we see above, are actually learned as the network is trained. As we train, these weight values are updated in such a way that the loss function is minimized.

PyTorch Parameter Class

To keep track of all the weight tensors inside the network. PyTorch has a special class called Parameter.
The Parameter class extends the tensor class, and so the weight tensor inside every layer is an instance of this Parameter class. This is why we see the Parameter containing text at the top of the string representation output。
PyTorch’s nn.Module class is basically looking for any attributes whose values are instances of the Parameter class, and when it finds an instance of the parameter class, it keeps track of it.

Weight Tensor Shape

For the convolutional layers, the weight values live inside the filters, and in code, the filters are actually the weight tensors themselves.

The convolution operation inside a layer is an operation between the input channels to the layer and the filter inside the layer.層內的卷積運算是該層的輸入通道與該層內的濾波器之間的運算。

Remember, the shape of a tensor really encodes all the information we need to know about the tensor.

For the first conv layer, we have 1 color channel that should be convolved by 6 filters of size 5x5 to produce 6 output channels. This is how we interpret the values inside our layer constructor.

> network.conv1
Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))

Inside our layer though, we don’t explicitly have 6 weight tensors for each of the 6 filters. We actually represent all 6 filters using a single weight tensor whose shape reflects or accounts for the 6 filters.但是在我們的層裏面，我們並沒有顯式地爲這6個過濾器中的每個都有6個權值張量。我們實際上用一個權張量來表示所有的6個濾波器，它的形狀反映或解釋了6個濾波器。

The shape of the weight tensor for the first convolutional layer shows us that we have a rank-4 weight tensor. The first axis has a length of 6, and this accounts for the 6 filters.

> network.conv1.weight.shape
torch.Size([6, 1, 5, 5])

The second axis has a length of 1 which accounts for the single input channel, and the last two axes account for the height and width of the filter.

The way to think about this is as if we are packaging all of our filters into a single tensor.考慮這個問題的方法是好像把所有的濾波器都打包進一個張量。
Now, the second conv layer has 12 filters, and instead of convolving a single input channel, there are 6 input channels coming from the previous layer.

> network.conv2.weight.shape
torch.Size([12, 6, 5, 5])

Think of this value of 6 here as giving each of the filters some depth. Instead of having a filter that convolves all of the channels iteratively, our filter has a depth that matches the number of channels.將這裏的值6考慮爲每個過濾器提供了一定的深度。不是使用一個對所有通道進行卷積的過濾器不斷迭代，而是使用一個與通道數量匹配的深度的濾波器。

Our filters are represented using a single tensor and that each filter inside the tensor also has a depth that accounts for the input channels that are being convolved.

All filters are represented using a single tensor.
Filters have depth that accounts for the input channels.

Our tensors are rank-4 tensors.
The first axis represents the number of filters.
The second axis represents the depth of each filter which corresponds to the number of input channels being convolved.
The last two axes represent the height and width of each filter. We can pull out any single filter by indexing into the weight tensor’s first axis.我們可以通過標引權張量的第一個軸來提取任何單個的濾波器。

(Number of filters, Depth, Height, Width)

Weight Matrix

With linear layers or fully connected layers, we have flattened rank-1 tensors as input and as output. The way we transform the in_features to the out_features in a linear layer is by using a rank-2 tensor that is commonly called a weight matrix.

This is due to the fact that the weight tensor is of rank-2 with height and width axes.

> network.fc1.shape
torch.Size([120, 192])

> network.fc2.shape                                    
torch.Size([60, 120])

> network.out.shape
torch.Size([10, 60])

Here we can see that each of our linear layers have a rank-2 weight tensor. The pattern that we can see here is that the height of the weight tensor has the length of the desired output features and a width of the input features.高度表述輸出的特徵數，寬度表示輸入的特徵數

Matrix Multiplication

For each row-column combination in the output, the value is obtained by taking the dot product of the corresponding row of the first matrix with the corresponding column of the second matrix.
The dot product means that we sum the products of corresponding components.

Linear Function Represented Using a Matrix

Specifically, the weight matrix is a linear function also called a linear map that maps a vector space of 4 dimensions to a vector space of 3 dimensions.

When we change the weight values inside the matrix, we are actually changing this function, and this is exactly what we want to do as we search for the function that our network is ultimately approximating.

Using PyTorch for Matrix Multiplication

Here, we have the in_features and the weight_matrix as tensors, and we’re using the tensor method called matmul() to perform the operation. The name matmul() as we now know is short for matrix multiplication.

> weight_matrix.matmul(in_features)
tensor([30., 40., 50.])

Accessing the Networks Parameters

for param in network.parameters():
    print(param.shape)

torch.Size([6, 1, 5, 5])
torch.Size([6])
torch.Size([12, 6, 5, 5])
torch.Size([12])
torch.Size([120, 192])
torch.Size([120])
torch.Size([60, 120])
torch.Size([60])
torch.Size([10, 60])
torch.Size([10])

for name, param in network.named_parameters():
    print(name, '\t\t', param.shape)

conv1.weight 		 torch.Size([6, 1, 5, 5])
conv1.bias 		 torch.Size([6])
conv2.weight 		 torch.Size([12, 6, 5, 5])
conv2.bias 		 torch.Size([12])
fc1.weight 		 torch.Size([120, 192])
fc1.bias 		 torch.Size([120])
fc2.weight 		 torch.Size([60, 120])
fc2.bias 		 torch.Size([60])
out.weight 		 torch.Size([10, 60])
out.bias 		 torch.Size([10])

How Linear Layers Work

When the input features are received by a linear layer, they are received in the form of a flattened 1-dimensional tensor and are then multiplied by the weight matrix. This matrix multiplication produces the output features.

Transform Using a Matrix

in_features = torch.tensor([1,2,3,4], dtype=torch.float32)

weight_matrix = torch.tensor([
    [1,2,3,4],
    [2,3,4,5],
    [3,4,5,6]
], dtype=torch.float32)

> weight_matrix.matmul(in_features)
tensor([30., 40., 50.])

In general, the weight matrix defines a linear function that maps a 1-dimensional tensor with four elements to a 1-dimensional tensor that has three elements. We can think of this function as a mapping from 4-dimensional Euclidean space to 3-dimensional Euclidean space.權重矩陣相當於把一個一維四個元素的張量映射到一個一維三個元素的張量

This is how linear layers work as well. They map an in_feature space to an out_feature space using a weight matrix.

Transform Using a PyTorch Linear Layer

Here, we have it. We’ve defined a linear layer that accepts 4 in features and transforms these into 3 out features, so we go from 4-dimensional space to 3-dimensional space.

fc = nn.Linear(in_features=4, out_features=3, bias=False)

The weight matrix is lives inside the PyTorch LinearLayer class and is created by PyTorch. The PyTorch LinearLayer class uses the numbers 4 and 3 that are passed to the constructor to create a 3 x 4 weight matrix.

Mathematical Notation of the Linear Transformation

y=Ax+b.

TonyHsuM

發佈了24 篇原創文章 · 獲贊 6 · 訪問量 3690

私信關注

Pytorch——CNN Weights - Learnable Parameters in PyTorch Neural Networks