cs231n assignment2 ConvolutionalNetworks

Convolution

Naive forward pass

實現卷積操作,按照卷積的流程,寫出最能理解的代碼。

N, C, H, W = x.shape
F, _, HH, WW = w.shape
stride, pad = conv_param['stride'], conv_param['pad']
# 整除
H_out = 1 + (H + 2 * pad - HH) // stride
W_out = 1 + (W + 2 * pad - WW) // stride
# 預先分配
out = np.zeros((N, F, H_out, W_out))

# 只在長、寬上做變換,故只寫第三第四維度
# 以常量進行填充,以constant_values=0填充
x_pad = np.pad(x, ((0, 0), (0, 0), (pad, pad), (pad, pad)), mode='constant', constant_values=0)

for i in range(N):  # 對於第i幅圖片
	for f in range(F):  # 對於第f個濾波器
		for j in range(H_out):
			for k in range(W_out):
				out[i, f, j, k] = np.sum(x_pad[i, :, j * stride: HH + j * stride, k * stride: WW + k * stride] * w[f]) + b[f]

Naive backward pass

x, w, b, conv_param = cache
N, C, H, W = x.shape
F, _, HH, WW = w.shape
stride, pad = conv_param['stride'], conv_param['pad']
H_out = 1 + (H + 2 * pad - HH) / stride
W_out = 1 + (W + 2 * pad - WW) / stride

# padding
x_pad = np.pad(x, ((0, 0), (0, 0), (pad, pad), (pad, pad)), mode='constant', constant_values=0)
# 預分配
dx = np.zeros_like(x)
dx_pad = np.zeros_like(x_pad)
dw = np.zeros_like(w)
db = np.zeros_like(b)

for i in range(N):  # 第i個圖片
	for f in range(F):  # 第f個濾波器
		for j in range(H_out):
			for k in range(W_out):
				window = x_pad[i, :, j * stride: HH + j * stride, k * stride: WW + k * stride]
				db[f] += dout[i, f, j, k]
                dw[f] += window * dout[i, f, j, k]
                dx_pad[i, :, j * stride:HH + j * stride, k * stride: WW + k * stride] += w[f] * dout[i, f, j, k]

dx = dx_pad[:, :, pad: -pad, pad: -pad]

Max-Pooling

Naive forward

N, C, H, W = x.shape
pool_height, pool_width, stride = pool_param['pool_height'], pool_param['pool_width'], pool_param['stride']
H_out = 1 + (H - pool_height) // stride
W_out = 1 + (W - pool_width) // stride

out = np.zeros((N, C, H_out, W_out))
for j in range(H_out):
	for k in range(W_out):
		out[:, :, j, k] = np.max(x[:, :, j * stride:pool_height + j * stride, k * stride: pool_width + k * stride],
                                     axis=(2, 3))

pass

Naive backward

x, pool_param = cache
N, C, H, W = x.shape
pool_height, pool_width, stride = pool_param['pool_height'], pool_param['pool_width'], pool_param['stride']
H_out = 1 + (H - pool_height) // stride
W_out = 1 + (W - pool_width) // stride

dx = np.zeros_like(x)

for i in range(H_out):
	for j in range(W_out):
		x_masked = x[:, :, i * stride:i * stride + pool_width, j * stride:j * stride + pool_height]
		# print(x_masked.shape)
		max_x_masked = np.max(x_masked, axis=(2, 3))[:,:,None,None]
		# print(max_x_masked.shape)
		dx[:, :, i * stride:i * stride + pool_height, j * stride:j * stride + pool_width] += (x_masked == max_x_masked) * dout[:,:,i,j][:,:,None,None]
pass

Three-layer ConvNet

cnn.py
直接使用已經封裝好的方法實現三層網絡

首先計算scores

out_conv, cache_conv = conv_relu_pool_forward(X, W1, b1, conv_param, pool_param)
out_fc1, cache_fc1 = affine_relu_forward(out_conv, W2, b2)
scores, cache_fc2 = affine_forward(out_fc1, W3, b3)

接着計算loss和梯度,注意需要添加正則項

loss, dout = softmax_loss(scores,y)
loss += 0.5 * self.reg * (np.sum(W1 ** 2) + np.sum(W2 ** 2) + np.sum(W3 ** 2))


dx3, dW3, db3 = affine_backward(dout,cache_fc2)
dx2, dW2, db2 = affine_relu_backward(dx3, cache_fc1)
dx1, dw1, db1 = conv_relu_pool_backward(dx2, cache_conv)
grads['W3'] = dW3 + self.reg * W3
grads['b3'] = db3
grads['W2'] = dW2 + self.reg * W2
grads['b2'] = db2
grads['W1'] = dw1 + self.reg * W1
grads['b1'] = db1

Spatial Batch Normalization

Batchnorm對於全連接網絡來講有着巨大的作用,可以幫助網絡實現正則化,防止過擬合。對於卷積神經網絡需要對之前的Batchnorm做些小修改,主要表現在數據維度層面。之前的batchnorn是兩維的,現在的卷積神經網絡中數據成了4維,故需要用transpose和reshape函數對數據進行變換。

前向傳播
layer.py中的spatial_batchnorm_forward()

N, C, H, W = x.shape
x_new = x.transpose(0, 2, 3, 1).reshape(N * H * W, C)
out, cache = batchnorm_forward(x_new, gamma, beta, bn_param)
out = out.reshape(N, H, W, C).transpose(0, 3, 1, 2)

layers.py中的spatial_batchnorm_backward()

N, C, H, W = dout.shape
dout_new = dout.transpose(0, 2, 3, 1).reshape(N * H * W, C)
dx, dgamma, dbeta = batchnorm_backward_alt(dout_new, cache)
dx = dx.reshape(N, H, W, C).transpose(0, 3, 1, 2)

Group Normalization

Group Normalization是2018年發表至ECCV的一篇論文提出來的。主要思想是折衷Layer Normalization和Instance Normalization的一種方法,即克服了batch的約束,又保證了收斂速度。
其主要思想是將特徵通道進行分組,然後以[N,G,C//G,H,W]的維度進行歸一化。

forward

N, C, H, W = x.shape
# 將特徵通道數分組,按照分組重新設置形狀
x_group = x.reshape((N, G, C // G, H, W))
mean = np.mean(x_group, axis=(2, 3, 4), keepdims=True)
var = np.var(x_group, axis=(2, 3, 4), keepdims=True)
x_norm = (x_group - mean) / np.sqrt(var + eps)  # 歸一化
x_norm = x_norm.reshape((N, C, H, W))  # 還原維度
out = x_norm * gamma + beta
cache = (x, gamma, beta, G, eps, mean, var, x_norm)

backward

反向傳播代碼參考他人博客,博客地址見[1]

N, C, H, W = dout.shape
x, gamma, beta, G, eps, mean, var, x_norm = cache

dbeta = np.sum(dout, axis=(0, 2, 3), keepdims=True)
dgamma = np.sum(dout * x_norm, axis=(0, 2, 3), keepdims=True)

dx_norm = dout * gamma
dx_groupnorm = dx_norm.reshape((N, G, C // G, H, W))

x_group = x.reshape((N, G, C // G, H, W))
dvar = np.sum(dx_groupnorm * -1.0 / 2 * (x_group - mean) / (var + eps) ** (3.0 / 2), axis=(2, 3, 4), keepdims=True)

N_GROUP = C // G * H * W
dmean1 = np.sum(dx_groupnorm * -1.0 / np.sqrt(var + eps), axis=(2, 3, 4), keepdims=True)
dmean2_var = dvar * -2.0 / N_GROUP * np.sum(x_group - mean, axis=(2, 3, 4), keepdims=True)
dmean = dmean1 + dmean2_var

dx_group1 = dx_groupnorm * 1.0 / np.sqrt(var + eps)
dx_group2_mean = dmean * 1.0 / N_GROUP
dx_group3_var = dvar * 2.0 / N_GROUP * (x_group - mean)
dx_group = dx_group1 + dx_group2_mean + dx_group3_var

dx = dx_group.reshape((N, C, H, W))

參考文章
[1] https://blog.csdn.net/weixin_42880443/article/details/81589745
[2] https://blog.csdn.net/u013832707/article/details/83059540

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章