我们先从一个简单的例子来看看PyTorch和Tensorflow的区别
import torch import tensorflow as tf if __name__ == "__main__": A = torch.Tensor([0]) B = torch.Tensor([10]) while (A < B)[0]: A += 2 B += 1 print(A, B) C = tf.constant([0]) D = tf.constant([10]) while (C < D)[0]: C = tf.add(C, 2) D = tf.add(D, 1) print(C, D)
运行结果
tensor([20.]) tensor([20.])
tf.Tensor([20], shape=(1,), dtype=int32) tf.Tensor([20], shape=(1,), dtype=int32)
这里我们可以看到PyTorch更简洁,不需要那么多的接口API,更接近于Python编程本身。
张量
import torch if __name__ == "__main__": a = torch.Tensor([[1, 2], [3, 4]]) print(a) print(a.shape) print(a.type()) b = torch.Tensor(2, 3) print(b) print(b.type()) c = torch.ones(2, 2) print(c) print(c.type()) d = torch.zeros(2, 2) print(d) print(d.type()) # 定义一个对角矩阵 e = torch.eye(2, 2) print(e) print(e.type()) # 定义一个和b相同形状的全0矩阵 f = torch.zeros_like(b) print(f) # 定义一个和b相同形状的全1矩阵 g = torch.ones_like(b) print(g) # 定义一个序列 h = torch.arange(0, 11, 1) print(h) print(h.type()) # 获取2~10之间等间隔的4个值 i = torch.linspace(2, 10, 4) print(i)
运行结果
tensor([[1., 2.],
[3., 4.]])
torch.Size([2, 2])
torch.FloatTensor
tensor([[7.0976e+22, 4.1828e+09, 4.2320e+21],
[1.1818e+22, 7.0976e+22, 1.8515e+28]])
torch.FloatTensor
tensor([[1., 1.],
[1., 1.]])
torch.FloatTensor
tensor([[0., 0.],
[0., 0.]])
torch.FloatTensor
tensor([[1., 0.],
[0., 1.]])
torch.FloatTensor
tensor([[0., 0., 0.],
[0., 0., 0.]])
tensor([[1., 1., 1.],
[1., 1., 1.]])
tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
torch.LongTensor
tensor([ 2.0000, 4.6667, 7.3333, 10.0000])
从以上结果可以看到,PyTorch不仅可以直接定义具体的值还可以直接定义形状,而直接定义形状会得到一个随机初始化的值。然后就是全0、全1、对角矩阵以及相同形状的全0矩阵、相同形状的全1矩阵,连续序列,这些tensorflow里面都有。但也有一些跟tensorflow不同的部分
# 生成0~1之间的随机值矩阵 j = torch.rand(2, 2) print(j) # 生成一个正态分布,包含均值和标准差 k = torch.normal(mean=0.0, std=torch.rand(5)) print(k) # 生成一个均匀分布 l = torch.Tensor(2, 2).uniform_(-1, 1) print(l) # 生成一个乱序序列 m = torch.randperm(10) print(m) print(m.type())
运行结果
tensor([[0.6257, 0.5132],
[0.5961, 0.8336]])
tensor([ 0.4738, 0.3879, 0.0394, -0.3446, 0.4863])
tensor([[ 0.6498, -0.8387],
[ 0.3767, -0.9012]])
tensor([9, 6, 1, 3, 0, 2, 7, 5, 8, 4])
torch.LongTensor
这里我们需要注意的是torch.arange(0, 11, 1)以及torch.randperm(10)的结果类型都是torch.LongTensor,其他都是torch.FloatTensor的。
Tensor的属性
每一个Tensor有torch.dtype、torch.device、torch.layout三种属性。torch.dtype就是数据的类型,这个很容易理解。
- torch.device标识了torch.Tensor对象在创建之后所存储的设备名称——比如CPU,GPU。对于GPU一般是通过cuda来表示,如果我们的电脑有多块GPU的话,则使用cuda:0、cuda:1、cuda:2这样的方式来表示。
torch.layout表示torch.Tensor内存布局的对象。我们可以将其定义为稠密的张量,也就是我们日常使用的张量,如果不做特殊的定义的话,就是稠密的张量。换句话说就是它对应到内存中的一块连续的区域。还有一种方式就是可以采用稀疏的方式来对张量进行存储,而采用稀疏的方式存储的是Tensor中非0元素的座标。对于稠密的张量的完整定义如下
a = torch.tensor([1, 2, 3], dtype=torch.float32, device=torch.device('cpu')) print(a)
运行结果
tensor([1., 2., 3.])
如果是定义在GPU上,则定义如下
a = torch.tensor([1, 2, 3], dtype=torch.float32, device=torch.device('cuda:0'))
如果只有一块显卡可以选择默认的显卡
a = torch.tensor([1, 2, 3], dtype=torch.float32, device=torch.device('cuda'))
而稀疏表达了当前Tensor中非0元素的个数,非0元素的个数越多,我们当前的数据越稀疏,如果全部是0的话就是非稀疏的情况了。低秩描述了数据本身之间的关联性,秩表示了当前矩阵中的向量之间的一个线性可表示的关系,具体可以参考线性代数整理(二) 中矩阵的秩。通过稀疏可以使我们的模型变的非常的简单。机器学习中包含有参数的模型和无参数的模型,对于有参数的模型,如果这些参数中0的个数非常多,就意味着可以把模型进行简化。参数为0的项实际上是可以消掉的。此时参数的个数是降低了,模型变得更简单了。对于参数稀疏化的学习中,对于机器学习中是一个非常重要的性质。这是我们从机器学习的角度去介绍稀疏的意义。另外,我们通过对数据稀疏化的表示的时候,可以在内存中减少开销。假设有一个100*100的矩阵,其中只有一个非0的值,其他9999个值全部是0,此时如果我们在内存中采用稠密的张量进行存储的时候,此时我们就需要10000个内存单元来进行存储;如果我们采用稀疏的张量进行存储,此时我们只需要记录这个非0元素的座标就可以了,所以我们在定义稀疏张量的时候需要给出座标值和原值,定义如下
# 定义稀疏张量的座标值 indices = torch.tensor([[0, 1, 1], [2, 0, 2]]) # 定义稀疏张量的原值 values = torch.tensor([3, 4, 5], dtype=torch.float32) # 定义一个稀疏张量 a = torch.sparse_coo_tensor(indices, values, [2, 4]) print(a) # 转成稠密张量 print(a.to_dense())
运行结果
tensor(indices=tensor([[0, 1, 1],
[2, 0, 2]]),
values=tensor([3., 4., 5.]),
size=(2, 4), nnz=3, layout=torch.sparse_coo)
tensor([[0., 0., 3., 0.],
[4., 0., 5., 0.]])
转成稠密张量以后,我们可以发现,经过稀疏张量的定义,它是在(0,2)、(1,0)、(1,2)的位置上放置了3、4、5三个非0值,而其他地方的值都为0。]
我们可以再来定义一个对角矩阵的稀疏张量
# 定义稀疏张量的座标值 indices = torch.tensor([[0, 1, 2, 3], [0, 1, 2, 3]]) # 定义稀疏张量的原值 values = torch.tensor([3, 4, 5, 6], dtype=torch.float32) # 定义一个稀疏张量 a = torch.sparse_coo_tensor(indices, values, [4, 4]).to_dense() print(a)
运行结果
tensor([[3., 0., 0., 0.],
[0., 4., 0., 0.],
[0., 0., 5., 0.],
[0., 0., 0., 6.]])
为什么我们有的数据放在CPU上,有的放在GPU上,在图像处理的时候,我们需要对其中设计的数据进行合理的分配,比如我们在进行数据读取和数据预处理的时候,可能会优先放在CPU上来进行操作。对于参数的计算,进行推理和反向传播的过程,通常会放在GPU上进行运算。通过对资源合理的分配来实现对资源利用率的最大化,保证网络训练和迭代的过程变得更加的快。
Tensor的算数运算
- 加法
import torch if __name__ == "__main__": a = torch.tensor([1], dtype=torch.int32) b = torch.tensor([2], dtype=torch.int32) c = a + b print(c) c = torch.add(a, b) print(c) c = a.add(b) print(c) a.add_(b) print(a)
运行结果
tensor([3], dtype=torch.int32)
tensor([3], dtype=torch.int32)
tensor([3], dtype=torch.int32)
tensor([3], dtype=torch.int32)
PyTorch有四种加法运算,前三种效果是一样的,第四种会直接把a+b的值赋值给a。
但也有可能是一个向量或者是一个矩阵加上一个标量,则为这个向量或者矩阵所有的分量全部加上这个标量
import torch if __name__ == "__main__": a = torch.tensor([[1, 2, 3], [4, 5, 6]], dtype=torch.int32) b = torch.tensor([2], dtype=torch.int32) c = a + b print(c) c = torch.add(a, b) print(c) c = a.add(b) print(c) a.add_(b) print(a)
运行结果
tensor([[3, 4, 5],
[6, 7, 8]], dtype=torch.int32)
tensor([[3, 4, 5],
[6, 7, 8]], dtype=torch.int32)
tensor([[3, 4, 5],
[6, 7, 8]], dtype=torch.int32)
tensor([[3, 4, 5],
[6, 7, 8]], dtype=torch.int32)
如果两者都是向量或者矩阵,则必须两者的最后一个维度的长度相同,否者会报错
import torch if __name__ == "__main__": a = torch.tensor([[1, 2, 3], [4, 5, 6]], dtype=torch.int32) b = torch.tensor([2, 4, 6], dtype=torch.int32) c = a + b print(c) c = torch.add(a, b) print(c) c = a.add(b) print(c) a.add_(b) print(a)
运行结果
tensor([[ 3, 6, 9],
[ 6, 9, 12]], dtype=torch.int32)
tensor([[ 3, 6, 9],
[ 6, 9, 12]], dtype=torch.int32)
tensor([[ 3, 6, 9],
[ 6, 9, 12]], dtype=torch.int32)
tensor([[ 3, 6, 9],
[ 6, 9, 12]], dtype=torch.int32)
- 减法
import torch if __name__ == "__main__": a = torch.tensor([1], dtype=torch.int32) b = torch.tensor([2], dtype=torch.int32) c = b - a print(c) c = torch.sub(b, a) print(c) c = b.sub(a) print(c) b.sub_(a) print(b)
运行结果
tensor([1], dtype=torch.int32)
tensor([1], dtype=torch.int32)
tensor([1], dtype=torch.int32)
tensor([1], dtype=torch.int32)
减法的其他规则与加法相同。
- 乘法
哈达玛积(element wise,对应元素相乘)
import torch if __name__ == "__main__": a = torch.tensor([[1, 2, 3], [4, 5, 6]], dtype=torch.int32) b = torch.tensor([2, 4, 6], dtype=torch.int32) c = a * b print(c) c = torch.mul(a, b) print(c) c = a.mul(b) print(c) a.mul_(b) print(a)
运行结果
tensor([[ 2, 8, 18],
[ 8, 20, 36]], dtype=torch.int32)
tensor([[ 2, 8, 18],
[ 8, 20, 36]], dtype=torch.int32)
tensor([[ 2, 8, 18],
[ 8, 20, 36]], dtype=torch.int32)
tensor([[ 2, 8, 18],
[ 8, 20, 36]], dtype=torch.int32)
- 除法
import torch if __name__ == "__main__": a = torch.tensor([[1, 2, 3], [4, 5, 6]], dtype=torch.float32) b = torch.tensor([2, 4, 6], dtype=torch.int32) c = a / b print(c) c = torch.div(a, b) print(c) c = a.div(b) print(c) a.div_(b) print(a)
运行结果
tensor([[0.5000, 0.5000, 0.5000],
[2.0000, 1.2500, 1.0000]])
tensor([[0.5000, 0.5000, 0.5000],
[2.0000, 1.2500, 1.0000]])
tensor([[0.5000, 0.5000, 0.5000],
[2.0000, 1.2500, 1.0000]])
tensor([[0.5000, 0.5000, 0.5000],
[2.0000, 1.2500, 1.0000]])
- 矩阵运算
二维矩阵乘法运算
import torch if __name__ == "__main__": a = torch.Tensor([[1, 2, 3], [4, 5, 6]]) b = torch.Tensor([[2, 4], [11, 13], [7, 9]]) c = a @ b print(c) c = torch.mm(a, b) print(c) c = torch.matmul(a, b) print(c) c = a.matmul(b) print(c) c = a.mm(b) print(c)
运行结果
tensor([[ 45., 57.],
[105., 135.]])
tensor([[ 45., 57.],
[105., 135.]])
tensor([[ 45., 57.],
[105., 135.]])
tensor([[ 45., 57.],
[105., 135.]])
tensor([[ 45., 57.],
[105., 135.]])
关于矩阵乘法的数学意义请参考线性代数整理 中矩阵和矩阵的乘法
对于高维的Tensor(dim>2),定义其矩阵乘法仅在最后的两个维度上,要求前面的维度必须保持一致,就像矩阵的索引一样并且运算操作只有torch.matmul()。
import torch if __name__ == "__main__": a = torch.ones(1, 2, 3, 4) b = torch.ones(1, 2, 4, 3) c = torch.matmul(a, b) print(c) c = a.matmul(b) print(c)
运行结果
tensor([[[[4., 4., 4.],
[4., 4., 4.],
[4., 4., 4.]],
[[4., 4., 4.],
[4., 4., 4.],
[4., 4., 4.]]]])
tensor([[[[4., 4., 4.],
[4., 4., 4.],
[4., 4., 4.]],
[[4., 4., 4.],
[4., 4., 4.],
[4., 4., 4.]]]])
本次运算实际上是作用在a的3,4和b的4,3这两个维度上的。而对于前两维需要保持一致,这里都是1,2。
- 幂运算
import torch if __name__ == "__main__": a = torch.Tensor([1, 2, 3]) c = torch.pow(a, 2) print(c) c = a**2 print(c) a.pow_(2) print(a) # 计算e的a次方 a = torch.Tensor([2]) c = torch.exp(a) print(c) c = a.exp_() print(c)
运行结果
tensor([1., 4., 9.])
tensor([1., 4., 9.])
tensor([1., 4., 9.])
tensor([7.3891])
tensor([7.3891])
- 开方运算
import torch if __name__ == "__main__": a = torch.Tensor([1, 2, 3]) c = torch.sqrt(a) print(c) c = a.sqrt() print(c) a.sqrt_() print(a)
运行结果
tensor([1.0000, 1.4142, 1.7321])
tensor([1.0000, 1.4142, 1.7321])
tensor([1.0000, 1.4142, 1.7321])
- 对数运算
import torch if __name__ == "__main__": a = torch.Tensor([1, 2, 3]) c = torch.log2(a) print(c) c = torch.log10(a) print(c) c = torch.log(a) print(c) torch.log_(a) print(a)
运行结果
tensor([0.0000, 1.0000, 1.5850])
tensor([0.0000, 0.3010, 0.4771])
tensor([0.0000, 0.6931, 1.0986])
tensor([0.0000, 0.6931, 1.0986])
in-place概念和广播机制
- in-place是指在进行运算的过程中,不允许使用临时变量,也就是所谓的"就地"操作,也称为原位操作。
import torch if __name__ == "__main__": a = torch.Tensor([1, 2, 3]) b = torch.Tensor([4]) a = a + b print(a)
运行结果
tensor([5., 6., 7.])
又比如之前说的add_、sub_、mul_等等都属于in-place。
- 广播机制:张量参数可以自动扩展为相同大小。需要满足两个条件:
- 每个张量至少有一个维度
- 满足右对齐
import torch if __name__ == "__main__": a = torch.rand(2, 1, 1) print(a) b = torch.rand(3) print(b) c = a + b print(c) print(c.shape)
运行结果
tensor([[[0.9496]],
[[0.5661]]])
tensor([0.0402, 0.8962, 0.5040])
tensor([[[0.9898, 1.8458, 1.4536]],
[[0.6063, 1.4623, 1.0701]]])
torch.Size([2, 1, 3])
在上面这个例子中,我们可以看到a的最低一维是1,但是它有高维度,而b是一个三维向量,没有高维度。通过结果c的形状,我们可以发现c的高维度与a保持一致,而最后一个维度与b保持一致。也就是说最后一维的长度,两个张量要么长度相等,要么其中有一个为1,就可以运算,否则就会报错。这就是广播机制。
import torch if __name__ == "__main__": a = torch.rand(2, 4, 1, 3) print(a) b = torch.rand(4, 2, 3) print(b) c = a + b print(c) print(c.shape)
运行结果
tensor([[[[0.1862, 0.8673, 0.2926]],
[[0.6385, 0.6885, 0.8268]],
[[0.3837, 0.3433, 0.0975]],
[[0.4689, 0.4580, 0.4023]]],
[[[0.1647, 0.5968, 0.5279]],
[[0.8252, 0.7446, 0.1916]],
[[0.9649, 0.6015, 0.5151]],
[[0.7504, 0.8202, 0.7865]]]])
tensor([[[0.7811, 0.8357, 0.2585],
[0.8866, 0.3935, 0.4450]],
[[0.2543, 0.7985, 0.1959],
[0.5357, 0.3883, 0.4426]],
[[0.8317, 0.2597, 0.9586],
[0.2829, 0.8665, 0.2853]],
[[0.7220, 0.7107, 0.9395],
[0.8345, 0.0955, 0.3690]]])
tensor([[[[0.9673, 1.7029, 0.5510],
[1.0727, 1.2608, 0.7375]],
[[0.8928, 1.4871, 1.0227],
[1.1742, 1.0768, 1.2694]],
[[1.2154, 0.6029, 1.0561],
[0.6666, 1.2098, 0.3828]],
[[1.1909, 1.1687, 1.3418],
[1.3034, 0.5535, 0.7713]]],
[[[0.9458, 1.4325, 0.7864],
[1.0512, 0.9904, 0.9729]],
[[1.0795, 1.5432, 0.3876],
[1.3609, 1.1329, 0.6342]],
[[1.7966, 0.8611, 1.4737],
[1.2478, 1.4680, 0.8004]],
[[1.4724, 1.5309, 1.7260],
[1.5849, 0.9157, 1.1555]]]])
torch.Size([2, 4, 2, 3])
取整/取余运算
import torch if __name__ == "__main__": a = torch.rand(2, 2) a.mul_(10) print(a) # 向下取整 print(torch.floor(a)) # 向上取整 print(torch.ceil(a)) # 四舍五入 print(torch.round(a)) # 取整数部分 print(torch.trunc(a)) # 取小数部分 print(torch.frac(a)) # 取余 print(a % 2)
运行结果
tensor([[5.8996, 9.2745],
[1.0162, 8.2628]])
tensor([[5., 9.],
[1., 8.]])
tensor([[ 6., 10.],
[ 2., 9.]])
tensor([[6., 9.],
[1., 8.]])
tensor([[5., 9.],
[1., 8.]])
tensor([[0.8996, 0.2745],
[0.0162, 0.2628]])
tensor([[1.8996, 1.2745],
[1.0162, 0.2628]])
比较运算
import torch import numpy as np if __name__ == "__main__": a = torch.Tensor([[1, 2, 3], [4, 5, 6]]) b = torch.Tensor([[1, 4, 9], [6, 5, 7]]) c = torch.rand(2, 4) d = a print(a) print(b) # 比较张量中的每一个值是否相等(张量的形状必须相同),返回相同形状的布尔值 print(torch.eq(a, b)) print(torch.eq(a, d)) # 比较张量的形状和值是否都相同,比较的两个张量的形状可以不同,但会返回False print(torch.equal(a, b)) print(torch.equal(a, c)) # 比较第一个张量中的每一个值是否大于等于第二个张量的相同位置上的值 # (张量的形状必须相同),返回相同形状的布尔值 print(torch.ge(a, b)) # 比较第一个张量中的每一个值是否大于第二个张量的相同位置上的值 # (张量的形状必须相同),返回相同形状的布尔值 print(torch.gt(a, b)) # 比较第一个张量中的每一个值是否小于等于第二个张量的相同位置上的值 # (张量的形状必须相同),返回相同形状的布尔值 print(torch.le(a, b)) # 比较第一个张量中的每一个值是否小于第二个张量的相同位置上的值 # (张量的形状必须相同),返回相同形状的布尔值 print(torch.lt(a, b)) # 比较第一个张量中的每一个值是否不等于第二个张量的相同位置上的值 # (张量的形状必须相同),返回相同形状的布尔值 print(torch.ne(a, b))
运行结果
tensor([[1., 2., 3.],
[4., 5., 6.]])
tensor([[1., 4., 9.],
[6., 5., 7.]])
tensor([[ True, False, False],
[False, True, False]])
tensor([[True, True, True],
[True, True, True]])
False
False
tensor([[ True, False, False],
[False, True, False]])
tensor([[False, False, False],
[False, False, False]])
tensor([[True, True, True],
[True, True, True]])
tensor([[False, True, True],
[ True, False, True]])
tensor([[False, True, True],
[ True, False, True]])
排序
a = torch.Tensor([1, 4, 4, 3, 5]) # 升序排序 print(torch.sort(a)) # 降序排序 print(torch.sort(a, descending=True)) b = torch.Tensor([[1, 4, 4, 3, 5], [2, 3, 1, 3, 5]]) print(b.shape) # 升序排序 print(torch.sort(b)) # 在第一个维度进行升序排序 print(torch.sort(b, dim=0)) # 降序排序 print(torch.sort(b, descending=True)) # 在第一个维度进行降序排序 print(torch.sort(b, dim=0, descending=True))
运行结果
torch.return_types.sort(
values=tensor([1., 3., 4., 4., 5.]),
indices=tensor([0, 3, 1, 2, 4]))
torch.return_types.sort(
values=tensor([5., 4., 4., 3., 1.]),
indices=tensor([4, 1, 2, 3, 0]))
torch.Size([2, 5])
torch.return_types.sort(
values=tensor([[1., 3., 4., 4., 5.],
[1., 2., 3., 3., 5.]]),
indices=tensor([[0, 3, 1, 2, 4],
[2, 0, 1, 3, 4]]))
torch.return_types.sort(
values=tensor([[1., 3., 1., 3., 5.],
[2., 4., 4., 3., 5.]]),
indices=tensor([[0, 1, 1, 0, 0],
[1, 0, 0, 1, 1]]))
torch.return_types.sort(
values=tensor([[5., 4., 4., 3., 1.],
[5., 3., 3., 2., 1.]]),
indices=tensor([[4, 1, 2, 3, 0],
[4, 1, 3, 0, 2]]))
torch.return_types.sort(
values=tensor([[2., 4., 4., 3., 5.],
[1., 3., 1., 3., 5.]]),
indices=tensor([[1, 0, 0, 0, 0],
[0, 1, 1, 1, 1]]))
Top K
a = torch.Tensor([[1, 4, 4, 3, 5], [2, 3, 1, 3, 6]]) # 在第一个维度上找出1个最大值 print(torch.topk(a, k=1, dim=0)) # 在第一个维度上找出2个最大值,由于a在第一个维度上只有2行,所以k最大只能为2 print(torch.topk(a, k=2, dim=0)) # 在第二个维度上找出2个最大值,这里k最大可为5,因为第二个维度上有5个值 print(torch.topk(a, k=2, dim=1))
运行结果
torch.return_types.topk(
values=tensor([[2., 4., 4., 3., 6.]]),
indices=tensor([[1, 0, 0, 0, 1]]))
torch.return_types.topk(
values=tensor([[2., 4., 4., 3., 6.],
[1., 3., 1., 3., 5.]]),
indices=tensor([[1, 0, 0, 0, 1],
[0, 1, 1, 1, 0]]))
torch.return_types.topk(
values=tensor([[5., 4.],
[6., 3.]]),
indices=tensor([[4, 1],
[4, 1]]))
第k个最小值
a = torch.Tensor([[1, 4, 4, 3, 5], [2, 3, 1, 3, 6], [4, 5, 6, 7, 8]]) # 在第一个维度上找出第2小的数 print(torch.kthvalue(a, k=2, dim=0)) # 在第二个维度上找出第2小的数 print(torch.kthvalue(a, k=2, dim=1))
运行结果
torch.return_types.kthvalue(
values=tensor([2., 4., 4., 3., 6.]),
indices=tensor([1, 0, 0, 0, 1]))
torch.return_types.kthvalue(
values=tensor([3., 2., 5.]),
indices=tensor([3, 0, 1]))
数据合法性校验
a = torch.rand(2, 3) b = torch.Tensor([1, 2, np.nan]) print(a) # 是否有界 print(torch.isfinite(a)) print(torch.isfinite(a / 0)) # 是否无界 print(torch.isinf(a / 0)) # 是否为空 print(torch.isnan(a)) print(torch.isnan(b))
运行结果
tensor([[0.8657, 0.4002, 0.3988],
[0.2066, 0.5564, 0.3181]])
tensor([[True, True, True],
[True, True, True]])
tensor([[False, False, False],
[False, False, False]])
tensor([[True, True, True],
[True, True, True]])
tensor([[False, False, False],
[False, False, False]])
tensor([False, False, True])
三角函数
import torch if __name__ == '__main__': a = torch.Tensor([0, 0, 0]) print(torch.cos(a))
运行结果
tensor([1., 1., 1.])
统计学函数
import torch if __name__ == '__main__': a = torch.rand(2, 2) print(a) # 求均值 print(torch.mean(a)) # 对第一个维度求均值 print(torch.mean(a, dim=0)) # 求和 print(torch.sum(a)) # 对第一个维度求和 print(torch.sum(a, dim=0)) # 累积 print(torch.prod(a)) # 对第一个维度求累积 print(torch.prod(a, dim=0)) # 求第一个维度的最大值索引 print(torch.argmax(a, dim=0)) # 求第一个维度的最小值索引 print(torch.argmin(a, dim=0)) # 计算标准差 print(torch.std(a)) # 计算方差 print(torch.var(a)) # 获取中数(中间的数) print(torch.median(a)) # 获取众数(出现次数最多的数) print(torch.mode(a)) a = torch.rand(2, 2) * 10 print(a) # 打印直方图 print(torch.histc(a, 6, 0, 0)) a = torch.randint(0, 10, [10]) print(a) print(torch.bincount(a))
运行结果
tensor([[0.3333, 0.3611],
[0.4208, 0.6395]])
tensor(0.4387)
tensor([0.3771, 0.5003])
tensor(1.7547)
tensor([0.7541, 1.0006])
tensor(0.0324)
tensor([0.1403, 0.2309])
tensor([1, 1])
tensor([0, 0])
tensor(0.1388)
tensor(0.0193)
tensor(0.3611)
torch.return_types.mode(
values=tensor([0.3333, 0.4208]),
indices=tensor([0, 0]))
tensor([[1.9862, 7.6381],
[9.2323, 7.4402]])
tensor([1., 0., 0., 0., 2., 1.])
tensor([1, 1, 4, 1, 2, 7, 9, 1, 4, 7])
tensor([0, 4, 1, 0, 2, 0, 0, 2, 0, 1])
随机抽样
import torch if __name__ == '__main__': # 定义随机种子 torch.manual_seed(1) # 均值 mean = torch.rand(1, 2) # 标准差 std = torch.rand(1, 2) # 正态分布 print(torch.normal(mean, std))
运行结果
tensor([[0.7825, 0.7358]])
范数运算
import torch if __name__ == '__main__': a = torch.rand(2, 1) b = torch.rand(2, 1) print(a) print(b) # 计算1、2、3范数 print(torch.dist(a, b, p=1)) print(torch.dist(a, b, p=2)) print(torch.dist(a, b, p=3)) # a的2范数 print(torch.norm(a)) # a的3范数 print(torch.norm(a, p=3)) # a的核范数 print(torch.norm(a, p='fro'))
运行结果
tensor([[0.3291],
[0.8294]])
tensor([[0.0810],
[0.9734]])
tensor(0.3921)
tensor(0.2869)
tensor(0.2633)
tensor(0.8923)
tensor(0.8463)
tensor(0.8923)
关于范数的内容可以参考机器学习算法整理 中的欧拉距离、明可夫斯基距离以及机器学习算法整理(二) 中L1正则,L2正则
张量裁剪
import torch if __name__ == '__main__': a = torch.rand(2, 2) * 10 print(a) # 将小于2的变成2,大于5的变成5,2~5之间的不变 a.clamp_(2, 5) print(a)
运行结果
tensor([[1.6498, 5.2090],
[9.7682, 2.3269]])
tensor([[2.0000, 5.0000],
[5.0000, 2.3269]])
张量的索引与数据筛选
import torch if __name__ == '__main__': a = torch.rand(4, 4) b = torch.rand(4, 4) print(a) print(b) # 如果a中的值大于0.5则输出a中的值,否则输出b中的值 out = torch.where(a > 0.5, a, b) print(out) # 列出a比b大的座标 out = torch.where(a > b) print(out) # 挑选a的第一个维度的第0行、第3行、第2行构建出一个新的tensor out = torch.index_select(a, dim=0, index=torch.tensor([0, 3, 2])) print(out) # 挑选a的第二个维度的第0列、第3列、第2列构建出一个新的tensor out = torch.index_select(a, dim=1, index=torch.tensor([0, 3, 2])) print(out) a = torch.linspace(1, 16, 16).view(4, 4) print(a) # 挑选a的第一个维度的行索引取值,而其本身的位置代表取行索引的第几列 out = torch.gather(a, dim=0, index=torch.tensor([[0, 1, 1, 1], [0, 1, 2, 2], [0, 1, 3, 3]])) print(out) # 挑选a的第二个维度的索引取值 out = torch.gather(a, dim=1, index=torch.tensor([[0, 1, 1, 1], [0, 1, 2, 2], [0, 1, 3, 3]])) print(out) mask = torch.gt(a, 8) print(mask) # 挑选mask中为True的值 out = torch.masked_select(a, mask) print(out) # 先将a扁平化处理成向量,再获取索引上的值,输出也是一个向量 out = torch.take(a, index=torch.tensor([0, 15, 13, 10])) print(out) a = torch.tensor([[0, 1, 2, 0], [2, 3, 0, 1]]) # 获取非0元素的座标 out = torch.nonzero(a) print(out)
运行结果
tensor([[0.2301, 0.4003, 0.6914, 0.4822],
[0.7194, 0.8242, 0.1110, 0.6387],
[0.7997, 0.9174, 0.6136, 0.7631],
[0.9998, 0.5568, 0.9027, 0.7765]])
tensor([[0.4136, 0.4748, 0.0058, 0.3138],
[0.7306, 0.7052, 0.5451, 0.1708],
[0.0622, 0.9961, 0.7769, 0.2812],
[0.5140, 0.5198, 0.2314, 0.2854]])
tensor([[0.4136, 0.4748, 0.6914, 0.3138],
[0.7194, 0.8242, 0.5451, 0.6387],
[0.7997, 0.9174, 0.6136, 0.7631],
[0.9998, 0.5568, 0.9027, 0.7765]])
(tensor([0, 0, 1, 1, 2, 2, 3, 3, 3, 3]), tensor([2, 3, 1, 3, 0, 3, 0, 1, 2, 3]))
tensor([[0.2301, 0.4003, 0.6914, 0.4822],
[0.9998, 0.5568, 0.9027, 0.7765],
[0.7997, 0.9174, 0.6136, 0.7631]])
tensor([[0.2301, 0.4822, 0.6914],
[0.7194, 0.6387, 0.1110],
[0.7997, 0.7631, 0.6136],
[0.9998, 0.7765, 0.9027]])
tensor([[ 1., 2., 3., 4.],
[ 5., 6., 7., 8.],
[ 9., 10., 11., 12.],
[13., 14., 15., 16.]])
tensor([[ 1., 6., 7., 8.],
[ 1., 6., 11., 12.],
[ 1., 6., 15., 16.]])
tensor([[ 1., 2., 2., 2.],
[ 5., 6., 7., 7.],
[ 9., 10., 12., 12.]])
tensor([[False, False, False, False],
[False, False, False, False],
[ True, True, True, True],
[ True, True, True, True]])
tensor([ 9., 10., 11., 12., 13., 14., 15., 16.])
tensor([ 1., 16., 14., 11.])
tensor([[0, 1],
[0, 2],
[1, 0],
[1, 1],
[1, 3]])
张量的组合与拼接
import torch if __name__ == '__main__': a = torch.zeros((2, 4)) b = torch.ones((2, 4)) # 在第一个维度上进行拼接,此时不会增加维度,只会增加该维度的条数 out = torch.cat((a, b), dim=0) print(out) a = torch.linspace(1, 6, 6).view(2, 3) b = torch.linspace(7, 12, 6).view(2, 3) print(a) print(b) # 将a和b看成独立元素进行拼接,此时会增加一个维度,该维度表示a和b的2个元素 out = torch.stack((a, b), dim=0) print(out) print(out.shape) # 将a的每一行跟b的相同序号的一行拼接,拼接后形成一个维度 out = torch.stack((a, b), dim=1) print(out) print(out.shape) # 此处可以打印出原始的a print(out[:, 0, :]) # 此处可以打印出原始的b print(out[:, 1, :]) # 将a的每一行每一列跟b的相同序号的行相同序号的列拼接,拼接后形成一个维度 out = torch.stack((a, b), dim=2) print(out) print(out.shape) # 此处可以打印出原始的a print(out[:, :, 0]) # 此处可以打印出原始的b print(out[:, :, 1])
运行结果
tensor([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
tensor([[1., 2., 3.],
[4., 5., 6.]])
tensor([[ 7., 8., 9.],
[10., 11., 12.]])
tensor([[[ 1., 2., 3.],
[ 4., 5., 6.]],
[[ 7., 8., 9.],
[10., 11., 12.]]])
torch.Size([2, 2, 3])
tensor([[[ 1., 2., 3.],
[ 7., 8., 9.]],
[[ 4., 5., 6.],
[10., 11., 12.]]])
torch.Size([2, 2, 3])
tensor([[1., 2., 3.],
[4., 5., 6.]])
tensor([[ 7., 8., 9.],
[10., 11., 12.]])
tensor([[[ 1., 7.],
[ 2., 8.],
[ 3., 9.]],
[[ 4., 10.],
[ 5., 11.],
[ 6., 12.]]])
torch.Size([2, 3, 2])
tensor([[1., 2., 3.],
[4., 5., 6.]])
tensor([[ 7., 8., 9.],
[10., 11., 12.]])
张量切片
import torch if __name__ == '__main__': a = torch.rand(3, 4) print(a) # 对a的第一个维度进行切片,第一个维度是3行,无法进行两两切片, # 第一个切片为2,第二个切片为1 out = torch.chunk(a, 2, dim=0) print(out) # 对a的第二个维度进行切片,第二个维度是4列,可以进行两两切片, out = torch.chunk(a, 2, dim=1) print(out) out = torch.split(a, 2, dim=0) print(out) out = torch.split(a, 2, dim=1) print(out) # 按照list指定的长度进行切分 out = torch.split(a, [1, 1, 1], dim=0) print(out)
运行结果
tensor([[0.5683, 0.0800, 0.2068, 0.8908],
[0.8924, 0.8733, 0.6078, 0.8697],
[0.0428, 0.0265, 0.3515, 0.3164]])
(tensor([[0.5683, 0.0800, 0.2068, 0.8908],
[0.8924, 0.8733, 0.6078, 0.8697]]), tensor([[0.0428, 0.0265, 0.3515, 0.3164]]))
(tensor([[0.5683, 0.0800],
[0.8924, 0.8733],
[0.0428, 0.0265]]), tensor([[0.2068, 0.8908],
[0.6078, 0.8697],
[0.3515, 0.3164]]))
(tensor([[0.5683, 0.0800, 0.2068, 0.8908],
[0.8924, 0.8733, 0.6078, 0.8697]]), tensor([[0.0428, 0.0265, 0.3515, 0.3164]]))
(tensor([[0.5683, 0.0800],
[0.8924, 0.8733],
[0.0428, 0.0265]]), tensor([[0.2068, 0.8908],
[0.6078, 0.8697],
[0.3515, 0.3164]]))
(tensor([[0.5683, 0.0800, 0.2068, 0.8908]]), tensor([[0.8924, 0.8733, 0.6078, 0.8697]]), tensor([[0.0428, 0.0265, 0.3515, 0.3164]]))
张量变形
import torch if __name__ == '__main__': a = torch.rand(2, 3) print(a) # 先将a扁平化,再转成需要的形状 out = torch.reshape(a, (3, 2)) print(out) # 转置 print(torch.t(out)) a = torch.rand(1, 2, 3) print(a) # 交换两个维度 out = torch.transpose(a, 0, 1) print(out) print(out.shape) # 去除值为1的维度 out = torch.squeeze(a) print(out) print(out.shape) # 在指定位置(这里是最后一个位置)增加一个值为1的维度 out = torch.unsqueeze(a, -1) print(out) print(out.shape) # 在第二个维度进行切分 out = torch.unbind(a, dim=1) print(out) # 在第二个维度进行翻转 print(torch.flip(a, dims=[1])) # 在第三个维度进行翻转 print(torch.flip(a, dims=[2])) # 对第二、第三个维度都翻转 print(torch.flip(a, dims=[1, 2])) # 逆时针旋转90度 out = torch.rot90(a) print(out) print(out.shape) # 逆时针旋转180度 out = torch.rot90(a, 2) print(out) print(out.shape) # 逆时针旋转360度 out = torch.rot90(a, 4) print(out) print(out.shape) # 顺时针旋转90度 out = torch.rot90(a, -1) print(out) print(out.shape)
运行结果
tensor([[7.7492e-01, 8.1314e-01, 6.4422e-01],
[9.8577e-01, 5.3938e-01, 2.3049e-04]])
tensor([[7.7492e-01, 8.1314e-01],
[6.4422e-01, 9.8577e-01],
[5.3938e-01, 2.3049e-04]])
tensor([[7.7492e-01, 6.4422e-01, 5.3938e-01],
[8.1314e-01, 9.8577e-01, 2.3049e-04]])
tensor([[[0.4003, 0.3144, 0.7292],
[0.0459, 0.5821, 0.7332]]])
tensor([[[0.4003, 0.3144, 0.7292]],
[[0.0459, 0.5821, 0.7332]]])
torch.Size([2, 1, 3])
tensor([[0.4003, 0.3144, 0.7292],
[0.0459, 0.5821, 0.7332]])
torch.Size([2, 3])
tensor([[[[0.4003],
[0.3144],
[0.7292]],
[[0.0459],
[0.5821],
[0.7332]]]])
torch.Size([1, 2, 3, 1])
(tensor([[0.4003, 0.3144, 0.7292]]), tensor([[0.0459, 0.5821, 0.7332]]))
tensor([[[0.0459, 0.5821, 0.7332],
[0.4003, 0.3144, 0.7292]]])
tensor([[[0.7292, 0.3144, 0.4003],
[0.7332, 0.5821, 0.0459]]])
tensor([[[0.7332, 0.5821, 0.0459],
[0.7292, 0.3144, 0.4003]]])
tensor([[[0.0459, 0.5821, 0.7332]],
[[0.4003, 0.3144, 0.7292]]])
torch.Size([2, 1, 3])
tensor([[[0.0459, 0.5821, 0.7332],
[0.4003, 0.3144, 0.7292]]])
torch.Size([1, 2, 3])
tensor([[[0.4003, 0.3144, 0.7292],
[0.0459, 0.5821, 0.7332]]])
torch.Size([1, 2, 3])
tensor([[[0.4003, 0.3144, 0.7292]],
[[0.0459, 0.5821, 0.7332]]])
torch.Size([2, 1, 3])
张量填充
import torch if __name__ == '__main__': # 定义一个2*3的全10矩阵 a = torch.full((2, 3), 10) print(a)
运行结果
tensor([[10, 10, 10],
[10, 10, 10]])
求导数
import torch from torch.autograd import Variable if __name__ == '__main__': # 等价于 x = Variable(torch.ones(2, 2), requires_grad=True) # requires_grad=True表示加入到反向传播图中参与计算 x = torch.ones(2, 2, requires_grad=True) y = x + 2 z = y**2 * 3 # 对x求导 # 等价于 torch.autograd.backward(z, grad_tensors=torch.ones(2, 2)) z.backward(torch.ones(2, 2)) print(x.grad) print(y.grad) print(x.grad_fn) print(y.grad_fn) print(z.grad_fn)
运行结果
tensor([[18., 18.],
[18., 18.]])
None
None
<AddBackward0 object at 0x7fad8f1b0c50>
<MulBackward0 object at 0x7fad8f1b0c50>
这里是一个复合函数求导,z'(x)=6y*(x+2)'=6y*1=6(1+2)=18,由于有4个1,所以有4个18。关于导数的计算方法可以参考高等数学整理 一元函数的导数与微分
目前Variable已经和Tensor合并。每个tensor通过requires_grad来设置是否计算梯度。用来冻结某些层的参数,比如预训练网络的参数不做更新,只训练后面的网络参数。又或者多任务网络中,我们只对某个分支进行计算参数,而其他分支的参数进行冻结。
- 关于Autograd的几个概念
1、叶子张量(leaf)
上图中表示了在PyTorch中想要完成的一组计算,在这个图中X称之为叶子张量,X也是一个叶子节点。在PyTorch中想要计算梯度,必须满足的一个要求就是当前的节点属于叶子节点,只有是一个叶子张量,我们才能计算它的梯度。如果我们打印Y的梯度,返回的是None。
2、grad VS grad_fn
- grad:该Tensor的梯度值,每次在计算backward时都需要将前一时刻的梯度归零,否则梯度值会一直累加。
- grad_fn:叶子节点通常为None,只有结果节点的grad_fn才有效,用于指示梯度函数是哪种类型.
3、backward函数
torch.autograd.backward(tensors,grad_tensors=None,retain_graph=None,create_graph=False)
- tensor:用于计算梯度的tensor,torch.autograd.backward(z)==z.backward(),这里指的的是常量的情况下,如果是变量的话需要指定grad_tensors
- grad_tensors:在计算矩阵的梯度时会用到。它其实也是一个tensor,shape一般需要和前面的tensor保持一致。
- retain_graph:通常在调用一次backward后,pytorch会自动把计算图销毁,所以想要对某个变量重复调用backward,则需要将该参数设置为True。
- create_graph:如果为True,那么就创建一个专门的graph of the derivative,这可以方便计算高阶微分。
-
def grad( outputs: _TensorOrTensors, inputs: _TensorOrTensors, grad_outputs: Optional[_TensorOrTensors] = None, retain_graph: Optional[bool] = None, create_graph: bool = False, only_inputs: bool = True, allow_unused: bool = False )
计算和返回outputs关于inputs的梯度的和。outputs:函数的因变量,即需要求导的那个函数。inputs:函数的自变量。grad_outputs:同backward。only_inputs:只计算inputs的梯度。allow_unused(bool,可选):如果为False,当计算输出出错时(因此它们的梯度永远是0)指明不使用的inputs。
4、torch.autograd包中的其他函数
- torch.autograd.enable_grad:启动梯度计算的上下文管理器
- torch.autograd.no_grad:禁止梯度计算的上下文管理器
- torch.autograd.set_grad_enabled(mode):设置是否进行梯度计算的上下文管理器。
Autograd中的function
import torch class line(torch.autograd.Function): @staticmethod def forward(ctx, w, x, b): ''' 前向运算 :param ctx: 上下文管理器 :param w: :param x: :param b: :return: ''' # y = wx + b ctx.save_for_backward(w, x, b) return w * x + b @staticmethod def backward(ctx, grad_out): ''' 反向传播 :param ctx: 上下文管理器 :param grad_out: 上一级梯度 :return: ''' w, x, b = ctx.saved_tensors # w的偏导数 grad_w = grad_out * x # x的偏导数 grad_x = grad_out * w # b的偏导数 grad_b = grad_out return grad_w, grad_x, grad_b if __name__ == '__main__': w = torch.rand(2, 2, requires_grad=True) x = torch.rand(2, 2, requires_grad=True) b = torch.rand(2, 2, requires_grad=True) out = line.apply(w, x, b) out.backward(torch.ones(2, 2)) print(w, x, b) print(w.grad, x.grad, b.grad)
运行结果
tensor([[0.7784, 0.2882],
[0.7826, 0.8178]], requires_grad=True) tensor([[0.4062, 0.4722],
[0.7921, 0.9470]], requires_grad=True) tensor([[0.7012, 0.9489],
[0.2466, 0.1548]], requires_grad=True)
tensor([[0.4062, 0.4722],
[0.7921, 0.9470]]) tensor([[0.7784, 0.2882],
[0.7826, 0.8178]]) tensor([[1., 1.],
[1., 1.]])
这是一个对线性函数求偏导数的过程,通过结果我们可以看到w的偏导数是x,x的偏导数是w,b的偏导数是1。
- 每一个原始的自动求导运算实际上是两个在Tensor上运行的函数
- forward函数计算从输入Tensors获得的输出Tensors
- backward函数接收输出Tensors对于某个标量值的梯度,并且计算输入Tensors相对于该相同标量值的梯度
- 最后,利用apply方法执行相应的运算,该方法是定义在Function类的父类_FunctionBase中的一个方法
非0值填充
这个是相对于普通padding而言的。
import torch if __name__ == '__main__': a = torch.arange(9, dtype=torch.float).reshape((1, 3, 3)) print(a) m = torch.nn.ReflectionPad2d(1) # 在a的周边填充非0值 out = m(a) print(out)
运行结果
tensor([[[0., 1., 2.],
[3., 4., 5.],
[6., 7., 8.]]])
tensor([[[4., 3., 4., 5., 4.],
[1., 0., 1., 2., 1.],
[4., 3., 4., 5., 4.],
[7., 6., 7., 8., 7.],
[4., 3., 4., 5., 4.]]])
神经网络的搭建
有关神经网络的内容可以参考Tensorflow深度学习算法整理 ,这里不再赘述。
波士顿房价预测
我们先来看一下数据
import numpy as np import torch from sklearn import datasets if __name__ == '__main__': boston = datasets.load_boston() X = torch.from_numpy(boston.data) y = torch.from_numpy(boston.target) y = torch.unsqueeze(y, -1) data = torch.cat((X, y), dim=-1) print(data) print(data.shape)
运行结果
tensor([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 3.9690e+02, 4.9800e+00,
2.4000e+01],
[2.7310e-02, 0.0000e+00, 7.0700e+00, ..., 3.9690e+02, 9.1400e+00,
2.1600e+01],
[2.7290e-02, 0.0000e+00, 7.0700e+00, ..., 3.9283e+02, 4.0300e+00,
3.4700e+01],
...,
[6.0760e-02, 0.0000e+00, 1.1930e+01, ..., 3.9690e+02, 5.6400e+00,
2.3900e+01],
[1.0959e-01, 0.0000e+00, 1.1930e+01, ..., 3.9345e+02, 6.4800e+00,
2.2000e+01],
[4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 3.9690e+02, 7.8800e+00,
1.1900e+01]], dtype=torch.float64)
torch.Size([506, 14])
import torch from sklearn import datasets if __name__ == '__main__': boston = datasets.load_boston() X = torch.from_numpy(boston.data) y = torch.from_numpy(boston.target) y = torch.unsqueeze(y, -1) data = torch.cat((X, y), dim=-1) print(data) print(data.shape) y = torch.squeeze(y) X_train = X[:496] y_train = y[:496] X_test = X[496:] y_test = y[496:] class Net(torch.nn.Module): def __init__(self, n_feature, n_output): super(Net, self).__init__() self.hidden = torch.nn.Linear(n_feature, 100) self.predict = torch.nn.Linear(100, n_output) def forward(self, x): out = self.hidden(x) out = torch.relu(out) out = self.predict(out) return out net = Net(13, 1) loss_func = torch.nn.MSELoss() optimizer = torch.optim.Adam(net.parameters(), lr=0.01) for i in range(10000): pred = net.forward(X_train.float()) pred = torch.squeeze(pred) loss = loss_func(pred, y_train.float()) * 0.001 optimizer.zero_grad() loss.backward() optimizer.step() print("item:{},loss:{}".format(i, loss)) print(pred[:10]) print(y_train[:10]) pred = net.forward(X_test.float()) pred = torch.squeeze(pred) loss_test = loss_func(pred, y_test.float()) * 0.001 print("item:{},loss_test:{}".format(i, loss_test)) print(pred[:10]) print(y_test[:10])
运行结果(最终训练结果)
item:9999,loss:0.0034487966913729906
tensor([26.7165, 22.6610, 33.0955, 34.6687, 36.8087, 29.3654, 22.9609, 20.9920,
17.1832, 20.8744], grad_fn=<SliceBackward0>)
tensor([24.0000, 21.6000, 34.7000, 33.4000, 36.2000, 28.7000, 22.9000, 27.1000,
16.5000, 18.9000], dtype=torch.float64)
item:9999,loss_test:0.007662008982151747
tensor([14.5801, 18.2911, 21.3332, 16.9826, 19.6432, 21.8298, 18.5557, 23.6807,
22.3610, 18.0118], grad_fn=<SliceBackward0>)
tensor([19.7000, 18.3000, 21.2000, 17.5000, 16.8000, 22.4000, 20.6000, 23.9000,
22.0000, 11.9000], dtype=torch.float64)
手写数字识别
import torch import torchvision.datasets as dataset import torchvision.transforms as transforms import torch.utils.data as data_utils if __name__ == '__main__': train_data = dataset.MNIST(root='mnist', train=True, transform=transforms.ToTensor(), download=True) test_data = dataset.MNIST(root='mnist', train=False, transform=transforms.ToTensor(), download=False) train_loader = data_utils.DataLoader(dataset=train_data, batch_size=64, shuffle=True) test_loader = data_utils.DataLoader(dataset=test_data, batch_size=64, shuffle=True) class CNN(torch.nn.Module): def __init__(self): super(CNN, self).__init__() self.conv = torch.nn.Sequential( torch.nn.Conv2d(1, 32, kernel_size=5, padding=2), torch.nn.BatchNorm2d(32), torch.nn.ReLU(), torch.nn.MaxPool2d(2) ) self.fc = torch.nn.Linear(14 * 14 * 32, 10) def forward(self, x): out = self.conv(x) out = out.view(out.size()[0], -1) out = self.fc(out) return out cnn = CNN() loss_func = torch.nn.CrossEntropyLoss() optimizer = torch.optim.Adam(cnn.parameters(), lr=0.01) for epoch in range(10): for i, (images, labels) in enumerate(train_loader): outputs = cnn(images) loss = loss_func(outputs, labels) optimizer.zero_grad() loss.backward() optimizer.step() print("epoch is {}, ite is {}/{}, loss is {}".format(epoch + 1, i, len(train_data) // 64, loss.item())) loss_test = 0 accuracy = 0 for i, (images, labels) in enumerate(test_loader): outputs = cnn(images) loss_test += loss_func(outputs, labels) _, pred = outputs.max(1) accuracy += (pred == labels).sum().item() accuracy = accuracy / len(test_data) loss_test = loss_test / (len(test_data) // 64) print("epoch is {}, accuracy is {}, loss test is {}".format(epoch + 1, accuracy, loss_test.item()))
运行结果
epoch is 1, ite is 937/937, loss is 0.08334837108850479
epoch is 1, accuracy is 0.9814, loss test is 0.06306721270084381
epoch is 2, ite is 937/937, loss is 0.08257070928812027
epoch is 2, accuracy is 0.9824, loss test is 0.05769834667444229
epoch is 3, ite is 937/937, loss is 0.02539072372019291
epoch is 3, accuracy is 0.9823, loss test is 0.05558949336409569
epoch is 4, ite is 937/937, loss is 0.014101949520409107
epoch is 4, accuracy is 0.982, loss test is 0.05912528932094574
epoch is 5, ite is 937/937, loss is 0.0016860843170434237
epoch is 5, accuracy is 0.9835, loss test is 0.05862809345126152
epoch is 6, ite is 937/937, loss is 0.04285441339015961
epoch is 6, accuracy is 0.9817, loss test is 0.06716518104076385
epoch is 7, ite is 937/937, loss is 0.0026565147563815117
epoch is 7, accuracy is 0.9831, loss test is 0.05950026586651802
epoch is 8, ite is 937/937, loss is 0.02730828896164894
epoch is 8, accuracy is 0.9824, loss test is 0.058563172817230225
epoch is 9, ite is 937/937, loss is 0.00010762683814391494
epoch is 9, accuracy is 0.9828, loss test is 0.0673145055770874
epoch is 10, ite is 937/937, loss is 0.0021532117389142513
epoch is 10, accuracy is 0.9852, loss test is 0.0562417209148407
Cifar10图像分类
- VggNet网络结构
VggNet是一个标准的串联网络结构,网络的深度不宜太深,否则会造成梯度消失
import torch import torch.nn.functional as F import torchvision.datasets as dataset import torchvision.transforms as transforms import torch.utils.data as data_utils if __name__ == '__main__': train_data = dataset.CIFAR10(root='cifa', train=True, transform=transforms.ToTensor(), download=True) test_data = dataset.CIFAR10(root='cifa', train=False, transform=transforms.ToTensor(), download=False) train_loader = data_utils.DataLoader(dataset=train_data, batch_size=64, shuffle=True) test_loader = data_utils.DataLoader(dataset=test_data, batch_size=64, shuffle=True) class VGGbase(torch.nn.Module): def __init__(self): super(VGGbase, self).__init__() self.conv1 = torch.nn.Sequential( torch.nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1), torch.nn.BatchNorm2d(64), torch.nn.ReLU() ) self.max_pooling1 = torch.nn.MaxPool2d(kernel_size=2, stride=2) self.conv2_1 = torch.nn.Sequential( torch.nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1), torch.nn.BatchNorm2d(128), torch.nn.ReLU() ) self.conv2_2 = torch.nn.Sequential( torch.nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1), torch.nn.BatchNorm2d(128), torch.nn.ReLU() ) self.max_pooling2 = torch.nn.MaxPool2d(kernel_size=2, stride=2) self.conv3_1 = torch.nn.Sequential( torch.nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1), torch.nn.BatchNorm2d(256), torch.nn.ReLU() ) self.conv3_2 = torch.nn.Sequential( torch.nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1), torch.nn.BatchNorm2d(256), torch.nn.ReLU() ) self.max_pooling3 = torch.nn.MaxPool2d(kernel_size=2, stride=2, padding=1) self.conv4_1 = torch.nn.Sequential( torch.nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=1), torch.nn.BatchNorm2d(512), torch.nn.ReLU() ) self.conv4_2 = torch.nn.Sequential( torch.nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1), torch.nn.BatchNorm2d(512), torch.nn.ReLU() ) self.max_pooling4 = torch.nn.MaxPool2d(kernel_size=2, stride=2) self.fc = torch.nn.Linear(512 * 4, 10) def forward(self, x): batchsize = x.size()[0] out = self.conv1(x) out = self.max_pooling1(out) out = self.conv2_1(out) out = self.conv2_2(out) out = self.max_pooling2(out) out = self.conv3_1(out) out = self.conv3_2(out) out = self.max_pooling3(out) out = self.conv4_1(out) out = self.conv4_2(out) out = self.max_pooling4(out) out = out.view(batchsize, -1) out = self.fc(out) out = F.log_softmax(out, dim=1) return out device = torch.device("cuda" if torch.cuda.is_available() else "cpu") epoch_num = 200 lr = 0.01 net = VGGbase().to(device) loss_func = torch.nn.CrossEntropyLoss() optimizer = torch.optim.Adam(net.parameters(), lr=lr) # 每隔5个epoch进行指数衰减,变成上一次学习率的0.9倍 scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.9) for epoch in range(epoch_num): net.train() for i, (images, labels) in enumerate(train_loader): images, labels = images.to(device), labels.to(device) outputs = net(images) loss = loss_func(outputs, labels) optimizer.zero_grad() loss.backward() optimizer.step() _, pred = torch.max(outputs, dim=1) correct = pred.eq(labels.data).cpu().sum() print("epoch is ", epoch, "step ", i, "loss is: ", loss.item(), "mini-batch correct is: ", (100.0 * correct / 64).item())
运行结果(部分)
epoch is 32 step 605 loss is: 0.010804953053593636 mini-batch correct is: 100.0
epoch is 32 step 606 loss is: 0.02166593447327614 mini-batch correct is: 98.4375
epoch is 32 step 607 loss is: 0.1924218237400055 mini-batch correct is: 95.3125
epoch is 32 step 608 loss is: 0.04531310871243477 mini-batch correct is: 96.875
epoch is 32 step 609 loss is: 0.03866473212838173 mini-batch correct is: 98.4375
epoch is 32 step 610 loss is: 0.0039138575084507465 mini-batch correct is: 100.0
epoch is 32 step 611 loss is: 0.009379544295370579 mini-batch correct is: 100.0
epoch is 32 step 612 loss is: 0.2707091271877289 mini-batch correct is: 93.75
epoch is 32 step 613 loss is: 0.016424348577857018 mini-batch correct is: 100.0
epoch is 32 step 614 loss is: 0.001230329042300582 mini-batch correct is: 100.0
epoch is 32 step 615 loss is: 0.013688713312149048 mini-batch correct is: 100.0
epoch is 32 step 616 loss is: 0.0062867505475878716 mini-batch correct is: 100.0
epoch is 32 step 617 loss is: 0.005267560016363859 mini-batch correct is: 100.0
- ResNet网络结构
ResNet是一个级联+串联的网络结构,它的网络可以搭的很深,不会造成梯度消失。
import torch import torch.nn.functional as F import torchvision.datasets as dataset import torchvision.transforms as transforms import torch.utils.data as data_utils if __name__ == '__main__': train_data = dataset.CIFAR10(root='cifa', train=True, transform=transforms.ToTensor(), download=True) test_data = dataset.CIFAR10(root='cifa', train=False, transform=transforms.ToTensor(), download=False) train_loader = data_utils.DataLoader(dataset=train_data, batch_size=64, shuffle=True) test_loader = data_utils.DataLoader(dataset=test_data, batch_size=64, shuffle=True) class ResBlock(torch.nn.Module): def __init__(self, in_channel, out_channel, stride=1): super(ResBlock, self).__init__() self.layer = torch.nn.Sequential( torch.nn.Conv2d(in_channel, out_channel, kernel_size=3, stride=stride, padding=1), torch.nn.BatchNorm2d(out_channel), torch.nn.ReLU(), torch.nn.Conv2d(out_channel, out_channel, kernel_size=3, stride=1, padding=1), torch.nn.BatchNorm2d(out_channel) ) self.shortcut = torch.nn.Sequential() if in_channel != out_channel or stride > 1: self.shortcut = torch.nn.Sequential( torch.nn.Conv2d(in_channel, out_channel, kernel_size=3, stride=stride, padding=1), torch.nn.BatchNorm2d(out_channel) ) def forward(self, x): out1 = self.layer(x) out2 = self.shortcut(x) out = out1 + out2 out = F.relu(out) return out class ResNet(torch.nn.Module): def make_layer(self, block, out_channel, stride, num_block): layers_list = [] for i in range(num_block): if i == 0: in_stride = stride else: in_stride = 1 layers_list.append(block(self.in_channel, out_channel, in_stride)) self.in_channel = out_channel return torch.nn.Sequential(*layers_list) def __init__(self): super(ResNet, self).__init__() self.in_channel = 32 self.conv1 = torch.nn.Sequential( torch.nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1), torch.nn.BatchNorm2d(32), torch.nn.ReLU() ) self.layer1 = self.make_layer(ResBlock, 64, 2, 2) self.layer2 = self.make_layer(ResBlock, 128, 2, 2) self.layer3 = self.make_layer(ResBlock, 256, 2, 2) self.layer4 = self.make_layer(ResBlock, 512, 2, 2) self.fc = torch.nn.Linear(512, 10) def forward(self, x): out = self.conv1(x) out = self.layer1(out) out = self.layer2(out) out = self.layer3(out) out = self.layer4(out) out = F.avg_pool2d(out, 2) out = out.view(out.size()[0], -1) out = self.fc(out) out = F.log_softmax(out, dim=1) return out device = torch.device("cuda" if torch.cuda.is_available() else "cpu") epoch_num = 200 lr = 0.01 net = ResNet().to(device) loss_func = torch.nn.CrossEntropyLoss() optimizer = torch.optim.Adam(net.parameters(), lr=lr) # 每隔5个epoch进行指数衰减,变成上一次学习率的0.9倍 scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.9) for epoch in range(epoch_num): net.train() for i, (images, labels) in enumerate(train_loader): images, labels = images.to(device), labels.to(device) outputs = net(images) loss = loss_func(outputs, labels) optimizer.zero_grad() loss.backward() optimizer.step() _, pred = torch.max(outputs, dim=1) correct = pred.eq(labels.data).cpu().sum() print("epoch is ", epoch, "step ", i, "loss is: ", loss.item(), "mini-batch correct is: ", (100.0 * correct / 64).item())
- MobileNet网络结构
有关MobileNet的详细说明可以参考Tensorflow深度学习算法整理 中的MobileNet,MobileNet能够通过分组卷积和1*1卷积结合代替掉一个标准的卷积单元,进而压缩计算量和参数量。MobileNet能够实现更轻量型的网络结构。
import torch import torch.nn.functional as F import torchvision.datasets as dataset import torchvision.transforms as transforms import torch.utils.data as data_utils if __name__ == '__main__': train_data = dataset.CIFAR10(root='cifa', train=True, transform=transforms.ToTensor(), download=True) test_data = dataset.CIFAR10(root='cifa', train=False, transform=transforms.ToTensor(), download=False) train_loader = data_utils.DataLoader(dataset=train_data, batch_size=64, shuffle=True) test_loader = data_utils.DataLoader(dataset=test_data, batch_size=64, shuffle=True) class MobileNet(torch.nn.Module): def conv_dw(self, in_channel, out_channel, stride): return torch.nn.Sequential( # 深度可分离卷积 torch.nn.Conv2d(in_channel, in_channel, kernel_size=3, stride=stride, padding=1, groups=in_channel, bias=False), torch.nn.BatchNorm2d(in_channel), torch.nn.ReLU(), torch.nn.Conv2d(in_channel, out_channel, kernel_size=1, stride=1, padding=0, bias=False), torch.nn.BatchNorm2d(out_channel), torch.nn.ReLU() ) def __init__(self): super(MobileNet, self).__init__() self.conv1 = torch.nn.Sequential( torch.nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1), torch.nn.BatchNorm2d(32), torch.nn.ReLU() ) self.conv_dw2 = self.conv_dw(32, 32, 1) self.conv_dw3 = self.conv_dw(32, 64, 2) self.conv_dw4 = self.conv_dw(64, 64, 1) self.conv_dw5 = self.conv_dw(64, 128, 2) self.conv_dw6 = self.conv_dw(128, 128, 1) self.conv_dw7 = self.conv_dw(128, 256, 2) self.conv_dw8 = self.conv_dw(256, 256, 1) self.conv_dw9 = self.conv_dw(256, 512, 2) self.fc = torch.nn.Linear(512, 10) def forward(self, x): out = self.conv1(x) out = self.conv_dw2(out) out = self.conv_dw3(out) out = self.conv_dw4(out) out = self.conv_dw5(out) out = self.conv_dw6(out) out = self.conv_dw7(out) out = self.conv_dw8(out) out = self.conv_dw9(out) out = F.avg_pool2d(out, 2) out = out.view(out.size()[0], -1) out = self.fc(out) out = F.log_softmax(out, dim=1) return out device = torch.device("cuda" if torch.cuda.is_available() else "cpu") epoch_num = 200 lr = 0.01 net = MobileNet().to(device) loss_func = torch.nn.CrossEntropyLoss() optimizer = torch.optim.Adam(net.parameters(), lr=lr) # 每隔5个epoch进行指数衰减,变成上一次学习率的0.9倍 scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.9) for epoch in range(epoch_num): net.train() for i, (images, labels) in enumerate(train_loader): images, labels = images.to(device), labels.to(device) outputs = net(images) loss = loss_func(outputs, labels) optimizer.zero_grad() loss.backward() optimizer.step() _, pred = torch.max(outputs, dim=1) correct = pred.eq(labels.data).cpu().sum() print("epoch is ", epoch, "step ", i, "loss is: ", loss.item(), "mini-batch correct is: ", (100.0 * correct / 64).item())
- InceptionNet网络结构
有关InceptionNett的详细说明可以参考Tensorflow深度学习算法整理 中的InceptionNet。InceptionNet是一种并联和串联相结合的网络结构,它可以加宽网络宽度,从而提升网络性能的网络结构。
import torch import torch.nn.functional as F import torchvision.datasets as dataset import torchvision.transforms as transforms import torch.utils.data as data_utils if __name__ == '__main__': train_data = dataset.CIFAR10(root='cifa', train=True, transform=transforms.ToTensor(), download=True) test_data = dataset.CIFAR10(root='cifa', train=False, transform=transforms.ToTensor(), download=False) train_loader = data_utils.DataLoader(dataset=train_data, batch_size=64, shuffle=True) test_loader = data_utils.DataLoader(dataset=test_data, batch_size=64, shuffle=True) class BaseInception(torch.nn.Module): def ConvBNRelu(self, in_channel, out_channel, kernel_size): return torch.nn.Sequential( torch.nn.Conv2d(in_channel, out_channel, kernel_size=kernel_size, stride=1, padding=kernel_size // 2), torch.nn.BatchNorm2d(out_channel), torch.nn.ReLU() ) def __init__(self, in_channel, out_channel_list, reduce_channel_list): super(BaseInception, self).__init__() self.branch1_conv = self.ConvBNRelu(in_channel, out_channel_list[0], 1) self.branch2_conv1 = self.ConvBNRelu(in_channel, reduce_channel_list[0], 1) self.branch2_conv2 = self.ConvBNRelu(reduce_channel_list[0], out_channel_list[1], 3) self.branch3_conv1 = self.ConvBNRelu(in_channel, reduce_channel_list[1], 1) self.branch3_conv2 = self.ConvBNRelu(reduce_channel_list[1], out_channel_list[2], 5) self.branch4_pool = torch.nn.MaxPool2d(kernel_size=3, stride=1, padding=1) self.branch4_conv = self.ConvBNRelu(in_channel, out_channel_list[3], 3) def forward(self, x): out1 = self.branch1_conv(x) out2 = self.branch2_conv1(x) out2 = self.branch2_conv2(out2) out3 = self.branch3_conv1(x) out3 = self.branch3_conv2(out3) out4 = self.branch4_pool(x) out4 = self.branch4_conv(out4) out = torch.cat([out1, out2, out3, out4], dim=1) return out class InceptionNet(torch.nn.Module): def __init__(self): super(InceptionNet, self).__init__() self.block1 = torch.nn.Sequential( torch.nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1), torch.nn.BatchNorm2d(64), torch.nn.ReLU() ) self.block2 = torch.nn.Sequential( torch.nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1), torch.nn.BatchNorm2d(128), torch.nn.ReLU() ) self.block3 = torch.nn.Sequential( BaseInception(in_channel=128, out_channel_list=[64, 64, 64, 64], reduce_channel_list=[16, 16]), torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1) ) self.block4 = torch.nn.Sequential( BaseInception(in_channel=256, out_channel_list=[96, 96, 96, 96], reduce_channel_list=[32, 32]), torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1) ) self.fc = torch.nn.Linear(1536, 10) def forward(self, x): out = self.block1(x) out = self.block2(out) out = self.block3(out) out = self.block4(out) out = F.avg_pool2d(out, 2) out = out.view(out.size()[0], -1) out = self.fc(out) out = F.log_softmax(out, dim=1) return out device = torch.device("cuda" if torch.cuda.is_available() else "cpu") epoch_num = 200 lr = 0.01 net = InceptionNet().to(device) loss_func = torch.nn.CrossEntropyLoss() optimizer = torch.optim.Adam(net.parameters(), lr=lr) # 每隔5个epoch进行指数衰减,变成上一次学习率的0.9倍 scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.9) for epoch in range(epoch_num): net.train() for i, (images, labels) in enumerate(train_loader): images, labels = images.to(device), labels.to(device) outputs = net(images) loss = loss_func(outputs, labels) optimizer.zero_grad() loss.backward() optimizer.step() _, pred = torch.max(outputs, dim=1) correct = pred.eq(labels.data).cpu().sum() print("epoch is ", epoch, "step ", i, "loss is: ", loss.item(), "mini-batch correct is: ", (100.0 * correct / 64).item())
- 使用预训练模型
这里我们使用ResNet18为例,使用预训练模型,我们不需要自己去搭建网络模型,在不需要对模型进行剪枝的情况下使用预训练模型可以增加我们的开发速度。
import torch import torchvision.datasets as dataset import torchvision.transforms as transforms import torch.utils.data as data_utils from torchvision import models if __name__ == '__main__': train_data = dataset.CIFAR10(root='cifa', train=True, transform=transforms.ToTensor(), download=True) test_data = dataset.CIFAR10(root='cifa', train=False, transform=transforms.ToTensor(), download=False) train_loader = data_utils.DataLoader(dataset=train_data, batch_size=64, shuffle=True) test_loader = data_utils.DataLoader(dataset=test_data, batch_size=64, shuffle=True) class ResNet18(torch.nn.Module): def __init__(self): super(ResNet18, self).__init__() # 加载预训练模型 self.model = models.resnet18(pretrained=True) self.num_features = self.model.fc.in_features self.model.fc = torch.nn.Linear(self.num_features, 10) def forward(self, x): out = self.model(x) return out device = torch.device("cuda" if torch.cuda.is_available() else "cpu") epoch_num = 200 lr = 0.01 net = ResNet18().to(device) loss_func = torch.nn.CrossEntropyLoss() optimizer = torch.optim.Adam(net.parameters(), lr=lr) # 每隔5个epoch进行指数衰减,变成上一次学习率的0.9倍 scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.9) for epoch in range(epoch_num): net.train() for i, (images, labels) in enumerate(train_loader): images, labels = images.to(device), labels.to(device) outputs = net(images) loss = loss_func(outputs, labels) optimizer.zero_grad() loss.backward() optimizer.step() _, pred = torch.max(outputs, dim=1) correct = pred.eq(labels.data).cpu().sum() print("epoch is ", epoch, "step ", i, "loss is: ", loss.item(), "mini-batch correct is: ", (100.0 * correct / 64).item())
图像增强
这是进行图像进入神经网络之前进行预处理的方法,叫做图像增强。上图中最左边的图片为原始图片,通过这张图片来产生右边的四张图片。一方面可以扩充我们的数据集,将数据集扩大4倍。另一方面这些变换后的图片可能在场景中会出现,但是在原始数据集中并没有。我们将这些变化后的数据送给我们的模型进行学习之后,模型应对现实场景下各种各样不一样的图片,就会更加的鲁棒。
上图中是一些常用的方法,整个图片增强的方法其实非常多。最左边的是原始图片,Rotation表示随机的旋转,随机旋转的意义在于在拍摄图片的时候,相机可能是斜的,如果把原始图片旋转一下送给模型,那么模型就能应对这些倾斜的图片,依然能够准确识别图片中的内容。Blur为模糊,模糊的意义在于拍摄图片的时候可能是摄像头上有雾气时拍摄的图片,为了让模型能够对这些图片依然鲁棒的时候,应该做这种操作。Contrast为对比度的随机调节,意义在不同的人对不同的对比度有不同的喜好,有些人可能希望艳丽一点,有些人喜欢暗一点。Scaling表示不同距离时的图片,其意义在于能够让模型能够处理不同远近的图片。Illumination表示曝光,在不同的光照条件下,让模型能够识别。Projective为透视变换,表示不同角度拍摄的图片的样子,通过透视变换能够来扭曲图片空间的位置来模拟我们站在不同角度拍摄图片的不同的样子,使得模型也能够应对这些图片的样子。
import torchvision.transforms as transforms from PIL import Image if __name__ == '__main__': trans = transforms.Compose([ transforms.ToTensor(), # 归一化,转成float32 transforms.RandomRotation(45), # 随机的旋转 transforms.RandomAffine(45), # 随机仿射变换 # 标准化,第一个元组表示各个通道(r,g,b)的均值,第二个元组表示各个通道(r,g,b)的方差 transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)) ]) unloader = transforms.ToPILImage() image = Image.open("/Users/admin/Documents/444.jpeg") print(image) image.show() image_out = trans(image) image = unloader(image_out) print(image_out.size()) image.show()
运行结果
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1896x1279 at 0x7F801385F750>
torch.Size([3, 1279, 1896])
GAN网络
GAN的主要内容可以参考Tensorflow深度学习算法整理(三)
这里我们会实现一个CycleGAN。数据集下载地址:https://people.eecs.berkeley.edu/~taesung_park/CycleGAN/datasets/这里我们使用apple2orange.zip这个数据集来进行训练。
首先是数据集的读取
import glob import random from torch.utils.data import Dataset, DataLoader from PIL import Image import torchvision.transforms as transforms import os class ImageDataset(Dataset): def __init__(self, root='', transform=None, model='train'): self.transform = transforms.Compose(transform) self.pathA = os.path.join(root, model + "A/*") self.pathB = os.path.join(root, model + "B/*") self.list_A = glob.glob(self.pathA) self.list_B = glob.glob(self.pathB) def __getitem__(self, index): im_pathA = self.list_A[index % len(self.list_A)] im_pathB = random.choice(self.list_B) im_A = Image.open(im_pathA) im_B = Image.open(im_pathB) item_A = self.transform(im_A) item_B = self.transform(im_B) return {"A": item_A, "B": item_B} def __len__(self): return max(len(self.list_A), len(self.list_B)) if __name__ == '__main__': root = "/Users/admin/Downloads/apple2orange" transform_ = [transforms.Resize(256, Image.BILINEAR), transforms.ToTensor()] dataloader = DataLoader(dataset=ImageDataset(root, transform_, 'train'), batch_size=1, shuffle=True, num_workers=1) for i, batch in enumerate(dataloader): print(i) print(batch)
生成器和判别器的模型
import torch import torch.nn.functional as F class ResBlock(torch.nn.Module): def __init__(self, in_channel): super(ResBlock, self).__init__() self.conv_block = torch.nn.Sequential( # 非0填充周边 torch.nn.ReflectionPad2d(1), torch.nn.Conv2d(in_channel, in_channel, kernel_size=3), # 在一个通道内做归一化 torch.nn.InstanceNorm2d(in_channel), torch.nn.ReLU(inplace=True), torch.nn.ReflectionPad2d(1), torch.nn.Conv2d(in_channel, in_channel, kernel_size=3), torch.nn.InstanceNorm2d(in_channel) ) def forward(self, x): return x + self.conv_block(x) class Generator(torch.nn.Module): '''生成器''' def __init__(self): super(Generator, self).__init__() net = [ torch.nn.ReflectionPad2d(3), torch.nn.Conv2d(3, 64, kernel_size=7), torch.nn.InstanceNorm2d(64), torch.nn.ReLU(inplace=True) ] in_channel = 64 out_channel = in_channel * 2 # 下采样2次 for _ in range(2): net += [ torch.nn.Conv2d(in_channel, out_channel, kernel_size=3, stride=2, padding=1), torch.nn.InstanceNorm2d(out_channel), torch.nn.ReLU(inplace=True) ] in_channel = out_channel out_channel = in_channel * 2 # 做9次残差连接 for _ in range(9): net += [ResBlock(in_channel)] # 上采样2次 out_channel = in_channel // 2 for _ in range(2): net += [ torch.nn.ConvTranspose2d(in_channel, out_channel, kernel_size=3, stride=2, padding=1, output_padding=1), torch.nn.InstanceNorm2d(out_channel), torch.nn.ReLU(inplace=True) ] in_channel = out_channel out_channel = in_channel // 2 # 输出 net += [ torch.nn.ReflectionPad2d(3), torch.nn.Conv2d(in_channel, 3, kernel_size=7), torch.nn.Tanh() ] self.model = torch.nn.Sequential(*net) def forward(self, x): return self.model(x) class Discriminator(torch.nn.Module): '''判别器''' def __init__(self): super(Discriminator, self).__init__() model = [ torch.nn.Conv2d(3, 64, kernel_size=4, stride=2, padding=1), torch.nn.LeakyReLU(0.2, inplace=True) ] model += [ torch.nn.Conv2d(64, 128, kernel_size=4, stride=2, padding=1), torch.nn.InstanceNorm2d(128), torch.nn.LeakyReLU(0.2, inplace=True) ] model += [ torch.nn.Conv2d(128, 256, kernel_size=4, stride=2, padding=1), torch.nn.InstanceNorm2d(256), torch.nn.LeakyReLU(0.2, inplace=True) ] model += [ torch.nn.Conv2d(256, 512, kernel_size=4, stride=2, padding=1), torch.nn.InstanceNorm2d(512), torch.nn.LeakyReLU(0.2, inplace=True) ] model += [torch.nn.Conv2d(512, 1, kernel_size=4, padding=1)] self.model = torch.nn.Sequential(*model) def forward(self, x): x = self.model(x) return F.avg_pool2d(x, x.size()[2:]).view(x.size()[0], -1) if __name__ == '__main__': G = Generator() D = Discriminator() input_tensor = torch.ones((1, 3, 256, 256), dtype=torch.float) out = G(input_tensor) print(out.size()) out = D(input_tensor) print(out.size())
运行结果
torch.Size([1, 3, 256, 256])
torch.Size([1, 1])
工具类和方法
import random import torch import numpy as np def tensor2image(tensor): image = 127.5 * (tensor[0].cpu().float().numpy() + 1.0) if image.shape[0] == 1: image = np.tile(image, (3, 1, 1)) return image.astype(np.uint8) class ReplayBuffer(): def __init__(self, max_size=50): assert (max_size > 0), "Empty buffer or trying to create a black hole. Be careful." self.max_size = max_size self.data = [] def push_and_pop(self, data): to_return = [] for element in data.data: element = torch.unsqueeze(element, 0) if len(self.data) < self.max_size: self.data.append(element) to_return.append(element) else: if random.uniform(0, 1) > 0.5: i = random.randint(0, self.max_size - 1) to_return.append(self.data[i].clone()) self.data[i] = element else: to_return.append(element) return torch.cat(to_return) class LambdaLR(): def __init__(self, n_epochs, offset, decay_start_epoch): assert ((n_epochs - decay_start_epoch) > 0), "Decay must start before the training session ends!" self.n_epochs = n_epochs self.offset = offset self.decay_start_epoch = decay_start_epoch def step(self, epoch): return 1.0 - max(0, epoch + self.offset - self.decay_start_epoch) / (self.n_epochs - self.decay_start_epoch) def weights_init_normal(m): classname = m.__class__.__name__ if classname.find('Conv') != -1: torch.nn.init.normal(m.weight.data, 0.0, 0.02) elif classname.find('BatchNorm2d') != -1: torch.nn.init.normal(m.weight.data, 1.0, 0.02) torch.nn.init.constant(m.bias.data, 0.0)
模型训练
import torchvision.transforms as transforms from torch.utils.data import DataLoader from PIL import Image import torch from pytorch.gan.models import Generator, Discriminator from pytorch.gan.utils import ReplayBuffer, LambdaLR, weights_init_normal from pytorch.gan.dataset import ImageDataset import itertools import tensorboardX import os if __name__ == '__main__': os.environ["OMP_NUM_THREADS"] = "1" device = torch.device("cuda" if torch.cuda.is_available() else "cpu") batchsize = 1 size = 256 lr = 0.0002 n_epoch = 200 epoch = 0 decay_epoch = 100 netG_A2B = Generator().to(device) netG_B2A = Generator().to(device) netD_A = Discriminator().to(device) netD_B = Discriminator().to(device) loss_GAN = torch.nn.MSELoss() loss_Cycle = torch.nn.L1Loss() loss_identity = torch.nn.L1Loss() opt_G = torch.optim.Adam(itertools.chain(netG_A2B.parameters(), netG_B2A.parameters()), lr=lr, betas=(0.5, 0.9999)) opt_DA = torch.optim.Adam(netD_A.parameters(), lr=lr, betas=(0.5, 0.9999)) opt_DB = torch.optim.Adam(netD_B.parameters(), lr=lr, betas=(0.5, 0.9999)) lr_scheduler_G = torch.optim.lr_scheduler.LambdaLR(opt_G, lr_lambda=LambdaLR(n_epoch, epoch, decay_epoch).step) lr_scheduler_DA = torch.optim.lr_scheduler.LambdaLR(opt_DA, lr_lambda=LambdaLR(n_epoch, epoch, decay_epoch).step) lr_scheduler_DB = torch.optim.lr_scheduler.LambdaLR(opt_DB, lr_lambda=LambdaLR(n_epoch, epoch, decay_epoch).step) data_root = "/Users/admin/Downloads/apple2orange" input_A = torch.ones([1, 3, size, size], dtype=torch.float).to(device) input_B = torch.ones([1, 3, size, size], dtype=torch.float).to(device) label_real = torch.ones([1], dtype=torch.float, requires_grad=False).to(device) label_fake = torch.zeros([1], dtype=torch.float, requires_grad=False).to(device) fake_A_buffer = ReplayBuffer() fake_B_Buffer = ReplayBuffer() log_path = "logs" writer_log = tensorboardX.SummaryWriter(log_path) transforms_ = [ transforms.Resize(int(256 * 1.12), Image.BICUBIC), transforms.RandomCrop(256), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) ] dataloader = DataLoader(dataset=ImageDataset(data_root, transforms_), batch_size=batchsize, shuffle=True, num_workers=1) step = 0 for epoch in range(n_epoch): for i, batch in enumerate(dataloader): real_A = torch.tensor(input_A.copy_(batch['A']), dtype=torch.float).to(device) real_B = torch.tensor(input_B.copy_(batch['B']), dtype=torch.float).to(device) # 生成器梯度下降 opt_G.zero_grad() same_B = netG_A2B(real_B) loss_identity_B = loss_identity(same_B, real_B) * 5.0 same_A = netG_B2A(real_A) loss_identity_A = loss_identity(same_A, real_A) * 5.0 fake_B = netG_A2B(real_A) pred_fake = netD_B(fake_B) loss_GAN_A2B = loss_GAN(pred_fake, label_real) fake_A = netG_B2A(real_B) pred_fake = netD_A(fake_A) loss_GAN_B2A = loss_GAN(pred_fake, label_real) recovered_A = netG_B2A(fake_B) loss_cycle_ABA = loss_Cycle(recovered_A, real_A) * 10.0 recovered_B = netG_A2B(fake_A) loss_cycle_BAB = loss_Cycle(recovered_B, real_B) * 10.0 loss_G = loss_identity_A + loss_identity_B + loss_GAN_A2B + loss_GAN_B2A + \ loss_cycle_ABA + loss_cycle_BAB loss_G.backward() opt_G.step() # 判别器梯度下降 opt_DA.zero_grad() pred_real = netD_A(real_A) loss_D_real = loss_GAN(pred_real, label_real) fake_A = fake_A_buffer.push_and_pop(fake_A) pred_fake = netD_A(fake_A.detach()) loss_D_fake = loss_GAN(pred_real, label_fake) loss_D_A = (loss_D_real + loss_D_fake) * 0.5 loss_D_A.backward() opt_DA.step() opt_DB.zero_grad() pred_real = netD_B(real_B) loss_D_real = loss_GAN(pred_real, label_real) fake_B = fake_A_buffer.push_and_pop(fake_B) pred_fake = netD_B(fake_B.detach()) loss_D_fake = loss_GAN(pred_real, label_fake) loss_D_B = (loss_D_real + loss_D_fake) * 0.5 loss_D_B.backward() opt_DB.step() print("loss_G:{}, loss_G_identity:{}, loss_G_GAN:{}, " "loss_G_cycle:{}, loss_D_A:{}, loss_D_B:{}".format( loss_G, loss_identity_A + loss_identity_B, loss_GAN_A2B + loss_GAN_B2A, loss_cycle_ABA + loss_cycle_BAB, loss_D_A, loss_D_B )) writer_log.add_scalar("loss_G", loss_G, global_step=step + 1) writer_log.add_scalar("loss_G_identity", loss_identity_A + loss_identity_B, global_step=step + 1) writer_log.add_scalar("loss_G_GAN", loss_GAN_A2B + loss_GAN_B2A, global_step=step + 1) writer_log.add_scalar("loss_G_cycle", loss_cycle_ABA + loss_cycle_BAB, global_step=step + 1) writer_log.add_scalar("loss_D_A", loss_D_A, global_step=step + 1) writer_log.add_scalar("loss_D_B", loss_D_B, global_step=step + 1) step += 1 lr_scheduler_G.step() lr_scheduler_DA.step() lr_scheduler_DB.step() torch.save(netG_A2B.state_dict(), "models/netG_A2B.pth") torch.save(netG_B2A.state_dict(), "models/netG_B2A.pth") torch.save(netD_A.state_dict(), "models/netD_A.pth") torch.save(netD_B.state_dict(), "models/netD_B.pth")
模型开发与部署
AI平台的开发部署一般分为训练平台和部署平台,开发平台一般以英伟达的CUDA为主,部署平台分为服务端部署和终端部署。
- 终端AI推理芯片,指的是一些类电脑的设备,如手机、汽车、摄像头、IoT、众多嵌入式设备
- Nvidia(英伟达):CUDA GPU,面向嵌入式的JETSON
- Intel:Movidius VPU(NCS2)
- Apple:A12处理器(及之后)上的NPU
- 高通:骁龙处理器
- 华为:麒麟处理器(达芬奇架构)
- AI终端前向软件框架
- 桌面级上使用的是PyTorch、Tensorflow
- iOS上使用的是Apple的CoreML、PyTorch库等。
- Android上使用的是TFiit框架、PyTorch库、NCNN库等
- Intel NCS上使用的是Intel的NCSDK
- Nvidia嵌入式设备上使用的TensorRT
- 终端部署PyTorch模型
- PyTorch的C++接口官方包名为LibTorch
- iOS:PyTorch->ONNX->CoreML->iOS
- Android:PyTorch->ONNX->ncnn->android或者PyTorch->ONNX->tensorflow->android
ONNX
Open Neural Network Exchange(开放神经网络交换)格式,是一个用于表示深度学习模型的标准,可使模型在不同框架之间进行转移。支持加载ONNX模型并进行推理的深度学习框架有:Caffe2、PyTorch、MXNet、ML.NET、TensorRT和Microsoft CNTK,TensorFlow非官方支持。
可视化工具:netron
pip install netron
- PyTorch转ONNX
pip install cython protobuf numpy
sudo apt-get install libprotobuf-dev protobuf-compiler
pip install onnx
- 如何正确的导出onnx
- 对于任何用到shape、size返回值的参数时,例如:tensor.view(tensor.size(0),-1)这类操作,避免直接使用tensor.size的返回值,而是加上int转换。tensor.view(int(tensor.size(0)),-1)
- 对于nn.Upsamle或nn.functional.interpolate函数,使用scale_factor指定倍率,而不是使用size参数指定大小
- 对于reshape、view操作时,-1的指定请放到batch维度。其他维度可以计算出来即可。batch维度禁止指定为大于-1的明确数字。
- torch.onnx.export指定dynamic_axes参数,并且只指定batch维度,不指定其他维度。我们只需要动态batch,相对动态的宽高有其他方案
这些做法的必要性体现在,简化过程的复杂度,去掉gather、shape类的节点,很多时候,部分不这么改看似也是可以但是需求复杂后,依旧存在各类问题。
YOLOV5的部署
YOLOV5的github地址:https://github.com/ultralytics/YOLOV5
代码拉取完成后进入yolov5-master文件夹,执行
python export.py --include=onnx
此时我们可以看到文件夹下面多了两个文件——yolov5s.onnx、yolov5s.pt
执行
(base) admindeMBP:yolov5-master admin$ netron
Serving at http://localhost:8080
打开浏览器进入http://127.0.0.1:8080/,选择打开yolov5s.onnx,可以看到该模型的可视化界面
这里我们看一下models文件夹下的yolo.py,由于之前说对于任何用到shape、size返回值的参数时,避免直接使用tensor.size的返回值,而是加上int转换,这里我们修改如下
第53行修改为
bs, _, ny, nx = map(int, x[i].shape)
由于对于reshape、view操作时,-1的指定请放到batch维度。其他维度可以计算出来即可。batch维度禁止指定为大于-1的明确数字。68行修改为
z.append(y.view(-1, int(y.size(1) * y.size(2) * y.size(3)), self.no))
再次执行
python export.py --include=onnx
使用可视化工具打开yolov5s.onnx,此时我们点中reshape节点,可以看见
TensorRT
TensorRT是英伟达的一个终端前向推理框架,是针对nvidia系列硬件进行优化加速,实现最大程度的利用GPU资源,提升推理性能。
框架下载地址:https://github.com/shouxieai/tensorRT_Pro
我们先简单写一个Pytorch模型,然后导出成onnx
import torch import torch.nn as nn class Model(nn.Module): def __init__(self): super(Model, self).__init__() self.conv = nn.Conv2d(1, 1, 3, stride=1, padding=1, bias=True) self.conv.weight.data.fill_(0.3) self.conv.bias.data.fill_(0.2) def forward(self, x): x = self.conv(x) return x.view(x.size(0), -1) if __name__ == '__main__': model = Model().eval() x = torch.full((1, 1, 3, 3), 1.0) y = model(x) torch.onnx.export(model, (x,), "onnx1.onnx", verbose=True)
程序运行后,会得到一个onnx1.onnx的文件,我们使用netron打开可以看到