第11章 CNNs (1)

卷积网络


卷积网络, 也称为卷积神经网 (CNNs) , 是一种特殊的神经网络,用来处理一个已知网格状拓扑的数据。这种网络使用了一种叫做卷积的操作。卷积是一种特殊的线性操作。


卷积操作

f(x)g(x)=f(α)g(xα)dα

f 是输入, g 是核函数

离散卷积

s[t]=(fg)(t)=a=f[a]g[ta]

除了保存数值的有限点,其他地方核函数的值为0

2维离散卷积

如果使用2d图像I 作为输入,那么

s[i,j]=(IK)[i,j]=mnI[m,n]K[im,jn]

=mnI[im,jn]K[m,n]

为什么卷积?

卷积带来3个非常重要的特性

  • sparse interactions
  • parameter sharing
  • equivariant representations

sparse interactions

如果有m 维度的输入和n 维度的输入,那么全连接矩阵需要m×n 个参数,运算时间为O(m×n) 。如果我们限定连接数为k , 那么只需要 k×n 个参数,运算时间为O(k×n)

Parameter sharing

在不只一个函数中使用同一组参数

例如,

s2=αx1+βx2+γx3
s3=αx2+βx3+γx4

其中α,β,γ 就是所谓的Parameter sharing。 如此一来,卷积可以使得内存需求和统计效率更高

equivariance

如果说一个函数是equivariant, 那么当输入改变时,输出也相应改变。 另外,如果 f(g(x))=g(f(x)) 那么称函数 f(x) 与函数 gequivariant


池化

池化函数用附近的输出的综合计算值代替了网络的输出。池化可以使得网络在输入的小范围波动下保持输出不变。当我们对是否存在某种特征比较关心,而不是关心其具体位置,这种不变性就变得尤为重要。例如,人脸识别,判断面部特征,但不关心具体位置。

另外,池化是处理变化输入大小的基础。变化池化的大小,可以下一层得到的数据都是固定大小的,与输入无关。


Convolutional Networks


Convolutional networks, also known as convolutional neural
networks(CNNs) , are a specialized kind of neural network for processing data that has a known, grid-like topology. The name “convolutional neural network” indicates that the network employs a mathematical operation called convolution. Convolution is a specialized kind of linear operation.


The convolution operation

f(x)g(x)=f(α)g(xα)dα

In convolutional network terminology, f is often referred to as the input, and g as the kernel, the output is sometimes referred to as the feature map

Discrete convolution

s[t]=(fg)(t)=a=f[a]g[ta]

Besides the finite set of points for which we store the values, these functions are considered to be zero.

2-d discrete convolution

If we use a 2d image I as input, then

s[i,j]=(IK)[i,j]=mnI[m,n]K[im,jn]

=mnI[im,jn]K[m,n]

Motivation

It brings 3 important ideas that can help improve a machine learning system:

  • sparse interactions
  • parameter sharing
  • equivariant representations

sparse interactions

If there are m inputs and n outputs, then matrix multiplication requires m×n parameters and the algorithms used in practice have O(m×n) runtime(per example). If we limit the number of connections each output may have to k , then the sparsely connected approach requires only k×n parameters and O(k×n) runtime.

Parameter sharing

using the same parameter for more than one function in a model

For example, in the figure,

s2=αx1+βx2+γx3
s3=αx2+βx3+γx4

as we can see, the α,β,γ are the parameters it shares. In this case, convolution can make the memory requirement and statistical efficiency more efficient.

equivariance

To say a function is equivariant means that if the input changes, the output changes in the same way. Specifically, a function f(x) is equivariant to a function g if f(g(x))=g(f(x))


Pooling

A pooling function replaces the output of the net at a certain location with a summary statistic of the nearby outputs. Pooling helps to make the representation become to small invariant translations of the input. Invariance to local translation can be a very useful property if we care more about whether some feature is present than exactly where it is.

Besides, pooling is essential for handling inputs of varying size. Varying the size of and offset between pooling regions makes it possible that the classification layer always receives the same number of summary statistics regardless of the input size.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章