2020-6-10 吴恩达-NN&DL-w3 浅层NN(课后作业)

参考 https://zhuanlan.zhihu.com/p/31270944

1, Which of the following are true? (Check all that apply.) Notice that I only list correct options.

  • X is a matrix in which each column is one training example.
  • a4[2]a^{[2]}_4 is the activation output by the 4th neuron of the 2nd layer
  • a[2](12)a^{[2](12)} denotes the activation vector of the 2nd layer for the 12th training example.
  • a[2]a^{[2]} denotes the activation vector of the 2nd layer.

1, 以下哪个是正确的。

  • X是一个矩阵,其中每个列都是一个训练样本。
  • a4[2]a^{[2]}_4是第二层第四层神经元的激活的输出
  • a[2](12)a^{[2](12)}表示第二层第十二训练样本的激活向量
  • a[2]a^{[2]}表示第二层的激活向量

答案:全对

=================================================================
2,The tanh activation usually works better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data better for the next layer. True/False?

2,对于隐藏单元,激活函数tanh通常比sigmoid函数效果好,因为它输出的平均值接近0,对于下一层来说数据更加中心化。这种说法是否正确?

答案:正确。参见链接

=================================================================
3,Which of these is a correct vectorized implementation of forward propagation for layer l, where 1≤l≤L?

3,对于l层(1<l<L),前向传播的向量化实现,下面哪个是正确的?

  • Z[l]=W[l]A[l]+b[l]Z^{[l]}=W^{[l]}A^{[l]}+b^{[l]}A[l+1]=g[l](Z[l])A^{[l+1]}=g^{[l]}(Z^{[l]})
  • Z[l]=W[l]A[l1]+b[l]Z^{[l]}=W^{[l]}A^{[l−1]}+b^{[l]}A[l]=g[l](Z[l])A^{[l]}=g^{[l]}(Z^{[l]})。正确
  • Z[l]=W[l1]A[l]+b[l1]Z^{[l]}=W^{[l-1]}A^{[l]}+b^{[l-1]}A[l]=g[l](Z[l])A^{[l]}=g^{[l]}(Z^{[l]})
  • Z[l]=W[l]A[l]+b[l]Z^{[l]}=W^{[l]}A^{[l]}+b^{[l]}A[l+1]=g[l+1](Z[l])A^{[l+1]}=g^{[l+1]}(Z^{[l]})

=================================================================
4,You are building a binary classifier for recognizing cucumbers (y=1) vs. watermelons (y=0). Which one of these activation functions would you recommend using for the output layer?

4,你构筑了一个识别黄瓜的二分分类器(黄瓜=1,西瓜=0),对于输出层你推荐使用下面哪个激活函数。

  • ReLU
  • Leaky ReLU
  • sigmoid。正确
  • tanh

=================================================================
5,Consider the following code: 观察下面代码

A = np.random.randn(4,3)
B = np.sum(A, axis = 1, keepdims = True)

What will be B.shape? B的形状是怎么样的?

  • (1, 3)
  • (4, 1)。正确
  • (, 3)
  • (4, )

keepdims参数,代表是否保持其维度信息。将一个4行3列的矩阵横向相加,得到4行1列的矩阵。

=================================================================
6,Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements are True? (Check all that apply)

6,假设你构筑一个NN。你初始化权重W和偏差b为0。下面哪个描述是正确的?

  • Each neuron in the first hidden layer will perform the same computation. So even after multiple iterations of gradient descent each neuron in the layer will be computing the same thing as other neurons.
    第一个隐藏层中的每个神经元节点将执行相同的计算。 所以即使经过多次梯度下降迭代后,该层中的每个神经元节点都会计算出与其他神经元节点相同的东西。 正确,参见链接
  • Each neuron in the first hidden layer will perform the same computation in the first iteration. But after one iteration of gradient descent they will learn to compute different things because we have “broken symmetry”.
    第一个隐藏层中的每个神经元将在第一次迭代中执行相同的计算。 但经过一次梯度下降迭代后,它们将学会计算不同的东西,因为我们已经“破坏了对称性”。
  • Each neuron in the first hidden layer will compute the same thing, but neurons in different layers will compute different things, thus we have accomplished “symmetry breaking” as described in lecture.
    第一个隐藏层中的每一个神经元都会计算出相同的东西,但是不同层的神经元会计算不同的东西,因此我们已经完成了“对称破坏”。
  • The first hidden layer’s neurons will perform different computations from each other even in the first iteration; their parameters will thus keep evolving in their own way.
    即使在第一次迭代中,第一个隐藏层的神经元也会执行不同的计算, 他们的参数将以自己的方式不断发展。

=================================================================
7,Logistic regression’s weights w should be initialized randomly rather than to all zeros, because if you initialize to all zeros, then logistic regression will fail to learn a useful decision boundary because it will fail to “break symmetry”, True/False?

7,逻辑回归权重w采用随机初始化比初始化为0要好。因为如果你初始化为0,那么逻辑回归将无法学习到有用的决策边界,因为它将无法“破坏对称性”,是正确的吗?

答案:错误。

逻辑回归没有隐藏层。 如果将权重初始化为零,则逻辑回归中的第一个样本x将输出零,但逻辑回归的导数取决于不是零的输入x(因为没有隐藏层)。 因此,在第二次迭代中,如果x不是常量向量,则权值遵循x的分布并且彼此不同。

=================================================================
8,You have built a network using the tanh activation for all the hidden units. You initialize the weights to relative large values, using np.random.randn(…,…)*1000. What will happen?

8,你在所有隐藏层上使用tanh函数构筑网络。你用一个大的值初始化权重np.random.randn(..,..)*1000。那么导致什么结果?

  • It doesn’t matter. So long as you initialize the weights randomly gradient descent is not affected by whether the weights are large or small.
    这没关系。只要随机初始化权重,梯度下降不受权重大小的影响。
  • This will cause the inputs of the tanh to also be very large, thus causing gradients to also become large. You therefore have to set α to be very small to prevent divergence; this will slow down learning.
    这将导致tanh的输入也非常大,由此导致梯度也变大。因此,你必须将α设置得非常小以防止发散; 这会减慢学习速度。
  • This will cause the inputs of the tanh to also be very large, causing the units to be “highly activated” and thus speed up learning compared to if the weights had to start from small values.
    这会导致tanh的输入也非常大,导致单位被“高度激活”,从而加快了学习速度,而权重必须从小数值开始。
  • This will cause the inputs of the tanh to also be very large, thus causing gradients to be close to zero. The optimization algorithm will thus become slow.
    这将导致tanh的输入也很大,因此导致梯度接近于零, 优化算法将因此变得缓慢。 正确

如果w很大,根据z=wx+b,z也会很大,那么tanh函数值也会很大,在这些地方,tanh函数的斜率很小,接近与0,会导致梯度下降收敛很慢。所以一般初始化w时会再乘上一个比较小的系数。

=================================================================
9,Consider the following 1 hidden layer neural network:

9,只有1个隐藏层的NN
在这里插入图片描述
以下是正确的答案

  • b[1]b^{[1]} will have shape (4, 1)
  • W[1]W^{[1]} will have shape (4, 2)
  • W[2]W^{[2]} will have shape (1, 4)
  • b[2]b^{[2]} will have shape (1, 1)

=================================================================
10,In the same network as the previous question, what are the dimensions of Z[1]Z^{[1]} and A[1]A^{[1]}?

10,问题9中, Z[1]Z^{[1]}A[1]A^{[1]} 维度是什么?

  • Z[1]Z^{[1]}A[1]A^{[1]} are (4,m)。正确。
  • Z[1]Z^{[1]}A[1]A^{[1]} are (1,4)
  • Z[1]Z^{[1]}A[1]A^{[1]} are (4,1)
  • Z[1]Z^{[1]}A[1]A^{[1]} are (4,2)

关于问题9和10的维度问题,可以参见链接。说明如下

  • Z和A矩阵的维度为Z[l],A[l]:(n[l],m)Z^{[l]},A^{[l]}:(n^{[l]},m)
  • b[l]的维度是 (n[l],1)
  • W[l]的维度是 (n[l],n[l-1])
  • n[l]表示每层的单元数
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章