吳恩達-NN&DL-w3 淺層NN(課後作業)

1, Which of the following are true? (Check all that apply.) Notice that I only list correct options.

  • X is a matrix in which each column is one training example.
  • a4[2]a^{[2]}_4 is the activation output by the 4th neuron of the 2nd layer
  • a[2](12)a^{[2](12)} denotes the activation vector of the 2nd layer for the 12th training example.
  • a[2]a^{[2]} denotes the activation vector of the 2nd layer.

1, 以下哪個是正確的。

  • X是一個矩陣,其中每個列都是一個訓練樣本。
  • a4[2]a^{[2]}_4是第二層第四層神經元的激活的輸出
  • a[2](12)a^{[2](12)}表示第二層第十二訓練樣本的激活向量
  • a[2]a^{[2]}表示第二層的激活向量


2,The tanh activation usually works better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data better for the next layer. True/False?



3,Which of these is a correct vectorized implementation of forward propagation for layer l, where 1≤l≤L?


  • Z[l]=W[l]A[l]+b[l]Z^{[l]}=W^{[l]}A^{[l]}+b^{[l]}A[l+1]=g[l](Z[l])A^{[l+1]}=g^{[l]}(Z^{[l]})
  • Z[l]=W[l]A[l1]+b[l]Z^{[l]}=W^{[l]}A^{[l−1]}+b^{[l]}A[l]=g[l](Z[l])A^{[l]}=g^{[l]}(Z^{[l]})。正確
  • Z[l]=W[l1]A[l]+b[l1]Z^{[l]}=W^{[l-1]}A^{[l]}+b^{[l-1]}A[l]=g[l](Z[l])A^{[l]}=g^{[l]}(Z^{[l]})
  • Z[l]=W[l]A[l]+b[l]Z^{[l]}=W^{[l]}A^{[l]}+b^{[l]}A[l+1]=g[l+1](Z[l])A^{[l+1]}=g^{[l+1]}(Z^{[l]})

4,You are building a binary classifier for recognizing cucumbers (y=1) vs. watermelons (y=0). Which one of these activation functions would you recommend using for the output layer?


  • ReLU
  • Leaky ReLU
  • sigmoid。正確
  • tanh

5,Consider the following code: 觀察下面代碼

A = np.random.randn(4,3)
B = np.sum(A, axis = 1, keepdims = True)

What will be B.shape? B的形狀是怎麼樣的?

  • (1, 3)
  • (4, 1)。正確
  • (, 3)
  • (4, )


6,Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements are True? (Check all that apply)


  • Each neuron in the first hidden layer will perform the same computation. So even after multiple iterations of gradient descent each neuron in the layer will be computing the same thing as other neurons.
    第一個隱藏層中的每個神經元節點將執行相同的計算。 所以即使經過多次梯度下降迭代後,該層中的每個神經元節點都會計算出與其他神經元節點相同的東西。 正確,參見鏈接
  • Each neuron in the first hidden layer will perform the same computation in the first iteration. But after one iteration of gradient descent they will learn to compute different things because we have “broken symmetry”.
    第一個隱藏層中的每個神經元將在第一次迭代中執行相同的計算。 但經過一次梯度下降迭代後,它們將學會計算不同的東西,因爲我們已經“破壞了對稱性”。
  • Each neuron in the first hidden layer will compute the same thing, but neurons in different layers will compute different things, thus we have accomplished “symmetry breaking” as described in lecture.
  • The first hidden layer’s neurons will perform different computations from each other even in the first iteration; their parameters will thus keep evolving in their own way.
    即使在第一次迭代中,第一個隱藏層的神經元也會執行不同的計算, 他們的參數將以自己的方式不斷髮展。

7,Logistic regression’s weights w should be initialized randomly rather than to all zeros, because if you initialize to all zeros, then logistic regression will fail to learn a useful decision boundary because it will fail to “break symmetry”, True/False?



邏輯迴歸沒有隱藏層。 如果將權重初始化爲零,則邏輯迴歸中的第一個樣本x將輸出零,但邏輯迴歸的導數取決於不是零的輸入x(因爲沒有隱藏層)。 因此,在第二次迭代中,如果x不是常量向量,則權值遵循x的分佈並且彼此不同。

8,You have built a network using the tanh activation for all the hidden units. You initialize the weights to relative large values, using np.random.randn(…,…)*1000. What will happen?


  • It doesn’t matter. So long as you initialize the weights randomly gradient descent is not affected by whether the weights are large or small.
  • This will cause the inputs of the tanh to also be very large, thus causing gradients to also become large. You therefore have to set α to be very small to prevent divergence; this will slow down learning.
    這將導致tanh的輸入也非常大,由此導致梯度也變大。因此,你必須將α設置得非常小以防止發散; 這會減慢學習速度。
  • This will cause the inputs of the tanh to also be very large, causing the units to be “highly activated” and thus speed up learning compared to if the weights had to start from small values.
  • This will cause the inputs of the tanh to also be very large, thus causing gradients to be close to zero. The optimization algorithm will thus become slow.
    這將導致tanh的輸入也很大,因此導致梯度接近於零, 優化算法將因此變得緩慢。 正確


9,Consider the following 1 hidden layer neural network:


  • b[1]b^{[1]} will have shape (4, 1)
  • W[1]W^{[1]} will have shape (4, 2)
  • W[2]W^{[2]} will have shape (1, 4)
  • b[2]b^{[2]} will have shape (1, 1)

10,In the same network as the previous question, what are the dimensions of Z[1]Z^{[1]} and A[1]A^{[1]}?

10,問題9中, Z[1]Z^{[1]}A[1]A^{[1]} 維度是什麼?

  • Z[1]Z^{[1]}A[1]A^{[1]} are (4,m)。正確。
  • Z[1]Z^{[1]}A[1]A^{[1]} are (1,4)
  • Z[1]Z^{[1]}A[1]A^{[1]} are (4,1)
  • Z[1]Z^{[1]}A[1]A^{[1]} are (4,2)


  • Z和A矩陣的維度爲Z[l],A[l]:(n[l],m)Z^{[l]},A^{[l]}:(n^{[l]},m)
  • b[l]的維度是 (n[l],1)
  • W[l]的維度是 (n[l],n[l-1])
  • n[l]表示每層的單元數
