2020-6-10 吳恩達-NN&DL-w3 淺層NN(課後作業)

參考 https://zhuanlan.zhihu.com/p/31270944

1, Which of the following are true? (Check all that apply.) Notice that I only list correct options.

  • X is a matrix in which each column is one training example.
  • a4[2]a^{[2]}_4 is the activation output by the 4th neuron of the 2nd layer
  • a[2](12)a^{[2](12)} denotes the activation vector of the 2nd layer for the 12th training example.
  • a[2]a^{[2]} denotes the activation vector of the 2nd layer.

1, 以下哪個是正確的。

  • X是一個矩陣,其中每個列都是一個訓練樣本。
  • a4[2]a^{[2]}_4是第二層第四層神經元的激活的輸出
  • a[2](12)a^{[2](12)}表示第二層第十二訓練樣本的激活向量
  • a[2]a^{[2]}表示第二層的激活向量

答案:全對

=================================================================
2,The tanh activation usually works better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data better for the next layer. True/False?

2,對於隱藏單元,激活函數tanh通常比sigmoid函數效果好,因爲它輸出的平均值接近0,對於下一層來說數據更加中心化。這種說法是否正確?

答案:正確。參見鏈接

=================================================================
3,Which of these is a correct vectorized implementation of forward propagation for layer l, where 1≤l≤L?

3,對於l層(1<l<L),前向傳播的向量化實現,下面哪個是正確的?

  • Z[l]=W[l]A[l]+b[l]Z^{[l]}=W^{[l]}A^{[l]}+b^{[l]}A[l+1]=g[l](Z[l])A^{[l+1]}=g^{[l]}(Z^{[l]})
  • Z[l]=W[l]A[l1]+b[l]Z^{[l]}=W^{[l]}A^{[l−1]}+b^{[l]}A[l]=g[l](Z[l])A^{[l]}=g^{[l]}(Z^{[l]})。正確
  • Z[l]=W[l1]A[l]+b[l1]Z^{[l]}=W^{[l-1]}A^{[l]}+b^{[l-1]}A[l]=g[l](Z[l])A^{[l]}=g^{[l]}(Z^{[l]})
  • Z[l]=W[l]A[l]+b[l]Z^{[l]}=W^{[l]}A^{[l]}+b^{[l]}A[l+1]=g[l+1](Z[l])A^{[l+1]}=g^{[l+1]}(Z^{[l]})

=================================================================
4,You are building a binary classifier for recognizing cucumbers (y=1) vs. watermelons (y=0). Which one of these activation functions would you recommend using for the output layer?

4,你構築了一個識別黃瓜的二分分類器(黃瓜=1,西瓜=0),對於輸出層你推薦使用下面哪個激活函數。

  • ReLU
  • Leaky ReLU
  • sigmoid。正確
  • tanh

=================================================================
5,Consider the following code: 觀察下面代碼

A = np.random.randn(4,3)
B = np.sum(A, axis = 1, keepdims = True)

What will be B.shape? B的形狀是怎麼樣的?

  • (1, 3)
  • (4, 1)。正確
  • (, 3)
  • (4, )

keepdims參數,代表是否保持其維度信息。將一個4行3列的矩陣橫向相加,得到4行1列的矩陣。

=================================================================
6,Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements are True? (Check all that apply)

6,假設你構築一個NN。你初始化權重W和偏差b爲0。下面哪個描述是正確的?

  • Each neuron in the first hidden layer will perform the same computation. So even after multiple iterations of gradient descent each neuron in the layer will be computing the same thing as other neurons.
    第一個隱藏層中的每個神經元節點將執行相同的計算。 所以即使經過多次梯度下降迭代後,該層中的每個神經元節點都會計算出與其他神經元節點相同的東西。 正確,參見鏈接
  • Each neuron in the first hidden layer will perform the same computation in the first iteration. But after one iteration of gradient descent they will learn to compute different things because we have “broken symmetry”.
    第一個隱藏層中的每個神經元將在第一次迭代中執行相同的計算。 但經過一次梯度下降迭代後,它們將學會計算不同的東西,因爲我們已經“破壞了對稱性”。
  • Each neuron in the first hidden layer will compute the same thing, but neurons in different layers will compute different things, thus we have accomplished “symmetry breaking” as described in lecture.
    第一個隱藏層中的每一個神經元都會計算出相同的東西,但是不同層的神經元會計算不同的東西,因此我們已經完成了“對稱破壞”。
  • The first hidden layer’s neurons will perform different computations from each other even in the first iteration; their parameters will thus keep evolving in their own way.
    即使在第一次迭代中,第一個隱藏層的神經元也會執行不同的計算, 他們的參數將以自己的方式不斷髮展。

=================================================================
7,Logistic regression’s weights w should be initialized randomly rather than to all zeros, because if you initialize to all zeros, then logistic regression will fail to learn a useful decision boundary because it will fail to “break symmetry”, True/False?

7,邏輯迴歸權重w採用隨機初始化比初始化爲0要好。因爲如果你初始化爲0,那麼邏輯迴歸將無法學習到有用的決策邊界,因爲它將無法“破壞對稱性”,是正確的嗎?

答案:錯誤。

邏輯迴歸沒有隱藏層。 如果將權重初始化爲零,則邏輯迴歸中的第一個樣本x將輸出零,但邏輯迴歸的導數取決於不是零的輸入x(因爲沒有隱藏層)。 因此,在第二次迭代中,如果x不是常量向量,則權值遵循x的分佈並且彼此不同。

=================================================================
8,You have built a network using the tanh activation for all the hidden units. You initialize the weights to relative large values, using np.random.randn(…,…)*1000. What will happen?

8,你在所有隱藏層上使用tanh函數構築網絡。你用一個大的值初始化權重np.random.randn(..,..)*1000。那麼導致什麼結果?

  • It doesn’t matter. So long as you initialize the weights randomly gradient descent is not affected by whether the weights are large or small.
    這沒關係。只要隨機初始化權重,梯度下降不受權重大小的影響。
  • This will cause the inputs of the tanh to also be very large, thus causing gradients to also become large. You therefore have to set α to be very small to prevent divergence; this will slow down learning.
    這將導致tanh的輸入也非常大,由此導致梯度也變大。因此,你必須將α設置得非常小以防止發散; 這會減慢學習速度。
  • This will cause the inputs of the tanh to also be very large, causing the units to be “highly activated” and thus speed up learning compared to if the weights had to start from small values.
    這會導致tanh的輸入也非常大,導致單位被“高度激活”,從而加快了學習速度,而權重必須從小數值開始。
  • This will cause the inputs of the tanh to also be very large, thus causing gradients to be close to zero. The optimization algorithm will thus become slow.
    這將導致tanh的輸入也很大,因此導致梯度接近於零, 優化算法將因此變得緩慢。 正確

如果w很大,根據z=wx+b,z也會很大,那麼tanh函數值也會很大,在這些地方,tanh函數的斜率很小,接近與0,會導致梯度下降收斂很慢。所以一般初始化w時會再乘上一個比較小的係數。

=================================================================
9,Consider the following 1 hidden layer neural network:

9,只有1個隱藏層的NN
在這裏插入圖片描述
以下是正確的答案

  • b[1]b^{[1]} will have shape (4, 1)
  • W[1]W^{[1]} will have shape (4, 2)
  • W[2]W^{[2]} will have shape (1, 4)
  • b[2]b^{[2]} will have shape (1, 1)

=================================================================
10,In the same network as the previous question, what are the dimensions of Z[1]Z^{[1]} and A[1]A^{[1]}?

10,問題9中, Z[1]Z^{[1]}A[1]A^{[1]} 維度是什麼?

  • Z[1]Z^{[1]}A[1]A^{[1]} are (4,m)。正確。
  • Z[1]Z^{[1]}A[1]A^{[1]} are (1,4)
  • Z[1]Z^{[1]}A[1]A^{[1]} are (4,1)
  • Z[1]Z^{[1]}A[1]A^{[1]} are (4,2)

關於問題9和10的維度問題,可以參見鏈接。說明如下

  • Z和A矩陣的維度爲Z[l],A[l]:(n[l],m)Z^{[l]},A^{[l]}:(n^{[l]},m)
  • b[l]的維度是 (n[l],1)
  • W[l]的維度是 (n[l],n[l-1])
  • n[l]表示每層的單元數
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章