卷積神經網絡Quiz3

Question 1
You are building a 3-class object classification and localization algorithm. The classes are: pedestrian (c=1), car (c=2), motorcycle (c=3). What would be the label for the following image? Recall y=[pc,bx,by,bh,bw,c1,c2,c3]
這裏寫圖片描述
y=[1,0.3,0.7,0.3,0.3,0,1,0]

y=[1,0.7,0.5,0.3,0.3,0,1,0]

y=[1,0.3,0.7,0.5,0.5,0,1,0]

y=[1,0.3,0.7,0.5,0.5,1,0,0]

y=[0,0.2,0.4,0.5,0.5,0,1,0]

解析:首先圖像裏面有目標,所以pc爲1,因爲是car,所以c1爲0,c2爲1,c3爲0,圖像的位置大概在0.3,0.7左右,所以bx= 0.3,by=0.7,汽車大小差不多佔圖像的0.3*0.3,所以bh = bw =0.3


Question 2
Continuing from the previous problem, what should y be for the image below? Remember that “?” means “don’t care”, which means that the neural network loss function won’t care what the neural network gives for that component of the output. As before, y=[pc,bx,by,bh,bw,c1,c2,c3].
這裏寫圖片描述
y=[1,?,?,?,?,0,0,0]

y=[0,?,?,?,?,?,?,?]

y=[0,?,?,?,?,0,0,0]

y=[1,?,?,?,?,?,?,?]

y=[?,?,?,?,?,?,?,?]

解析:沒有目標,所以第一項爲0,其餘不關心,爲?


Question 3
You are working on a factory automation task. Your system will see a can of soft-drink coming down a conveyor belt, and you want it to take a picture and decide whether (i) there is a soft-drink can in the image, and if so (ii) its bounding box. Since the soft-drink can is round, the bounding box is always square, and the soft drink can always appears as the same size in the image. There is at most one soft drink can in each image. Here’re some typical images in your training set:
這裏寫圖片描述
What is the most appropriate set of output units for your neural network?

Logistic unit (for classifying if there is a soft-drink can in the image)

Logistic unit, bx and by

Logistic unit, bx, by, bh (since bw = bh)

Logistic unit, bx, by, bh, bw

解析:有兩個要求,第一個是是否有飲料罐,其次是位置,所以需要bx, by, bh, bw,而bw = bh,所以只需要3個即可


Question 4
If you build a neural network that inputs a picture of a person’s face and outputs N landmarks on the face (assume the input image always contains exactly one face), how many output units will the network have?

N

2N

3N

N2

解析:
這裏寫圖片描述
在人體姿態檢測中,同樣可以通過對人體不同的特徵位置關鍵點的標註,來記錄人體的姿態。一個特徵位置需要(x,y)兩個值表示,所以需要2N


Question 5
When training one of the object detection systems described in lecture, you need a training set that contains many pictures of the object(s) you wish to detect. However, bounding boxes do not need to be provided in the training set, since the algorithm can learn to detect the objects by itself.

True

False


Question 6
Suppose you are applying a sliding windows classifier (non-convolutional implementation). Increasing the stride would tend to increase accuracy, but decrease computational cost.

True

False

解析:增加stride,相當於檢測的更少了,所以精度不可能提高


Question 7
In the YOLO algorithm, at training time, only one cell —the one containing the center/midpoint of an object— is responsible for detecting this object.

True

False

解析:將對象分配到一個格子的過程是:觀察對象的中點,將該對象分配到其中點所在的格子中,(即使對象橫跨多個格子,也只分配到中點所在的格子中,其他格子記爲無該對象,即標記爲“0”);


8。Question 8
What is the IoU between these two boxes? The upper-left box is 2x2, and the lower-right box is 2x3. The overlapping region is 1x1.

這裏寫圖片描述

1/6

1/9

1/10

None of the above
解析:交集爲1,並集爲9


Question 9
Suppose you run non-max suppression on the predicted boxes above. The parameters you use for non-max suppression are that boxes with probability ≤ 0.4 are discarded, and the IoU threshold for deciding if two boxes overlap is 0.5. How many boxes will remain after non-max suppression?
這裏寫圖片描述
3

4

5

6

7

解析:NMS算法:
以單個對象檢測爲例:
對於圖片每個網格預測輸出矩陣:yi=[Pc bx by bh bw],其中Pc表示有對象的概率;
拋棄Pc⩽0.6 的邊界框;
對剩餘的邊界框(while):

  • 選取最大Pc值的邊界框,作爲預測輸出邊界框;
  • 拋棄和選取的邊界框IoU⩾0.5的剩餘的邊界框。

對於本例,首先拋棄Pc小於0.4的目標,即右下角的小汽車;之後,選最大Pc,即0.98,沒有交併集,count=1;其次選0.74,與0.46的Tree存在交併集,交併集小於0.5,不拋棄,count=2;接下來選0.73,拋棄0.62的car,count=3;接下來選0.58,count=4;最後是0.46的tree,count=5


10。Question 10
Suppose you are using YOLO on a 19x19 grid, on a detection problem with 20 classes, and with 5 anchor boxes. During training, for each image you will need to construct an output volume y as the target value for the neural network; this corresponds to the last layer of the neural network. (y may include some “?”, or “don’t cares”). What is the dimension of this output volume?

19x19x(25x20)

19x19x(20x25)

19x19x(5x20)

19x19x(5x25)

解析:Pc, bx,by,bh,bw佔5個,剩下20個爲classes,所以需要25個,有5個anchor boxes,需要乘以5


參考:http://blog.csdn.net/koala_tree/article/details/78597575

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章