A single neural network in one evalution
Introduction
(1)simple
We reframe object detection as a single regression problem, straight from image pixels to bounding box coordinates and class probabilities.
(2)global image
unlike sliding window and region proposal-based techniques, YOLO sees the intire image during training and test time so it implicitly encodes contextual information about classes as well as their appearance.
and Fast R-CNN mistakes background patch because it can’t see the larger context.
(3)YOLO learn generalizable representations(概括性特徵) of objects
a wide margin(大邊緣)
(4)YOLO lags behind(落後於) current detection sys in accuracy
2 Unified Detection
entire image ->bounding boxs
predicts all bounding box across all classes for an image simultaneously
end to end
define confidence as Pr(
if Pr(object) = 0; confidence = 0
if Pr(object) = 1; confidence = IOU(intersection over uion)
IOU用來表示預測框與真實框重合的程度,其最小值是0,最大值是1。 具體算法請參照如下文章
each bounding box consists of 5 prediction:x,y,w,h,confidence
each grid cell also predicts C conditional class probabilities, Pr(
At test time we multiply the conditional class probabilities and the individual box confidence prediction
Pr(
class-specific confidence scores for each box
Question: the map of the box confidence and the grid cell
the center of the box in the cell
Model:
(1)detection as regression problem
(2)image ->divide S*S cells
(3)cell ->B bounding boxes
(4)the predictions are S*S(B*5+C) tensor
parameters: for YOLO on PASCAL VOC S = 7, B =2. The data had 20 labels classes, so C = 20. the prediction is 7*7*(5*2+20)