1、先說比例問題

SSD輸入是正方形，如果是矩形，模型會自動縮放，會引起圖片畸變，影響精度，解決方法：

A、放任模型縮放，犧牲精度

B、對圖片切分成多個正方形的圖像，有可能導致部分特徵被切到多個圖中；

C、做圖形的擴展，如300*900的圖，在兩側補充白色爲900*900的方形，代價是會引起目標特徵相對於圖的比例縮小，影響小特徵的檢測。

參考：

You have several options from then on:

Just letting TF reshape the input to (w, h) with the resizer, without preprocessing. The problem is that the images will be deformed, which may (or not, depending on your data and the objects you're trying to detect) be a problem.
Cropping all the images to have sub-images with the same aspect ratio as (w, h). Problem: you'll lose part of the images or have to do more inferences for each image.
Padding all images (with black pixels or random white noise) to get images with the same aspect ratio as (w, h). You'll have to do some coordinate translations on the output bounding boxes (the coordinates you'll get will be in the augmented image, you'll have to translate to initial coordinates by multiplying them by old_size/new_size on both axes). The problem is that some objects will be downsized (relatively to the full image size) more than some others, which may or may not be a problem depending on your data and what you're trying to detect.

結論，最好處理成方形，否則會有畸變。推薦方法C

參考：https://stackoverflow.com/questions/48145456/tensorflow-object-detection-api-ssd-model-using-keep-aspect-ratio-resizer?rq=1

另外，

SSD and faster R-CNN work quite differently one from another, so, even though F-RCNN has no such constraint, for SSD you need input images that always have the same size (actually you need the feature map to always have the same size, but the best way to ensure it is with always the same input size). This is because it ends with fully connected layers, for which you need to know the size of the feature maps; whereas for F-RCNN there are only convolutions (which work on any input size) up to the ROI-pooling layer (which only doesnt need a fixed image size).

2、圖像大小問題

經過1中處理，圖像已是方形，那我們是否要預處理輸入圖片爲300*300或512*512？

模型在讀取數據時，會自動縮放到300*300或512*512，可以不手動處理。

具體見連接

“原因6：SSD設置了輸入圖片的大小，它會將不同大小的圖片裁剪爲300x300，或者512x512，和Faster-rcnn相比，在輸入上就會少很多的計算，不要說後面的啦，不快就怪啦！！！”

也可以在1中直接縮放圖形，記得要處理label文件中的座標。

3、修改成SSD N*N模型

如果我就不要300*300 或者512*512怎麼辦？

改爲SSD640

不建議增大模型，會產生更多Bbox----->

如上圖所示，當Faster-rcnn的輸入分辨率爲1000x600時，產生的BB是6000個；當SSD300的輸入分辨率爲300x300時，產生的BB是8372個；當SSD512的輸入分辨率爲512x512時，產生的BB是24564個，大家像一個情況，當SSD的分辨率也是1000x600時，會產生多少個BB呢？這個數字可能會很大！但是它卻說自己比Faster-rcnn和YOLO等算法快很多，我們來分析分析原因。
————————————————
版權聲明：本文爲CSDN博主「技術挖掘者」的原創文章，遵循 CC 4.0 BY-SA 版權協議，轉載請附上原文出處鏈接及本聲明。
原文鏈接：https://blog.csdn.net/WZZ18191171661/article/details/79444217

4、針對小目標的改進

SSD 存在問題：
SSD的缺點是對小尺寸的目標識別仍比較差，還達不到Faster R-CNN的水準。這主要是因爲小尺寸的目標多用較低層級的anchor來訓練(因爲小尺寸目標在較低層級IOU較大)，較低層級的特徵非線性程度不夠，無法訓練到足夠的精確度。

個人觀點：SSD到底好不好，需要根據你的應用和需求來講，真正合適你的應用場景的檢測算法需要你去做性能驗證，比如你的場景是密集的包含多個小目標的，我很建議你用Faster-rcnn，針對特定的網絡進行優化，也是可以加速的；如果你的應用對速度要求很苛刻，那麼肯定首先考慮SSD，至於那些測試集上的評估結果，和真實的數據還是有很大的差距，算法的性能也需要進一步進行評估。
————————————————
版權聲明：本文爲CSDN博主「技術挖掘者」的原創文章，遵循 CC 4.0 BY-SA 版權協議，轉載請附上原文出處鏈接及本聲明。
原文鏈接：https://blog.csdn.net/WZZ18191171661/article/details/79444217

A、改進的SSD RSSD