目標

1能直觀感受整個過程

2學習新idea

計劃

1過一遍英文論文（配合中文論文）——4days

2速讀RCNN發展歷史blog——5days

3學習RCNN、

4FastRCNN（7days）

5Faster-RCNN、MAskRCNN、後期

實際

1過一遍論文翻譯（5days——2018/3/5）【Done】

2速讀歷史：

RCNN（Done）

Fast R-CNN（）

http://deeplearning.csail.mit.edu/

http://chuansong.me/n/353443351445

https://www.zhihu.com/people/jennywei528/answers

http://www.cnblogs.com/wuxiangli/p/7066707.html

https://zhuanlan.zhihu.com/p/21533724

1過一遍論文翻譯（5days）【Done】

https://wenku.baidu.com/view/55616aeb03d276a20029bd64783e0912a2167ced.html

https://alvinzhu.xyz/2017/10/07/mask-r-cnn/#fn:18

http://blog.csdn.net/myGFZ/article/details/79136610

2難度太大，速讀歷史（2days）（RCNN ——》Fast RCNN——》Faster RCNN）簡介【Done】

閱讀順序

翻譯了同一論文，第三篇有助理解前兩篇

（1）https://zhuanlan.zhihu.com/p/26655034

（2）https://zhuanlan.zhihu.com/p/26652657

（3）https://zhuanlan.zhihu.com/p/30967656

圖像檢測=圖像分類+邊框檢查

————————————————————————Fast R-CNN

https://zhuanlan.zhihu.com/p/30368989
https://www.jianshu.com/p/7c35ba55ad61
http://jacobkong.github.io/posts/1679631826/
https://zhuanlan.zhihu.com/p/24780395
http://blog.csdn.net/wopawn/article/details/52463853

https://www.jianshu.com/p/38bed2b9f49a

關於背景sppnet介紹：

https://www.davex.pw/2018/02/11/paper-reading-of-spp-net/
http://hellodfan.com/2017/09/30/%E7%89%A9%E4%BD%93%E6%A3%80%E6%B5%8B%E8%AE%BA%E6%96%87-SPPNet/
https://blog.csdn.net/skying_li/article/details/70158924

finetune：http://blog.csdn.net/u014381600/article/details/71511794

閃光點：
1）共享計算結果：ROI pooling
原先1個圖片有2000個候選區域，每個候選區域需要做1次的前向計算（在Alexnet中），得到每個候選區域的特徵圖——》對輸入圖片做一次前向計算輸出整個圖形的特徵圖，根據每個圖形的特徵圖分別提取2000個候選區的特徵圖
2）multi-task：一個模型搞定上述2、3、4（end- end）
SVM：在CNN輸出層用softmax函數

LR：在CNN輸出層加入了用來輸出邊界框座標的線性迴歸層。

【訓練階段】

A:輸入訓練集
（1）使用正樣本（真實樣本+IOU>0.5的建議框）負樣本（0.1<IOU<0.5建議框，其中背景樣本佔比75%）(數據增廣：50%概率水平翻轉),修改VGG的最後一層池化層爲Rol池化層，獲取整幅圖像的特徵，得到特徵圖

進行selective search得到2000個邊框座標

B:區域建議region proposal：2000個邊框座標+對應的特徵圖

（2）在特徵圖種找到建議框所映射的候選區域的特徵框

（3）使用ROI處理特徵框爲爲同一大小H*W
（3）特徵框經過全連接層（SVD分解）得到固定大小的特徵向量

C:固定大小的特徵向量

（4）softmax分類得分

bounding-box窗口迴歸

（5）用非極大值抑制，保留最有可能的區域

E:細化後的邊框座標
缺陷：1）3個訓練模型2）使用SS搜索慢
【測試階段】
A:輸入一張圖像

（1）送入imagenet獲取整幅圖像的特徵，得到特徵圖

進行selective search得到2000個邊框座標

B:區域建議region proposal：2000個邊框座標+對應的特徵圖

（2）在特徵圖種找到建議框所映射的候選區域的特徵框

（3）使用ROI處理特徵框爲爲同一大小H*W

（3）特徵框經過全連接層（SVD分解）得到固定大小的特徵向量

C:固定大小的特徵向量

（4）softmax分類得分

bounding-box窗口迴歸

（5）用非極大值抑制，保留最有可能的區域

E:細化後的邊框座標
缺陷：1）3個訓練模型2）使用SS搜索慢

Faster R-CNN：改進（region proposer）

https://zhuanlan.zhihu.com/p/30720870

https://www.jianshu.com/p/8f78a9350117

http://jacobkong.github.io/posts/3802700508/

https://www.jianshu.com/p/3a2b92206658

輸入：圖像（不需要帶有區域建議）

輸出：每個區域的對象類別+相關的緊密邊界框

區域建議網絡（egion Proposal Network, RPN）使用CNN裏的第一個卷積層前進傳播過程中的圖像特徵（構建了k個anchor boxes（common aspect ratios），每個anchor box輸出bounding box和對應的位置圖像的分數））——fastcnn

1，單獨訓練RPN；

2，使用步驟中1得到的區域生成方法單獨訓練Fast R-CNN;

3, 使用步驟2得到的網絡作爲初始網絡訓練RPN；

4，再次訓練Fast R-CNN, 微調參數。

Mask Fast-RCNN：擴展到像素級切割

1）RoIPool--》RoIAlign（雙線性插值）

2）Faster R-CNN的CNN特徵頂部添加了一個完全卷積網絡（Fully Convolutional Network，FCN），用來生成掩碼（Binary Mask分割輸出），判斷給定像素是否屬於物體【對各個區域分割時，解除不同類之間的耦合。假設有K類物體，一般的分割方法直接預測一個有K個通道的輸出，其中每個通道代表對應的類別。而Mask R-CNN預測K個有2個通道（前景和背景）的輸出，這樣各個類別的預測是獨立的】

輸入：CNN特徵圖。

輸出：指示像素是否屬於物體的二值矩陣【在像素屬於對象的所有位置上都具有1s的矩陣，其他位置爲0s，這種規則被稱爲二進制掩碼。】

FastR-CNN；

Region Proposal Network(RPN: propose candidate object bounding boxes)+extract features using RoIpool(RoI: region of Interest) from each candidate box and performs classification and bounding-box regression

http://blog.csdn.net/gavin__zhou/article/details/51996615

http://blog.csdn.net/u014544555/article/details/79381342

http://blog.csdn.net/tigerda/article/details/78527870?locationNum=2&fps=1

http://blog.csdn.net/linolzhang/article/details/71774168

https://zhuanlan.zhihu.com/p/32830206

2 Mask R-CNN

Mask：A mask encodes an input object’s spatial layout.

任務：

object instance segmentation

輸出:

1 detect object in an image 2 generate a high-quality segmentation mask for each instance

一句話概括框架：

add a branch for predicting an object mask in parallel with the existing branch for bounding box recognition

框架介紹：

(PRN)+（Fast R-CNN:output bounding-box classification and regression +FCN:output abinary mask for each RoI）

提取特徵：ResNet-FPN+ResNet C4 ：the convolutional backbone architecture used for feature extraction over an entire image

分類：the network headfor bounding-box recognition (classification and regression)and mask prediction that is applied separately to each RoI.

Loss：

multi-task Loss：each sampled ROI as L=Lcls+Lbox+Lmask(定義：The mask branch has a Km2 dimensional output for each RoI, which encodes K binary masks of resolution m*m, one for each of the K classes；計算：sigmoid+average binary cross-entropy loss；真實類別爲k的Rol，僅算第k個的Lmask)

安裝：

https://www.cnblogs.com/Anita9002/p/8335710.html

http://blog.csdn.net/wei_guo_xd/article/details/78579534

http://blog.csdn.net/xiongchao99/article/details/79122428

https://yq.aliyun.com/articles/238716

【速讀心得】Msak R-CNN

————————————————————————Fast R-CNN

【安裝記錄】ssh+frp+docker+pycharm

【安裝記錄】遠程軟件

【安裝記錄】lift:learned invariant feature points

tesseract4.0：ubuntu16.04 +x64+leptonica1.74.4源碼安裝（ViewerDebugging）工具記錄

【安裝記錄】【UEFI++雙系統（win10+ubuntu16.04）+雙硬盤+深度學習】+【win10遠程連接ubuntu16.04】

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結