如何利用AI識別口罩下的人臉?

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"今天的大街上戴口罩的人越來越多,你可能會想:他們摘了口罩都長什麼樣呢?至少我們STRV機器學習(ML)團隊就有這樣的疑問。作爲一個機器學習團隊,我們很快意識到問題比想象中更容易解決。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"想知道我們是如何設計出一種可以從人臉圖像上移除口罩的ML工具的嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"本文將指導你完成構建深度學習ML模型的整個過程——從初始設置、數據收集和選擇適當的模型,到訓練和微調。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在深入研究之前,我們先來定義任務的性質。我們試圖解決的問題可以看作是"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"圖像修復"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",也就是恢復受損圖像或填充缺失部分的過程。下面就是圖像修復的例子:輸入的圖像有一些白色缺失,經過處理這些缺失被補足了。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/8f\/e5\/8ffdd13c89863c04c79bb3731161d4e5.png","alt":null,"title":"使用部分卷積進行圖像修復的示例","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"解決完定義的問題後我們再提一點:除了本文之外,我們還準備了一個GitHub帳戶,其中包含你需要的所有內容實現,以及Jupyter Notebook“mask2face.ipynb”,你可以在其中運行本文提到的所有內容。只需單擊幾下,即可訓練你自己的神經網絡。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"接下來,讓我們正式開始吧。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、準備工作"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"如果你想在計算機上執行本文所述的所有步驟,可以從我們的"},{"type":"link","attrs":{"href":"https:\/\/github.com\/strvcom\/strv-ml-mask2face","title":null,"type":null},"content":[{"type":"text","text":"GitHub"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"克隆此項目。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"首先,我們來爲Python項目準備虛擬環境。你可以使用你喜歡的任何虛擬環境,只要確保從environment.yml和requirements.txt安裝所有必需的依賴項即可。不熟悉虛擬環境或Conda的話可以參考這篇"},{"type":"link","attrs":{"href":"https:\/\/towardsdatascience.com\/a-guide-to-conda-environments-bc6180fc533","title":null,"type":null},"content":[{"type":"text","text":"文章"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。如果你熟悉Conda,還可以在克隆的GitHub項目目錄中運行以下命令來初始化Conda環境:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"conda env create -f environment.yml\nconda activate mask2face"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"現在,你已經有了一個帶有所有必需依賴項的環境,接下來我們來定義目標和目的。對於這個項目,我們想要創建一個ML模型,該模型可以向我們展示戴口罩的人摘下口罩的樣子。我們的模型有一個輸入——戴口罩的人的圖像;一個輸出——摘下口罩的人的圖像。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、實現"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.高層ML管道"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"下圖很好地展示了整個項目的高層管道。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/d0\/81\/d04a76df92441456d8b1923f02eac881.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們從一個帶有預先計算的面部界標的面部數據集開始,該數據集是通過口罩生成器處理的,它使用這些界標將口罩放在臉上。現在我們有了帶有成對圖像(戴和不戴口罩)的數據集,我們就可以繼續定義ML模型的架構了。管道的最後一部分是找到最佳損失函數和組成各個部分的所有必要腳本,以便我們可以訓練和評估模型。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.數據生成"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"要想訓練這個深度學習模型,我們需要採用大量數據,也就是大量輸入和輸出的圖像對。當然,要收集每個人戴口罩\/不戴口罩的輸入和輸出圖像是不切實際的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"當前,市面上有很多人臉圖像數據集,主要用於訓練人臉檢測算法。我們可以採用這樣的數據集,在人臉上繪製口罩——於是我們就有了圖像對。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/75\/24\/75b9ae41023dc58681a100a5dd6bbd24.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們嘗試了兩個數據集。其中一個數據庫是馬薩諸塞大學[1]的現實世界"},{"type":"link","attrs":{"href":"http:\/\/vis-www.cs.umass.edu\/lfw\/","title":null,"type":null},"content":[{"type":"text","text":"人臉標記數據集"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。這裏是它的104MB gzip壓縮tar"},{"type":"link","attrs":{"href":"http:\/\/vis-www.cs.umass.edu\/lfw\/lfw-deepfunneled.tgz","title":null,"type":null},"content":[{"type":"text","text":"文件"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",其中包含整個數據集,超過5,000張圖片。這個數據集非常適合我們的情況,因爲它包含的圖像主要都是人臉。但對於最終結果,我們使用了"},{"type":"link","attrs":{"href":"http:\/\/mmlab.ie.cuhk.edu.hk\/projects\/CelebA.html","title":null,"type":null},"content":[{"type":"text","text":"CelebA數據集"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",它更大(200,000個樣本),並且包含更高質量的圖像。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"接下來,我們需要定位面部界標,以便將口罩放置在正確的位置。爲此,我們使用了一個預訓練的"},{"type":"link","attrs":{"href":"http:\/\/dlib.net\/","title":null,"type":null},"content":[{"type":"text","text":"dlib"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"面部界標檢測器。你可以使用其他任何類似的數據集,只要確保你可以找到預計算的面部界標點(可參考這個GitHub"},{"type":"link","attrs":{"href":"https:\/\/github.com\/LynnHo\/Facial-Landmarks-of-Face-Datasets","title":null,"type":null},"content":[{"type":"text","text":"存儲庫"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":")或自己計算界標。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.口罩生成器"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"一開始,我們做了一個口罩生成器的簡單實現,將一個多邊形放置在臉上,使多邊形頂點與面部界標的距離隨機化。這樣我們就可以快速生成一個簡單的數據集,並測試項目背後的想法是否可行。一旦確定它確實有效,我們就開始尋找一種更強大的解決方案,以更好地反映現實場景。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"GitHub上有一個很棒的項目"},{"type":"link","attrs":{"href":"https:\/\/github.com\/aqeelanwar\/MaskTheFace","title":null,"type":null},"content":[{"type":"text","text":"Mask The Face"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",已經解決了口罩生成問題。它從臉部界標點估計口罩位置,估計臉部傾斜角度以從數據庫中選擇最合適的口罩,最後將口罩放置在臉上。可用的口罩數據庫包括了手術口罩、有各種顏色和紋理的布口罩、幾種呼吸器,甚至是防毒面罩。"}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"import matplotlib.pyplot as plt\nimport matplotlib.image as mpimg\nfrom utils.data_generator import DataGenerator\nfrom utils.configuration import Configuration\n\n# You can update configuration.json to change behavior of the generator\nconfiguration = Configuration()\ndg = DataGenerator(configuration)\n\n# Generate images\ndg.generate_images()\n\n# Plot a few examples of image pairs\nn_examples = 5\ninputs, outputs = dg.get_dataset_examples(n_examples)\nf, axarr = plt.subplots(2, n_examples, figsize=(20,10))\nfor i in range(len(inputs)):\n axarr[1, i].imshow(mpimg.imread(inputs[i]))\n axarr[0, i].imshow(mpimg.imread(outputs[i]"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"現在我們已經準備好了數據集,是時候搭建深度神經網絡模型架構了。在這項工作中,絕對沒有人可以聲稱有一個客觀的“最佳”選項。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"選擇合適架構的過程總是取決於許多因素,例如時間要求(你要實時處理視頻還是要離線預處理一批圖像?)、硬件需求(模型應在搭載高性能GPU的羣集上運行,還是要在低功耗移動設備上運行?)等等。每次你都要尋找正確的參數,並針對你的具體情況進行設置。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"如果你想進一步瞭解這個問題,可以參考這個KDnuggets"},{"type":"link","attrs":{"href":"https:\/\/www.kdnuggets.com\/2019\/09\/no-free-lunch-data-science.html","title":null,"type":null},"content":[{"type":"text","text":"帖子"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"或這篇學術"},{"type":"link","attrs":{"href":"http:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.390.9412&rep=rep1&type=pdf","title":null,"type":null},"content":[{"type":"text","text":"文章"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5.卷積神經網絡"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"卷積神經網絡(CNN)是一種利用卷積核過濾器的神經網絡架構。它適用於各種問題,例如時間序列分析、自然語言處理和推薦系統,但主要用於各種圖像相關的用途,例如對象分類、圖像分割、圖像分析和圖像修復。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"CNN的核心是能夠檢測輸入圖像視覺特徵的卷積層。當我們一層層堆疊多個卷積層時,它們傾向於檢測不同的特徵。第一層通常會提取更復雜的特徵,例如邊角或邊緣。當你深入CNN時,卷積層將開始檢測更高級的特徵,例如對象、面部等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"有關CNN的詳細說明,請參閱這篇TechTalks"},{"type":"link","attrs":{"href":"https:\/\/bdtechtalks.com\/2020\/01\/06\/convolutional-neural-networks-cnn-convnets\/","title":null,"type":null},"content":[{"type":"text","text":"文章"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"或這些斯坦福"},{"type":"link","attrs":{"href":"https:\/\/cs231n.github.io\/convolutional-networks\/","title":null,"type":null},"content":[{"type":"text","text":"筆記"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/e1\/29\/e144c175411063a814227c0b20327929.png","alt":null,"title":"CNN架構示例","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"上圖顯示了用於圖像檢測的CNN示例。這並不是我們要解決的問題,但CNN架構是任何修復架構的必要組成部分。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"6.ResNet塊"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在討論修復架構之前,我們先來談談起作用的構建塊,它們稱爲ResNet塊或殘差塊。在傳統的神經網絡或CNN中,每一層都連接到下一層。在具有殘差塊的網絡中,每一層也會連接到下一層,但還會再連接兩層或更多層。我們引入了ResNet塊以進一步提高性能,後文會具體介紹。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c3\/4c\/c3c460cfc52e7817ef497e30f1856d4c.png","alt":null,"title":"ResNet構建塊[7]","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"神經網絡能夠估計任何函數,我們可以認爲增加層數可以提高估計的準確性。但由於諸如梯度消失或維數詛咒之類的問題,層數增加到一定程度就不會繼續提升性能了,甚至會讓性能倒退。這就是爲什麼有很多研究致力於解決這些問題,而性能最好的解決方案之一就是殘差塊。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"殘差塊允許使用跳過連接或標識函數,將信息從初始層傳遞到下一層。通過賦予神經網絡使用標識函數的能力,相比單純地增加層數,我們可以構建性能更好的網絡。你可以在參考資料中閱讀有關ResNet及其變體的更多信息。[8]"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"7.編碼器-解碼器"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"編碼器-解碼器架構由兩個單獨的神經網絡組成:編碼器提取輸入(嵌入)的一個固定長度表示,而解碼器從該表示生成輸出。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/9c\/b3\/9cf445675c58daa37fddcfe3f8187bb3.png","alt":null,"title":"用於圖像分割的編碼器-解碼器網絡[6]","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"你會注意到,編碼器部分與上一節中描述的CNN非常相似。經過驗證的分類CNN通常用作編碼器的基礎,甚至直接用作編碼器,只是沒有最後一個(分類)層。這個架構可以用來生成新圖像,這正是我們所需要的。但它的性能卻不是那麼好,因此我們來看一下更好的東西。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"8.U-net"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"U-net最初是爲圖像分割而開發的卷積神經網絡架構[2],但它在其他許多任務(例如圖像修復或圖像着色)中也展示了自己的能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/ed\/ec\/ed3062f2f696fdb4649b29b32556dbec.png","alt":null,"title":"原始文章中的U-net架構[2]","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們前面之所以提到ResNet塊有一個重要原因。事實上,將ResNet塊與U-net架構結合使用對整體性能的影響最大。你可以在下圖中看到添加ResNet塊的架構。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/14\/7b\/14aa6bcd021b99f69f0a88d97889647b.png","alt":null,"title":"U-net中使用的Upscale ResNet塊(頂部)和downscale resNet塊(底部)","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"當你將上面的U-net架構與上一節中的編碼器-解碼器架構進行比較時,它們看起來非常相似,但有一個關鍵的區別:U-net實現了一種稱爲“跳過連接”的功能,該功能將identity從反捲積塊傳播到另一側對應的上採樣塊(上圖中的灰色箭頭)。這是對編碼器-解碼器架構的兩處顯著改進。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"首先,已知跳過連接可以加快學習過程並幫助解決梯度消失問題[5];其次,它們可以將信息從編碼器直接傳遞到解碼器,從而有助於減少下采樣期間的信息丟失。我們可以認爲它們能傳播我們希望保持不變的口罩外部圖像的所有部分,同時還有助於生成口罩下面的臉部圖像。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"這正是我們所需要的!跳過連接將有助於保留我們要傳播到輸出的部分輸入,而U-net的編碼器-解碼器部分將檢測到口罩並將其替換爲下面的嘴部圖像。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"9.損失函數"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"如何選擇損失函數是你需要解決的最重要的問題之一,使用正確的損失函數可能會得到性能出色的模型,反之就會得到令人失望的模型。這就是我們花很多時間選擇最佳模型的原因所在。下面,我們來討論幾種選項。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"均方誤差(MSE)和均值絕對誤差(MAE)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"MSE和MAE都是損失函數,都基於我們模型生成的圖像將口罩應用到面部之前。這似乎正是我們所需要的,但我們並不打算訓練可以像素級完美重現口罩下隱藏內容的模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們希望我們的模型理解口罩下面是嘴巴和鼻子,甚至可能要理解來自那些未被隱藏的事物(例如眼睛)所包含的情感,從而生成悲傷的、快樂的或可能是驚訝的面孔。這意味着,即使不能完美地捕獲每個像素,實際上也可以產生一個很好的結果;更重要的是,它可以學習如何在任何臉部上泛化,而不僅僅是對訓練數據集中的面孔進行泛化。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"結構相似性指數(SSIM)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"SSIM是用於度量兩個圖像之間相似度的度量標準,由Wang等人在2004年提出[3]。它專注於解決MSE\/MAE所存在的問題。它提供了一個數值表達式,用來展示兩張圖像之間的相似度。它通過對比圖像之間的三個測量值來做到這一點:亮度、對比度和結構。最終得分是所有三個測量值的加權組合,分數從0到1,1表示圖像完全相同。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"下圖說明了MSE存在的問題:左上圖是未經修改的原始圖像;其他圖像均有不同形式的失真。原始圖像與其他圖像之間的均方誤差大致相同(大約480),而SSIM的變化很大。例如,模糊圖像和分割後的圖像與原始圖像的相似度絕對不如其他圖像,但MSE幾乎相同——儘管面部特徵和細節丟失了。另一方面,偏色圖像和對比度拉伸的圖像與人眼中的原始圖像非常相似(SSID指標也是一樣),但MSE表示不同意這個結論。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/ff\/c4\/ffa9223b418b500f3e9c55cac9ae26c4.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三、結果"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"訓練"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們使用ADAM優化器和SSIM損失函數,通過U-net架構對模型進行訓練,將數據集分爲測試部分(1,000張圖像)、訓練部分(其餘80%的數據集)和驗證部分(其餘80%的數據集)。我們的第一個實驗爲幾張測試圖像生成了不錯但不太清晰的輸出圖像。是時候嘗試使用架構和損失函數來提高性能了,下面是我們嘗試的一些更改:卷積過濾器的層數和大小。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"更多的卷積過濾器和更深的網絡意味着更多的參數(大小爲[8、8、256]的2D卷積層具有59萬個參數,大小爲[4、4、512]的層具有230萬個參數)和更多的訓練時間。由於每層中過濾器的深度和數量是我們模型架構的構造器的輸入參數,因此使用不同的值進行實驗非常容易。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"嘗試一段時間後,我們發現對我們而言,以下設置可以達到性能和模型大小之間的最佳平衡:"}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"# Train model with different number of layers and filter sizes\nfrom utils.architectures import UNet\nfrom utils.model import Mask2FaceModel\n\n# Feel free to experiment with the number of filters and their sizes\nfilters = (64, 128, 128, 256, 256, 512)\nkernels = ( 7, 7, 7, 3, 3, 3)\ninput_image_size=(256, 256, 3)\narchitecture = UNet.RESNET\ntraining_epochs = 20\nbatch_size = 12\n\nmodel = Mask2FaceModel.build_model(architecture=architecture, input_size=input_image_size, filters=filters, kernels=kernels)\nmodel.summary()\nmodel.train(epochs=training_epochs, batch_size=batch_size, loss_function='ssim_l1_loss'"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們做了一些實驗,上面代碼塊中的設置對我們來說是最好的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"現在,我們已經通過上述調整對模型進行了訓練和調整,下面我們來看一些結果!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/99\/a9\/994caf1b5b2b1431b02c337717a82ba9.png","alt":null,"title":"左:我們模型的輸入;中:沒有口罩的輸入圖像(預期輸出);右:我們模型的輸出","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"如你所見,給定的網絡在我們的測試數據上生成了很好的結果。這個網絡具有泛化能力,並且似乎"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"可以很好地識別情緒,從而生成微笑或悲傷的面孔"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。另一方面,這裏當然也有改進的空間。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"四、進一步改進的想法"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"雖說使用ResNet塊的U-net網絡效果很好,但我們也可以看到生成的摘口罩圖像不是很清晰。一種解決方法是用一個提煉網絡擴展我們的網絡,如[4]中和下圖中所述。此外還可以進行其他一些改進。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/67\/2f\/67a1267802bab217b360efa30593a92f.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.改善數據集"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"根據我們實驗中獲取的經驗。數據集的選擇可以對結果產生重大影響。下一步,我們將合併不同的數據集以使樣本具有更大的多樣性,從而更好地模擬現實世界的數據。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"另一項可行改進是調整將口罩與面部組合的方式,使它們看起來更自然"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。[12]是很好的靈感來源。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.變分自動編碼器"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們已經提到了編碼器-解碼器架構,其中編碼器部分將輸入圖像映射到嵌入中。我們可以將嵌入視爲多維潛在空間中的單點。在許多方面,變分自動編碼器與編碼器-解碼器是很像的;主要的區別可能是變分自動編碼器的映射是圍繞潛在空間的一點完成的多元正態分佈。這意味着編碼在設計上是連續的,可以實現更好的隨機採樣和內插。這可能會極大地改善網絡輸出生成圖像的平滑度。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.生成對抗網絡"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"GAN能夠生成與真實照片無法區分的結果,這主要歸功於完全不同的學習方法。我們當前的模型試圖將訓練過程中的損失降到最低,而GAN還是由兩個獨立的神經網絡組成:生成器和鑑別器。生成器生成輸出圖像,而鑑別器嘗試確定圖像是真實圖像還是由生成器生成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在學習過程中,兩個網絡都會動態更新,讓表現越來越好,直到最後鑑別器無法確定所生成的圖像是否真實,生成器所生成的圖像就與真實圖像無法區分了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/87\/b4\/8769b22afba940681d47e9d6303ef8b4.png","alt":null,"title":"從源A和源B創建混合面孔的示例[9]","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"GAN的結果很好,但在訓練過程中通常會出現收斂問題,而且訓練時間很長。由於參數衆多,GAN模型通常也要複雜得多,因此不太適合導出到手機上。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.Concat ImageNet和FaceNet嵌入"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在許多方面,U-net的瓶頸層都可以用作特徵提取嵌入。[10]、[11]等文章建議,將不同網絡的嵌入並置可以提高整體性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們嘗試將嵌入(瓶頸層)與ImageNet和FaceNet的兩種不同嵌入結合在一起。我們期望這可以添加有關人臉及其特徵的更多信息,以幫助U-net的上採樣部分進行人臉修復。這無疑提高了性能,但另一方面,它使整個模型更加複雜,並且與“訓練”部分中提到的其他改進相比,其性能提升要小得多。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/3c\/e3\/3c035c63c0dd2448878bf8d8f9eeb8e3.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"五、總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"這種人臉重建面臨許多挑戰。我們發現,要想獲得最佳結果,就需要一種創新的方法來融合各種數據集和技術。我們必須適當地解決諸如遮擋、照明和姿勢多樣性等具體問題。問題無法解決的話,在傳統的手工解決方案和深度神經網絡中都會有顯著的精度下降,方案最後可能只能處理一類照片。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"但正是這些挑戰讓我們發現這個項目非常具有吸引力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們着手創建Mask2Face的原因是要爲我們的ML部門打造一個典型示例。我們觀察世界上正在發生的事情(口罩檢測),並尋找不怎麼常見的路徑(摘下口罩)。任務越難,學到的經驗越多。ML的核心目標是解決看似不可能的問題,我們希望一直遵循這一理念。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"參考資料"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"[1]"},{"type":"link","attrs":{"href":"http:\/\/vis-www.cs.umass.edu\/lfw\/","title":null,"type":null},"content":[{"type":"text","text":"http:\/\/vis-www.cs.umass.edu\/lfw\/"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"[2] "},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/abs\/1505.04597","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/arxiv.org\/abs\/1505.04597"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"[3] Wang, Zhou; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. (2004-04-01). \"Image quality assessment: from error visibility to structural similarity\". IEEE Transactions on Image Processing"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"[4] Elharrouss, O., Almaadeed, N., Al-Maadeed, S. et al. “Image Inpainting: A Review”. Neural Process Lett 51, 2007–2028 (2020). "},{"type":"link","attrs":{"href":"https:\/\/doi.org\/10.1007\/s11063-019-10163-0","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/doi.org\/10.1007\/s11063-019-10163-0"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"[5] Adaloglou, Nikolas. \"Intuitive Explanation of Skip Connections in Deep Learning\". (2020) "},{"type":"link","attrs":{"href":"https:\/\/theaisummer.com\/skip-connections\/","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/theaisummer.com\/skip-connections\/"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"[6] Hyeonwoo Noh, Seunghoon Hong and Bohyung Han. “Learning Deconvolution Network for Semantic Segmentation”. (2015) "},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/abs\/1505.04366","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/arxiv.org\/abs\/1505.04366"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"[7]. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385,2015."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"[8] "},{"type":"link","attrs":{"href":"https:\/\/towardsdatascience.com\/an-overview-of-resnet-and-its-variants-5281e2f56035","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/towardsdatascience.com\/an-overview-of-resnet-and-its-variants-5281e2f56035"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"[9] T. Karras, S. Laine and T. Aila. A Style-Based Generator Architecture for Generative Adversarial Networks. arXiv preprint arXiv:1812.04948,2019"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"[10] Danny Francis, Phuong Anh Nguyen, Benoit Huet and Chong-Wah Ngo. Fusion of Multimodal Embeddings for Ad-Hoc Video Search. ICCV 2019"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"[11] Xin Ma, Xiaoqiang Zhou, Huaibo Huang, Zhenhua Chai, Xiaolin Wei and Ran He. Free-Form Image Inpainting via Contrastive Attention Network. arXiv preprint arXiv:2010.15643, 2020"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"[12] "},{"type":"link","attrs":{"href":"https:\/\/medium.com\/neuromation-blog\/neuronuggets-cut-and-paste-in-deep-learning-a296d3e7e876","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/medium.com\/neuromation-blog\/neuronuggets-cut-and-paste-in-deep-learning-a296d3e7e876"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/www.strv.com\/blog\/mask2face-how-we-built-ai-that-shows-face-beneath-mask-engineering","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/www.strv.com\/blog\/mask2face-how-we-built-ai-that-shows-face-beneath-mask-engineering"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章