阿里最新研究試用因果推理方法讓視覺AI更智能 | CVPR 2021論文解讀

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"學過人類照片和魚類照片的 AI,第一次見到美人魚的照片會作何反應?人臉和魚身它都很熟悉,但它無法想象一個從沒見過的事物。近期,阿里巴巴達摩院將因果推理方法引入計算機視覺領域,嘗試克服機器學習方法的缺陷,讓 AI 想象從未見過的事物,相關論文已被計算機視覺頂會 CVPR 2021 收錄。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"計算機視覺(CV,Computer Vision)是研究如何讓機器“看”的科學,通過將非結構化的圖像和視頻數據進行結構化的特徵表達,讓 AI 理解視覺信息。深度學習出現後,AI 在 CV 領域的很多任務取得了超越人類的進展,不過,比起人類的視覺理解能力,AI 仍是非常“低維”的存在。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過人和魚的形象來想象美人魚,對人來說輕而易舉,AI 卻極有可能把美人魚胡亂歸入“人”或“魚”中的一類。因爲它們缺乏“想象”這一高級別認知能力。現階段的機器學習技術本質是通過觀測數據進行擬合,這導致 AI 只認得學過的事物,遇到超越訓練數據的對象,往往容易陷入“人工智障”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖靈獎得主、因果關係演算法創立者朱迪·珀爾認爲,人類的想象能力源於我們自帶因果推理技能的大腦,人類善問“爲什麼”,也就是尋求事物的因果關係。藉助這套認知系統,我們用“小數據”就能處理現實世界無限的“大任務”。而 AI 卻只能用“大數據”來處理“小任務”,如果 AI 能夠學會因果推理,就有望破除“智商天花板”,甚至通向強人工智能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因果推理理論極大啓發了研究者,其與機器學習的結合日益受到關注。在工業界,達摩院城市大腦實驗室最早將因果推理方法引入 CV 領域,用因果推理模型賦能機器學習模型,讓視覺 AI 更智能。今年,該團隊與南洋理工大學合作了《反事實的零次和開集識別》等三篇採用因果推理方法的論文,均被 CVPR 2021 收錄。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/27\/270965e5bcb1b9dd58f7609c73dbb8c2.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(左爲現有方法的 AI“想象”結果,中爲達摩院論文提出的算法核心,右爲基於達摩院框架完成的想象結果。在左右二圖中,* 紅色代表訓練集裏面的樣本,藍色是 AI 未見過類別的樣本,綠色 AI 對未見過類別的想象。)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"零次學習是指讓機器分類沒見過的對象類別,開集識別要求讓機器把沒見過的對象類別標成“不認識”,兩個任務都依賴想象能力。《反事實的零次和開集識別》提出了一種基於反事實的算法框架,通過解耦樣本特徵(比如對象的姿勢)和類別特徵(比如是否有羽毛),再基於樣本特徵進行反事實生成。在常用數據集上,該算法的準確率超出現有頂尖方法 2.2% 到 4.3%。論文作者嶽中琪指出,AI 認知智能的進化剛剛開始,業界的探索仍處在早期階段,今後他們將不斷提升和優化相關算法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"城市大腦實驗室介紹,數據驅動的機器學習模型普遍面臨數據不均衡問題,“以城市爲例,它的信息呈長尾分佈,相比海量的正常信息,交通事故、車輛違規、突發災害等異常信息的發生概率很小,樣本稀少,儘管可以通過大量增加少見樣本的辦法來部分解決問題,但這麼做成本高、效率低。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於自研算法,只需使用正常信息樣本,就能讓 AI 獲得無偏見的異常檢測結果。一旦出現緊急情況,比如某輛車和某個行人發生異常交互,城市大腦不必不懂裝懂或視而不見,而是可以實時識別和反饋信息。未來,這一技術有望應用於城市基礎視覺算法體系優化、極少樣本城市異常事件感知能力優化乃至多模態語義搜索、智能圖文生成等領域。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面是嶽中琪對 CVPR 2021 論文《反事實的零次和開集識別》(Counterfactual Zero-Shot and Open-Set Visual Recognition)的解析,論文代碼請見 "},{"type":"link","attrs":{"href":"https:\/\/github.com\/yue-zhongqi\/gcm-cf%E3%80%82","title":"","type":null},"content":[{"type":"text","text":"https:\/\/github.com\/yue-zhongqi\/gcm-cf。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現有的零次學習和開集識別中,見過和未見過類別間識別率存在嚴重失衡,我們發現這種失衡是由於對未見過類別樣本失真的想象。由此,我們提出了一種反事實框架,通過基於樣本特徵的反事實生成保真,在各個評估數據集下取得了穩定的提升。這項工作的主要優勢在於:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"我們提出的 GCM-CF 是一個見過 \/ 未見過類別的二元分類器,二元分類後可以適用任何監督學習(在見過類別上)和零次學習算法(在未見過類別上);"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"我們提出的反事實生成框架適用於各種生成模型,例如基於 VAE、GAN 或是 Flow 的;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"我們提供了一種易於實現的兩組概念間解耦的算法"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來我會具體的來介紹我們針對的任務,提出的框架,和對應的算法。文章導視:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一節:零次學習和開集識別"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二節:反事實生成框架"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第三節:提出的 GCM-CF 算法"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第四節:實驗結果"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"第一節:零次學習和開集識別"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/b8\/b84c882271cbd959c92acfaa8e2daefb.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"很多人都認識羚羊和貘這兩種動物(如上圖所示),那麼一個鼻子像貘的羚羊長得什麼樣呢?可能大家能想象出來一個類似於圖右的動物(它叫高鼻羚羊)。在上面的過程中,我們就是在做 "},{"type":"text","marks":[{"type":"strong"}],"text":"零次學習 (Zero-Shot Learning, ZSL)"},{"type":"text","text":":雖然我們沒見過高鼻羚羊,但是通過現有的關於羚羊和貘的知識,我們就能想象出來這個未見類別的樣子,相當於認識了這個動物。事實上,這種將已有知識泛化到未見事物上的能力,正是人能夠快速學習的一個重要原因。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/88\/8874dd5458888d40a33eccbe354b08c5.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們再來看一個路牌的例子,我們很容易就認出左邊的兩個路牌是熟悉的、見過的,而右邊的則是一個很奇怪的沒見過的路牌。人類很容易就能完成這樣的 "},{"type":"text","marks":[{"type":"strong"}],"text":"開集識別 (Open-Set Recognition, OSR)"},{"type":"text","text":",因爲我們不僅熟悉見過的樣本,也有對未知世界的認知能力,使得我們知道見過和未見之間的邊界。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/81\/81219f0b4e02aee378d96709026b2fbd.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在機器學習當中,這兩個任務的定義如上圖所示。零次學習訓練集提供類別集合。上面的圖片,除了每張圖片的類別標籤,每個類別還額外有一個屬性特徵 (attribute) 來描述這個類的特點(比如有翅膀,圓臉等等),測試的時候有兩種設定:在 Conventional ZSL 下全部是未見類別中的圖片(),並且測試的時候也會給定類別的 dense label,而在 Generalized ZSL 中測試集會有和中的圖片。開集識別的訓練集則和普通的監督學習沒有差別,只是在測試的時候會有訓練未見過類別的樣本,分類器除了正確識別見過的類,還要將未見過的類標成“未知”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/cc\/cc5230fbedb44b34091bbb1252992859.png","alt":null,"title":null,"style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現有的 ZSL 和 OSR 的主要方法是基於生成的,比如 ZSL 中用未見類別的屬性特徵生成圖片,然後在 image space 進行比較。"},{"type":"text","marks":[{"type":"strong"}],"text":"然而生成模型會自然的偏向見過的訓練集,使得對於未見類別的想象失真了"},{"type":"text","text":"(這其實是因爲屬性特徵的 entanglement,這裏我不詳細展開,大家可以參考一下論文)。比如訓練的時候見過大象的長鼻子,而去想象沒見過的貘的長鼻子的時候,就會想象成大象的鼻子。左邊的圖展現了這種失真:紅色是訓練集裏面的樣本,藍色是 ground-truth 的未見過類別的樣本,綠色是現有方法對未見過類別的想象,這些想象已經脫離了樣本空間,既不像見過的類,也不像沒見過的類(綠色的點偏離了藍色和紅色的點)。這就解釋了爲什麼見過和未見過類別的識別率會失衡了:用綠色和紅色樣本學習的分類器(黑色虛線)"},{"type":"text","marks":[{"type":"strong"}],"text":"犧牲了未見過類的 recall 來提高見過類的 recall"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"第二節 反事實生成框架"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼如何在想象的時候保真?我們來思考一下人是怎麼想象的:在想象一個古代生物的樣子時候,我們會基於它的化石骨架(圖左);在想象動畫世界的一個場景的時候,我們會參考現實世界(圖右)。這些 "},{"type":"text","marks":[{"type":"strong"}],"text":"想象其實本質是一種反事實推理(counterfactual inference)"},{"type":"text","text":",給定這樣的化石(fact),如果它還活着(counterfact),會是什麼樣子呢?給定現實世界的某個場景,如果這個場景到了動畫世界,它是什麼樣子呢?我們的想象,通過建立在 fact 的基石上,就變得合情合理而非天馬行空。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/0a\/0a53d52a9ace52603cb990c0fcca235e.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/55\/550360c0554cbeb47b8f027cf879300f.png","alt":null,"title":null,"style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼可否在 ZSL 和 OSR 當中利用反事實產生合理的想象呢?我們首先爲這兩個任務構建了一個 "},{"type":"text","marks":[{"type":"strong"}],"text":"基於因果的生成模型 Generative Causal Model (GCM)"},{"type":"text","text":",我們假設觀測到的圖片是由樣本特徵(和類別無關,比如物體的 pose 等)和類別特徵(比如有羽毛,圓臉等)生成的。現有的基於生成的方法其實在學習,然後把的值設爲某個類的特徵(比如 ZSL 中的 dense label),把設成高斯噪聲,就可以生成很多這個類的樣本了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/1b\/1b1ea50453d14a1e2d0106365bc2924b.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"反事實生成和現有生成模型的最大區別就是基於了特定的樣本特徵(fact)來進行生成,而非高斯噪聲"},{"type":"text","text":"。具體過程如上圖所示,對於一個圖片,我們通過 encoder 拿到這個圖片的樣本特徵(比如 front-view,walking 等),基於這個樣本特徵(fact)和不同的類別特徵(counterfact),我們可以生成不同類別的反事實圖片(front-view,walking 的貓,羊和雞等等)。直覺上,我們知道,因爲反事實生成的貓、羊和雞的圖片很不像,肯定不屬於這三個類別。這種直覺其實是有理論支持的 — 叫做 "},{"type":"text","marks":[{"type":"strong"}],"text":"反事實一致性(Counterfactual Consistency Rule)"},{"type":"text","text":",通俗的解釋就是 counterfact 和 fact 重合時,得到的結果就是 factual 的結果,比如 fact 是昨天喫冰淇凌拉肚子,那麼反事實問題“如果我昨天喫冰淇淋會怎麼樣呢?”的答案就是拉肚子。那麼如何通過 consistency rule 解決 ZSL 和 OSR 呢?"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"第三節 GCM-CF 算法"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/ae\/ae29fd4c5c4e7a51e521394ea32c58ea.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們的 GCM-CF 算法流程由上圖概括,它"},{"type":"text","marks":[{"type":"strong"}],"text":"本質上是一個基於 consistency rule 的二元分類器,去判斷某個樣本是屬於見過還是沒見過的類"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"訓練的時候我們學習一個 GCM(訓練過程等下會具體講)。測試的時候,對於每個樣本,我們用上一節介紹的步驟進行反事實生成:用這個樣本自己的,拼上不同的類別特徵,然後用生成。這樣生成的樣本可以證明是“保真”(Counterfactual Faithful)的,也就是在樣本空間裏面,那麼我們就能夠用樣本空間當中的量度去比較和生成的,從而用 consistency rule 判斷是屬於見過的還是沒見過的類。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"具體到任務中,在 ZSL 裏面,我們用未見過類別的 attribute(圖中)生成反事實樣本,然後用訓練集的樣本(見過的類)和生成的樣本(未見過的類)訓練一個線性分類器,對輸入樣本進行分類後,我們取見過類和未見過類概率的 top-K 的平均值。如果未見過類上的平均值較小,我們就認爲樣本不像未見過的類(not consistent),把這個樣本標註成屬於見過的類,並使用在見過類的樣本上面監督學習的分類器來分類(這其實是基於 consistency rule 的換質位推理,具體見論文);反之如果 consistent,就標註爲爲見過的類,然後用任何 Conventional ZSL 的算法對其分類。在 OSR 裏面,因爲沒有未見類別的信息,我們用見過類的 one-hot label(圖中)作爲生成反事實樣本 *,如果和生成的樣本在歐式距離下都很遠(not consistent),就認爲屬於未見過的類,並標爲“未知”,反之則用監督學習的分類器即可。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以看到算法的核心要求是生成保真的樣本,這樣才能用 consistency rule 做推理。這個性質可以由 Counterfactual Faithfulness Theorem 來保證,簡單來說就是:"},{"type":"text","marks":[{"type":"strong"}],"text":"保真生成的充要條件是樣本特徵和類別特徵之間解耦(disentangle)"},{"type":"text","text":"。我們通過三個 loss 實現:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"-VAE loss:這個 loss 要求 encode 得到的,和樣本自己的,可以重構樣本,並且 encode 出來的要非常符合 isotropic Gaussian 分佈。這樣通過使的分佈和無關實現解耦;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Contrastive loss:反事實生成的樣本中,只和自己類別特徵生成的樣本像,和其他類別特徵生成的樣本都遠。這個避免了生成模型只用 Z 裏面的信息進行生成而忽略了,從而進一步的把的信息從裏解耦;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"GAN loss:這個 loss 直接要求反事實生成的樣本被 discriminator 認爲是真實的,通過充要條件,用保真來進一步解耦。"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"第四節 實驗"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在介紹實驗前,值得注意的是 ZSL 常用的 Proposed Split 官方給的數據集之前有一個數據泄露的 bug,這使得一些方法在見過類別(S)的表現特別高。去年的時候官方網站上放出了 Proposed Split V2,解決了這個 bug。我們下面的實驗都是在改過的數據集上跑的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"減輕見過和未見過類別識別率的失衡"},{"type":"text","text":":下面的 tsne 顯示了反事實生成的結果,可以看到通過 condition 樣本特徵(藍星是未見類的樣本,紅星是見過的),生成的未見類別的樣本確實保真了(在藍點中間),得到的 decision boundary(黑線)也 balanced 了。這在 ZSL 的 4 個常用數據集上也體現了出來,我們的方法大幅提高了未見類別 (U) 的準確率,從而使得整體的準確率 H(harmonic mean) 提高了,達到了 SOTA 的表現。現有的方法其實也有一個簡單的解決失衡的辦法,就是直接調整見過類別的 logits,通過改變調整的幅度,我們可以得到一個見過類別和未見過類別的曲線,可以看到我們的方法(紅線)在各個調整幅度下都更高,說明它能從根本上減輕失衡,這是簡單的調整所不能完成的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/87\/873ac560e08018e3f635e5073be4ca4e.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"強大的見過 \/ 未見過類別的分類器"},{"type":"text","text":":我們的方法能夠適用任何的 conventional ZSL 算法,我們測試了 inference-based 的 RelationNet,和三個基於不同生成網絡的 generation-based 的方法,發現加上我們的方法都獲得了提高,並且超過了用現在 SOTA 的 TF-VAEGAN 作爲見過 \/ 未見過的分類器的表現。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/28\/2884b19e0df13dda4dbc14265a6f356d.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"強大的開集分類器"},{"type":"text","text":":我們在常用的幾個數據集上做了開集識別的實驗(用的 F1 指標),並取得了 SOTA 的表現。因爲開集識別中未見過類別的數量是未知的,所以好的分類器必須在數量少和多的情況下都好,在右圖中我們畫了 F1 分數和未見過類別的數量(從少到多)的曲線,我們的方法(藍色)在每個情況下都是最好,並且在未見類別很多的時候(藍色曲線末尾)F1 基本沒有下降,體現了較強的魯棒性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/ac\/ac16f806c6cbc0e03e21011d066332ab.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"結語"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這篇工作是我們對於解耦表示(disentangled representation)的一點點探究和摸索,把難以實現的所有 factor full disentangle,放寬成爲兩組概念(樣本特徵和類別特徵)之間的 disentangle,並藉着 disentangle 帶來的 faithfulness 性質,使我們提出的反事實生成框架變爲可能。這也從一個側面反映瞭解耦是因果推理的一個重要的前提,當不同的概念被區分開(比如解耦的表示)時,我們就可以基於它們之間的因果關係進行推理,得到魯棒、穩定、可泛化的結論。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我也看到一些對於解耦的悲觀或是質疑,確實,目前就連解耦的定義都沒有定論,更不要說方法、evaluation 等等了。但這些困難也是可預見的:解耦在幫助機器跨越一個層級,從學習觀測到的數據中的規律,到探究這些數據產生的原因 — 就像人知道太陽每天會升起的規律是容易的,但明白爲什麼太陽會升起卻花了幾千年。這裏也鼓勵大家多多關注、探索解耦這個領域,說不定帶來下一個突破的就是你啊。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後附上論文的引用:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"@inproceedings{yue2021counterfactual,"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"title={Counterfactual Zero-Shot and Open-Set Visual Recognition},"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"author={Yue, Zhongqi and Wang, Tan and Zhang, Hanwang and Sun, Qianru and Hua, Xian-Sheng},"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"booktitle= {CVPR},"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"year={2021}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"}"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章