蘋果研究人員提出集成反演技術,可從不同機器學習模型中重建訓練數據

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"MI攻擊"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近幾年,模型反演(Model inversion, MI)攻擊備受關注。MI攻擊是指濫用經過訓練的機器學習(ML)模型,並藉此推斷模型原始訓練數據中的敏感信息。遭受攻擊的模型經常會在反演期間被凍結,從而被攻擊者用於引導訓練生成對抗網絡之類的生成器,最終重建模型原始訓練數據的分佈。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,審查MI技術對正確建立模型保護機制至關重要。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/cb\/cbf032edcff6c9aefbaceb62ff92a0cc.png","alt":"Title","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"藉助單一模型高質量地重建訓練數據的過程非常複雜,然而,現有的MI相關文獻並沒有考慮到多個模型同時被攻擊的可能性,這類情況中攻擊者可以找到額外的信息和切入點。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果攻擊成功,原始訓練樣本泄露,而其訓練數據中如果包含個人的身份信息,那麼數據集中的數據本體的隱私將會受到威脅。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"集成反演技術"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Apple的研究人員提出了一種集成反演的技術,藉助生成器來估計模型原始訓練數據的分佈,而該生成器則被限制在一系列共享對象或實體的訓練模型之中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對比使用單一機器學習模型的MI,使用該技術生成的樣本質量得到了顯著的提升,並具備了區分數據集實體間屬性的能力。這證明了如果藉助與預期訓練結果相類似的輔助數據集,可以在不使用任何數據集的情況下依舊可以得到高質量結果,改善反演的結果。通過深入研究集成中模型多樣性對結果的影響,並添加多重限制以激勵重建樣本獲得高精確度和高激活度,訓練圖片的重建準確程度得到了提升。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對比針對單一模型的MI攻擊,該研究所提出的模型在重建性能上展現了明顯的提升。該研究不僅利用最遠模型採樣法(FMS)進行集成中模型多樣性的優化,還創建了一個模型間等級對應關係明確的反演集成,模型的輸出向量中的增強信息也被用來生成更優的限制條件,以更好地確定目標質量的高低。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過隨機訓練的形式,小批量隨機梯度下降(SGD)這類的主流動態卷積神經網絡(DCNN),可以使用任意的大型數據集進行訓練。DCNN模型對訓練數據集中最初的隨機權重和統計上的噪音非常敏感,而由於學習算法的隨機性,同一訓練集可能會生成側重特徵不同的模型。因此,爲減少差異性,研究者一般會使用集成學習,一種簡單的技巧來提升DCNN辨別式訓練的性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d3\/d39eda45045530593ceeceecf2f4245c.png","alt":"Image","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然這篇論文是以集成學習爲基礎進行的研究,但論文對“集成”一詞卻有不同的定義。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"若想成功對模型進行反演,攻擊者不能假定目標模型一定是通過集成學習進行訓練的,但他們卻可以通過蒐集有關聯的模型搭建一個攻擊模型的集成。換句話來說,在“集成反演攻擊”這個語境下,“集成”不是要求模型一定要經過集成訓練,而是指攻擊者從各種來源所收集到相關模型的集合。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"舉例來說,研究者可以通過不斷收集新的訓練數據,對當前模型進行訓練並更新結果,而攻擊者則可以將這些模型收集爲一個集合並加以利用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"藉助該策略,無數據的MNIST手寫數字的反演準確率提升了70.9%,而基於輔助數據的試驗準確率則提高了17.9%;對比基準實驗,人臉反演的準確率提升了21.1%。論文的目標是,以更系統的方式對現有模型反演策略進行評估。在未來的研究中,需以針對這類集成的模型反演攻擊開發相應的保護機制爲重點。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"結論"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"論文中提出的集合反演技術,可以利用機器學習模型集合中的多樣性特質提升模型反演的性能表現;通過結合one-hot損失和最大化輸出激活損失函數,讓樣本質量得到了更進一層的提升。除此之外,過濾掉攻擊模型中含有較小最大化激活的生成樣本也可以讓反演表現更加突出。同時,爲確定目標模型的多樣性對集合反演性能的影響,研究者深入探索研究了各種差異下目標模型的表現情況。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"論文原文:"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/pdf\/2111.03702.pdf","title":null,"type":null},"content":[{"type":"text","text":"利用集成反演從各類機器學習模型中重建訓練數據"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"英文原文"},{"type":"text","text":":"},{"type":"link","attrs":{"href":"https:\/\/www.marktechpost.com\/2021\/11\/27\/apple-researchers-propose-a-method-for-reconstructing-training-data-from-diverse-machine-learning-models-by-ensemble-inversion","title":null,"type":null},"content":[{"type":"text","text":"Apple Researchers Propose A Method For Reconstructing Training Data From Diverse Machine Learning Models By Ensemble Inversion"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章