招商證券BERT壓縮實踐(二):如何構建3層8bit模型?

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"BERT,全稱 Bidirectional Encoder Representation from Transformers,是一款於 2018 年發佈,在包括問答和語言理解等多個任務中達到頂尖性能的語言模型。它不僅擊敗了之前最先進的計算模型,而且在答題方面也有超過人類的表現。招商證券希望藉助BERT提升自研NLP平臺的能力,爲旗下智能產品家族賦能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在前一篇"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/article\/fyWR8cOmI7xtfEY3rqA3","title":"","type":null},"content":[{"type":"text","text":"蒸餾模型"}]},{"type":"text","text":"中,招商證券信息技術中心 NLP 開發組已經初步實踐了BERT模型壓縮方法,成功將12層BERT模型縮減爲3層。在本次分享中,研發人員們將介紹更簡潔的模塊替換方法,以及削減參數比特位的量化方法,並將這幾種方法有機結合實現了將BERT體積壓縮至1\/10的目標。"}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1. BERT-of-Theseus模塊替換"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.1 概述"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"BERT-of-Theseus"}]},{"type":"text","text":"[1]主要通過模塊替換的方法進行模型壓縮。不同於模型蒸餾方法需要根據模型和任務制定複雜的損失函數以及引入大量額外超參,Theseus壓縮方法顯得簡潔許多:該方法同樣需要一個大模型作爲“先驅”,而規模較小的目標模型作爲“後輩”(類似蒸餾方法中的教師模型和學生模型),對於“先驅”BERT模型來說,主體部分是由多個結構相同的Transformer Encoder組成,“後輩”模型將“先驅”中的每N個Transformer Encoder模塊替換爲1個Transformer Encoder模塊,從而實現模型的壓縮。具體實現過程如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在BERT模型中,第i個Encoder的輸出爲:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/5b\/6a\/5b13ba5736bf81aee20d773513c3906a.png","alt":null,"title":"","style":[{"key":"width","value":"25%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“先驅”和“後輩”中第i個模塊的輸出分別爲:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/06\/4f\/06d725effd292238d25f7fc1fce8b94f.png","alt":null,"title":"","style":[{"key":"width","value":"25%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於“後輩”模型的1個Encoder模塊將會替換N個“先驅”Encoder模塊,因此可以將每N個“先驅”Encoder模塊分爲一個邏輯組,從而與“後輩”模型對應。Theseus方法就是用“後輩”Encoder模塊替換對應的“先驅”邏輯組,具體的替換的過程比較直觀:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,設置一個概率p,通過伯努利分佈函數獲得模塊替換概率:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c3\/fb\/c337080b7d90fdd0d1a30cdde7cc2afb.png","alt":null,"title":"","style":[{"key":"width","value":"25%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章