小米在預訓練模型的探索與優化

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"導讀:預訓練模型在NLP大放異彩,並開啓了預訓練-微調的NLP範式時代。由於工業領域相關業務的複雜性,以及工業應用對推理性能的要求,大規模預訓練模型往往不能簡單直接地被應用於NLP業務中。本文將爲大家帶來小米在預訓練模型的探索與優化。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"預訓練簡介"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/eb\/53\/eb241bcf1afcf2a58156b5c9e0482f53.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"預訓練與詞向量的方法一脈相承。詞向量是從任務無關和大量的無監督語料中學習到詞的分佈式表達,即文檔中詞的向量化表達。在得到詞向量之後,一般會輸入到下游任務中,進行後續的計算,從而得到任務相關的模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是,詞向量的學習方法存在一個問題:不能對文檔中的上下文進行建模,對於上面的例子“蘋果”在兩個句子中的表達意思是不一樣的,而詞向量的表達卻是同一個,所以在表達能力的多樣性上會有侷限,這是一種靜態的Word Embedding。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在後面的發展中,有了根據上下文建模的Word Embedding,比如,可以在學習上嘗試使用雙向LSTM模型,在非監督語料學習詞向量,這比靜態的詞向量網絡會複雜一些,最後可以通過隱層得到動態的詞向量輸入到下游任務中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1. 序列建模方法"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/8e\/0f\/8e3359495cd5cb9bfd27e857188b1a0f.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在NLP中,一般使用序列建模的方法。之前比較常用的序列建模是LSTM遞歸神經網絡,其問題是建模時,句子中兩個遠距離詞之間的交互是間接的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"17年Transformer發佈之後,在NLP任務中取得了很大的提升。這裏面Self-Attention可以對任意詞語間進行直接的交互,Multi-head Attention可以表達在不同類型的進行語義交互。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2. 預訓練模型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/41\/44\/41894539yydfc9bfae644f5f35f6e444.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章