滴滴翻譯技術探索與實踐

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1. 背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"機器翻譯系統是使用深度學習技術從其支持的語言中翻譯大量文本的服務。服務將 \"源\" 文本從一種語言轉換爲不同的 \"目標\" 語言。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"機器翻譯技術背後的概念和使用它的接口相對簡單,但背後的技術是極其複雜的,並彙集了一些前沿技術,特別是深度機器學習、大數據、語言學、GPU加速計算等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大規模商業化使用的機器翻譯主要經歷了SMT與NMT階段:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.1 統計機器翻譯(SMT)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"機器翻譯行業使用的比較早的技術是統計機器翻譯 (SMT)。SMT採用統計分析方法,根據源語言的語境,利用已有的雙語語料學習到的短語翻譯知識,將源語言的分隔片段短語轉化爲對應的目標短語,最後利用語言模型估計出一個句子的最佳可能譯文。SMT也是最早商用的機器翻譯系統。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/e0\/12\/e09130c8cddfa88f71308d0c059aa312.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"統計機器翻譯結構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.2 神經機器翻譯(NMT)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"神經機器翻譯是利用深度神經網絡將源句子進行編碼,再進行解碼翻譯爲目標語言。2016年,Google發表GNMT[1],改變了SMT的地位,使翻譯技術發生了根本性的轉變,自此進入NMT時代,並且有了現在更高的翻譯質量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/d9\/c4\/d986a57f8f294f248a53ef045a13d9c4.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"神經機器翻譯結構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.3 SMT與NMT的相同點"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"兩者都需要大量的標註數據(人類翻譯內容 )來訓練翻譯系統。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"兩者不能用做雙語字典。翻譯是基於潛在翻譯列表翻譯單詞,並根據句子中使用的單詞的上下文進行翻譯。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2. 評價指標 (BLEU)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於機器翻譯的結果,可以採用人工評估的方法,但是比較受限。首先是人工評估具有一定的主觀性,其次每次訓練結果需要人工評估成本較高。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章