火山引擎在機器寫作和機器翻譯方面的最新進展

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"人工智能正在改變人們創造、獲取、分享及消費信息的模式,然而高效高質有用的內容創作仍然困難重重,保證大衆能公正的獲取到準確信息也充滿挑戰。本文,InfoQ 經授權整理了字節跳動AI Lab總監李磊近期在火山引擎智能增長技術專場的演講(火山引擎是字節跳動旗下的數字服務與智能科技及品牌),其分享了文本生成技術進展、挑戰及火山引擎的實踐經驗。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着新媒體平臺的興起,人工智能技術已經大大提高了信息內容的創作,而個性化推薦算法的信息又爲信息內容的分發提供了極大的便利,這其中,文本生成技術非常重要,因爲它在很多的應用場景有廣泛的應用,比如機器翻譯、機器寫作、對話機器人以及自動問答。2019年在《管理科學》雜誌上MIT研究人員發表的一項最新研究表明,機器翻譯技術已經將國際化貿易量提高了10%,這相當於將地球上的各個國家之間的距離縮短了25% [1]。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近年來,字節跳動也研發了多項先進的機器翻譯技術,目前字節跳動自研的火山翻譯平臺已經有公司內外的50多個客戶使用,支持超過50多種語言的互相翻譯。此外,在字節跳動我們研發了Xiaomingbot自動寫稿平臺,自2016年上線以來,已經累計寫了60萬篇文章,覆蓋了17項的體育賽事,支持6種語言,在自媒體平臺上面也有15萬的粉絲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面給大家展示一下Xiaomingbot如何自動寫新聞。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/31\/31804d508e9f17855105d0d5b3d162e9.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們的系統將從數據源獲取到比賽信息,例如球員比賽佈陣、球員的進球等等信息。同時我們還會利用計算機視覺的算法,對比賽視頻進行分析識別出其中的球員、球衣上面的號碼,球員的運動軌跡、球員的動作、球員的位置以及關鍵的一些場景等等。再利用這些信息我們利用文本生成算法寫出最後的文章 [2]。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在另外一項研究當中我們使用計算機視覺的算法去分析斯諾克比賽的運動、桌上球的運動軌跡、以及利用機器學習最後去預測球員的擊球策略,預測下一杆球會落到哪個袋,並且利用這些預測去生成最終的比賽解說 [3]。這對於一些非職業的觀衆來說,非常有助於幫助理解球賽的進程。這是我們算法最終生成的一些解說情況。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/64\/643c8fab83fce8a5ca739ef1fe83ea5d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本場講座,會分爲五部分內容。第一部分,我會給大家先簡單介紹一下什麼是序列生成問題,它有什麼樣的難度和挑戰;第二部分,將介紹深度隱變量模型,Deep latent Variable Models for Text Generation;第三部分,我將介紹文本生成當中如果加上限制之後,如何做更好的算法,我們提出了一類蒙特卡洛採樣算法來做文本生成;第四部分會介紹機器翻譯當中如何使一個模型可以去獲取四項雙語語言能力。最後一部分介紹多語言的機器翻譯,我們最新的一個工作mRASP。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/4a\/4a16f2a25735a279fb2b24232cb19d69.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"序列生成問題的難度和挑戰"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在自然語言中,所有自然語言聲稱的核心問題是對句子序列做建模,比如說這樣一個句子的The quick brown fox jumps over the lazy dog句號,這裏有10個字符,Modeling的問題就是對這10個字符的聯合概率去建模,也就任意一個句子長度爲L的句子,我需要對整個L各字符對它算出它的聯合概率分佈。當然最基本的一種方法是叫Auto-Regressive Language model,是把這個聯合概率分解成下面這個形式,每一個部分它實際上是第i個字符的概率,是建立在前面1到i-1個字符的基礎之上,這具體的每一個概率可以有很多建模的方法。比如說現在從2017年開始比較流行的叫Transformer網絡裏面對個條件概率的建模是使用多層的多頭注意力機制(Muti-Head Attention)來建模的 [4]。當然這個Transformer有很多的參數,實際學習當中就需要找到最好的一組參數,使得語料裏面的聯合概率最大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/8c\/8c99409c000964b3b731541cc0299dff.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在另外一些問題當中,例如機器翻譯、對話生成以及自動問答當中,我們通常會有一個輸入,輸入也是一個序列,我們要針對這個輸入做一個輸出,例如機器翻譯,給定一個輸入的英文句子(X),我們要輸出一個目標語言中文的句子(Y),所以我們要對Y|X這樣一個條件概率去建模,同樣可以用之前提到的Transformer模型來對這個概率建模。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/85\/857add8c3777fa6afd5d836eca17e656.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"把深度生成模型按照方法類別去歸一個類,大致可以分成這樣幾類:按照自然估計的方法可以分成概率密度有沒有顯式密度(explicit density),以及隱式密度(implicit density)。顯式密度當中又分是否密度是可直接計算的,例如像自迴歸分解(Auto-Regressive Factorization)裏面的Transformer模型 [4]。如果不是自迴歸分解,還有像馬爾科夫分解(Markov Factorization)以及並行分解(Parallel Factorization)。像最新做的一些工作就GLAT等等這樣一些工作就可以做並行分解。在顯式密度中另外一塊是不可高效計算的密度(Intractable Density),也是今天需要重點介紹的一類模型,"},{"type":"text","marks":[{"type":"strong"}],"text":"叫隱變量模型(Latent Variable Model)"},{"type":"text","text":",典型的代表有DSSVAE、VTM等,本場講座也將會介紹。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c9\/c97fb82f71770aea1a07c45c95bacfab.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/3b\/3bd3fb79b86d59d5a1cae227ae2f5f34.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假如說這個密度沒有顯式公式的,是隱式的,也就是說你無法嚴格地寫出它的概率分佈,通常可以寫出它的能量函數(Energy Function),可以是條件能量模型(Conditional Energy Based model)或者是受限概率模型(Constrained Probability Model)。這次,我們會特別介紹受限概率模型如何來快速生成句子。包含CGMH、MHA、TSMH等一系列算法。但有一部分內容這裏不會介紹,就是對抗學習(Adversarial learning),它已經超出極大自然概率估計這個範圍以外。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/73\/739e82452c56b64f826a0696a96d7476.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來的一部分我將會介紹文本生成的深度隱變量模型(Deep Latent Variable Models for Text Generation)。我具體會介紹兩類工作,一類是我們如何從文本當中學到可解釋的深度隱含表示。第二類是我們如何從文本當中學到解耦的一個表示,並且利用這個解耦的表示來做更好的文本生成。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"文本生成的深度隱變量模型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/47\/4797da10ae03077142f51e09fd0e116d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們先看第一部分,我們要去學可解釋的隱層表示,那麼,什麼是可解釋?我們看這樣一個具體的問題:我們從對話的句子當中希望去學到對話的一個隱表示,並且這個隱表示對應一定的語義關係,例如這裏兩個對話,”Remind me about the football game”,”Will it be overcast tomorrow”。這兩個對話句子對應兩個不同的意圖,第一個意圖是希望去給它一個提醒(Remind),第二個意圖是問路(request for the information about where),這兩個意圖我們希望從句子本身通過學這樣一個生成模型去學到,你在使用當中就可以根據對應的不同的意圖去生成不同的回答。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b9\/b9050f1138f5f6230baa07779446e9ac.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"傳統的做法是用變分自編碼(Variational Auto-encoder)的方法,去學一個隱表示,這個方法具體是假設有一個隱變量(Latent Variable) Z,它自己有一個高斯分佈。從這個Z裏面可以生成出文本句子X出來,利用這樣的方法,Kingma & Welling在2013年提出了VAE的算法,通過變分推斷去學到隱層表示。這個方法當然可以去生成句子,也可以學到隱表示。但是當你把這個隱表示投影到低維空間去可視化出來的時候,你會發現不同的句子全部都混合到一起了,這整個混合在一起的一個大組並沒有明顯的聚類,所以很難去解釋這個隱層表示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/66\/66bbd1736c054c28427f428e7717c9bd.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如何從這裏的隱變量Z得到一個可解釋的隱層表示?一個比較好的自然的方法是在隱變量Z上面再加一個先驗變量c,而這個先驗和Z不同的在於Z是連續的,Z的先驗C是離散的。也就是說,Z是一個高斯混合分佈(Gaussian Mixture distribution),我們希望從原始文本里去學到比較有意義的C和Z,這樣不同語義、不同意圖的句子可以落到不同的聚類裏面,也就是對應不同C的值。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a3\/a3933a1df0f838451b0c36cd8f92e3ff.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏有一個很重要的動機是,在隱變量模型裏面引入離散的變量,會顯著提高模型的可解釋性。這個願望當然非常美好,可是大家在實際學習過程中會發現,往往學到的Z去投影到低聚維空間的時候,會發生一個mode-collapse問題,也就是實際上學到的這些不同的意圖的句子,它在隱空間的表示是混合在一起的,無法看到一個很明顯的區分。如何從混合在一起的區分裏面,去理解爲什麼會產生這種現象?並且試圖去修正它,使它得到我們希望的可解釋的一個隱層表示?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們最近在ICML2020上面的發表的DEMVAE的工作 [5],實際上解決了這個問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先我們把要處理的模型推廣到一個非常廣泛的一族模型叫指數族混合變分自編碼器(Exponential-family Mixture VAE)中。我們假設句子X是由一個隱變量z生成出來的。z是指數族密度的一個混合分佈。這裏的C是離散的,代表不同混合概率。而Z是對應的不同組分,每個組分是一個指數族概率分佈。我們前面提到的高斯混合自編碼器(Gaussian Mixture VAE)是屬於這一族分佈裏面的一個具體例子。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/cd\/cd96427c9228863d0056b113754af3be.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同樣,這個模型我們要去直接估計的話,也仍然會遇到峯值塌縮(mode collapse)的問題。我們做的一個解決方案,是我們仔細分析了損失函數(Loss Function),我們發現只要在這個損失函數裏面,也就是變分下界(Variational ELBO)裏面加上一個額外的懲罰項叫離散項(dispersion term),加了這個之後,我們最終就可以讓不同的峯值不會發生塌縮,從而會學到更有意義的隱層空間表示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/87\/87694d86d950d96cabb5365e9d203641.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這是我們使用DEMVAE方法去學習到的一個效果。我們從對話句子裏面去學到它的一個隱層表示C和Z,注意C是離散的。我們用後驗分佈去分析這個C並且對它做一個簡單的分類,發現這個C和真實的意圖會非常非常接近,例如左邊的這些句子,我們分析出來它們都屬於同一個C,實際上可以對應對話動作是問路(Request address),第二類都是對應問天氣(Request-weather)這樣一個意圖。有了這個之後,我們就可以去生成更好的對話回覆,例如,這樣一個輸入句子,“Taking you to Chevron“,我們可以預測假如說我們需要去做感謝這個意圖的話,我們可以生成這樣一個回覆句子,“Thank you car ,let us go there“,假如說我們要去Request address的話,我們又可以生成另外一個句子,What is the address,所以根據不同的例子出來的意圖,我們可以做可控的生成,這也是可解釋性帶來的一個好處。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"介紹了可解釋性,我們再介紹另外一個相關的問題,叫數據到文本的生成(Data-to-Text Generation),這個問題我們給定一個數據表格它是一個鍵值(Key- vaule)的表格的形式,比如這裏顯示了一個餐館的一些的屬性,希望去生成這個餐館的描述。例如這右邊是它一個可行的描述。這個問題可以把它建模成數據到文本的生成,Data-to-Text Generation。傳統的做法是人工寫出非常多的模板,這個模板裏面留了很多空位,這些空位和數據結合之後,我們就可以去生成比較好的文本了。當然實際應用當中,我們不希望生成是一成不變的,我們希望同一個內容可以生成各式各樣的文本。這就需要我們人工去寫非常多的模板,而人工寫這些模板是比較枯燥的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/4a\/4a1323c9cadc611c90e06d35c75f6674.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們希望提出一個新的方法,它能夠自動地從語料裏面學到這些模板,並且根據這些模板去做很好的生成。如何做呢?我們有兩個動機:一是我們從概念上可以把隱空間的表示區分成兩個隨機變量,一個變量是用來刻畫的數據內容(Content),另一個隨機變量是用來刻畫模板(Template),這樣兩個合起來之後,我們就能夠從數據裏面去生成句子。並且我們希望這個模板的隱層表示不是顯示的離散的表示,而是連續的一個空間,這也就意味着你可以有幾乎無限的模板。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ba\/ba4bb76153160dec1d8a6ee272e3467c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二個動機是我們不僅僅需要利用成對的表格和文本句子,這樣一個成對數據來訓練,我們實際上這種成對的數據是非常少的。在實際應用中,我們還是希望去利用原始文本(Raw text)來訓練,並且從原始文本當中學到模板(Template)和內容(Content)的表示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們提出了一個新的模型叫變分模板機(Variational Template Machine),這個模型裏面主要框架和前面介紹的變分自編碼器(Variational Auto-encoder),本質上是非常類似的,但是與之不同的是我們有兩個隱變量,一個是內容隱變量C,它是從數據裏面來得到的。另外一個是模板隱變量Z,是有自己的先驗分佈。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"生成的過程是怎麼樣的呢?一個輸入數據X,表示成field,position和value的一個三元組集合。我們先從輸入的數據X裏面去計算內容變量 C,這個可以通過一個神經網絡來實現。第二步我們從Z的先驗(例如高斯分佈)裏面去採樣一個Z,得到Z的值,這是相當於從一個無限大的模板庫裏面去採樣選擇一個模板。第三個是把C和Z合併之後,利用另外一個神經網絡,例如Transformer可以去做生成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ad\/add345125bb8b0dea65843848a91e3ec.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"利用這個變分模板機(Variational Template Machine)它最大的好處是不僅能夠利用成對的表格數據和句子來訓練,還可以利用額外的原始文本,這個並沒有對應的表格數據也可以用來訓練,並且提升這個模型的性能。這就相當於做了一個反向翻譯,根據原始文本找到了對應的C和Z,即模板和內容的後驗分佈,等同於製造了更多的一些僞平行語料,而這些僞平行語料可以用來提升學習的效果 [6]。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/2f\/2fcdfedefd5695449a0b5effdea99f59.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 我們在WIKI Data和SPNLG的Data上面去做了實驗,前者根據數據去生成個人簡介,後者是根據餐館的一些屬性去生成餐館的描述。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/eb\/eb7617bbb57af024508759094fc3ad72.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏兩幅圖比較了我們變分模板機VTM方法和其他一些生成方法的性能優劣。縱軸是BLEU SCORE,是用來衡量的生成結果和真實結果之間的相關性,所以越高越好。橫軸是self-BLEU,是用來衡量同一個方法生成的不同句子之間的相關性,我們希望同一個方法生成的句子,相互之間相關性越小越好。所以理想情況是:左上角的位置,質量最高,BLUE SCORE最高,而Self-BLEU相關性越好,Self-BLEU要越低越好。我們提出的這個變分模板機方法,它在質量上面和Self-BLEU兩方面都取得了最好的分數。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/df\/dfc8af005fea2406ad712b2d0d8fd0b4.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們也比較了變分模板機的優勢,如果完整的變分模板機它並不使用原始文本的話,只用成對數據對它來訓練,它的性能就會下降,它的Self-BLEU質量會下降,同時它的多樣性會降低。所以額外的原始數據還是非常重要的,我們也驗證了在這個過程當中有一些重要的訓練目標,也是起了非常關鍵的作用,去掉它也會使性能下降。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用這個變分模板機VTM模型,我們得到的額外的一個好處是我們能夠去分析原始數據裏面它對應的隱變量,以及通過這個隱變量找到數據的一些合理的結構。例如,我們把模板變量z映射到二維空間去之後,我們會發現每一個句子實際上有一些獨立的聚類,比如說右邊這個聚類它對應於因果描述,裏面的句子基本上都有一些because、since、with等等這樣一些表達因果的模式在裏面,這個是完全從數據裏面學到的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/77\/77e6a88655d7089a360c8c69e6205ef1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果大家關心的話,這裏有生成的一些例子,這是從用戶畫像去生成用戶簡歷的一個例子,通過在模板變量裏面做不同的採樣,我們可以得到不同的模板值,把它與表格裏面學到的內容變量合併之後我們可去生成不同的句子,不同的句子長度和寫作風格都有很大的差別,這樣就得到了比較多樣,並且質量比較高的一些句子。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/6c\/6cb902b66895299432523dd85851f1d4.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"利用類似的解耦表示學習(Disentangled Representation Learning)的方法,我們也可以去學到句子的語法表示以及語義表示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/21\/210589e23213d0bac8e69dbf44295a0f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個語法表示和語義表示有什麼作用?我們可以做一個非常有趣的實驗,叫『句子嫁接』。例如有兩個句子,“There is an apple on the table”,“The dog is behind the door”。我們可以從從第一個句子裏面學到它的語法表示,從第二個句子裏面學到它的語義表示,把前者語法表示和後者語義表示合併起來,通過DSSVAE模型 [7],生成另外一個句子,“There is a dog behind the door”。從形式上,它非常接近第一個句子,都是there-be句型;從語義上,它更接近第二個句子,這就是句子嫁接。有了句子嫁接之後,我們可以利用這個技術在質量非常高的文章上面去學一些模型。比如一些業餘作者要寫文章的時候,我們就可以用這些高質量文章上面學出的模型去幫助業餘的作者改進他們的寫作內容。這是第二部分,文本生成的深度隱變量模型(Deep Latent Variable Models for Text generation)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第三部分我將介紹一下,如果文本生成過程當中有額外的條件限制,如何高效地去做生成。這個問題是我們在火山引擎的實踐當中發現的。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"受限文本生成的蒙特卡洛方法"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/fc\/fc5394b9ac02459ee851a55fd7915076.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假如說我們要廣告主設計一個廣告,希望在廣告文案當中出現一些給定的關鍵詞,這個問題可以描述成受限文本的生成(Constrained Text Generation)。這裏具體的限制是keyword-occurrence constraint,即這些詞必須要在句子當中出現。針對關鍵詞限制(keyword occurrence),傳統的算法是格束搜索(grid beam search) [8]。通過格束搜索,我們能夠去生成一些句子,句子中必然會包含給定的關鍵詞,但是這種方法並不能保證會生成質量比較高的句子。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/e5\/e5077f6edc856a33c6e286ae66d2e953.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們提出了一個新的基於採樣的文本生成框架。首先我們把目標問題和目標函數拆解成兩部分,"},{"type":"text","marks":[{"type":"strong"}],"text":"第一部分是預訓練好的語言模型表徵句子概率(pre-trained language model)"},{"type":"text","text":",這部分代表了句子本身的通順程度,所以可以用以前訓練好的語言模型來表示,對應圖中橘黃色的部分。第二部分代表的是"},{"type":"text","marks":[{"type":"strong"}],"text":"受限的文本"},{"type":"text","text":","},{"type":"text","marks":[{"type":"strong"}],"text":"這可以用指示函數(indicator function)來表示圖中藍色的部分"},{"type":"text","text":"。而我們目標的句子,實際上是這兩部分的交集,也就是圖中紅色的部分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/6f\/6f1be00c6ed979f51ce876693aa72e30.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們的目標是從紅色的部分裏面去生成既通順又滿足約束的高質量句子。所有的文本生成問題幾乎都可以用這樣一個框架來表示。而有了這樣一個目標問題的表示之後,我們發現這個目標函數實際上不是一個合理的、有效的概率分佈,因爲它並沒有歸一化,要直接去找出其中的概率最高的樣本點是比較困難的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們提出了一個新方法——CGMH [9]。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先我們從原始語料當中可以預訓練一個語言模型,例如現在比較流行的GPT2或者GPT3 [10]。然後我們從一個初始的句子出發,不斷地修改這個句子,每一步都可以"},{"type":"text","marks":[{"type":"strong"}],"text":"插入、替換"},{"type":"text","text":"、或"},{"type":"text","marks":[{"type":"strong"}],"text":"刪掉"},{"type":"text","text":"一個詞。對於得到的新句子,我們再用梅特羅波利斯-黑斯廷斯算法(Metropolis-Hastings)去計算是否接受這個改動還是保留原來的句子。通過這樣不斷迭代式的改動之後,我們最終就可以得到一些比較高質量的句子。這是整個CGMH的核心思想。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/60\/6036f3ac2c9c28b5ac1773fc8e8195e9.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/1a\/1ae9baa9cd2dbc6f137ea444820d6bf2.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們也在之前介紹的關鍵詞約束的文本生成任務上做了實驗,這張圖是CGMH、格束搜索(GBS,即grid beam search)以及LSTM等算法的對比。上圖是自動評估NLL(Negative Log-likelihood)分數,越低越好;下圖是人工評價的指標,越高越好。在上下兩個圖當中,CGMH方法(紅色柱子)都得到了最好分數。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/3e\/3e248205e4cfb9ea4865c0fddf7c5486.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們已經將CGMH部署到大規模線上廣告創作平臺,去爲我們的廣告主服務。它已經被超過10萬個廣告主以及組織採納,每天生成非常多廣告,廣告文案的採納率約達75%以上,也就是說CGMH生成的廣告質量實際上是非常高的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"利用類似的思想,我們還可以去做對抗文本的生成。在機器學習裏面,很多機器學習分類模型都是非常脆弱的,非常容易受到一些噪聲(Noises)或者攻擊(Attacks)的影響。如果要去分析它會受到哪些影響,我們就要去生成對抗樣本,而在文本里面,如果要生成比較像人說的話且具有對抗性質的文本,實際上是非常難的。而我們用CGMH同樣的思想去建模之後,就可以快速找到比較高質量並且真正具有對抗性質的樣本。例如,我們有一個情感分類器,要對影評文本做情感分類。原來對於句子「I really like this movie」,可以正確地進行情感分類,是99%的正向(Positive),通過MHA算法,在不改動語義的情況下,我們的算法只小小改動了幾個詞,把它改成「we truely like the show」,這個時候就會讓情感分類器混淆了,它甚至會認爲這個句子是59%的負向(Negative) [10]。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/22\/2227060afc691a3d1700f713018dd7b1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"更復雜的限制(Constraints)是我們有一些邏輯的或者組合的限制,在這個情況下,要去做生成實際上就非常難了。比如我要把一個陳述句改成一個疑問句,同時關鍵信息要保留,不能缺失,就需要加上比較多的組合的限制以及邏輯語義上的限制。邏輯語義上的限制加了之後如何去做生成,這是比較難的一個問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/06\/0614fbdf6e39e84cf9c251d836d19871.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同樣,我們把它建模成採樣的形式,把目標函數分成兩部分,第一部分有語言模型,第二部分有限制,不過這裏的限制根據邏輯公式去做了一個構造,根據這個限制去做生成,我們提出了一個新的算法,叫TSMH(Tree Search enhanced Metropolis-Hastings),這個算法可以高效地針對目標函數去做採樣 [11]。這是介紹的帶限制的文本如何去做生成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ae\/aeb7351a57a0e0f9b156ee67470d13c5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來我將介紹一下我們在神經網絡機器翻譯方面最新研究的方法,如何去提升神經網絡機器翻譯的性能。首先我要介紹鏡像生成式模型(Mirror Generative Model),這是2020年發表在ICLR會議上面的一個新方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/22\/2252827c95afdc435f4678964cb152db.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"鏡像生成式模型如何提升神經網絡機器翻譯"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們知道,神經網絡機器翻譯是非常喫數據的,一個好的翻譯模型需要大量的平行雙語語料來訓練。有很多的語對之間並沒有這麼大量的平行語料,例如對於中文到印第語的翻譯,實際上是無法找到中文和印第語之間大量的平行語料的。自然的一個問題是:我們能否利用單一語料去做訓練,例如英語到泰米爾語的翻譯當中,我們有大量的英語或者泰米爾語的單語語料,我們利用單語的語料和少量的平行語料一起來做更好的訓練。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/24\/247c96c0236314dd7b38ffd4ec2e5357.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如何做到這一點呢?實際上當我們觀察人的語言能力,我們從中得到一個啓發。當一個人會中文和英文的時候,他必然同時具有四種能力:能用中文造句,能用英文造句,能把中文翻譯成英文,也能把英文翻譯成中文。實際上這裏代表了四種語言能力,我們把前兩種對應到神經網絡裏面的語言模型,把後兩種對應到神經網絡裏面的兩個翻譯方向。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/66\/668ce824df6575b9a7fd01e3f1669171.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼,我們能否做一個模型,使得它像人一樣只要會兩種語言,就會與這兩種語言相關的四種語言技能?答案是肯定的。我們可以在兩個語言句子X、Y之間引入一個隱變量Z,這個隱變量同時跟原語言以及目標語言有關。把它作爲一個橋樑之後,我們把四種技能都整合到一個模型裏面,做目標語言的生成,就是P(Y|Z),原語言到目標語言的翻譯就是P(Y|X、Z),原語言的語言模型就是P(X|Z),而目標語言到原語言的翻譯模型就是P(X|Y,Z)。如何把這四個概率都放到一個框架裏面去呢?我們有一個重要的發現,就是鏡像性(Mirror property),我們發現生成概率P(X,Y|Z),實際上可以寫成這樣對稱的形式,最終把它分解成四項,而這四項分別代表了原語言和目標語言的生成能力,以及原語言到目標語言、目標語言到原語言的翻譯能力。而把四個放到一起之後,我們就可以去聯合做優化,也就是我們提出的鏡像生成式神經機器翻譯模型(MGNMT) [12]。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/cf\/cf6d599c64a9ec01981755776e4ebe26.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"利用這個鏡像生成模型(MGNMT),我們在多個數據上都得到了最好的翻譯結果。在低資源的情況下,相對於傳統的Transformer或者Transformer聯合反向翻譯(Back Translation)的方式,MGNMT都有比較一致的、顯著的提高。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ae\/ae61f7507eef902733e29ec2e768e3cf.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在高資源的情況下(例如英德語向),利用MGNMT加上額外的非平行語料之後,我們可以依然比Transformer加上反向翻譯的方法有明顯提升,並且驗證了非平行語料的數據是非常有用的,而MGNMT在低資源語向的提升會更大一些。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/95\/9599a424767c7dec8ce183b28a64a8bc.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"多語言翻譯預訓練"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"剛纔提到兩個語言之間的翻譯,我們下一步要介紹的是我們如何做更多語言的翻譯。我們在最新的發表在今年EMNLP會議上的工作mRASP的論文當中就提出了一個多語言預訓練的方法。mRASP是一個全新的範式去訓練一個多語言翻譯模型,並且在很多場景裏面進行少量微調之後,就可以讓它在目標語對之間的翻譯有較大的提升 [13]。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/4f\/4f37211f3680576060f6fa793143012f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"世界上有非常非常多的語言,如果你數一下,真正有人使用的人類語言有超過6900種,我們這裏的目標是去構建一個統一的翻譯模型,能夠自動翻譯任何語對。這當然是機器翻譯的最終目標,這個目標也是非常具有挑戰性的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b5\/b50fda4097661d065815de056fc0b617.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們爲什麼要把很多門語言放在一起訓練?第一個現實的原因是,要訓練一個好的機器翻譯的模型需要大量的平行語對,而很多語對之間並沒有平行語料,所以很多語對之間是非常稀疏的。第二個原因是,根據我們的直觀經驗,在語對之間有很多共同的信息是可以遷移的。我們知道,如果一個人學德語需要花一年時間,他學法語也需要花一年的時間,這是單獨學習的情況。如果他花一年時間先學了德語之後,再去學法語,只需要花三個月時間就可以學會法語了。也就是說,當一個人有了學習德語的能力之後,再去學另一門語言,可以大大縮短他學習其他語言的時間,這就是常說的觸類旁通。這就給我們一個很大的啓示,我們在做多語言翻譯的時候,也許把很多語言放在一起學,總的代價可以比單獨學習各門語言的代價總和要小得多。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/cb\/cb1427c2b4bb12e36548bf0d9abc6ba2.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從模型上來講,我們還有一個更深層次的目標,更偏數學的一個直觀想法是:假如單獨學習英語、法語、西班牙語、德語、意大利語等語言的翻譯,我們可能學到一個比較好的表示,但是這些表示之間都沒有相互的關係。其實這些語言之間,我們依然可以找到一些雙語的語對把它們連接起來,這些語對具有相同的意思。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/e8\/e8dd05d1f5868d6750068a54f9e75dd7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們就希望通過這些具有相同意思、並且在各個語言裏面都出現的一些句子作爲錨點,有了這個錨點之後,我們再去統一地學習所有的語言的表示,這樣就會學到一個更好的表示。在這個表示的框架下,一個句子即使在不同的語言裏面,只要它有同樣的語義,就會映射到同樣一個表示空間裏面的向量上面去。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f0\/f05ac8e51c1e1b80c8bdc131b59e01b3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這也是我們提出的mRASP核心思想。mRASP翻譯的模型是用基於Transformer的編解碼器(Encoder-Decoder),我們在輸入端加了編碼器(Encoder)的語言標識符去表示它輸入的語種,在解碼器(Decoder)做了一個額外的輸入是目標語言的語言標識符,表示它需要翻譯的語種。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了使用大量的雙語平行句對來訓練之外,我們還利用這些平行句對做了一個數據增強。通過發明的隨機對齊替換(Random Aligned Substitution)的方法,我們把原句裏面一些詞通過同義詞詞典找到它對應的另外一種語言裏面的同義詞,然後做隨機替換,之後把替換後的源端句子和真正的目標句子再組合成一個僞平行句對,通過這樣的方式去做訓練之後,就可以得到一個比較好的模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過mRASP這個方法,我們在很多場景下去做了多種語言翻譯的測試,這裏面顯示了我們通過mRASP訓練了一個初始的模型,這個統一的模型我們在具體語對平行數據上又去微調。比如說這裏英語到白俄羅斯語(Be),我們應用mRASP預訓練好的模型在英語到白俄羅斯語微調之後得到的翻譯模型,和在英語到白俄羅斯語雙語語料上面直接訓練出一個Transformer翻譯模型做比較之後,發現mRASP可以大大提升翻譯的性能。在極低資源方向(Extremely-Low Resource Directions)以及低資源方向(Low Resource Directions)這兩種場景下,我們都發現mRASP這樣做預訓練微調之後會得到更好的翻譯,提升都在10個點以上。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/cb\/cbc1dc4bf74b22811aa495a80e7ebdcc.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在中等資源(Medium Resource)和高資源(Rich Resource,指雙語語對有100萬以上的平行語料)兩個場景下,我們發現mRASP方法仍然有比較大的提升,我們也和之前提出的所有其他方法做了對比,包括XLM、CTNMT、MASS以及mBART等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/dd\/dd650e405490fc2fa6c1f5f157651abd.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們也做了另外一個實驗,mRASP是否對未見語種也有效?通過mRASP訓練了之後,我們在一些從來沒有見過的語對上面去做微調,例如從荷蘭語(Nl)到葡萄牙語(Pt)。這兩個語言都沒有在mRASP的預訓練語料裏面出現過,而且微調階段雙語平行語料只有1.25萬,非常少,如果直接在這個語對上面用Transformer去訓練的話,得不到任何有意義的結果,BLEU SCORE會是0。用mRASP預訓練好的模型,在荷蘭語到葡萄牙語的語料上面去微調之後,會得到一些有意義的翻譯結果,而BLEU SCORE也有了10個點的提升(從0漲到13)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/fc\/fc5f3f2d92e83f75456cc1f1fcd83229.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"簡單總結下我的演講內容。這裏我介紹了多模態協作機器人Xiaomingbot,也介紹了兩種從數據當中學到解耦隱表示(Disentangled Latent Representation)的方法,包括變分模板機VTM,用來做數據到文本的生成(Data-to-Text Generation)。以及DSSVAE,從數據當中學到文本和語義隱層表示的。以及DEMVAE方法,如何從原始文本數據當中學到有意義的隱表示和語義聚類。我也介紹了在文本生成當中如果有額外的限制,如何用比較好的一些方法去生成高質量的句子,並且符合這些限制,如CGMH、MHA和TSMH等方法。最後我介紹了兩個機器翻譯的新方法,一個是鏡像式生成模型MGNMT,可以把平行語料和非平行語料聯合在一起去學到兩個語言之間的四種語言能力。而mRASP更是把機器翻譯預訓練推廣到非常多的語對之間,把這些語對聯合起來訓練一個比較好的模型,然後在下游的翻譯任務上做微調,能夠非常有效地提升翻譯性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/94\/9423a16534c95bc762e708ec8ff16cc0.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們也開源了一些算法還有工具,包括mRASP。我們已經把訓練後的以及訓練好的模型開源。我們最近也發佈了一個高性能的序列推理工具LightSeq [14],針對Nvidia的GPU做性能優化,重寫了序列生成的計算內核,並且在序列生成機器翻譯等任務上相對tensorflow版本,有10倍以上的速度提升。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,我們推出了火山翻譯系統,如果大家感興趣,歡迎到translate.volcengine.cn網站去體驗。目前火山引擎 AI 中臺也集合了包括視頻翻譯、機器翻譯、智能同傳等模塊功能,同時歡迎到火山引擎官網 volcengine.cn 詳細瞭解。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"李磊,字節跳動AI Lab總監。字節跳動傑出科學家,卡耐基梅隆大學計算機科學博士,致力於機器翻譯、機器寫作、智能機器人的研發。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"火山引擎是字節跳動旗下的數字服務與智能科技品牌,基於公司服務數億用戶的大數據、人工智能和基礎服務等技術能力,爲企業提供系統化的全鏈路解決方案,助力企業務實地創新,給企業帶來持續、快速增長。火山引擎圍繞數據智能、視覺智能、語音智能、智能應用、多媒體技術和雲原生等六大方向,面向企業級市場推出了數十款技術產品與服務,從開發、應用到運營,滿足不同類型企業在生命週期不同階段業務發展的核心需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"參考文獻"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"

[1]

E. Brynjolfsson, X. Hui and M. Liu, \"Does machine translation affect international trade? Evidence from a large digital platform.,\" Management Science, vol. 65, no. 12, pp. 5449- 5460, 2019.

[2]

R. Xu, J. Cao, M. Wang, J. Chen, H. Zhou, Y. Zeng, Y. Wang, L. Chen, X. Yin, X. Zhang, S. Jiang, Y. Wang and L. Li, \"Xiaomingbot: A Multilingual Robot News Reporter,\" in the 58th Annual Meeting of the Association for Computational Linguistics (ACL): System Demonstrations, 2020.

[3]

Z. Sun, J. Chen, H. Zhou, D. Zhou, L. Li and M. Jiang, \"GraspSnooker: Automatic Chinese Commentary Generation for Snooker Videos,\" in the 28th International Joint Conference on Artificial Intelligence (IJCAI) : Demo, 2019.

[4]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser and I. Polosukhin, \"Attention is All You Need,\" in NeurIPS, 2017.

[5]

W. Shi, H. Zhou, N. Miao and L. Li, \"Dispersed Exponential Family Mixture VAEs for Interpretable Text Generation,\" in the Proceedings of the 37th International Conference on Machine Learning, 2020.

[6]

R. Ye, W. Shi, H. Zhou, Z. Wei and L. Li, \"Variational Template Machine for Data-to-Text Generation,\" in Proceedings of International Conference on Learning Representations, 2020.

[7]

B. Bao, H. Zhou, S. Huang, L. Li, L. Mou, O. Vechtomova, X. Dai and J. Chen, \"Generating Sentences from Disentangled Syntactic and Semantic Spaces,\" in the 57th Annual Meeting of the Association for Computational Linguistics, 2019.

[8]

C. Hokamp and Q. Liu, \"Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search,\" in the 55th Annual Meeting of the Association for Computational Linguistics, 2017.

[9]

N. Miao, H. Zhou, L. Mou, R. Yan and L. Li, \"CGMH: Constrained Sentence Generation by Metropolis-Hastings Sampling,\" in the 33rd AAAI Conference on Artificial Intelligence , 2019.

[10]

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan and Chil, \"Language Models are Few-Shot Learners,\" in Advances in Neural Information Processing Systems , 2020.

[11]

H. Zhang, N. Miao, H. Zhou and L. Li, \"Generating Fluent Adversarial Examples for Natural Languages,\" in 57th Annual Meeting of the Association for Computational Linguistics - short papers, 2019.

[12]

M. Zhang, N. Jiang, L. Li and Y. Xue, \"Language Generation via Combinatorial Constraint Satisfaction: A Tree Search Enhanced Monte-Carlo Approach,\" in he Conference on Empirical Methods in Natural Language Processing (EMNLP) - Findings, 2020.

[13]

Z. Zheng, H. Zhou, S. Huang, L. Li, X. Dai and J. Chen, \"Mirror Generative Models for Neural Machine Translation,\" in International Conference on Learning Representations , 2020.

[14]

Z. Lin, X. Pan, M. Wang, X. Qiu, J. Feng, H. Zhou and L. Li, \"Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information,\" in the Conference on Empirical Methods in Natural Language Processing, 2020.

[15]

\"LightSeq,\" [Online]. Available: https:\/\/github.com\/bytedance\/lightseq. [Accessed 2020]."}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章