GPT-2大戰GPT-3:OpenAI內部的一場終極對決
{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於在訓練過程中使用的數據集的多樣性,我們可以爲來自不同領域的文本獲得足夠的文本生成。GPT-2 的參數和數據是其前代 GPT 的 10 倍。而 GPT-3 又是 GPT-2 的 10 倍。那麼問題來了,應該選擇那個 Transformer 呢?"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"我應該選哪個 Transformer:GPT-2 還是 GPT-3"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"生成式預訓練 Transformer(GPT)是 OpenAI 開發在自然語言處理(NLP)領域的創新之舉。這些模型被認爲是同類模型中最先進的,甚至在壞人手中也可能是很危險的。它是一種無監督的生成模型,也就是說,它接收句子等輸入信息,並嘗試生成一個適當的響應,而用於其訓練的數據是不帶標籤的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/0b\/19\/0bae4b3ec72027da4d104a424922c319.jpg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"GPT-2 是什麼?"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/e8\/73\/e84b399237fe31188f610f40dd904773.jpg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"GPT-2 是 OpenAI 在 2019 年 2 月創建的一種基於 Transformer 的無監督深度學習語言模型,其目的只有一個,就是預測句子中的下一個單詞。GPT-2 是“Generative Pretrained Transformer 2”的縮寫。該模型是開源的,在超過 15 億個參數上進行訓練,以便爲給定句子生成下一個文本序列。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於在訓練過程中所用數據集的多樣性,我們能夠獲取足夠的來自不同領域的文本生成。而 GPT-2 的參數和數據是其前代 GPT 的 10 倍。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"語言任務,如閱讀、摘要和翻譯,可以通過 GPT-2 學習原始文本,而不需要使用特定領域的訓練數據。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"自然語言處理的一些侷限性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當處理自然語言生成時,必須考慮到某些侷限性。它是一個非常活躍的研究領域,但它還處於起步階段,還不能克服它的侷限性。限制條件包括重複的文本,對技術性和專業性很強的主題的誤解,以及對上下文短語的誤解。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"語言和語言學是一個複雜而龐大的領域,通常需要人們經過多年的訓練和接觸,不僅包括理解詞語的含義,而且還包括上下文如何構成句子、如何給出答案以及使用恰當的俚語。它還可以爲不同的領域創建自定義和可擴展的模型。OpenAI 提供的一個例子就是使用 Amazon Reviews 數據集來訓練 GPT-2,並教授模型如何根據星級和類別等爲條件編寫評論。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"GPT-3 是什麼?"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/45\/6c\/45b908558a3200b4f9321fa7de45fe6c.jpg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"簡而言之,GPT-3 就是“生成式預訓練 Transformer”,它是 GPT-2 的第 3 個發行版,也是一個升級版。第 3 版將 GPT 模型提升到了一個全新的高度,因爲它的訓練參數達到了 1750 億個(這是前代 GPT-2 的 10 倍以上)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"GPT-3 是在一個名爲“"},{"type":"link","attrs":{"href":"https:\/\/commoncrawl.org\/","title":"","type":null},"content":[{"type":"text","text":"Common Crawl"}]},{"type":"text","text":"”的開源數據集上進行訓練的,還有來自 OpenAI 的其他文本,如維基百科(Wikipedia)條目。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"GPT-3 的創建是爲了比 GPT-2 更強大,因爲它能夠處理更多的特定主題。GPT-2 在接受音樂和講故事等專業領域的任務時表現不佳,這是衆所周知的。現在,GPT-3 可以更進一步地完成諸如答題、寫論文、文本摘要、語言翻譯和生成計算機代碼等任務。它能夠生成計算機代碼,本身就已經是一個重大的壯舉了。你可以"},{"type":"link","attrs":{"href":"https:\/\/gpt3examples.com\/","title":"","type":null},"content":[{"type":"text","text":"在這裏查看一些 GPT-3 的例子"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"長期以來,很多程序員都在擔心被人工智能所取代,而現在看來,這一擔心正在成爲現實。隨着 Deepfake 視頻的普及,由人工智能驅動的語音和文字也開始模仿人類。不久,當你打電話或在網上交流時(例如聊天應用),可能很難判斷你是在和真人交談還是與人工智能交談。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"GPT-3 可稱爲序列文本預測模型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然它仍然是一種語言預測模型,但更精確的描述可能是一種序列文本預測模型。GPT-3 的算法結構已被認爲是同類模型中最先進的,因爲它使用了大量的預訓練數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"GPT-3 通過語義學的方法理解語言的含義,並嘗試輸出一個有意義的句子給用戶,從而在接受輸入後生成句子。因爲不使用標籤化的數據,模型就不會知道什麼是對的,什麼是錯的,這是一種無監督學習。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲這些模型可以自動完成許多基於語言的任務,所以當用戶使用聊天機器人與公司進行通信時,它們就變得越來越知名和流行。GPT-3 目前處於私有 beta 測試階段,這意味着如果用戶想要使用這個模型,他們必須登錄到等待列表中。它作爲通過雲訪問的 API 提供。現在看來,這些模型只適用於那些擁有 GPT 模型資源的個人 \/ 企業。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當我們給出 “I want to go output to play so I went to the____”的句子時,可以看到這種模式在發揮作用的一個例子。在這個例子中,一個好的響應可以是諸如 park 或 playground 之類的,而不是諸如 car wash 之類的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,在提示文本的條件下,park 或 playground 的概率高於 car wash 的概率。當模型被訓練時,它被輸入數百萬個樣本文本選項,並將其轉換爲數字向量表示。這是一種數據壓縮的形式,模型用它把文本轉換成有效的句子。壓縮和解壓的過程可以提高模型計算詞的條件概率的準確性。它開啓了一個充滿可能性的全新世界,但也有其侷限性。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"GPT-2 和 GPT-3 的一些侷限性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"儘管生成式預訓練 Transformer 在人工智能競賽中是一個偉大的里程碑,但是它沒有能力處理複雜和冗長的語言形式。舉例來說,如果你想像一個句子或一段包含文學、金融或醫學等專業領域的詞彙,如果事先沒有進行足夠的訓練,模型就不能做出恰當的反應。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"鑑於計算資源和功耗的巨大需求,在當前情況下,這並非一種可行的大衆解決方案。數十億的參數需要大量計算資源才能運行和訓練。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那又是一個黑盒模式。在一個業務環境中,用戶最需要的是理解下面的過程。目前 GPT-3 仍不能向公衆開放,因爲只有少數人可以獨佔。潛在的使用者必須登記他們的興趣,並等待邀請,這樣才能親自測試模型。這麼做是爲了防止濫用如此強大的模型。一種可以複製人類語言模式的算法,對於整個社會來說有很多道德意義。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"GPT-3 優於 GPT-2"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於 GPT-3 更強的性能和明顯更多的參數,它包含了更多的主題文本,顯然比它的前代要好。這一模型非常先進,即便存在侷限性, OpenAI 仍然決定保持其安全性,並僅發佈給提交推理使用這一模式的選定個人。最後,他們可能會考慮將其作爲 API 發佈,這樣就可以控制請求,並最小化對模型的濫用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外需要注意的是:微軟在 2020 年 9 月宣佈了 GPT-3 的“獨家”使用許可;其他人仍然可以使用公共 API 來接收輸出,但只有微軟自己擁有源代碼的控制權。由於這個原因,"},{"type":"link","attrs":{"href":"https:\/\/www.eleuther.ai\/","title":"","type":null},"content":[{"type":"text","text":"EleutherAI"}]},{"type":"text","text":"一直在研究它自己的基於 Transformer 的語言模型,這種模型是根據 GPT 架構鬆散地設計的。他們的目標之一是使用自己的 GPT-Neo 來複制一個 GPT-3 規模的模型,並將其免費開源給公衆。你可以"},{"type":"link","attrs":{"href":"https:\/\/github.com\/EleutherAI\/gpt-neo","title":"","type":null},"content":[{"type":"text","text":"在這裏查看 GitHub repo 上的 GPT-Neo 進展"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"人工智能在對語言生成領域造成致命一擊之前,還有很長的路要走,因爲這些模型還不能完善人類語言的細微差別。需要學習處理的任務的精確度和類型仍比當前的能力要高。但是,新的 GPT 模型的快速發展,使得下一個重大突破可能就在眼前。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"作者介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kevin Vu,管理 Exxact Corp 博客,並與許多有才華的寫手合作,他們都撰寫深度學習的不同方面。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https:\/\/www.exxactcorp.com\/blog\/Deep-Learning\/gpt2-vs-gpt3-the-openai-showdown"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.