語言人工智能真的火了

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"短短几年間,深度學習算法已發展到能夠在棋牌遊戲中打敗世界上最優秀的玩家,並且能夠以與人類相同的正確率(甚至可能更好)識別人臉。但事實證明,掌握人類語言的獨特而深遠的複雜性是人工智能面臨的最嚴峻的挑戰之一。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種情況會改變嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"計算機可以有效地理解所有人類語言,它將徹底改變我們與世界各地品牌、企業和組織接觸的方式。現在,大多數公司都沒有時間回答顧客提出的問題。但是你可以想象一下,如果一個公司能夠在任何時候、任何渠道,傾聽並理解和回答所有的問題呢?爲了抓住存在的這個巨大機會,我的團隊已經和一些世界上最具創新能力的組織及其技術平臺生態系統一起建立了大規模的一對一客戶交流。但是仍有大量工作要做。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一直到 2015 年,一種能夠識別人臉的算法才得以建立,其正確率堪比人類。Facebook 的 DeepFace 的正確率是 97.4%,與人類 97.5% 的表現相差無幾。作爲參考,"},{"type":"link","attrs":{"href":"https:\/\/www.theverge.com\/2014\/7\/7\/5878069\/why-facebook-is-beating-the-fbi-at-facial-recognition","title":"","type":null},"content":[{"type":"text","text":"FBI 的面部識別算法"}]},{"type":"text","text":"只能達到 85% 的正確率,也就是說,每 7 個案件中,仍有 1 個以上的案件是錯誤的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"FBI 的算法是由一個工程師團隊手工製作的。每一個特徵,比如鼻子的大小和眼睛的相對位置,都是人工編程的。Facebook 的算法則是利用學習到的特徵。Facebook 使用了一種特殊的深度學習架構,稱爲卷積神經網絡,它能模擬我們視覺皮層的不同層對圖像的處理。由於我們不知道我們到底是如何看到的,所以這些層之間的連接是由算法學習的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Facebook 之所以能做到這一點,是因爲它想明白瞭如何將人類級人工智能的兩個基本要素落實到位:一個能夠學習功能的架構,一個由數百萬用戶標註的高質量數據,這些用戶在分享的照片中爲好友添加了標籤。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"語言是視覺的"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在生物進化過程中,視覺是被數以百萬計的不同物種所解決的問題。但是語言卻顯得更加複雜。就我們所知,我們目前是唯一能用複雜語言進行溝通的物種。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不到十年前,要理解文本的含義,人工智能算法只會統計某些詞出現的頻率。但這種方法顯然忽略了這樣一個事實:即單詞具有同義詞,而且只在特定語境下具有意義。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2013 年,Tomas Mikolov 和他在谷歌的團隊發現瞭如何創建一個能夠學習單詞含義的架構。他們的"},{"type":"link","attrs":{"href":"https:\/\/en.wikipedia.org\/wiki\/Word2vec","title":"","type":null},"content":[{"type":"text","text":"word2vec"}]},{"type":"text","text":"算法將同義詞相互映射,它能夠對大小、性別、速度等意義進行建模,甚至能夠學習國家及其首都這樣的函數關係。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然而,缺失的部分是上下文。這一領域真正的突破出現在 2018 年,谷歌推出了"},{"type":"link","attrs":{"href":"https:\/\/venturebeat.com\/2018\/11\/02\/google-open-sources-bert-a-state-of-the-art-training-technique-for-natural-language-processing\/","title":"","type":null},"content":[{"type":"text","text":"BERT"}]},{"type":"text","text":"模型。Jacob Devlin 和他的團隊回收了一種通常用於機器翻譯的架構,讓它根據句子中的上下文來學習單詞的意思。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過教會該模型填補維基百科文章中的缺失詞彙,該團隊能夠將語言結構嵌入到 BERT 模型中。在只有有限數量的高質量標籤數據的情況下,他們能夠針對多種任務對 BERT 進行微調,從尋找問題的正確答案到真正理解一句話的內容。他們是第一個真正掌握了語言理解的兩個基本要素:正確的架構和學習大量高質量的數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2019 年,Facebook 的研究人員得以更進一步。他們同時在 100 多種語言上訓練了一個類似 BERT 的模型。該模型能夠學習一種語言的任務,例如英語,並將其用於其他任何一種語言的相同任務,例如阿拉伯語、漢語和印地語。這種語言無關的模型在訓練的語言上的表現與 BERT 相同,從一種語言到另一種語言的影響是有限的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這一切技術本身的確令人印象深刻,但是到 2020 年初,谷歌的研究人員終於能夠在廣泛的語言理解任務中超越人類。谷歌通過在更多的數據上訓練一個更大的網絡,將 BERT 架構推向極限。這種被稱爲 T5 的模型現在在標記句子和尋找問題的正確答案方面比人類做得更好。"},{"type":"link","attrs":{"href":"https:\/\/venturebeat.com\/2020\/10\/26\/google-open-sources-mt5-a-multilingual-model-trained-on-over-101-languages\/","title":"","type":null},"content":[{"type":"text","text":"10 月發佈"}]},{"type":"text","text":"的語言無關的 mT5 模型在從一種語言翻譯到另一種語言方面幾乎和雙語人類一樣出色,但它可以同時處理 100 多種語言。而"},{"type":"link","attrs":{"href":"https:\/\/venturebeat.com\/2021\/01\/12\/google-trained-a-trillion-parameter-ai-language-model\/","title":"","type":null},"content":[{"type":"text","text":"谷歌剛剛宣佈的萬億參數模型"}]},{"type":"text","text":",則讓這個模型變得更龐大、更強大。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"可能性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"想象一下,聊天機器人可以理解你用任何想象中的語言寫的東西。它們會真正理解上下文並記住過去的對話這個時候,你得到的答案不再是泛泛的答案,而變成了切中要害的答案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着時間的推移,隨着公司對這些微調工作的投資,我們將看到有限的應用出現。而且,如果我們相信摩爾定律,我們可能會在五年左右看到更復雜的應用。但是新的模型也會出現,超過 T5 算法的性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2021 年伊始,我們距離人工智能最重大的突破,以及由此帶來的無限可能,已觸手可及。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pieter Butters,Sinch 機器學習和人工智能工程總監。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https:\/\/venturebeat.com\/2021\/01\/17\/language-ai-is-really-heating-up\/"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章