小米彭力:知識圖譜的技術突破口是要針對特定場景做優化

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"隨着互聯網的發展,知識圖譜和深度學習已廣泛應用並影響了不同業務場景下數據獲取及計算的方式。知識圖譜已變爲問答系統、商品推薦等智能應用的基礎設施,爲上層業務在語義理解和可解釋性上提供了依據。其中知識計算是知識圖譜構建的關鍵一環,將各類數據、知識、經驗以及信息進行表示、分類、融合、建模將知識表達成更接近人類認知的結構。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"爲了進一步瞭解知識計算及知識圖譜技術在小米業務場景下的探索和實踐,在"},{"type":"link","attrs":{"href":"https:\/\/aicon.infoq.cn\/2021\/beijing\/schedule","title":"xxx","type":null},"content":[{"type":"text","text":"AICon人工智能大會"}]},{"type":"text","text":"(北京站·2021)召開前夕"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",InfoQ 有幸採訪了小米人工智能部\/知識圖譜平臺團隊負責人彭力,聽他來分享知識計算的技術方案在小米業務場景中的應用與創新。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"知識圖譜在小米業務場景下的實踐"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在數據爆炸時代,知識圖譜技術作爲認知智能領域的重要組成部分,在人工智能與產業緊密結合的當下,其重要性尤爲凸顯。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"2012年至今,"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s?__biz=MzU1NDA4NjU2MA==&mid=2247518792&idx=2&sn=da607ee3f988305570a9531a7a87c37d&chksm=fbea3387cc9dba91be577de89c3da1248635786e5cdc9b660fe665db2590eff0f7e785f001e9&scene=27#wechat_redirect","title":"xxx","type":null},"content":[{"type":"text","text":"知識圖譜"}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"經歷了多個發展階段,更早期的概念可以追溯到1960年的語義網絡,中間經歷了一系列的演變,才形成了如今的知識圖譜。1968 年圖靈獎獲得者 Edward Feigenbaum 研發出世界首個專家系統 DENDRAL,並隨後在第五屆國際人工智能會議上正式提出知識工程的概念,目標是將知識融入計算機系統用以解決只有領域專家才能解決的複雜問題。1999 年互聯網發明人、圖靈獎獲得者 Tim Berners-Lee 爵士提出語義網的概念,核心理念是用知識表示互聯網,建立常識知識,但一直苦於規模小、應用場景不清楚而發展緩慢,因此,2012 年以前,學術界和工業界普遍認爲知識圖譜技術處於初級發展階段。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"作爲一家以硬件起家的企業,小米在知識圖譜領域的佈局並不算早。據人工智能部\/知識圖譜平臺團隊負責人彭力介紹,2018年他剛加入小米時,"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/article\/1yPMvABqfOSllIBPimpv","title":"xxx","type":null},"content":[{"type":"text","text":"小米"}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"的知識圖譜平臺纔剛起步,當時還處於小作坊的模式,流程上和處理邏輯上還不規範,缺少流程控制、數據管理等基礎的設施。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"18年入職以後,彭力第一階段主導的工作是在模式層構建、圖譜的收錄流程搭建、計算邏輯單元抽象、上線的流程、質量控制等環節上展開工作,規範化收錄流程、提升知識收錄的質量和效率。第二階段再根據具體業務需求做重點的數據和算法的打磨和優化。接下來就是做服務效率的優化和行業圖譜的應用場景的探索。經過三階段的工作後,在知識獲取、知識對齊、鏈接預測、實體鏈接等"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/article\/p3DY1hdJ2QTYLylQkrhI","title":"xxx","type":null},"content":[{"type":"text","text":"算法"}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"上經歷了從無到有、由淺到深、由慢到快等不同維度的迭代及優化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"這幾個階段的工作,聽起來簡單,實際操作卻不容易,在技術的迭代中其實會遇到各種各樣的問題。在提及遇到的難題時,彭力舉出了一個實體鏈接算法迭代優化的例子。他表示,小愛是知識圖譜團隊服務的重點業務之一,在小愛問答場景下實體鏈接算法就遇到了語義缺失和併發性能要求高的兩個問題;其中第一個問題:小愛用戶query一般較短以人物類的query 爲例大約有81%的query都是單實體的短文本,這就導致了實體的上下文缺失和語義缺失等問題,給實體鏈接的實體消歧帶來挑戰。第二個問題是實體鏈接的應用之一是需要輔助短文本理解工作,業務場景要求算法的QPS要達到2000個每秒。針對第一個問題文本本身沒有上下文只能從用戶的先驗特徵出發,以往的實體鏈接中也有像實體流行度相似的統計特徵,但是單單把該特徵引入對效果提升不明確,所以既然以用戶維度做效果評估,他們索性就引入用戶的點贊、分享、用戶搜索熱度、實體流行度等特徵作了一層基於MLP的粗排序,單獨看了一下粗過濾的效果,效果比較明確;提速上計算的瓶頸主要是在實體消歧上,針對實體消歧做了兩層排序,首先基於前面做的粗排序後做了一個粗篩(兩個目的一個是減少計算量,第二個是減少計算的噪聲)後面用深度模型做了一個精排序(用fast-transformer和模型量化提速)來提升精度,經過改造後隨機準確提升了XX,服務的計算性能提升了30倍。負責該任務對性能提升的效果也特別的激動。他們把該方法應用到了CCKS比賽上,最終在實體鏈指賽道獲得了第一名的成績。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"解決了技術迭代中存在的問題,接下來就是要讓技術爲業務賦能。知識圖譜和知識計算等技術在小米的應用案例有很多,比如在去年新冠疫情期間,他們把知識圖譜應用在新冠防疫上,並把方案發布在IEEE知識圖譜復工復產案例中,因此獲得了國家局領導好評;另外,在小米自己的電商領域把用戶商品及場景結合構建了電商圖譜,把推薦等關鍵知識計算技術應用在小米有品商城和小米網等場景,並把案例發表於認知智能時代:知識圖譜實例案例集。除此之外還有很多案例基於業務的場景針對自己的需求做了很多優化及創新,比如:智能工廠故障檢測、智能物料採購等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"類似上述的案例還有很多,基於知識計算技術不僅實現了爲業務賦能,也統一了企業的知識體系,以知識化服務形式提升各部門工作效率,完成了知識的沉澱和閉環。正是因爲這樣的不斷摸索,小米的知識圖譜技術才越來越成熟,團隊的凝聚力也越來越強。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"知識圖譜目前面臨的技術難題和突破口"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"技術賦能於業務的背後,重要的是技術如何搭建。從知識圖譜的構建技術看,它經歷了由人工構建到羣體智慧構建到自動獲取、構建的過程。但其實知識圖譜的人工構建和自動化構建各有優缺點。彭力認爲,其實這兩個構建方法對比的優缺點很明顯,人工構建數量有限精度高粒度細但成本大,自動構建數據大成本小精度與人工構建比略爲遜色且粒度粗。人工構建根據角色不同可以分爲專家構建和衆包構建等,其中專家構建的知識精度與可信度高但是專家有限而且成本大,衆包構建的方法獲取到的知識會受人員的知識儲備和素質等不可控因素影響數據可能會在不同程度上污染。自動構建主要精力集中在算法優化上,人力投入成本相對小,知識構建一般面向開放文本所以知識的體量一般比人工大的多,但是精度會受數據的波動和變化的影響。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"總地來說,目前知識構建還是多以人工加自動化構建結合的方法(自動爲主人工輔助質量控制 )。但在特定行業的知識在通用知識領域覆蓋比較稀疏的場景下自動構建就會失去作用以人工(專家)的構建爲主。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"無論是人工還是自動化,知識圖譜的構建都是一個相當複雜的系統工程,不可能通過某一項技術適配所有場景。現在業界各家公司都在自己擅長的領域相繼的構建並應用了知識圖譜,對於行業知識的構建基本上框架都是基於自頂向下的構建流程,是由知識建模、知識獲取、知識融合、知識推理、知識存儲、知識應用等關鍵環節組成,但是通用框架和通用的算法不太多,大多都是對每個環節針對自己應用的需求做特定的適配。另外還有一些企業級的構建平臺比如poolparty、lods、Stardog等平臺,但是對於業務兼容與適配的可控性差不太適合做自有業務擴展和計算。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"技術發展至今,業內有不少聲音認爲,知識圖譜技術已經達到了通用+多源異構的階段,對於此問題,彭力表示,多源異構的階段已經存在相當長一段時間了,知識圖譜的優勢之一就是對多源異構數據的融合和對齊。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"圖譜的知識來源可以來自開放的數據集也可以來自某些垂直類的資源站點,來源多而且數據的表現形式和組織方式也差別大(文本、圖片、視頻、音頻、時序數據等),所以知識對齊與融合是知識計算重要的一環。多源異構的場景有很多,如:政務上信用認證場景用戶的社保繳費、房產信息、租房信息、保險信息會分佈在不同的組織部門、存儲和構成方式每個部門差別也比較大,需要把這些信息聚合才能輔助更高層的精準分析和決策。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"技術發展到一定階段後,必然會面臨一些瓶頸問題。就技術而言,工業界與學術界的目標不同,就工業界而言,其目標是落地應用,知識圖譜在落地的過程中會遇到比較細的一些問題,比如在數據獲取上如何高質量的完成多源異構數據的抽取、如何將多源異構的數據融合對齊、如何建立高效通用的構建框架、應用上如何讓圖譜能夠發揮最大化的價值、如何讓知識圖譜能夠勝任複雜的知識推理等,這些都是擺在我們面前需要一一去解決的技術難題。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"彭力坦言,想要解決這些問題,突破口還是要基於自己的場景做特定性的優化"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"最近知識圖譜在行業領域的應用處於井噴期遍地開花,在電力、醫療、金融、司法、能源、政務、生物基因等涉及到語義理解和知識推理等的場景都有知識圖譜的身影。但是現在知識圖譜的複雜推理能力和複雜推理的性能在認知智能時代依然還有很大的提升空間,待複雜推理能力提升後,知識圖譜將會更深入地在依賴可解性和可理解性這種場景更廣泛的應用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"採訪嘉賓:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"彭力,小米,人工智能部\/知識圖譜平臺團隊負責人。2012年至2018年曾就職於百度,於2018年5月加入小米。現任小米知識圖譜部圖譜平臺團隊負責人。目前主要負責小米知識圖譜的構建及落地,已推動知識圖譜及其技術賦能小愛同學、小米網、遊戲中心等智能問答、智能客服、商品推薦、商品搜索等業務場景。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"活動推薦:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"11月6日北京AICon會議上,小米AI實驗室主管王斌老師擔任“認知智能的前沿探索”專題出品人,本專題下,除了彭力的知識計算話題外,還有美團新零售知識圖譜探索、阿里巴巴多模態預訓練模型、郵電大學圖神經網絡實踐的話題,感興趣的可以點擊鏈接【"},{"type":"link","attrs":{"href":"https:\/\/aicon.infoq.cn\/2021\/beijing\/schedule","title":"xxx","type":null},"content":[{"type":"text","text":"AICon人工智能大會"}]},{"type":"text","text":"】,希望本專題的演講可以給你帶來更多思考。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/76\/60\/766cfe85a0232ab3a893ce9063340560.jpg","alt":null,"title":null,"style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章