深度學習與大數據系統融合是dead direction嗎?

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2016年,人工智能以圍棋冠軍的身份進入了人們的視野,有關深度學習的研究也遍地開花,其中有一個方向頗爲小衆:深度學習與大數據系統融合。如今四年過去了,這一領域一直不爲業內所關注,甚至少有成果展示。那麼在這樣的前提下,堅持在深度學習與大數據系統融合領域的探究是否還有意義呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了尋找答案,InfoQ專門採訪到了來自騰訊互娛增值服務部的專家研究員——李立,作爲人工智能領域的專家,他對該研究方向的現狀與前景有着自己的一番見解。李老師也將在2020年12月6-7日的QCon全球軟件開發大會(深圳站)“人工智能前沿方向與落地實踐“專題中,帶來《"},{"type":"link","attrs":{"href":"https:\/\/qcon.infoq.cn\/2020\/shenzhen\/presentation\/2895","title":"","type":null},"content":[{"type":"text","marks":[{"type":"underline"}],"text":"深度學習和大數據系統融合的思考和應用"}]},{"type":"text","text":"》的演講,進一步分享他和團隊在深度學習與大數據系統融合方向未來的思考和探索。"}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"深度學習爲何要與大數據系統結合?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“大數據是作爲深度學習的基礎而存在的。”李立說,這是指“大數據系統中數據是深度學習的數據養料,沒有大數據系統的海量數據,很多深度學習模型都會陷入過擬合的狀態”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"衆所周知,人工智能的發展不能離開三大要素:數據、算法、算力。這其中,數據是非常重要的基礎,也正因如此,A+B+C(即:AI+大數據+雲計算)的模式能成爲了不少企業發展人工智能的標準配備。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大數據系統和深度學習訓練系統通常是分開獨立的兩套系統。大數據系統中的數據,通過 IO 轉移到深度學習訓練系統,然後進行訓練。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是,爲深度學習設置單獨的集羣,會迫使開發人員爲機器學習流程創建多個程序。"},{"type":"text","marks":[{"type":"strong"}],"text":"擁有獨立的集羣需要在它們之間傳遞大型數據集,從而引起不必要的系統複雜性和端到端的學習延遲"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,深度學習和大數據系統融合是將兩套系統打通,具體來說,就是深度學習模型在大數據系統集羣上進行訓練。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在深度學習風靡的2017年,有不少企業都在深度學習和大數據系統融合的方向上進行過探索。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最著名的項目是 Yahoo ! 研究院開源的 TensorFlow On Spark 項目。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2017年2月13日,雅虎宣佈開源 TensorFlowOnSpark ,該項目爲 Apache Hadoop 和 Apache Spark 集羣帶來可擴展的深度學習。 通過結合深入學習框架 TensorFlow 和大數據框架 Apache Spark的顯着特徵,TensorFlowOnSpark 能夠方便地實現分佈式深度學習。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨後也有一些公司推出了類似的工具,比如:2019年6月28日,阿里巴巴也推出了Flink-AI-Extended,結合了TensorFlow和Flink,旨在爲用戶提供了更方便有用的工具。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是,在工業界和學術界,這個方向可以說是相當小衆,甚至有失敗。李立說,殺手級的落地場景更是一個都沒有。對此,他表示:“我們對這個方向思考的一個結論就是,深度學習和大數據系統融合"},{"type":"text","marks":[{"type":"strong"}],"text":"全面支持不同類型深度學習是不現實的,必須給這個方向找準合適的場景"},{"type":"text","text":"。”"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"深度學習和大數據融合融合是Dead direction?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“從當前的現狀來看,深度學習和大數據系統融合,是一個 dead direction(死衚衕) 了。”李立這樣告訴InfoQ。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"李立進一步解釋道,之所以這樣說,一方面是因爲Kubernetes已經逐漸成熟,當前主流做法是基於 K8S 搭建深度學習分佈式訓練集羣;另一方面是因爲,深度學習和大數據融合,因爲大數據集羣缺少對 GPU 成熟的調度能力,並不是很適合訓練自然語言處理、計算機視覺和音頻處理領域相關的計算密集型的模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不過,他補充說:“話雖如此,但重新定位和重新選擇路線,深度學習和大數據系統融合還是能找到自己的價值。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果這個方向真能找到自己的價值,會加速大數據系統進一步完善其 GPU 調度能力。李立認爲,目前 Spark、Storm 和 Flink 的 GPU 調度能力都相對不成熟,而且這方面的進展不快。其中原因就是沒有太大的場景需要大數據系統有成熟的 GPU 調度能力。但一旦深度學習和大數據融合有比較大的價值的話,大數據系統進一步發展其GPU 調度能力,就有了充足的理由和需求。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"結語"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"深度學習與大數據系統融合雖然已經發展多年,但目前仍熱並不被工業界的實踐所接受,有人認爲該方向已經是 dead direction,但是李立及其團隊卻依舊發掘了一些新的可能。想要了解更多詳細信息,可以來"},{"type":"link","attrs":{"href":"https:\/\/qcon.infoq.cn\/2020\/shenzhen\/","title":"xxx","type":null},"content":[{"type":"text","text":"QCon深圳"}]},{"type":"text","text":"現場與李老師進行更深入的交流。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/9e\/a4\/9eda9f5ac8b9fyy68c4cf50e100e64a4.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"大會召開在即,掃碼圖中二維碼或"},{"type":"link","attrs":{"href":"https:\/\/qcon.infoq.cn\/2020\/shenzhen\/schedule","title":"xxx","type":null},"content":[{"type":"text","text":"點擊這裏"}],"marks":[{"type":"size","attrs":{"size":10}}]},{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"可查看大會日程。會議諮詢:17310043226(同微信)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"採訪嘉賓簡介"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"李立,騰訊互動娛樂增值服務部專家研究員,畢業於北京大學計算機系,豐富技術研發經歷,主要研究領域包括機器學習、推薦系統和遊戲 AI Bot 等,參與和負責多個國家課題項目,發表多篇學術論文和擁有多項技術發明專利。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章