5G+智能時代的多模搜索技術

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2010年隨着iphone4的發佈,智能手機被廣泛使用,從大學生到老人小孩,移動互聯網的發展如火如荼。近兩年,5G技術讓下載速度變得越來越快,相較於傳統的文本搜索技術,語音搜索和圖片搜索等新型搜索方式出現在越來越多的產品形態當中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"今天的內容主要分爲4個部分來介紹多模搜索技術:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多模搜索:始於移動,繁榮5G+智能時代"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"語音搜索:聽清+聽懂+滿足"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"視覺搜索:所見即所得"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“破圈”:無限可能"}]}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"多模搜索:始於移動,繁榮5G+智能時代"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 多模搜索的概念"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/1b\/1b06435f52f4b7b931fa937242b5b43b.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多模搜索包括視覺搜索和語音搜索兩種形式。在百度app的下方,語音按鈕的部分,是語音搜索的入口。在搜索框的右側相機的按鈕,是視覺搜索的入口。語音搜索可以很好地替代文字搜索,而視覺搜索,可以幫用戶方便的找到圖片背後所具有的信息。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 百度爲什麼在2015年開始多模搜索技術的積累"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/da\/daef4e6513dbe872d5d77d654b8e9f02.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以iphone手機爲代表的智能手機時代到來,使得語音輸入成爲可能。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4G通訊網大大提高了上傳下載速度,上傳圖片不再困難。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們的網民從中青年開始向兩側的小孩兒和老人擴展。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. 多模搜索在5G新時代產生的變化"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/e5\/e58f3bf7d05ec85a95ca24060ae6e1cf.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"沉浸式體驗。在5G帶寬更加強大之後,我們需要有一些超越視頻的更加沉浸式的體驗。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"延時的降低。在5G推廣開來之後,尤其是藉着雲邊端三個階段的服務的部署,包括很多的模型,從雲端前置到端,這樣的話可能會帶來很大的便利。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"新的硬件。伴隨智能音響、藍牙耳機、智能手錶、智能眼鏡的廣泛使用,進一步催生多模搜索的用戶需求。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"語音搜索:聽清+聽懂+滿足"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 語音搜索的目標有三個,聽清、聽懂、滿足"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"聽清:準確地將我們說話的語音信號轉換到文字,這裏面臨的挑戰其實非常的多:① 環境比較嘈雜;② 方言;③ 聲音過小。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"聽懂:即使當我們去把語音轉換成文字,也不代表我們就能按照傳統的搜索的方法把轉換後的文字直接丟給搜索引擎去理解。原因:① 口語化的問題;② 長尾的問題;③ 連續搜索。舉個例子,第一次用戶會問“倫敦現在幾點”,但下一次他不會再說“巴黎現在幾點”,會直接問“巴黎呢”。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"滿足:一些特定的語音入口,比如智能音箱上,我們不太可能會把前十的搜索結果去給用戶都播報一遍,我們只能給用戶最精準的top1結果。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 技術方案"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/0b\/0bab7691f9b7ac610626c98bae363115.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整體這三個階段大的技術框架如下:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先在聽清這個環節。從輸入角度來看,有兩個:① 語音識別。這個階段更多的是把聲學的信號轉換成基礎的文本文字;② 語音糾錯,語音糾錯會把用戶原始的文字表達改變成適合搜索引擎去真正理解的query。在最後的內容表達的過程中,會經過一些播報生成、語音合成,使得交互更加的自然。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在聽懂的環節主要有四個部分:① query的泛化,這一步的目標其實會把整個用戶的長尾表達,映射到搜索引擎更加好理解的一些比較高頻的Query上去;② 對口語理解,可以轉化爲QA問題;③ 上下文的理解;④ 整個搜索session的管理"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在滿足這個環節層面。站在整個百度通用搜索的肩膀上,在某一些特定的場景需要給用戶一些更精準的一些表達,所以這裏面需要智能問答的技術,還包括知識圖譜的技術,最後提供一些特定服務。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"視覺搜索:所見即所得"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 目標"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/87\/8710a6561146612c3e8828cfee675487.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"視覺搜索要做的事情就是所見即所得,無論是用戶通過手機拍攝,或者是通過攝像頭實時攝像看到的一些東西,我們都能給到其背後的內容,這裏面大概有三個挑戰。"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"交互。交互技術是一個很重要,影響用戶整個的交互效率比較高的環節。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"感知。不同於文本搜索把每一段文字或者自己的需求通過比較高級別抽象的東西去表達出來,視覺搜索需要從像素級來感知和構成更高級的物體級信息。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"識別。理解整個由像素集合所代表的一個個物體背後的信息。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 成就"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經過幾年的努力,百度取得了比較好的技術積累,實現了全球比較領先的視覺感知和搜索引擎。從交互上來看,基本可以在100毫秒左右,在手機端上就能給用戶一個很好的感知反饋,同時覆蓋了60多個場景,索引了8000多萬種的實體,幾十億的商品,還有1000多億的圖片。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. 視覺技術"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f9\/f9c5eb7d5b8932fb5fa15eed947266ff.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整個視覺搜索的技術大概分爲三個層面,"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一個層面是視覺感知,這個層面主要再用戶的手機本地計算,這裏麪包括2D和3D的檢測,2d和3D的跟蹤,包括一些簡單的場景識別,還有支持AR定位和渲染。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二個環節是視覺識別,當我們在手機端完成這些感知之後,需要對感知到的這些物體做更詳細的信息搜索和滿足。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第三個是基礎技術,主要用來支撐上面的感知和識別,包括圖像的理解、文本的理解、視頻的理解,還有關於人體、人臉等等一些感知技術,也包括一些基礎的雲和端上的性能優化、多模態的QA技術等。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4. 視覺感知流程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"把視覺感知打造成一個基於視頻流的感知計算和MR交互的一個框架,這個框架是完全在端上去計算,主要包含六個流程"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"檢測與分割。主要是發現畫面裏面的一些基礎的物體,以及它的一些物體類型。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"跟蹤。因爲我們要做連續的交互,畫面會有持續移動,所以就會需要做一些跟蹤和定位,去保持住跟蹤物體的具體位置。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"粗粒度理解。端上對整個流量做一些簡單理解,起到流量精準分發的目的。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"AR展現。在雲端搜索結果返回之後,把結果信息通過AR的方式展現。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MR交互。通過手指或者肢體表情交互,讓用戶可以與AR內容做更進一步的交互和交流。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"應用場景。最後是支撐已有的幾個產品形態,包括動態多目標識別、拍題搜索、AR翻譯、實時取詞等。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"視覺感知算法的演進"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ec\/ec749606f6f2837b72dcc2d13c877f2e.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從2017年開始,我們第一次嘗試在端上做物體檢測技術,它的目標就是做輕量級模型。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二個階段是在2018年,在做連續的幀檢測的時候,解決新的檢測穩定性的問題。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2019年開始,我們想進一步提升小物體的檢測效果,在Multi-Scale檢測和網絡結構自動搜索上做了一些工作。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後到20年會更多的去通過多階段蒸餾和anchor free模型探索,進一步提升整體的檢測效果。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"算法迭代"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一代檢測算法:首先在第一個階段輕量級的階段,我們選型直接選了one stage的檢測方法,使用公開的mobilenet-v1結合剪枝增加模型的速度,loss在layer層面也做了一些簡單的優化,並且嘗試使用了focal loss。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二代檢測算法:剛剛提到我們發現了一個新的問題,在連續檢測時檢測的輸出會發生很大的變化,這是影響着連續幀的檢測穩定性的一個根本的原因。我們首次提出問題的定義和量化公式,最終結合多幀信息非常好的解決了檢測穩定性問題,同時兼顧了性能。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第三代檢測算法:第三代的主要目的是解決小物體召回,基本上從兩個階段來入手,第一個就是整體的網絡模型結構,yolo-v3對小物體會更加友好,第二個是引入了網絡的結構化搜索,會進一步提升對小物體的優化。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第四代檢測算法,我們觀察到,雖然改進後yolo-v3效果已經很不錯,但是與retinanet-50甚至更大的模型相比,還是有很大的差距。因此希望通過蒸餾的方式進一步提升檢測結果的準確率。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5. 視覺識別檢索的流程"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a8\/a8e9adaea07df4f53cb77e9e82239b50.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基本的流程是:基於sift或者cnn提取的特徵,然後使用ANN進行檢索。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們大概經歷了三個階段的演進:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一個階段是2015年剛開始做的時候,基於有監督的方式。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二個階段是2018年的時候,我們開始引入了半監督的方式,通過數據驅動訓練圖像以及視頻的特徵表示。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從20年開始,我們從半監督向無監督的方式去升級算法,希望利用到更多的數據去學習到一個更加適合任務自身場景、泛化能力也更好的特徵表示。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"有監督方法存在的問題"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一個有標註的數據,數據規模往往比較小,而且噪聲也相對來說比較多一些。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二點由於它的規模比較小,所以樣本多樣性往往是不足的。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第三個問題是通過這種人工標註,有多少人工就有多少智能,標註成本也是非常高的,而且這個週期也比較長。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"無監督方法的選型和演進"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/7e\/7e6795c5876023a1fd5169b791d22a62.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一種是比較偏向於傳統的譜聚類的方式實現,根據向量得到pair之間的相似度,然後將聚類的編號作爲數據的label。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/9a\/9a27ec7a53053536ded44c2be0bbc334.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二種是以BYOL算法代表的,將圖片經過多種增強後得到自身的變形,通過原始圖和變形圖之間的對比學習,學習到整個圖片比較好的特徵表達。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"“破圈”:無限可能"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"破圈無限可能,未來多模搜索的技術和產品形態會往哪裏發展?度曉曉App是答案之一。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/54\/542309c0d732f95d63c76a0955a2c139.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"度曉曉是在20年百度世界大會上第一次發佈的多模搜索新的產品,從技術角度來看它是語音、視覺、文本三大技術領域的一個交匯,匯聚了語音識別、圖像識別,還有智能搜索、NLU和多輪對話的技術,同時它本身有一個虛擬形象存在,富含多種情感的語音合成技術,它的背後是百度多種技術方法和產品的融合,包含着信息和服務搜索,以及互動娛樂的部分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"未來通過更多的整合多模搜索技術,將產生更多無限可能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"文章作者:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"李國洪"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百度資深研發工程師 | 百度多模搜索策略負責人"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:DataFunTalk(ID:dataFunTalk)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/SZOwA-LD_JU8fhdAAw2QtA","title":"xxx","type":null},"content":[{"type":"text","text":"5G+智能時代的多模搜索技術"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章