圖算法在網絡黑產挖掘中的思考

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"導讀:"},{"type":"text","text":"虛擬網絡中存在部分黑產用戶,這部分用戶通過違法犯罪等不正當的方式去謀取利益。作爲惡意內容生產的源頭,管控相關黑產用戶可以保障各業務健康平穩運行。當前工業界與學術界的許多組織通常採用樹形模型、社區劃分等方式挖掘黑產用戶,但樹形模型、社區劃分的方式存在一定短板,爲了更好地挖掘黑產用戶,我們通過圖表徵學習與聚類相結合的方式進行挖掘。本文將爲大家介紹圖算法在網絡黑產挖掘中的思考與應用,主要介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖算法設計的背景及目標"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖算法GraphSAGE落地及優化"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"孤立點&異質性"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總結思考"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"圖算法設計的背景及目標"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 圖算法設計的背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在虛擬網絡中存在部分的黑產用戶,這部分用戶通過違法犯罪等不正當的方式去謀取利益,比如招嫖、色情宣傳、賭博宣傳的行爲,更有甚者,如毒品、槍支販賣等嚴重的犯罪行爲。當前工業界與學術界的許多組織推出了基於圖像文字等內容方面的API以及解決方案。而本次主題則是介紹基於賬號層面上的解決方法,爲什麼需要在賬號層面對網絡黑產的賬號進行挖掘呢?"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d2\/d2421ad8f8618f8a2cd994b89bf138b6.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原因主要有三:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"① 惡意賬號是網絡黑產的源頭,在賬號層面對網絡黑產的賬號進行挖掘可以對黑產的源頭進行精準地打擊;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"② 賬號行爲對抗門檻高,用戶的行爲習慣以及關係網絡是很難在短期內作出改變的,而針對單一的黑產內容可以通過多種方式避免被現有的算法所感知,雖然黑產用戶可能不懂算法,但其可以通過“接地氣”的方式來干擾算法模型,譬如在圖片上進行簡單的塗抹,在敏感處打上馬賽克,在圖片處加上黑框,通過簡單的對抗手段會對基於黑產內容的算法產生較大的影響;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"③ 可以防範於未然,通過賬號層面的關聯提前圈定可疑賬號,在其進行違法犯罪行爲之前對賬號進行相應的處理以及管控。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"具體通過什麼方式挖掘黑產賬號?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,簡單介紹下在推薦場景中應用。比如廣告推薦,通常上,廣告商會給予平臺方用戶的用戶標籤,用戶存在用戶標籤之後,平臺方則會將相關類別的用戶找出,然後將廣告推送給對應的用戶;另一種方式是廣告方提供種子包給平臺方,平臺方會找到相似的用戶,然後將廣告推送給相關的用戶,常見的應用場景有Facebook look like、Google similar audiences。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"在黑產場景中與推薦場景中的應用類似,主要分爲兩個任務場景:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"① 找出目標惡意類別用戶。比如需要找出散播招嫖信息的用戶,則給定該類用戶招嫖的標籤,類似於一個用戶定性的問題;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"② 黑產種子用戶擴散,即利用歷史的黑產用戶進行用戶擴散以及用戶召回,可以通過染色擴散以及相似用戶檢索等方式完成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對惡意用戶定性的傳統方法,通常採用樹形模型,比如說XGboost、GBDT等。這類算法短板顯而易見,其缺乏對用戶之間的關聯進行考慮;另外一種用戶召回方式爲用戶社區劃分(相似用戶召回),其中比較常用的社區劃分算法有FastUnfolding、Copra等。這類算法的缺陷也相當明顯,其由於原本社區規模小,所以最終召回的人數也少。且會存在多個種子用戶在同一個社區的情況,難以召回大量可疑用戶。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/88\/88b00561b42307ff24c7610022dfb6bc.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,通過圖表徵學習與聚類相結合的方式進行召回。通過圖表徵學習將圖結構的節點屬性以及結構特徵映射到一個節點低維空間,由此產生一個節點特徵,然後再去進行下游的任務,如用戶定性即節點分類等。其中,圖表徵學習的關鍵點在於在進行低維的映射當中需要保留原始圖的結構和節點屬性信息。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/1b\/1bb2a4f4be694cd2b316f5bc780ec201.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 圖算法設計的目標"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"① 算法的覆蓋率和精準度;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"② 用戶分羣規模合理,保證分羣的可用性;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"③ 支持增量特徵,下游任務易用性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於業務場景更多爲動態網絡,當新增節點時,如果模型支持增量特徵,則不需要重複訓練模型,可以極大的減少開發的流程,節省機器學習的資源,縮短任務完成的時間。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b8\/b83e0c17cbd20b0f3b68ac4af9ec4c01.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"圖算法GraphSAGE落地及優化 "}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. GraphSAGE核心思想"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"GraphSAGE核心思想主要爲兩點:鄰居抽樣;特徵聚合。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/4b\/4b9634b3b6424b7dd0ddeb47e45143a5.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"GraphSAGE的聚合過程實際是節點自身的屬性特徵和其抽樣的鄰居節點特徵分別做一次線性變換,然後將兩者concat在一起,再進行一次線性變換得到目標節點的embedding特徵。最後利用得到的目標節點的embedding特徵進行下游的任務,訓練的方式的可以採用無監督的方式,如NCE Loss。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. GraphSAGE的優點"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"GraphSAGE通過鄰居抽樣的方式解決了GCN內存爆炸的問題,同時可以將直推式學習轉化爲歸納式學習,避免了節點的embedding特徵每一次都需要重新訓練的情況,支持了增量特徵。爲什麼通過鄰居隨機抽樣就可以使得直推式的模型變爲支持增量特徵的歸納式模型呢?在原始的GraphSAGE模型(直推式模型)當中,節點標籤皆僅對應一種局部結構、一種embedding特徵。在GraphSAGE引入鄰居隨機抽樣之後,節點標籤則變爲對應多種局部結構、多種embedding特徵,這樣可以防止模型在訓練過程過擬合,增強模型的泛化能力,則可以支持增量特徵。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/7e\/7ee164401623e205794c8fe42063f976.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. GraphSAGE的缺點"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"① 原GraphSAGE無法處理加權圖,僅能夠鄰居節點等權聚合;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"② 抽樣引入隨機過程,推理過程中同一節點embedding特徵不穩定;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"③ 抽樣數目限制會導致部分局部信息丟失;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"④ GCN網絡層太多容易引起訓練中過度平滑問題。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4. GraphSAGE的優化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲解決上述GraphSAGE存在的缺點,對GraphSAGE進行優化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"① 聚合優化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解決等權聚合的問題。相對於直接將鄰居節點進行聚合,將邊權重進行歸一化之後,點的鄰居節點的特徵進行點燃,最後再進行特徵融合。這樣做的好處主要有兩點:邊權重越大的鄰居,對目標節點影響越大;節點邊權重歸一化在預處理階段完成,幾再與目標節乎不影響算法速度。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/75\/759e16424eeb7506e298499183274ab1.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"② 剪枝優化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解決embedding特徵不穩定的問題。在訓練的過程希望通過引入隨機過程防止模型出現過擬合的現象,但是在模型的推理過程式是想要去掉這樣一個隨機過程。直接對原始網絡進行剪枝操作,僅保留每個節點權重最大的K條邊,在模型進行推理的時候,會將目標節點所有的K個鄰居節點的特徵都聚合到目標節點上,聚合方式同樣爲加權的方式。這樣做的好處主要有兩個點:在網絡結構不變的情況下,保證同節點embedding特徵相同;在保證算法精度的前提下,大幅度降低圖的稠密程度,降低內存開銷。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b7\/b712f119d37770f752a2dcb0b8fb19dc.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"③ 採樣優化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解決局部信息丟失以及訓練過平滑的問題。主要通過DGL的抽樣方式代替原有的抽樣方式,具體的做法爲:提前將每一個節點的屬性特徵與它所有的鄰居節點的屬性特徵的均值進行concat,這樣可以使得每一個節點初始狀態下已經包含了周圍一些鄰居節點的一些信息,通過這種方式,在採樣相同節點的前提下,可以獲得更多的局部信息。一般情況下,GCN模型採用兩層網絡模型,當增加至第三層的時候則將存在內存爆炸的問題;當增加至第四層時,則將出現過平滑的問題,將導致特徵分佈去重,這樣則導致節點沒有區分性。而採用DGL採樣,通過採樣兩層GCN模型而實際上採樣了三層,而且不會出現過平滑問題。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/e6\/e6ee7891f2577ce9d49326f4339b1960.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5. 效果評估"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"效果評估的指標主要有兩個:聚類(社區)準確率;召回惡意率。相對於原有的fastunfolding以及node2vec從聚類準確率、召回惡意率、平均社區規模、運行時間作一個橫向對比:"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/fd\/fd63d64b484477240f84fb617ed10f0d.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"孤立點&異質性"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 黑產挖掘場景中的孤立點的解決思路"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"黑產用戶在被處理後,通常會快速地申請新的賬號或使用備用賬號,因爲在對黑產的挖掘過程中就不可避免地會出現孤立點,類似在推薦算法中的冷啓動問題。以node2vec算法爲例,算法通常會通過遊走去構造訓練的節點段,那麼如果孤立節點沒有連邊的話,節點是無法出現在訓練集當中。爲了解決該問題,引入一個解決推薦系統冷啓動的算法——EGES,將每一個節點的屬性特徵映射到一個embedding特徵,然後將每一個屬性的embedding特徵置於注意力層進行處理,比如將N個隨機特徵通過注意力加權,可以獲得最終的一個節點層面的embedding特徵,新增的節點將不再依賴於關係網絡以及用戶的一些交互行爲,新增的節點可以通過自身的屬性特徵就直接獲得我們的embedding特徵,不需要考慮用戶關係從而解決孤立點的問題。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/52\/524ddb4bba7194cdd963c68a6e331c1a.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在具體落地過程中,提出了GraphSAGE-EGES算法,實際上是綜合了兩種算法的優勢,GraphSAGE的節點本身的初始特徵將其替換成了EGES增強之後的屬性特徵,通過此類方式,最終的算法框架如下圖所示:"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b8\/b882d53651aa4973e142d1af68a21818.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此類算法可以提升聚類準確率2個百分點。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 黑產網絡中異質性的解決思路"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在正常的網絡結構當中,一個用戶的一階鄰居基本上都是同一類的用戶,比如說在學術引用當中,一篇數據挖掘的論文,引用其的論文也多是與數據挖掘相關的。這一類的網絡稱之爲同質性網絡。但在黑產的關係網絡當中,圖的異質性就非常高了,黑產用戶不僅僅與黑產用戶相關,其也可以與正常用戶建立關係,這種特殊的網絡結構就會存在一些弊端,以下圖異質性網絡爲例,圈住的正常節點的一階鄰居節點一半爲惡意賬號,算法進行預測、聚類時,該節點很多概率會被判定爲惡意賬號。圈住的惡意節點的一階鄰居3個皆爲正常賬號,算法進行預測、聚類時,該節點則大概率被判定爲正常節點,導致算法的精度下降。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a2\/a23740394d22ad358f00cf9d719e9e8f.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了解決上述問題,需要去考慮網絡的結構是否合理。爲了構建合理的網絡結構,需要將惡意賬號與正常賬號之間存在的聯繫剔除掉,並將惡意賬號之間的聯繫進行一定的增強。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/50\/500b435bd7e86f9baace42eeda2a8e28.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當網絡結構合理時,算法進行預測、聚類時會更加準確,因此引入圖結構學習的概念,嘗試用LDS算法解決這類問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"LDS算法的思想:在訓練GCN模型的參數的同時對網絡的結構進行調整,在最初的時候給予一個網絡結構(鄰接矩陣),先固定GCN的模型,然後訓練鄰接矩陣,通過幾輪迭代之後再固定鄰接矩陣,再訓練GCN模型,通過幾輪迭代之後,可以得出一個合理的網絡結構。總的來說,這個算法實際上就是一個極大似然估計以及伯努利分佈的問題。在LDS算法學習鄰接矩陣的時候實際就是學習兩個點的鄰邊是否應該存在,實際上爲一個0-1分佈。最終通過網絡結構以及節點的標籤去預估在當前數據標籤的情況下,更應該得到什麼樣的一個網絡結構,以上即爲該算法的核心思想。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實際上,在許多業務場景當中會存在許多不合理的圖結構,甚者在某些業務場景中不存在關係信息,這樣的話,在最初達不到完整網絡的情況時,通常會使用KNN的方式對網絡進行初始化,然後再去學習一個更加合理的網絡結構,最終達到一個更好節點預測、聚類的目的。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結思考"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面分享幾點在算法落地以及算法選擇中的一些工作總結與思考:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"① 針對圖算法這塊,特徵工程和圖的構建方式是非常重要的。如果圖的結構不合理的話,即使算法模型再強大、特徵工程處理得再好,算法訓練出的結果也不是最終理想的效果;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"② 多數業務場景的區分度是不一樣的,不存在一個普適的算法可以解決所有業務場景存在的問題,如上述的FastUnfolding、node2vec在某些特定的業務場景下效果可以比GraphSAGE的效果更好,所以在面臨具體問題的時候,需要結合場景作算法選擇以及優化;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"③ 在工業界落地的算法通常比較直接、明瞭,這樣的算法往往效果更好。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"嘉賓介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Harry,騰訊高級研究員。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:DataFunTalk(ID:datafuntalk)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/sZ7VQz26c5mrWAsnMKx8Hw","title":"xxx","type":null},"content":[{"type":"text","text":"圖算法在網絡黑產挖掘中的思考"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章