教你用5步檢測出企業網絡中的惡意爬蟲

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"惡意爬蟲(bot)在企業網絡的安全漏洞中起到至關重要的作用。這已經不是什麼祕密。爬蟲經常被惡意軟件利用,在企業網絡中傳播。但檢測和移除惡意爬蟲卻很複雜,這是由於操作環境中的許多日常進程,諸如軟件更新,用的都是爬蟲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"直到最近,一直都沒有什麼有效的方式讓安全團隊能區分出“好爬蟲”和“壞爬蟲”。開源源碼和社區規則聲稱它們可以辨別爬蟲,但收效甚微;誤報太多。最後,安全分析專家會因爲追蹤分析“好爬蟲”觸發的無關緊要的安全警報而疲於奔命。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在Cato,我們保護客戶網絡時也面臨相同問題。爲解決這個問題,我們發明了一種新方法,在我們的安全即服務中實施的多維度方法論,可以比單純使用開源源碼或社區規則多鑑別72%以上的惡意事件。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最重要的是,你可以在自己的網絡中採用類似策略。您的工具將會是任何網絡工程師的交易手段:訪問您的網絡,像分接傳感器一樣捕獲流量,保留足夠的磁盤空間來存儲一週的數據包。下面是如何分析這些捕獲的數據包,以便更好地保護你的網絡。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"辨別惡意爬蟲和流量的五項原則"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這篇文章中,我們將引入多維度的方法來分辨惡意爬蟲。僅憑一條規律或許並不能讓我們準確辨別出惡意爬蟲,但綜合多條規則並加以分析利用,將讓這些爬蟲無所遁形。方法並不難懂,是常見的逐步縮小狩獵範圍,從人們日常產生的會話縮小到危害網絡安全的那部分會話。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"具體步驟如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"區分人類與爬蟲"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"區分瀏覽器與其他用戶客戶端"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"找出瀏覽器中的爬蟲"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分析payload"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"確定目標威脅程度"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面,讓我們一步一步地看:"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"根據對話頻率區分人類與爬蟲"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爬蟲的本質決定了它們會更傾向於與對象進行連續不斷的會話。這是由於它們需要接收命令、發送KeepAlive信號,或者是滲出數據。區分爬蟲與人類行爲的第一步便是要揪出這些與目標機械性的重複對話。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們需要找到那些與多個目標有規律且不間斷對話的主機,過往經驗告訴我們,收集一週左右的流量便足以判斷這些客戶端與目標間對話的本質。從統計學的角度來看,這些會話越是有規律,他們越有可能是由爬蟲生成的(見圖一)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/cf\/cf54b98bceb0c7849991f4554369b3b7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖中所示爲今年五月中旬時收集的爬蟲對話頻率,對話產生的流量均勻分佈,基本可以確定是爬蟲流量。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"區分瀏覽器與其他用戶客戶端"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"僅僅是分辨出爬蟲寄居的主機並沒有太大的用處,如我們之前提到的,大多數的主機都可能產生爬蟲流量,因此我們還需要搞清楚網絡中客戶端的通信類型。一般來說,良性爬蟲存在於瀏覽器之中,而惡意爬蟲則相反。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"操作系統中有很多種客戶端和庫都會產生流量,像是“Chrome”、“WinINet”,以及“Java Runtime Environment”都是客戶端類型。乍一看這些客戶端產生的流量可能大同小異,但通過一些方法,我們還是可以將他們區分開來的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先是應用層的頭。因爲大多數防火牆都會在設置裏允許HTTP和TLS訪問任意IP,許多爬蟲都會利用這些協議與他們的目標進行對話。我們只需要辨別客戶端設置中的HTTP和TLS特徵羣組,便可輕易揪出在瀏覽器外運行的爬蟲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每一個HTTP session都有請求頭來定義請求,以及服務器端的應對方法。這些頭的序列、值,都是在生成HTTP請求的時候就設定好的(見圖二)。類似的,TLS的session屬性,例如加密套件、擴展列表、ALPN(應用層協議協商)和橢圓曲線等,這些都是在初始的TLS消息中確定的。這條初始的消息又被稱作“ClientHello”消息,也是未經過加密處理的。根據HTTP和TLS屬性種類的不同進行聚類,可以在一定程度上將爬蟲進行分類。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"舉例來說,如果我們捕獲到了加密套件不相符的TLS流量,那麼我們基本就可以確定這些流量是在瀏覽器外生成的了,這種非人爲行爲也就意味着是爬蟲在製造流量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/4d\/4d2ac8dc7ae3137410f8b46a668d3b36.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖中的例子是某條由Windows加密庫生成的消息的頭。通過查看與設置不相符的seq、key以及value可以辨別出爬蟲的存在。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"找出瀏覽器中的爬蟲"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"判定惡意爬蟲的另一種方法則是通過觀察HTTP頭中的一些特定信息:網絡瀏覽器的HTTP頭通常都是規範且清晰的。在正常的瀏覽session中,點擊瀏覽器中的鏈接後生成的請求頭中會包含Referer,標明這個鏈接的來源。爬蟲流量則相反,直接訪問鏈接的請求頭中Referer會爲空,部分惡意訪問甚至可以仿造Referer。因此,在所有信息流中都長得一模一樣的請求,大概率是爬蟲流量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/58\/58440ec2b7fcce0b0eab6a92d59cb6a2.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖中顯示的是在瀏覽session中包含Referer的頭的示例"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"User-Agent(UA)這個字符段代表了程序在發起對話請求,一些諸如fingerbank.org這類的第三方服務會將UA裏的程序版本號與已知版本相對應,並試圖通過這條信息檢測異常爬蟲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"舉例來說,最新版本瀏覽器會在UA字段使用“Mozilla\/5.0”,低於這個版本的Mozilla或者缺失這條信息的請求基本意味着會是異常爬蟲。有信譽的瀏覽器產生都流量不會不攜帶UA值。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Payload分析"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是,我們並不希望異常爬蟲的檢查被侷限在HTTP和TLS協議上,所有我們需要將更多的協議納入考慮範圍。以IRC協議爲例,IRC上的爬蟲爲殭屍網絡貢獻了不少有生力量。我們可以通過觀察在已知端口上,使用固定未知協議的現有惡意軟件樣本,利用應用識別來標記惡意爬蟲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而流量的方向(入方向或出方向)在鑑別爬蟲時同樣有重要價值。直連到互聯網的設備通常會被暴露在掃描操作中,因此,我們可以將這些爬蟲看作是入站掃描器。另一方面,出站掃描行爲則代表了該設備已被掃描器感染。感染的後果則是目標可能會被襲擊,降低企業的IP地址信譽。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下圖顯示了在某一時間段內的流量圖,這樣的活動軌跡很可能是掃描機器人的傑作。這種類型的圖可以通過計算流量\/秒來進行數據分析。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a9\/a9586e9e279b69eb0f86a9ebbee8d3e5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"高頻率出站掃描活動示例"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"目標分析:瞭解你的目的地是什麼"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在瞭解過從客戶端-服務器的通信頻率以及客戶端類型判斷惡意爬蟲後,我們將引入另一判定維度:爬蟲的目標或者說目的地。判定惡意爬蟲的目標需要考慮兩種因素:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一是目標的信譽"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"二是目標的受歡迎度"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目標信譽可以從各種渠道中收集來的經驗,判斷一個域受到惡意攻擊的可能性而計算得出。信譽的判定需要由第三方服務給出,或者是通過收集用戶反饋的受到攻擊的報告得出。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然而,通常情況下,僅僅是憑藉着對目標信譽、URL信譽的判斷,並不足以標記惡意爬蟲。每個月都要有數以百萬計的新域名註冊成功,在沒有充足背景調查下的域名信譽判定系統並不能很好地給出該域名是否可信的判斷,由此帶來的高誤判率也證明了這一點。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總而言之"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"將文中的方法總結後,可以發現,如果session滿足以下條件,其爲惡意爬蟲的可能性很高:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由計算機生成而非人爲"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"產生於瀏覽器之外,或是帶有異常元數據的瀏覽器流量"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與低人氣目標溝通,目標如果被標記爲惡意或未歸類,那麼將更可疑。正常或良性爬蟲不應當與低人氣目標對話。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"練習:揭開仙女座惡意軟件的網絡面紗"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過綜合這幾種判定手段,我們可以找出網絡中的各類危險因素。光說不練假把式,下面我們將通過“仙女座爬蟲”這個經典例子來練習。“仙女座”是其他惡意軟件的常用下載器,而通過文中介紹的四種手段進行數據分析,我們將揭露“仙女座”爬蟲的真面目。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"目標聲譽"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多家高信譽的網站均判斷“disorderstatus.ru”爲惡意域名,他們給這個域名打上的標籤大多是:“已知感染源”,“殭屍網絡”之類。然而,僅憑如此,我們並不能直接判定與該域名對話的主機感染了仙女座病毒:用戶可能只是訪問過這個網站而已。更何況,這個URL大概率只會被歸類爲“未知”或“非惡意”。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"目標受歡迎度"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"訪問某個目標的用戶萬中無一,這個“一”很不尋常,也爲該貢獻了“低人氣”分數。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"對話頻率"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在一週的數據收集中,我們注意到了客戶端與目標在連續三天內均有流量產生。重複的對話意味着爬蟲的存在。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/10\/10ec4a0698534eea07066898b3354e3c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"用戶與disorderstatus.ru在三天內均有持續一小時的client-target對話流量產生"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"頭分析"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"請求頭中的“User-Agent”值爲“Mozilla\/4.0”,無效的服務器版本信息意味着該UA多半是爬蟲製造。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/83\/831a6707b1a40ef3d92afb485fe8e710.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖中是我們在與disorderstatus.ru對話時捕獲的HTTP頭。值得注意的是,在我們捕獲的所有請求中,沒有一個頭包含“Referer”。另外,UA值爲“Mozilla\/4.0”。這兩點都指向了Andromeda會話。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"IP網絡中的爬蟲檢測正在逐漸成爲檢測惡意軟件的基本組成部分之一,雖然不易,但我們相信,通過文中介紹的這五種手段的組合變種,爬蟲檢測將會更加有效。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https:\/\/www.catonetworks.com\/blog\/how-to-identify-malicious-bots-on-your-network-in-5-steps"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章