當selenium被識別爬蟲後

因爲某站發版，在修一個以前的項目，用Selenium驅動Chrome來做的，然後在某頁面需要點擊，無論怎麼做都失效，我嘗試過如下方法：

原始的點擊，如：driver.find_element_by_id('id').click()
瀏覽器執行js，如：driver.execute_script('document.getElementById("id").click()')
Selenium行爲事件ActionChains，其中的move_to_element、move_to_element_with_offset等等方法都嘗試過

最後，我手動在Selenium驅動打開的Chrome瀏覽器中去點擊該按鈕，但是無效。此刻我判斷對方已經識別我的Chrome是爬蟲了。

在stackoverflow上有一個問題，Can a website detect when you are using selenium with chromedriver?

下面有很多回答，其中有一個回答引用了某位CEO的講話：

Even though they can create new bots, we figured out a way to identify Selenium the a tool they’re using, so we’re blocking Selenium no matter how many times they iterate on that bot. We’re doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious.

所以Selenium並不是萬能的，很多方法可以檢查出你到底是不是爬蟲，那麼有什麼應對方法呢？

有人回答說去修改 chromedriver 的源碼，那還不如自己去寫一個瀏覽器呢。

萬萬沒想到，最後我還是成功了。

方法很簡單，就是去驅動Firefox，而不是Chrome。

1self.driver = webdriver.Firefox()

就這樣一行代碼解決了。

至於裏面的原因是什麼，在網上找了很久Firefox與Chrome的區別，然後搜了下selenium的原理，如下

當Selenium2.x提出了WebDriver的概念之後，它提供了完全另外的一種方式與瀏覽器交互。那就是利用瀏覽器原生的API，封裝成一套更加面向對象的SeleniumWebDriverAPI，直接操作瀏覽器頁面裏的元素，甚至操作瀏覽器本身（截屏，窗口大小，啓動，關閉，安裝插件，配置證書之類的）。由於使用的是瀏覽器原生的API，速度大大提高，而且調用的穩定性交給了瀏覽器廠商本身，顯然是更加科學。然而帶來的一些副作用就是，不同的瀏覽器廠商，對Web元素的操作和呈現多少會有一些差異，這就直接導致了SeleniumWebDriver要分瀏覽器廠商不同，而提供不同的實現。例如Firefox就有專門的FirefoxDriver，Chrome就有專門的ChromeDriver等等。

所以建議以後若發現驅動Chrome失敗，可以嘗試一下Firefox

當selenium被識別爬蟲後

Node 單機集羣入門實戰

MongoDB中的參數限制與閥值詳析

App爬蟲思路

神經網絡基礎及Keras入門

WebSocket爬蟲之爬取龍珠彈幕

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結