解決selenium + chromedriver被知乎反爬的問題1
當使用selenium去某寶或其他網站進行爬蟲或者模擬登陸時,會出現滑動驗證碼,並且無論是用ActionChains滑還是手動滑,都會很委婉的告訴你“哎呀網絡錯誤,請刷新”等等。why?
經過科學上網,查閱衆多資料,發現seleniumyou 有一些特徵值, 例如下面:
window.navigator.webdriver
window.navigator.languages
window.navigator.plugins.length
其中最主要的特徵值就是webdriver這一項。
partial interface Navigator {
readonly attribute boolean webdriver;
};
Navigator接口的webdriver IDL屬性必須返回webdriver-active標誌的值,該標誌默認值爲false或者undefined。
此屬性允許網站確定用戶代理受WebDriver控制,並可用於幫助緩解壓力,拒絕服務攻擊。
檢測方法:
檢查→Console→輸入window.navigator.webdriver
正常情況下爲false
或者undefined
(根據瀏覽器穩定)
ok,接下來我們要做的
selenium被檢測的突破——修改webdriver的特徵值,這裏使用的是利用mitmproxy通過代碼注入的方式進行修改webdriver的值:
Object.defineProperties(navigator,{webdriver:{get:() => false}});
完整代碼
# -*- coding: utf-8 -*-
# @Time : 2019/3/5 15:11
# @Author : One Fine
from selenium import webdriver
from time import sleep
import requests
try:
import http.cookiejar as cookielib
except Exception as e:
print("兼容Py2.x", e)
import cookielib # 兼容Py2.x
class ZhihuAccount(object):
""""
入口:check_login
True:
False:
"""
def __init__(self):
self.brower = webdriver.Chrome(executable_path='D:/selenium/chromedriver.exe')
self.session = requests.session()
self.session.cookies = cookielib.LWPCookieJar(filename='zhihu_cookie.text')
self.headers = {
'Referer': 'https://www.zhihu.com/signup?next=%2F',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/72.0.3626.121 Safari/537.36',
}
# 加載cookie
self.load_cookies() # 加載失敗主動拋出異常
def login(self, username='', password=''):
if username == '' or password == '':
username = input('輸入名稱:')
password = input('輸入密碼:')
self.brower.get('https://www.zhihu.com/signup?next=%2F')
try:
self.brower.find_element_by_xpath('//*[@id="root"]/div/main/div/div/div/div[2]/div[2]/span').click() # 點擊
self.brower.find_element_by_xpath('//*[@id="root"]//input[@name="username"]').send_keys(username)
sleep(2)
self.brower.find_element_by_xpath('//*[@id="root"]//input[@name="password"]').send_keys(password)
self.brower.execute_script('Object.defineProperties(navigator,{webdriver:{get:() => false}});')
status = self.brower.execute_script('window.navigator.webdriver')
self.brower.find_element_by_xpath('//*/form/button').click() # 點擊 # if status == ('None' or 'False'):
sleep(1)
# 登錄邏輯中保存session
for cookie in self.brower.get_cookies():
self.session.cookies.set_cookie(
cookielib.Cookie(version=0, name=cookie['name'], value=cookie['value'],
port='80', port_specified=False, domain=cookie['domain'],
domain_specified=True, domain_initial_dot=False,
path=cookie['path'], path_specified=True,
secure=cookie['secure'], rest={},
expires=cookie['expiry'] if "expiry" in cookie else None,
discard=False, comment=None, comment_url=None, rfc2109=False))
self.session.cookies.save()
return True
except Exception as e:
print("登錄失敗", e)
return False
def load_cookies(self):
try:
self.session.cookies.load(ignore_discard=True)
return True
except Exception as e:
print("zhihu_cookie未能加載", e)
print("正在重新登錄...")
# 第一次嘗試登錄:
if self.login():
print("cookie成功加載")
return True
else:
print("加載cookie失敗")
return False
def check_login(self):
# 通過設置頁面返回狀態碼來判斷是否爲登錄狀態
inbox_url = 'https://www.zhihu.com/settings/account'
response = self.session.get(inbox_url, headers=self.headers, allow_redirects=False)
status = True
if not response.status_code == 200:
# 第二次嘗試登錄:
# print("正在重新登錄...")
if not self.login():
status = False
# 關閉瀏覽器:
self.brower.quit()
self.session.close()
if status:
return True
else:
return False
if __name__ == '__main__':
account = ZhihuAccount()
if account.check_login():
print("登錄成功")
else:
print("登錄失敗")
參考:
selenium的檢測與突破 https://zhuanlan.zhihu.com/p/55956954
繞過selenium的檢測,實現模擬登陸 https://zhuanlan.zhihu.com/p/56040461
web自動化測試框架selenium調用JavaScript代碼常用操作解析 https://blog.csdn.net/cxx654/article/details/79949366
關於selenium獲取cookie然後實現免登陸 https://blog.csdn.net/weixin_40444270/article/details/80593058
LWPCookieJar的使用 https://blog.csdn.net/nimade511/article/details/52540437
參考: selenium的檢測與突破 https://zhuanlan.zhihu.com/p/55956954 ↩︎