1. 爬虫的代码

chrome headless 配置、基本安装和使用可以参考：
http://www.voidcn.com/article/p-hwlrznzi-bpz.html
https://blog.csdn.net/xc_zhou/article/details/80823855

爬虫的代码

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.wait import WebDriverWait

chrome_options = Options()
# 在启动Chromedriver之前，为Chrome开启实验性功能参数excludeSwitches，它的值为['enable-automation'],可应对WebDriver检测
chrome_options.add_experimental_option('excludeSwitches', ['enable-automation'])
chrome_options.add_argument('--headless')
chrome_options.add_argument('--proxy-server=http://127.0.0.1:8080')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--no-sandbox')  # 取消沙盒模式
chrome_options.add_argument('--disable-setuid-sandbox')
# chrome_options.add_argument('--single-process') # 单进程运行
# chrome_options.add_argument('--process-per-tab') # 每个标签使用单独进程
# chrome_options.add_argument('--process-per-site') # 每个站点使用单独进程
# chrome_options.add_argument('--in-process-plugins') # 插件不启用单独进程
# chrome_options.add_argument('--disable-popup-blocking') # 禁用弹出拦截
chrome_options.add_argument('--disable-images')  # 禁用图像
chrome_options.add_argument('--blink-settings=imagesEnabled=false')
chrome_options.add_argument('--incognito')  # 启动进入隐身模式
chrome_options.add_argument('--lang=zh-CN')  # 设置语言为简体中文
chrome_options.add_argument(
    '--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36')
chrome_options.add_argument('--hide-scrollbars')
chrome_options.add_argument('--disable-bundled-ppapi-flash')
chrome_options.add_argument('--mute-audio')
chrome_options.add_argument('lang=zh_CN.UTF-8')
# chrome_options.add_extension(r'C:\hdmbdioamgdkppmocchpkjhbpfmpjiei-3.0.1-Crx4Chrome.com.crx') 添加插件
# chrome_options.add_argument('--disable-extensions') 禁用插件
# chrome_options.add_argument('--disable-plugins')

DRIVER = webdriver.Chrome(executable_path="C:\chromedriver.exe",
                          chrome_options=chrome_options)

WebDriverWait(DRIVER, 1)
DRIVER.get("http://intoli.com/blog/not-possible-to-block-chrome-headless/chrome-headless-test.html")
WebDriverWait(DRIVER, 2)
page_source = DRIVER.page_source

DRIVER.quit()
print(page_source)

2. mitmproxy 脚本代码

import mitmproxy.http


class JsCheckPass:
    def response(slef, flow: mitmproxy.http.HTTPFlow):
        t = 'window.chrome = true;'
        t0 = 'Object.defineProperties(navigator,{webdriver:{get:() => false}});'
        t1 = 'window.navigator.chrome = {runtime: {},// etc.};'
        t2 = '''
            Object.defineProperty(navigator, 'plugins', {
                get: () => [1, 2, 3, 4, 5,6],
              });
            '''
        if 'chrome-headless-test' in flow.request.url or 'um.js' in flow.request.url:
            flow.response.text = t + t0 + t1 + t2 + flow.response.text
            flow.response.text = flow.response.text.replace("permissionStatus.state === 'prompt'",
                                                            "permissionStatus.state === 'promptzzzzzzzzz'")


addons = [
    JsCheckPass(),
]

3.运行

从cmd进入虚拟环境，然后运行mitmweb -s addons.py，让代理启动，再执行headless的时候配置好代理，就能修改请求和响应了，判断是否是爬虫一般都是从js判断的，也就是请求发送完，服务器响应js文件发回来，最后js在浏览器里面执行完就是最终的判断结果，只要找到这个js文件，修改里面的代码，就能修改最终的页面结果

参考教程：
https://www.cnblogs.com/yangjintao/p/10599868.html
https://blog.csdn.net/freeking101/article/details/83901842
https://blog.csdn.net/Chen_chong__/article/details/85526088
https://blog.wolfogre.com/posts/usage-of-mitmproxy/
https://www.jianshu.com/p/0eb46f21fee9

基本的使用方法看这些教程就够了

还有一种伪装方式，参考：
https://ask.csdn.net/questions/382674
https://blog.csdn.net/sinly100/article/details/79184559

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python爬虫伪装使用 mitmproxy 通过js验证

1. 爬虫的代码

2. mitmproxy 脚本代码

3.运行

Spring Cloud 部署时如何使用 Kubernetes 作为注册中心和配置中心

KubeKey 部署 K8s v1.28.8 实战

Django admin 添加自定義頁面不帶數據模型 models

macOS pip 安裝 mysqlclient 報錯

python爬蟲僞裝使用 mitmproxy 通過js驗證

redis 常用命令和常用功能（過期、內存淘汰，發佈訂閱、監控、日誌、事務等等）

redis 冰山一角

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

python爬虫 伪装 使用 mitmproxy 通过js验证

1. 爬虫的代码

2. mitmproxy 脚本代码

3.运行

python爬虫伪装使用 mitmproxy 通过js验证