python使用selenium自动化加载Firefox配置文件执行post get方法

原創

zycdn

2020-07-03 00:53

单纯使用BeautifulSoup进行爬取百度贴吧首页的时候，只能爬取到1-20条热门动态里面的图片。为了爬取到完整的热门动态里面的图片，我们则需要模拟浏览器的滚动条滚动，让网页去触发xhr请求更多的热门动态。

安装python插件

pip install selenium

安装浏览器驱动

火狐浏览器驱动

谷歌浏览器驱动

opera浏览器驱动

将下载的文件解压后添加到环境变量中。

模拟Firefox浏览器行为

必须安装浏览器和浏览器驱动，并且浏览器和浏览器驱动要匹配
浏览器驱动所在的目录要在环境变量中，或者定义浏览器browser的时候指定驱动的路径

from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver = webdriver.Firefox()
driver.get("https://tieba.baidu.com/index.html")
# 模拟滚动条滚动到底部
for i in range(1, 5):
    driver.execute_script('window.scrollTo(0, document.body.scrollHeight)')
    time.sleep(1)
html = BeautifulSoup(driver.page_source, "lxml")
imgs = html.select("#new_list li img")
driver.close()
driver.quit()

模拟Chrome浏览器行为

from selenium import webdriver
driver = webdriver.Chrome() #使用默认自动化测试
driver.get(r'https://blog.csdn.net/zycdn')
driver.close()
drvier.quit()

加载chrome配置文件失败

from selenium import webdriver
# 个人资料路径
user_data_dir = r'--user-data-dir=C:\Users\Administrator\AppData\Local\Google\Chrome\User Data'
# 加载配置数据
option = webdriver.ChromeOptions()
option.add_argument(user_data_dir)
# 启动浏览器配置 
driver = webdriver.Chrome(chrome_options=option, executable_path=r'D:\bin\chromedriver.exe')
# 一直提示错误，所以用Firefox了
'''
Warning (from warnings module):
  File "__main__", line 1
DeprecationWarning: use options instead of chrome_options
Traceback (most recent call last):
selenium.common.exceptions.WebDriverException: Message: unknown error: failed to write prefs file
'''

加载Firefox配置文件

有的扩展需要停用再启用才能使用

from selenium import webdriver
# 配置文件路径
profile_path = r'C:\Users\Administrator\AppData\Roaming\Mozilla\Firefox\Profiles\cievpga3.default-release'
# 加载配置数据
profile = webdriver.FirefoxProfile(profile_path)
# 启动浏览器配置
driver = webdriver.Firefox(firefox_profile=profile, executable_path=r'D:\bin\geckodriver.exe')
driver.get(r'http://****************.21tb.com')
driver.close()
driver.quit()

通过函数调用POST GET方法

在函数中通过post get方法获取数据

from selenium import webdriver
# 配置文件路径 帮助-故障排除信息
profile_path = r'C:\Users\Administrator\AppData\Roaming\Mozilla\Firefox\Profiles\cievpga3.default-release'
# 加载配置数据
profile = webdriver.FirefoxProfile(profile_path)
# 启动浏览器配置
driver = webdriver.Firefox(firefox_profile=profile, executable_path=r'D:\bin\geckodriver.exe')
driver.get(r'http://************************.21tb.com')
print(driver.execute_script('return test()')) #简单的直接在函数中返回就可以
# gettest getpost是通过插件注入到页面的
driver.execute_script('return gettest()') #复杂的就先执行函数，然后返回数据
print(driver.execute_script('return get_data'))
driver.execute_script('return posttest()')#复杂的就先执行函数，然后返回数据
print(driver.execute_script('return post_data'))
driver.close()
driver.quit()

#其他地方是这样介绍执行JS函数的
#即使改成先定义再执行的方式也是不好使的
js=r"""
var url = "*****************";
axios.get(url).then(function (response) {
  total = response.data.data.total;
  return total;
});
"""
print(driver.execute_script(js))

注入工具

FeHelperWeb开发者助手
需要注入的代码自己编写就可以

selenium更多用法

查找元素

from selenium import webdriver
from bs4 import BeautifulSoup
browser = webdriver.Firefox()
browser.get("https://tieba.baidu.com/index.html")
new_list = browser.find_element_by_id('new_list')
user_name = browser.find_element_by_name ('user_name')
active = browser.find_element_by_class_name  ('active')
p = browser.find_element_by_tag_name ('p')

# find_element_by_name 通过name查找单个元素
# find_element_by_xpath 通过xpath查找单个元素
# find_element_by_link_text 通过链接查找单个元素
# find_element_by_partial_link_text 通过部分链接查找单个元素
# find_element_by_tag_name 通过标签名称查找单个元素
# find_element_by_class_name 通过类名查找单个元素
# find_element_by_css_selector 通过css选择武器查找单个元素
# find_elements_by_name 通过name查找多个元素
# find_elements_by_xpath 通过xpath查找多个元素
# find_elements_by_link_text 通过链接查找多个元素
# find_elements_by_partial_link_text 通过部分链接查找多个元素
# find_elements_by_tag_name 通过标签名称查找多个元素
# find_elements_by_class_name 通过类名查找多个元素
# find_elements_by_css_selector 通过css选择武器查找多个元素

获取元素信息

btn_more = browser.find_element_by_id('btn_more')
print(btn_more.get_attribute('class')) # 获取属性
print(btn_more.get_attribute('href')) # 获取属性
print(btn_more.text) # 获取文本值

元素交互操作

btn_more = browser.find_element_by_id('btn_more')
btn_more.click() # 模拟点击,可以模拟点击加载更多

input_search = browser.find_element(By.ID,'q')
input_search.clear() # 清空输入

执行JavaScript

browser.execute_script('window.scrollTo(0, document.body.scrollHeight)')
browser.execute_script('alert("To Bottom")')

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python使用selenium自动化加载Firefox配置文件执行post get方法

安装python插件

安装浏览器驱动

模拟Firefox浏览器行为

模拟Chrome浏览器行为

加载chrome配置文件失败

加载Firefox配置文件

通过函数调用POST GET方法

注入工具

selenium更多用法

查找元素

获取元素信息

元素交互操作

执行JavaScript

linux安装cuda和cudnn

模拟手机设备：使用 Playwright 实现移动端自动化测试

Mellanox网卡开启SR-IOV

全面系统的AI学习路径，帮助普通人也能玩转AI

HTML 00 Tutorial

uni-app实现上拉加载

vue3编译优化之“静态提升”

又是一个月-20240513

flask 如何保证返回json有序

linux服务器设置ssh免密

python使用selenium自動化加載Firefox配置文件執行post get方法

SQL Server字符串聚合拼接列值合併

Python心形圖-採用turtle模塊畫心形

小程序雲開發的開通及json數據導入並解決導入數據庫失敗JSON decoder out of sync - data changing underfoot

Linux服務器每天查看磁盤佔用空間並自動發郵件預警的shell文件

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結