seleniun学习(python)

原創

2020-02-23 14:20

Selenium库里有一个叫WebDriver的API。WebDriver可以控制浏览器的操作，它可以像BeautifulSoup或者其它Selector对象一样用来查找页面元素，与页面上的元素进行交互(发送文本、点击等)，以及执行其他动作来运行网络爬虫，我们常用的函数大多都是webdriver里面的函数。

（一） chrome启动选项

from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options

# 实例化一个启动参数对象
options = Options()
#添加启动参数
options.add_argument('--incognito') #隐身模式
options.add_argument('--headless') #无界面运行
options.add_argument('--window-size=1366,768') #设置浏览器分辨率
#这里是用chrome浏览器，executable_path参数是driver的路径
driver = Chrome(executable_path="driver路径",options=options)
driver.get("网页地址")

关于启动参数的介绍，大家可以参考这篇博文。

（二)浏览器界面设置

#给浏览器设置固定的宽和高
driver.set_window_size(480, 800)
#将浏览器最大化
driver.maximize_window()

（三）定位元素

driver.find_element_by_id
driver.find_element_by_name
driver.find_element_by_xpath
driver.find_element_by_link_text
driver.find_element_by_partial_link_text
#通过标签的名字来定位
driver.find_element_by_tag_name
driver.find_element_by_class_name
driver.find_element_by_css_selector

关于如何使用xpath和css_selector，可以参考我之前转发的博文。

（四)

from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options

options = Options()
driver = Chrome(executable_path="E:\\Ksoftware\\chromedriver\\chromedriver.exe",options=options)
driver.get("http://www.baidu.com")
driver.find_element_by_link_text("新闻").click()

driver.back()
location_xinwen = driver.find_element_by_class_name("mnav")
#用text获得文本
print("text is："+location_xinwen.text)

#用get_attribute方法获得某标签的属性
all_a = driver.find_elements_by_tag_name("a")
for a in all_a:
    print("href is:"+a.get_attribute("href"))

Out:

在第四部分中，我们学习了
1.click()
2.text
3.get_attribute()

（五）鼠标动作链

#导入ActionChains类
from selenium.webdrive import ActionChains

#鼠标移动到ac位置
ac = driver.find_elenemt_by_xpath('element')
ActionChains(driver).move_to_element(ac).perform()

#在ac位置单击
ac = driver.find_element_by_xpath('elementA')
ActionChains(driver).move_to_element(ac).click(ac).perform()

#在ac位置双击
ac = driver.find_element_by_xpath("elementB")
ActionChains(driver).move_to_element(ac).double_click(ac).perform()

#在ac位置右击
ac = driver.find_element_by_xpath('elementC')
ActionChains(driver).move_to_element(ac).context_click(ac).perform()

#在ac位置左键单击hold住
ac = driver.find_element_by_xpath('elementF')
ActionChains(driver).move_to_element(ac).click_and_hold(ac).perform()

#将ac1拖拽到ac2位置
ac1 = driver.find_element_by_xpath('elementD')
ac2 = driver.find_element_by_xpath('elementE')
ActionChains(driver).drag_and_drop(ac1, ac2).perform()

（六)向文本框输入文字
向文本框输入文字的方法其实很简单，先要定位到文本框，然后向其传入字符串，最后提交表单即可。

#清除定位到的输入框中的内容
driver.find_element_by_id("").clear()
#输入内容
driver.find_element_by_id("").send_keys("")
#点击“百度一下”进行搜索
driver.find_element_by_id("").click()
#也可以把click的动作换成submit提交，作用应该是一样的
driver.find_element_by_id("").submit()

使用send_keys()的时候，我们可在字符串前面加u，就可以解决输入中文报错的问题。

（七)多层框架或窗口的定位
遇到frame嵌套页面，直接定位是定位不到的，这时候就需要我们用下面的代码了。

switch_to_frame()

switch_to_windows("windowsName")

#也可以使用window_handles方法来获取每个窗口的操作对象
for handle in driver.window_handles:
    driver.switch_to_window(handle)

更多的介绍，可以点击这里查看。

（八)定位一组元素
这里有一篇优秀的博文，介绍如果把页面上所有的checkbox勾选上，以及如何去掉勾选上的框等等。

（九）处理下拉框
对于下拉框的处理有一点不太一样，我们必须要选定为到下拉框的元素，然后选择下拉列表中的选项进行点击操作。

我们还可以通过Select类来处理下拉框

#导入Select类
from selenium.webdriver.support.ui import Select

#找到name的选项卡
select = Select(driver.find_element_by_name(' '))
#index索引从0开始
select.select_by_index(1)
#value是option标签的一个属性值，并不是显示在下拉框中的值
select.select_by_value(" ")
#visible_text实在option标签文本的值，是显示在下拉框的值
select.select_by_visible_text(u' ')
#全部取消选择
select.deselect_all()

（十）显式等待
显示等待是指定某一条件知道这个条件成立时继续执行。显示等待指定了某个条件，然后设置最长等待事件。如果在这个时间还找到没有元素，那么便会抛出异常。
selenium已经有很多内置的等待条件，我们可以直接调用。

（十一）键盘按键操作

from selenium.webdriver.common.keys import Keys
#全选输入框内容
driver.find_element_by_id("").send_keys(Keys.CONTROL,'a')

（十二)杂七杂八

#可以利用time库，设置等待时间，以防页面未完全加载就开始进行操作，造成错误
import time
time.sleep()

#也可以使用selenium库中的函数，这个方法可以在一个时间范围内智能等待
implicitly_wait()

#页面的前进和后退
driver.forward()
driver.back()

#生成页面快照，当浏览器设置为无界面时，可以使用该方法查看爬虫过程
save_screenshot("photo name")

#打印网页渲染后的源代码
driver.page_source

#接受页面中的警告信息
driver.switch_to_alert().accept()

#selenium还可以执行javascript脚本
execute_script("document.title")

#控制滚动条到底部
js = "var q = documentElement.scrollTop=10000000"
driver.execute_script(js)

#滚动屏幕
driver.execute_script("window.scrollBy(0, 700)")

#获得cookie
get_cookies()
#删除特定的cookie
delete_cookie("cookie_name")
#删除所有的cookie
delete_all_cookie()

本篇博文参考了一些学习资料，基本上把常用的都提到了，由于关于selenium的知识点特别零碎，这篇博文看起来没有什么逻辑，条理也没有很清晰，还请见谅。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

seleniun学习(python)

SQL优化-20231016

hive信息查詢

數據清洗與準備

seleniun學習(python)

pandas（五）索引對象

SQL學習（三）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結