selenium在爬取中的使用

1、環境設置

1、安裝selenium：pip install selenium
2、安裝chrome瀏覽器，並下載相應版本的chrome的驅動（地址）
3、將ChromeDriver加入系統環境變量
4、調試時一般就用chrome瀏覽器，實際使用時用PhantomJS（無界面瀏覽器），selenium自帶對其的驅動。

2、一般使用流程

1、導入selenium.webdriver，驅動相應的瀏覽器
2、打開網址，可以獲取到渲染後的網頁源碼。
3、可以打開多個標籤頁，在標籤頁間切換使用br.switch_to_window(br.window_handles[n])，n爲標籤的順序。

3、使用方法示例

python
from selenium import webdriver
import time

br=webdriver.Chrome()
br.get("https://www.baidu.com")
time.sleep(1)#給瀏覽器一些渲染的時間

page=br.page_source#獲取網頁源碼
# print(page)
s=br.find_element_by_xpath("//a[contains(text(),'學術')]")#只能用xpath定位到標籤，要想獲取屬性的值，看下面
# print(s.text)#獲取該html標籤中的文字
# print(s.get_attribute('href'))#獲取該html標籤屬性的值
# print(s.get_property('name'))#同上

#執行js腳本，打開新標籤（此時仍控制上一個標籤，要想控制當前標籤並獲取內容，看下面），
#若是通過模擬點擊打開新標籤仍與此同，必須利用窗口句柄轉換。
js_script='window.open("https://www.sogou.com");'
br.execute_script(js_script)
print(br.title)#百度一下，你就知道，雖然打開了新標籤，但仍控制着上一個標籤

#頁面標籤的控制
br.switch_to_window(br.window_handles[1])#數字爲標籤的順序，在此爲搜狗
print(br.title)#搜狗搜索引擎 - 上網從搜狗開始

#向輸入框輸入文字並模擬點擊
br.find_element_by_xpath("//input[@id='query']").send_keys('python')#定位輸入框並輸入文字
br.find_element_by_xpath("//input[@id='stb']").click()#定位搜索按鈕並點擊
time.sleep(1)
print(br.title)#python - 搜狗搜索

#滾動頁面
br.execute_script("window.scrollBy(0,800)")#向下滾動800px
time.sleep(2)
br.find_element_by_xpath("//a[@id='sogou_vr_11013201_title_2']").click()#第一次失敗，因爲元素不在可見區域內，這時就要滾動頁面是元素可見
time.sleep(6)
print(br.title)#python - 搜狗搜索，驗證上面說的 模擬打開新標籤，瀏覽器仍控制着上一個標籤
br.switch_to_window(br.window_handles[-1])
print(br.title)#搜狗翻譯 - 上網從搜狗開始

# #導入 ActionChains 類
# from selenium.webdriver import ActionChains
# # 鼠標移動到 ac 位置，=這個功能特別適合解決>>今日頭條首頁將鼠標移動到頁面底部刷新<<的情況，上拉、下拉刷新。
# ac = driver.find_element_by_xpath('element')
# ActionChains(driver).move_to_element(ac).perform()


br.close()#退出當前標籤
br.quit()#退出整個瀏覽器