python之批量打印網頁爲pdf文件（二）

原創

2021-09-29 13:26

　　小爬之前的博文《python之批量打印網頁爲pdf文件（一）》中詳細講述瞭如何利用python+selenium，然後通過在chrome_options.add_experimental_option('prefs', prefs)配置特定的setting參數，將具體的打印設置參數傳遞給我們瀏覽器來實現【批量打印網頁爲PDF文件】。但是遺憾的是，通過該方式，小爬暫時還未能查到傳遞【頁面】參數　的方法，見下圖：

　　假如您剛好只想打印頁面的某幾頁爲PDF文件，那麼這就成了一個不大不小的問題。通過翻閱很多資料,我發現chrome瀏覽器的實驗性方法Page.printToPDF 可以有效解決這個問題，它有如下參數可以設置：

　　可以看到，這裏面幾乎有我們想要傳遞的所有參數，包括頁碼（pageRanges：Paper ranges to print, e.g., '1-5, 8, 11-13'. Defaults to the empty string, which means print all pages.），我們必須在chrome瀏覽器的headless模式下來調用該方法（否則會報錯），比如我們只想打印web的第一頁內容爲PDF文件，下面就是一段程序示例，供參考：

 1 from selenium import webdriver
 2 import json, base64,time
 3 from logon_token import userName,passWord
 4 
 5 def send_devtools(driver, cmd, params={}):
 6   resource = "/session/%s/chromium/send_command_and_get_result" % driver.session_id
 7   url = driver.command_executor._url + resource
 8   body = json.dumps({'cmd': cmd, 'params': params})
 9   response = driver.command_executor._request('POST', url, body)
10   return response.get('value')
11 
12 def save_as_pdf(driver, path, options={}):    
13   result = send_devtools(driver, "Page.printToPDF", options)
14   with open(path, 'wb') as file:
15     file.write(base64.b64decode(result['data']))
16 
17 if __name__ =="__main__":
18   options = webdriver.ChromeOptions()
19   options.add_argument("--headless")
20   options.add_argument("--disable-gpu")
21 
22   driver = webdriver.Chrome(chrome_options=options)
23 
24   '''請求URL，進入登錄界面，輸入對應用戶名密碼完成登錄，拿到用戶權限，然後跳轉到我們想要的頁面'''
25   driver.get("Your Login web URL")
26   driver.execute_script(f'''document.querySelector("#username").value="{userName}";document.querySelector("#password").value="{passWord}";''')
28   driver.find_element_by_xpath('//*[@id="LoginForm"]/div[4]/button').click()
29 
30   driver.get("your final requestUrl")
31 
32   time.sleep(2) #如果頁面複雜，非靜態頁面，建議適當給延遲，等待頁面徹底加載完成
33 
34   save_as_pdf(driver, r'Your fileName for the new pdf file', { 'landscape': True,'pageRanges':'1-1','ignore_invalid_page_ranges':True})

　　在瀏覽器的headless模式下，該方法穩定且高效，各位童鞋不妨一試～～

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python之批量打印網頁爲pdf文件（二）

python用win32com.client驅動excel時如何控制是否更新鏈接？

如何在SAP GUI中快速執行新的事務代碼

如何批量去掉文本的括號前後綴內容

如何藉助python第三方庫存取不同應用程序的用戶名、密碼

python如何提取瀏覽器中保存的網站登錄用戶名密碼

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結