Selenium截屏圖片未加載的問題解決--【懶加載】

原創

暮良文王

2020-10-11 13:30

需求：

截屏後轉PDF。

問題：

selenium截屏後，圖片未加載

如下圖：

原因：

網站使用了懶加載技術：只有在瀏覽器中縱向滾動條滾動到指定的位置時，頁面的元素纔會被動態加載。

什麼是圖片懶加載？

圖片懶加載是一種網頁優化技術。圖片作爲一種網絡資源，在被請求時也與普通靜態資源一樣，將佔用網絡資源，而一次性將整個頁面的所有圖片加載完，將大大增加頁面的首屏加載時間。

爲了解決這種問題，通過前後端配合，使圖片僅在瀏覽器當前視窗內出現時才加載該圖片，達到減少首屏圖片請求數的技術就被稱爲“圖片懶加載”。

解決：

模擬人滾動滾動條的行爲, 實現頁面的加載

模擬人滾動滾動條的代碼：

        js_height = "return document.body.clientHeight"
        driver.get(link)
        k = 1
        height = driver.execute_script(js_height)
        while True:
            if k * 500 < height:
                js_move = "window.scrollTo(0,{})".format(k * 500)
                print(js_move)
                driver.execute_script(js_move)
                time.sleep(0.2)
                height = driver.execute_script(js_height)
                k += 1
            else:
                break

全部代碼：

#!/usr/bin/python3
# -*- coding:utf-8 -*-
"""
@author: lms
@file: screenshot.py
@time: 2020/10/10 13:02
@desc: 
"""

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from PIL import Image


def screenshot_and_convert_to_pdf(link):
    path = './'

    # 一定要使用無頭模式，不然截不了全頁面，只能截到你電腦的高度
    chrome_options = Options()
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--disable-gpu')
    chrome_options.add_argument('--no-sandbox')
    driver = webdriver.Chrome(chrome_options=chrome_options)
    try:
        driver.implicitly_wait(20)
        driver.get(link)

        # 模擬人滾動滾動條,處理圖片懶加載問題
        js_height = "return document.body.clientHeight"
        driver.get(link)
        k = 1
        height = driver.execute_script(js_height)
        while True:
            if k * 500 < height:
                js_move = "window.scrollTo(0,{})".format(k * 500)
                print(js_move)
                driver.execute_script(js_move)
                time.sleep(0.2)
                height = driver.execute_script(js_height)
                k += 1
            else:
                break

        time.sleep(1)
        # 接下來是全屏的關鍵，用js獲取頁面的寬高
        width = driver.execute_script("return document.documentElement.scrollWidth")
        height = driver.execute_script("return document.documentElement.scrollHeight")
        print(width, height)
        # 將瀏覽器的寬高設置成剛剛獲取的寬高
        driver.set_window_size(width, height)
        time.sleep(1)

        png_path = path + '/{}.png'.format('123456')
        # pdf_url = SERVER_URL + '/static/global_tech_map/{}.pdf'.format(.pic_num)
        # 截圖並關掉瀏覽器
        driver.save_screenshot(png_path)
        driver.close()
        # png轉pdf
        image1 = Image.open(png_path)
        im1 = image1.convert('RGB')
        pdf_path = png_path.replace('.png', '.pdf')
        im1.save(pdf_path)

    except Exception as e:
        print(e)


if __name__ == '__main__':
    screenshot_and_convert_to_pdf('https://mp.weixin.qq.com/s/nJRnGpPVeJ1kdMIOwiPNpg')

處理完成後的截屏：

感謝閱讀~

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Selenium截屏圖片未加載的問題解決--【懶加載】

需求：

問題：

原因：

解決：

處理完成後的截屏：

HTML頁面關於高分屏的設置

北歐瑞典挪威芬蘭瑞士TikTok海外網紅與YouTube博主的合作模式

歐洲英國德國法國TikTok與YouTube海外網紅達人的完美合作策略

druid數據源 xml配置

crontab使用說明【一文搞懂Linux定時任務Crontab】

SwitchHosts operation not permitted 解決方案--親測有效

無網環境安裝docker之--rpm

centos7無網環境安裝docker

Ansible快速實戰指南----多機自動化執行命令、部署神器

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

Selenium截屏 圖片未加載的問題解決--【懶加載】

需求：

問題：

原因：

解決：

處理完成後的截屏：

Selenium截屏圖片未加載的問題解決--【懶加載】