[ Python ] 爬蟲類庫學習之 re 正則解析，爬取糗事百科的糗圖

原創

2020-06-23 02:56

爬取圖片

基於requests模塊的get請求
爬取指定url的圖片

import requests
if __name__ == '__main__':
    url="https://pic.qiushibaike.com/system/pictures/12296/122960119/medium/8L45TQR77BQYY1C6.jpg"
    # text 字符串 content 二進制形式
    response = requests.get(url)
    img_data = response.content
    with open('./a.jpg','wb') as fp:
        fp.write(img_data)

    print('爬取數據結束！')

糗事百科

爬取 糗事百科 指定頁面的糗圖
爬取鏈接：https://www.qiushibaike.com/imgrank/

import requests
import os
import re

if __name__ == '__main__':
    # 創建文件夾保存所有圖片
    if not os.path.exists('./qiutu'):
        os.mkdir('./qiutu')
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) '
                      'Chrome/78.0.3904.108 Safari/537.36 '
    }
    url = "https://www.qiushibaike.com/imgrank/page/%d/"
    # 爬取前兩頁的圖片
    for pageNum in range(1, 3):
        new_url = format(url % pageNum)
        response = requests.get(new_url, headers=headers)
        page_text = response.text
        ex = '<div class="thumb">.*?<img src="(.*?)" alt.*?</div>'
        img_src_list = re.findall(ex, page_text, re.S)
        for src in img_src_list:
            src = 'https:' + src
            img_data = requests.get(url=src, headers=headers).content
            img_name = src.split('/')[-1]
            imgPath = './qiutu/' + img_name
            with open(imgPath, 'wb') as fp:
                fp.write(img_data)
                print(img_name + '下載成功！')

    print('爬取數據結束！')

來源：爬蟲開發入門丨老男孩IT教育

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

[ Python ] 爬蟲類庫學習之 re 正則解析，爬取糗事百科的糗圖

爬取圖片

糗事百科

[ Java ] 一文搞懂設計模式常用的七大原則

[ Python ] 爬蟲類庫學習之 re 正則解析，爬取糗事百科的糗圖

總結了 150 餘個神奇網站，你不來瞅瞅嗎？

如何用 Java 實現有序，無序線性表的合併倒置

[ Java ] 最通俗易懂的 Java8 新特性 Lambda表達式講解

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

[ Python ] 爬蟲類庫學習之 re 正則解析，爬取糗事百科的 糗圖

爬取圖片

糗事百科

[ Python ] 爬蟲類庫學習之 re 正則解析，爬取糗事百科的糗圖