python爬蟲實戰，requests模塊，Python實現抓取頭條街拍美圖前言開發工具環境搭建詳細瀏覽器信息最後下載成功查看詳情

原創

2021-11-23 00:23

前言

利用Python爬取的是今日頭條中的街拍美圖。廢話不多說。

讓我們愉快地開始吧~

開發工具

Python版本： 3.6.4

相關模塊：

re；

requests模塊；

以及一些Python自帶的模塊。

環境搭建

安裝Python並添加到環境變量，pip安裝需要的相關模塊即可。

詳細瀏覽器信息

獲取文章鏈接相關代碼：

import requests
import json
import re

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'
}

def get_first_data(offset):
    params = {
        'offset': offset,
        'format': 'json',
        'keyword': '街拍',
        'autoload': 'true',
        'count': '20',
        'cur_tab': '1',
        'from':'search_tab'
    }
    response = requests.get(url='https://www.toutiao.com/search_content/', headers=headers, params=params)
    try:
        response.raise_for_status()
        return response.text
    except Exception as exc:
        print("獲取失敗")
        return None

def handle_first_data(html):
    data = json.loads(html)
    if data and "data" in data.keys():
        for item in data.get("data"):
            yield item.get("article_url")

這裏需要提一下requests模塊的報錯，在response對象上調用 raise_for_status()方法，如果下載文件出錯，會拋出異常，需要使用 try 和 except 語句將代碼行包裹起來，處理這一錯誤，不讓程序崩潰。

另外附上requests模塊技術文檔網址：http://cn.python-requests.org/zh_CN/latest/

獲取圖片鏈接相關代碼：

def get_second_data(url):
    if url: 
        try:
            reponse = requests.get(url, headers=headers)
            reponse.raise_for_status()
            return reponse.text
        except Exception as exc:
            print("進入鏈接發生錯誤")
            return None

def handle_second_data(html):
    if html:
        pattern = re.compile(r'gallery: JSON.parse\((.*?)\),', re.S)
        result = re.search(pattern, html)
        if result:
            imageurl = []
            data = json.loads(json.loads(result.group(1)))
            if data and "sub_images" in data.keys():
                sub_images = data.get("sub_images")
                images = [item.get('url') for item in sub_images]
                for image in images:
                    imageurl.append(images)
                return imageurl
        else:
            print("have no result")

獲取圖片相關代碼：

def download_image(imageUrl):
    for url in imageUrl:
        try:
            image = requests.get(url).content
        except:
            pass
        with open("images"+str(url[-10:])+".jpg", "wb") as ob:
            ob.write(image)
            ob.close()
            print(url[-10:] + "下載成功！" + url)

def main():
    html = get_first_data(0)
    for url in handle_first_data(html):
        html = get_second_data(url)
        if html:
            result = handle_second_data(html)
            if result:
                try:
                    download_image(result)
                except KeyError:
                    print("{0}存在問題，略過".format(result))
                    continue

if __name__ == '__main__':
    main()

最後下載成功

查看詳情

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python爬蟲實戰，requests模塊，Python實現抓取頭條街拍美圖前言開發工具環境搭建詳細瀏覽器信息最後下載成功查看詳情

前言

開發工具

環境搭建

詳細瀏覽器信息

最後下載成功

查看詳情

這個網絡爬蟲代碼，拿到數據之後如何存到csv文件中去？

.NET開源強大、易於使用的緩存框架 - FusionCache

面試，有時候是個運氣活

【腳本項目源碼】Python製作藝術簽名生成器，打造專屬你的個人藝術簽名

【腳本項目源碼】Python實現魯迅名言查詢系統

【腳本項目源碼】Python製作多功能音樂播放器，打造專屬你的音樂播放器

Python爬蟲實戰，requests+xlwt模塊，爬取螺螄粉商品數據（附源碼）

Python爬蟲實戰，Request+urllib模塊，批量下載爬取飆歌榜所有音樂文件

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

python爬蟲實戰，requests模塊，Python實現抓取頭條街拍美圖 前言 開發工具 環境搭建 詳細瀏覽器信息 最後下載成功 查看詳情

前言

開發工具

環境搭建

詳細瀏覽器信息

最後下載成功

查看詳情

python爬蟲實戰，requests模塊，Python實現抓取頭條街拍美圖前言開發工具環境搭建詳細瀏覽器信息最後下載成功查看詳情