python關鍵字爬取京東圖片

import re
import urllib.request
import os
import numpy as np
def craw(url,page,savedir):
    html1=urllib.request.urlopen(url).read()
    html1=str(html1)
    pat1='<div id="J_goodsList".*<div class="p-commit">'
    result1=re.compile(pat1).findall(html1)
    result1=result1[0]
    pat2 ='source-data-lazy-img="(//.*?jpg)'
    imag = re.compile(pat2).findall(result1)
    x = 1
    for imagurl in imag:
        imagname = savedir + '第'+str(page)+'頁' + '第'+str(x)+'個' + '.jpg'
        imagurl = 'https:' + imagurl
        try:
            urllib.request.urlretrieve(imagurl,filename=imagname)
            print('已輸出第',page,'頁，第',x,'個')
        except urllib.error.URLError as e:
            if hasattr(e,'code'):
                x+=1
            if hasattr(e,'reason'):
                x+=1
        x+=1
if __name__ =="__main__":
    page_= 45
    key = ['襯衫','馬甲襯衫','馬甲','女生職業裝','女士西服']
    for k in range(len(key)):
        if os.path.exists('./img/' + key[k]) == False:
            os.makedirs('./img/' + key[k])
        savedir = './img/' + key[k] + '/'
        name    = key[k]
        for i in range(1,2*page_+1):
            if i%2==0:
                key2=i/2+0.5
            else:
                key2=(i+1)/2
            key1=name
            key_temp=urllib.request.quote(key1)
            url2='https://search.jd.com/Search?keyword='+key_temp +'&enc=utf-8&page='+ str(i)
            # 模擬瀏覽器
            req = urllib.request.Request(url2)
            req.add_header("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8")
            req.add_header("Accept-Encoding","gzip, deflate, br")
            req.add_header("Accept-Language","zh-CN,zh;q=0.9")
            req.add_header("User-Agent","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36")
            url=urllib.request.Request(url2)
            craw(url,key2,savedir)

本代碼爲網上所找，網址不記得了。原作可見請告知。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python關鍵字爬取京東圖片

PDManer [元數建模]-v4.9.0 發佈：一款簡單好用的數據庫建模平臺

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

cs01 CSS Syntax

挑戰程序設計競賽 2.3章習題 poj 3046 Ant Counting

[MASM拾遺]Offset僞指令

h30 HTML Layout Elements

瞭解顯卡

一款基於C#開發的通訊調試工具（支持Modbus RTU、MQTT調試）

Linux/Golang/glibC系統調用

cs04 CSS Measurement Units

Tensorflow訓練MobileNet V1 retrain圖片分類

opencv獲取文件夾下所有指定後綴的文件

python重命名文件夾內指定後綴的文件

python關鍵字爬取京東圖片

四種聚類方法淺談

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結