受到~wangweijun的20行Python代碼爬取王者榮耀全英雄皮膚的啓發
我去試試同樣的方法爬取LOL皮膚,果然方法大同小異,畢竟都是騰訊系的
首先進入LOL官網
進入資料庫
獲取全英雄的詳細列表,包括ID、綽號、英文名、中文名等等
url = 'https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js'
herolist = requests.get(url)
以火女皮膚地址爲例
https://game.gtimg.cn/images/lol/act/img/skin/big1000.jpg
https://game.gtimg.cn/images/lol/act/img/skin/big1001.jpg
https://game.gtimg.cn/images/lol/act/img/skin/big1002.jpg
找到規律big後面是英雄ID,000、001、002是皮膚編碼,只要編輯好這些圖片地址,就可以了
接下來完整的代碼
現在要存放的地方建一個名爲“lol”的文件夾即可
import os
import requests
from urllib import error
import socket
url = 'https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js'
herolist = requests.get(url) # 獲取英雄列表json文件
herolist_json = herolist.json() # 轉化爲json格式
hero_name = list(map(lambda x: x['name'], herolist.json()['hero'])) # 提取英雄的綽號
hero_title = list(map(lambda x: x['title'], herolist.json()['hero'])) # 提取英雄的名字
hero_number = list(map(lambda x: x['heroId'], herolist.json()['hero'])) # 提取英雄的編號
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36'}
# 下載圖片
def downloadPic():
i = 0
for j in hero_number:
# 創建文件夾
os.mkdir("E:\\Picture\\lol\\" + hero_name[i]+"-"+hero_title[i])
# 進入創建好的文件夾
os.chdir("E:\\Picture\\lol\\" + hero_name[i]+"-"+hero_title[i])
i += 1
for k in range(20):
# 拼接url,如果K小於10中間加兩個“0”,否則一個“0”
if k < 10:
onehero_link = 'https://game.gtimg.cn/images/lol/act/img/skin/big' + str(j) + '00' + str(k) + '.jpg'
else:
onehero_link = 'https://game.gtimg.cn/images/lol/act/img/skin/big' + str(j) + '0' + str(k) + '.jpg'
try:
im = requests.get(onehero_link,headers=headers)
except error.URLError as e:
if isinstance(e.reason,socket.timeout):
print('超時,執行下一個請求')
# 請求url
if im.status_code == 200:
open(str(k) + '.jpg', 'wb').write(im.content) # 寫入文件
downloadPic()
成果如下
下面這種方法可以爬到皮膚名字
import urllib.request
import jsonpath
import json
import os
import time
from urllib import error
import socket,requests
print("努力成爲爬蟲大神")
timestart=time.time()
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36'}
if not os.path.exists("json"):
os.mkdir("json")
for s in range(555, 600):
try:
hero_urls = 'https://game.gtimg.cn/images/lol/act/img/js/hero/'+str(s)+'.js'
j_name = 'json/zms'+str(s)+'.json'
urllib.request.urlretrieve(url=hero_urls, filename=j_name)
obj = json.load(open(j_name, 'r', encoding='utf-8'))
hero_name = jsonpath.jsonpath(obj, '$.hero..name')
hero_title = jsonpath.jsonpath(obj, '$.hero..title')
skins_name = jsonpath.jsonpath(obj, '$.skins..name')
skins_mainImg = jsonpath.jsonpath(obj, '$.skins..mainImg')
print("開始爬{}".format(hero_name[0]))
docname = hero_title[0] + " " + hero_name[0]
if not os.path.exists(docname):
os.mkdir(docname)
for i in range(len(skins_name)):
if skins_mainImg[i] != "":
try:
im = requests.get(skins_mainImg[i],headers=headers)
open(docname + "/" + skins_name[i] + ".jpg", 'wb').write(im.content) # 寫入文件
except error.URLError as e:
if isinstance(e.reason,socket.timeout):
print('超時,執行下一個請求')
except:
continue
timeend=time.time()
print("一共用時:{}秒".format(timeend-timestart))