Python3爬蟲---抓取英雄聯盟的所有英雄皮膚

#!/usr/bin/env python
# -*- coding:utf-8 -*-
#@author:Chris iven
#Python version 3.6

#1.分析LOL官網行爲!
#發現這個網頁的所有數據都是經過js生成的!意思就是說他的數據全部不在該網頁裏面,而是在一個JS文件裏面!
#所以我們只需要獲取JS數據.

"""
抓取步驟及思路:

1.獲取英雄的js數據,訪問並且下載.然後轉換爲JSON格式的數據.

2.對數據進行解析,我們這邊的函數式get_hero_data!
這個函數會對下載的json數據進行解析,提取出LOL中的英雄英文名字和id值!

3.訪問並且下載:
對於獲取的數據進行拼接,尤其是對圖片的鏈接進行拼接.然後進行下載!

"""

import requests
import json,re,os
class LOL_Spider(object):
def __init__(self,url):
self.url = url

def get_hero_data(self):
response = requests.get(self.url,headers = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 BIDUBrowser/8.7 Safari/537.36"},timeout=10)
if response.status_code == 200:
with open("hero_data.json","w")as f:
f.write(json.dumps(response.text, indent=2))

#打開文件
with open("hero_data.json","r")as f:
string = f.read()
data = json.loads(string)

hero_name = []#英雄的名字
hero_id = []#英雄的圖片id
pattern1 = re.compile('"keys":{(.*?)},"data".*?')
#匹配出第一段數據!
first_data = re.findall(pattern1,data)[0]
pattern2 = re.compile('"(.*?)":"(.*?)"')
for i in re.findall(pattern2,first_data):
hero_id.append(i[0])#id
hero_name.append(i[1])#名字
print(hero_name,"\n",hero_id)
return hero_name,hero_id

def download_pic(self,hero_name,hero_id):
i = 0
while i <len(hero_id):
j = 0
while j < 15:
url = "http://ossweb-img.qq.com/images/lol/web201310/skin/big"+hero_id[i]+"00"+str(j)+".jpg"
#print(url)
print("下載鏈接是:",url)
response = requests.get(url,headers={"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 BIDUBrowser/8.7 Safari/537.36"},timeout=10)
if "404 page not found" in response.text:
print(hero_name[i], "下的皮膚已經下載完畢!!")
break
else:
try:
os.mkdir("英雄聯盟各英雄和皮膚/"+hero_name[i])
except FileExistsError:
pass
with open("英雄聯盟各英雄和皮膚/"+hero_name[i]+"/"+str(j)+".jpg","wb")as f:
f.write(response.content)
j+=1
i+=1
def Start_Spider(self):
hero_name,hero_id = self.get_hero_data()
self.download_pic(hero_name,hero_id)

if __name__=="__main__":
url = "http://lol.qq.com/biz/hero/champion.js"
lol = LOL_Spider(url)
lol.Start_Spider()

Python3爬蟲---抓取英雄聯盟的所有英雄皮膚

namedtuple如何處理多個tuple/dict的數據?

Django真正需要加緩存的地方是...

記錄下django開發的一些問題

盤點Python幾種pip install 資源的方式

寫一下關於對svg的解密

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結