使用python爬蟲分析京東購物數據
分析女士購買內衣數據
1、首先獲取評論數據
url= "https://sclub.jd.com/comment/productPageComments.action?productId=11565382115&score=%d&sortType=5&page=%d&pageSize=10"
2、然後分析返回數據格式
{
"id":11577732545,
"topped":0,
"guid":"6f8c009f-bb28-404a-8bbb-8e2d2c1b67b0",
"content":"是夏天穿的薄款,就是感覺聚攏效果沒說的那麼好",
"creationTime":"2018-06-10 21:12:45",
"isTop":false,
"referenceId":"11565382120",
"referenceImage":"jfs/t21367/46/2125692555/126045/5500b502/5b480df9Nc703f5cd.jpg",
"referenceName":"都市麗人文胸大碼內衣性感聚攏無痕無鋼圈深v透氣薄款洞洞杯調整上託胸罩 2B7513 紫灰 75B/34",
"referenceTime":"2018-06-01 15:57:58",
"referenceType":"Product",
"referenceTypeId":0,
"firstCategory":1315,
"secondCategory":1345,
"thirdCategory":1364,
"replies":[],
"replyCount":1,
"replyCount2":1,
"score":5,
"status":1,
"title":"",
"usefulVoteCount":0,
"uselessVoteCount":0,
"userImage":"misc.360buyimg.com/user/myjd-2015/css/i/peisong.jpg",
"userImageUrl":"misc.360buyimg.com/user/myjd-2015/css/i/peisong.jpg",
"userLevelId":"56",
"userProvince":"",
"viewCount":0,
"orderId":0,
"isReplyGrade":false,
"nickname":"Z***t",
"productColor":"黑色",
"productSize":"75B/34",
"userClientShow":"來自京東Android客戶端",
"userLevelName":"銅牌會員",
"userClient":4,
"images":[]
}
然後分析用戶購買數據中的產品顏色、客戶端、size等信息
之後我們使用python的圖標化工具將數據進行可視化展示
購買顏色排行
可以看出夏天出於防走光的目的,購買膚色及黑色的最多,奇怪爲什麼沒人買粉色,萌萌噠不好看麼
size排行
很遺憾的是,電影都是騙人的,妹子們的size主要集中在75B和80B,(具體是多大我也不清楚~~~)
手機與size排行
這個是手機與妹子size的一個數據分析,可以看出拿iphone的妹子,貌似size不如拿Android手機的妹子大,,
核心代碼
class BraSpider(object):
base_url = "https://sclub.jd.com/comment/productPageComments.action?productId=11565382115&score=%d&sortType=5&page=%d&pageSize=10"
def parse_comment(self, response, ret):
content = json.loads(response.text)
comments = content['comments']
i = len(ret) + 1
for comment in comments:
item = {}
#item['content'] = comment['content']
#item['guid'] = comment['guid']
#item['id'] = comment['id']
#item['time'] = comment['referenceTime']
item['color'] = comment['productColor']
item['size'] = comment['productSize']
item['userClientShow'] = comment['userClientShow']
ret.insert(i, item)
i = i + 1
def start_requests(self):
comments_ret = []
hot_tag_ret = {}
ret = {}
for page in range(1,150):
for i in range(0,6):
url = self.base_url % (i, page)
response = requests.get(url)
if response.status_code == 200:
self.parse_comment(response, comments_ret)
ret['comments'] = comments_ret
ret['tag'] = hot_tag_ret
return ret
github連接地址
https://github.com/libinbin-1014/python-study/blob/master/bar/bar-scrapy.py