記一次有意思的python爬蟲

原創

libinbin_1014

2018-08-23 10:21

使用python爬蟲分析京東購物數據

分析女士購買內衣數據

1、首先獲取評論數據

url= "https://sclub.jd.com/comment/productPageComments.action?productId=11565382115&score=%d&sortType=5&page=%d&pageSize=10"

2、然後分析返回數據格式

{
"id":11577732545,
"topped":0,
"guid":"6f8c009f-bb28-404a-8bbb-8e2d2c1b67b0",
"content":"是夏天穿的薄款，就是感覺聚攏效果沒說的那麼好",
"creationTime":"2018-06-10 21:12:45",
"isTop":false,
"referenceId":"11565382120",
"referenceImage":"jfs/t21367/46/2125692555/126045/5500b502/5b480df9Nc703f5cd.jpg",
"referenceName":"都市麗人文胸大碼內衣性感聚攏無痕無鋼圈深v透氣薄款洞洞杯調整上託胸罩 2B7513 紫灰 75B/34",
"referenceTime":"2018-06-01 15:57:58",
"referenceType":"Product",
"referenceTypeId":0,
"firstCategory":1315,
"secondCategory":1345,
"thirdCategory":1364,
"replies":[],
"replyCount":1,
"replyCount2":1,
"score":5,
"status":1,
"title":"",
"usefulVoteCount":0,
"uselessVoteCount":0,
"userImage":"misc.360buyimg.com/user/myjd-2015/css/i/peisong.jpg",
"userImageUrl":"misc.360buyimg.com/user/myjd-2015/css/i/peisong.jpg",
"userLevelId":"56",
"userProvince":"",
"viewCount":0,
"orderId":0,
"isReplyGrade":false,
"nickname":"Z***t",
"productColor":"黑色",
"productSize":"75B/34",
"userClientShow":"來自京東Android客戶端",
"userLevelName":"銅牌會員",
"userClient":4,
"images":[]
}

然後分析用戶購買數據中的產品顏色、客戶端、size等信息

之後我們使用python的圖標化工具將數據進行可視化展示

購買顏色排行

可以看出夏天出於防走光的目的，購買膚色及黑色的最多，奇怪爲什麼沒人買粉色，萌萌噠不好看麼

size排行

很遺憾的是，電影都是騙人的，妹子們的size主要集中在75B和80B，（具體是多大我也不清楚~~~）

手機與size排行

這個是手機與妹子size的一個數據分析，可以看出拿iphone的妹子，貌似size不如拿Android手機的妹子大，，

核心代碼

class BraSpider(object):

    base_url = "https://sclub.jd.com/comment/productPageComments.action?productId=11565382115&score=%d&sortType=5&page=%d&pageSize=10"
    def parse_comment(self, response, ret):
        content = json.loads(response.text)
        comments = content['comments']
        i = len(ret) + 1
        for comment in comments:
            item = {}
            #item['content'] = comment['content']
            #item['guid'] = comment['guid']
            #item['id'] = comment['id']
            #item['time'] = comment['referenceTime']
            item['color'] = comment['productColor']
            item['size'] = comment['productSize']
            item['userClientShow'] = comment['userClientShow']
            ret.insert(i, item)
            i = i + 1


    def start_requests(self):
        comments_ret = []
        hot_tag_ret = {}
        ret = {}
        for page in range(1,150):
            for i in range(0,6):
                url = self.base_url % (i, page)
                response = requests.get(url)
                if response.status_code == 200:
                    self.parse_comment(response, comments_ret)

        ret['comments'] = comments_ret
        ret['tag'] = hot_tag_ret
        return ret

github連接地址

https://github.com/libinbin-1014/python-study/blob/master/bar/bar-scrapy.py

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

記一次有意思的python爬蟲

使用python爬蟲分析京東購物數據

分析女士購買內衣數據

1、首先獲取評論數據

2、然後分析返回數據格式

購買顏色排行

size排行

手機與size排行

核心代碼

GO語言學習：JSON處理

curl學習（三）：使用POST的處理方法

從一個數組裏面獲取最大的幾個數字代碼實現

linux下popen的使用心得

Hydra源碼分析學習

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結