Python爬蟲Ajax異步處理-----頭條街拍

原創

2020-06-26 11:41

直接上代碼

 import requests
 from urllib.parse import urlencode
 import os
 from hashlib import md5
 from multiprocessing.pool import Pool

 獲取Ajax內容，添加參數
  def get_page(offset):
     params = {
     'aid': '24',
     'offset': offset,
     'format': 'json',
     'keyword': '街拍',
     'autoload': 'true',
     'count': '20',
     'cur_tab': '1',
     'from': 'search_tab',
     'pd': 'synthesis',
     'en_qc':'1',
     'timestamp':'1555409953329',
     'app_name':'web_search'
     }
     url='http://www.toutiao.com/api/search/content/?'+urlencode(params)
     response=requests.get(url)


     if response.status_code==200:
           return  response.json()


 解析頁面
  def get_image(json):
       data=json.get('data')
       if  data:
             for item in data:
                 title=item.get('title')
                 image_list=item.get('image_list')
                 ##如果有圖片，沒有不操作
                 if  image_list:
                         for image in image_list:
                                yield  {
                                     'image':image.get('url'),
                                     'title':title
                                 }
       else:
                 print('k')

  def  save(item):
       if not os.path.exists(item.get('title')):
       os.mkdir(item.get('title'))
       #獲取鏈接
       local_image_url=item.get('image')
       ##拼接圖片url

       response=requests.get(local_image_url)
       if response.status_code==200:
            file_path='{0}/{1}.{2}'.format(item.get('title'),md5(response.content).hexdigest(),'jpg')
            if not os.path.exists(file_path):
                 with open(file_path,'wb') as f:
                     f.write(response.content)

  def  main(offset):
        json=get_page(offset)
        images=get_image(json)
        #如果獲取到
        if   images:
               for item in get_image(json):

                  print(item)

                  save(item)

 group_start=0
 groupend=50
  
 if __name__=='__main__':
       #進程池處理
       pool=Pool()
       groups=([x * 20 for x in range(group_start,groupend)])
       pool.map(main,groups)
       pool.close()
       pool.join()

效果顯示，部分結果如下

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Python爬蟲Ajax異步處理-----頭條街拍

如何基於surging跨網關跨語言進行緩存降級

2024合集

程序員天天 CURD，怎麼才能成長，職業發展的思考(2)

移位操作搞定兩數之商

教你用Perl實現Smgp協議

如何通過前端表格控件在10分鐘內完成一張分組報表？

win11關閉自動檢測病毒刪文件

通用代碼生成器簡介

lightdb 單機模式下數據庫平移

千兆寬帶實際網速能到達多少？

Python進階----Third

Python進階——Second

Python進階----Forth

Django-部分知識點

Flask——前後端分離知識點

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結