在web應用中常用JSON(JavaScript Object Notation)格式傳輸數據,例如:
-
利用http://httpbin.org/API對發送的http請求進行觀測。
-
爬蟲程序利用Spalsh渲染引擎渲染頁面。
要求:在Python中讀取json數據。
解決方案:
標準庫中的json模塊,使用其中loads()、dumps()方法完成json數據的讀寫。
- 對於
requests
模塊:
>>> import requests
>>> r = requests.get('http://httpbin.org/headers')
>>> r
<Response [200]>
>>> r.content
b'{\n "headers": {\n "Accept": "*/*", \n "Accept-Encoding": "gzip, deflate", \n "Host": "httpbin.org", \n "User-Agent": "python-requests/2.22.0"\n }\n}\n'
>>> r.text
'{\n "headers": {\n "Accept": "*/*", \n "Accept-Encoding": "gzip, deflate", \n "Host": "httpbin.org", \n "User-Agent": "python-requests/2.22.0"\n }\n}\n'
- 對於
json
模塊:
json數據解析(反序列化):json.loads()
>>> import json
>>> d = json.loads(r.text) #python解析爲字典
>>> d
{'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.22.0'}}
>>> d['headers']
{'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.22.0'}
>>> d['headers']['Host']
'httpbin.org'
- 方案示例:
Spider(Json) → Splash → Web
創建splash容器
$ sudo docker pull scrapinghub/splash
$ sudo docker run -itd -p 8050:8050 scrapinghub/splash
數據序列化爲json數據:json.dumps()
>>> import requests
>>> import json
>>> requests.post
<function post at 0x7fad7195c378>
>>> url = 'http://localhost:8050/render.html'
>>> headers = {'content-type': 'application/json'}
>>> data = {'url': 'http://jd.com', 'timeout': 20, 'images': 0} #以京東爲例,timeout指定渲染時間,images爲0表示不渲染圖片
>>> json_data = json.dumps(data) #將python字典轉換爲json數據,即序列化
>>> json_data
'{"url": "http://jd.com", "timeout": 20, "images": 0}'
>>> r2 = requests.post(url, headers=headers, data=json_data)
>>> r2
<Response [200]>
>>> r2.text
'<!DOCTYPE html><html class="o2_mini csstransitions cssanimations o2_webkit o2_safari o2_602"><head>\n <meta charset="utf8" version="1">\n <title>京東(JD.COM)-正品低價、品質保障、配送及時、輕鬆購物!</title>\n <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=yes">\n <meta name="description" content="京東JD.COM-專業的綜合網上購物商城,銷售家電、數碼通訊、電腦、家居百貨、服裝服飾、母嬰、圖書、食品等數萬個品牌優質商品.便捷、誠信的服務,爲您提供愉悅的網上購物體驗!">\n #中間省略 M11.4,0.9C11.4,0.9,11.4,0.9,11.4,0.9L11.4,0.9z"></path></symbol></defs></svg><ul class="elevator_list"><li class="elevator_item"><a class="elevator_lk" href="javascript:void(0);" clstag="h|keycount|core|elvt_01" tabindex="-1" aria-hidden="true"><span class="elevator_lk_bg"></span><span class="elevator_lk_txt">京東秒殺</span></a></li><li class="elevator_item"><a class="elevator_lk" href="javascript:void(0);" clstag="h|keycount|core|elvt_02" tabindex="-1" aria-hidden="true"><span class="elevator_lk_bg"></span><span class="elevator_lk_txt">特色優選</span></a></li><li class="elevator_item"><a class="elevator_lk" href="javascript:void(0);" clstag="h|keycount|core|elvt_03" tabindex="-1" aria-hidden="true"><span class="elevator_lk_bg"></span><span class="elevator_lk_txt">頻道廣場</span></a></li><li class="elevator_item"><a class="elevator_lk" href="javascript:void(0);" clstag="h|keycount|core|elvt_04" tabindex="-1" aria-hidden="true"><span class="elevator_lk_bg"></span><span class="elevator_lk_txt">爲你推薦</span></a></li><li class="elevator_item"><a class="elevator_lk elevator_lk2" href="//jdcs.jd.com/chat/index.action?venderId=1&entry=jd_web_jimi_jdhome" target="_blank" clstag="h|keycount|core|elvt_05"><span class="elevator_lk_bg"></span><svg><use xlink:href="#icon_timline"></use></svg><span class="elevator_lk_txt">客服</span></a></li><li class="elevator_item"><a class="elevator_lk elevator_lk2" href="//surveys.jd.com/index.php?r=survey/index/sid/889711/newtest/Y/lang/zh-Hans" target="_blank" clstag="h|keycount|core|elvt_06"><span class="elevator_lk_bg"></span><svg><use xlink:href="#icon_feedback"></use></svg><span class="elevator_lk_txt">反饋</span></a></li></ul><a class="elevator_totop" href="javascript: void(0);" clstag="h|keycount|core|elvt_07" tabindex="-1" aria-hidden="true"><span class="elevator_totop_icon">\ue606</span><span class="elevator_totop_txt">頂部</span></a></div></div></div>\n<script type="text/javascript">\n window.point.dom = new Date().getTime();\n</script>\n\n\n\n\n<script type="text/javascript" src="//misc.360buyimg.com/mtd/pc/index_2019/1.0.0/static/js/runtime.js"></script>\n\n<script type="text/javascript" src="//misc.360buyimg.com/mtd/pc/index_2019/1.0.0/static/js/index.chunk.js"></script>\n\n<script type="text/javascript">\n window.point.js = new Date().getTime();\n</script>\n</body></html>'
除json.loads()
和json.dumps()
外,json模塊還有json.load()
和json.dump()
。
dumps()和dump()是序列化方法。dumps()只完成了序列化爲str;dump()必須傳文件描述符,將序列化的str保存到文件中。
loads()和load()是反序列化方法。loads()只完成了反序列化;load()只接收文件描述符,完成了讀取文件和反序列化。
>>> data
{'url': 'http://jd.com', 'timeout': 20, 'images': 0}
>>> f = open('demo.json', 'w')
>>> json.dump(data, f) #將字典轉化爲json數據
>>> f.close()
# cat demo.json
{"url": "http://jd.com", "timeout": 20, "images": 0}
>>> f2 = open('demo.json')
>>> json.load(f2) #將json數據轉化爲字典
{'url': 'http://jd.com', 'timeout': 20, 'images': 0}