26. 讀寫json數據

在web應用中常用JSON(JavaScript Object Notation)格式傳輸數據,例如:

  1. 利用http://httpbin.org/API對發送的http請求進行觀測。

  2. 爬蟲程序利用Spalsh渲染引擎渲染頁面。

要求:在Python中讀取json數據。

解決方案:

標準庫中的json模塊,使用其中loads()、dumps()方法完成json數據的讀寫。


  • 對於requests模塊:
>>> import requests

>>> r = requests.get('http://httpbin.org/headers')

>>> r
<Response [200]>

>>> r.content
b'{\n  "headers": {\n    "Accept": "*/*", \n    "Accept-Encoding": "gzip, deflate", \n    "Host": "httpbin.org", \n    "User-Agent": "python-requests/2.22.0"\n  }\n}\n'

>>> r.text
'{\n  "headers": {\n    "Accept": "*/*", \n    "Accept-Encoding": "gzip, deflate", \n    "Host": "httpbin.org", \n    "User-Agent": "python-requests/2.22.0"\n  }\n}\n'
  • 對於json模塊:

json數據解析(反序列化):json.loads()

>>> import json

>>> d = json.loads(r.text)              #python解析爲字典

>>> d
{'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.22.0'}}

>>> d['headers']
{'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.22.0'}

>>> d['headers']['Host']
'httpbin.org'

  • 方案示例:

Spider(Json) → Splash → Web

創建splash容器

$ sudo docker pull scrapinghub/splash

$ sudo docker run -itd -p 8050:8050 scrapinghub/splash

數據序列化爲json數據:json.dumps()

>>> import requests

>>> import json

>>> requests.post
<function post at 0x7fad7195c378>

>>> url = 'http://localhost:8050/render.html'

>>> headers = {'content-type': 'application/json'}

>>> data = {'url': 'http://jd.com', 'timeout': 20, 'images': 0}             #以京東爲例,timeout指定渲染時間,images爲0表示不渲染圖片

>>> json_data = json.dumps(data)                #將python字典轉換爲json數據,即序列化

>>> json_data
'{"url": "http://jd.com", "timeout": 20, "images": 0}'

>>> r2 = requests.post(url, headers=headers, data=json_data)

>>> r2
<Response [200]>

>>> r2.text

'<!DOCTYPE html><html class="o2_mini csstransitions cssanimations o2_webkit o2_safari o2_602"><head>\n    <meta charset="utf8" version="1">\n    <title>京東(JD.COM)-正品低價、品質保障、配送及時、輕鬆購物!</title>\n    <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=yes">\n    <meta name="description" content="京東JD.COM-專業的綜合網上購物商城,銷售家電、數碼通訊、電腦、家居百貨、服裝服飾、母嬰、圖書、食品等數萬個品牌優質商品.便捷、誠信的服務,爲您提供愉悅的網上購物體驗!">\n   #中間省略  M11.4,0.9C11.4,0.9,11.4,0.9,11.4,0.9L11.4,0.9z"></path></symbol></defs></svg><ul class="elevator_list"><li class="elevator_item"><a class="elevator_lk" href="javascript:void(0);" clstag="h|keycount|core|elvt_01" tabindex="-1" aria-hidden="true"><span class="elevator_lk_bg"></span><span class="elevator_lk_txt">京東秒殺</span></a></li><li class="elevator_item"><a class="elevator_lk" href="javascript:void(0);" clstag="h|keycount|core|elvt_02" tabindex="-1" aria-hidden="true"><span class="elevator_lk_bg"></span><span class="elevator_lk_txt">特色優選</span></a></li><li class="elevator_item"><a class="elevator_lk" href="javascript:void(0);" clstag="h|keycount|core|elvt_03" tabindex="-1" aria-hidden="true"><span class="elevator_lk_bg"></span><span class="elevator_lk_txt">頻道廣場</span></a></li><li class="elevator_item"><a class="elevator_lk" href="javascript:void(0);" clstag="h|keycount|core|elvt_04" tabindex="-1" aria-hidden="true"><span class="elevator_lk_bg"></span><span class="elevator_lk_txt">爲你推薦</span></a></li><li class="elevator_item"><a class="elevator_lk elevator_lk2" href="//jdcs.jd.com/chat/index.action?venderId=1&amp;entry=jd_web_jimi_jdhome" target="_blank" clstag="h|keycount|core|elvt_05"><span class="elevator_lk_bg"></span><svg><use xlink:href="#icon_timline"></use></svg><span class="elevator_lk_txt">客服</span></a></li><li class="elevator_item"><a class="elevator_lk elevator_lk2" href="//surveys.jd.com/index.php?r=survey/index/sid/889711/newtest/Y/lang/zh-Hans" target="_blank" clstag="h|keycount|core|elvt_06"><span class="elevator_lk_bg"></span><svg><use xlink:href="#icon_feedback"></use></svg><span class="elevator_lk_txt">反饋</span></a></li></ul><a class="elevator_totop" href="javascript: void(0);" clstag="h|keycount|core|elvt_07" tabindex="-1" aria-hidden="true"><span class="elevator_totop_icon">\ue606</span><span class="elevator_totop_txt">頂部</span></a></div></div></div>\n<script type="text/javascript">\n    window.point.dom = new Date().getTime();\n</script>\n\n\n\n\n<script type="text/javascript" src="//misc.360buyimg.com/mtd/pc/index_2019/1.0.0/static/js/runtime.js"></script>\n\n<script type="text/javascript" src="//misc.360buyimg.com/mtd/pc/index_2019/1.0.0/static/js/index.chunk.js"></script>\n\n<script type="text/javascript">\n    window.point.js = new Date().getTime();\n</script>\n</body></html>'

json.loads()json.dumps()外,json模塊還有json.load()json.dump()

dumps()和dump()是序列化方法。dumps()只完成了序列化爲str;dump()必須傳文件描述符,將序列化的str保存到文件中。

loads()和load()是反序列化方法。loads()只完成了反序列化;load()只接收文件描述符,完成了讀取文件和反序列化。

>>> data
{'url': 'http://jd.com', 'timeout': 20, 'images': 0}

>>> f = open('demo.json', 'w')

>>> json.dump(data, f)              #將字典轉化爲json數據

>>> f.close()
# cat demo.json 
{"url": "http://jd.com", "timeout": 20, "images": 0}
>>> f2 = open('demo.json')

>>> json.load(f2)               #將json數據轉化爲字典
{'url': 'http://jd.com', 'timeout': 20, 'images': 0}

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章