一、打開京東——》搜索ipad

第一次打開只加載了30個商品，往下拉纔會加載後30個商品，總共60個商品

我們f12打開Network檢查

得到這樣的url：

Request URL:

https://search.jd.com/s_new.php?keyword=ipad&enc=utf-8&qrst=1&rt=1&stop=1&vt=2&bs=1&wq=ipa&ev=exbrand_Apple%5E&page=4&s=90&scrolling=y&log_id=1581908729.84407&tpl=1_M&show_items=11714725279,100008348540,57520498136,100002716261,57520498128,56721987303,11795481337,43625057973,11244282613,100008348558,20764151799,43673117804,100000306383,14157116051,100008348556,62655564481,12966165063,57520498124,100002716273,100004245960,11244282617,11714725282,62655564477,56638765596,43673108398,11244282627,100000205038,11302227154,12966165074,100000206156

跟普通的url對比分析

https://search.jd.com/Search?keyword=ipad&enc=utf-8&qrst=1&rt=1&stop=1&vt=2&bs=1&wq=ipa&ev=exbrand_Apple%5E&page=3&s=60&click=0

對比參數：

多了幾個參數"&scrolling", "&log_id", "&show_items"，我們一個一個嘗試一遍

最終得到能夠訪問的url：

https://search.jd.com/Search?keyword=ipad&enc=utf-8&qrst=1&rt=1&stop=1&vt=2&bs=1&ev=exbrand_Apple%5E&page=5&s=121&scrolling=y

https://search.jd.com/Search?keyword=ipad&enc=utf-8&qrst=1&rt=1&stop=1&vt=2&bs=1&ev=exbrand_Apple%5E&page=6&s=151&scrolling=y

從而尋找規律得到：

"https://search.jd.com/Search?keyword=" + keyword + "&enc=utf-8&qrst=1&rt=1&stop=1&vt=1&stock=1&page=" + str(p) + "&s=" + str(1 + (p - 1) * 30) + "&click=0&scrolling=y"

得到一整頁的商品後，我們的目標是每一個商品的價格和其他，因此我們要得到每一個商品的鏈接

其中一個url：https://item.jd.com/57521237589.html，"57521237589"就是商品的編號

同樣打開f12檢查，並複製其XPath方便我們提取這樣的鏈接

複製得到的XPath：//*[@id="J_goodsList"]/ul/li[1]/div/div[1]/a 還需修改一下，把 "li[1]" 的 "[1]" 去掉，不然就只會得到一個商品的鏈接

html = etree.HTML(response.text)
html.xpath('//*[@id="J_goodsList"]/ul/li/div/div[1]/a/@href'):

二、獲取商品參數信息（價格，評論數......）

按f12檢查, 京東的價格是通過js加載的，直接獲取html的<span>是不能的

得到價格信息的url：https://p.3.cn/prices/mgets?skuIds=3575321949， "35753219491"是商品編號

返回是一個json文件，我們寫出如下代碼獲取

jsons = json.loads(response.text[0:-1])
price = jsons[0]['p']

三、獲取商品其他信息並最後寫入csv文件

直接在網頁源代碼html提取就行

寫入csv文件

with open("test.csv", "a", newline="") as csvfile:
    rows = ("商品名稱", "商品價格", "商品鏈接")
    writer = csv.writer(csvfile)
    writer.writerow(rows)

完整代碼參考：

GitHub: https://github.com/Tomy-Enrique/Spider/tree/master/JD_spider

Gitee（碼雲）: https://gitee.com/TomyEnrique/Spider/tree/master/JD_spider

一個路過的Developer

發佈了11 篇原創文章 · 獲贊 8 · 訪問量 6808

私信關注

Python爬蟲——爬取京東商品（價格，評論數）——2020-02-16

一、打開京東——》搜索ipad

二、獲取商品參數信息（價格，評論數......）

三、獲取商品其他信息並最後寫入csv文件

vue項目獲取富文本編輯器wangEditor內容導出爲word（html轉word格式並下載）

dotnet C# 創建 X11 應用時設置窗口背景顏色

Navicat安裝與激活教程

TDengine docker安裝方法

vue3組件通信與props

sapui5

Alpine Linux apk add DNS lookup error

部分JDK版本的發佈時間

工作中用到的腳本合集

合併代碼時Beyond Compare設置

LeetCode4.尋找兩個有序數組的中位數（有圖助於理解）

數據庫系統之（函數依賴，碼，範式，規範化）

git命令整理（個人向）

遞歸練習--快速排序

DataBase System Concept相關語句 (筆記)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結