網絡爬蟲前奏之實例爬取京東商品004

import requests
url = "https://item.jd.com/100006349791.html"
try:
    r=requests.get(url)
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    #[:1000]是字符串切片,前1000個字符
    print(r.text[:1000])
except:
    print("爬取失敗")

因爲京東有反爬所以報錯:

 解決後的代碼:

import requests
headers={
    "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.100 Safari/537.36",
    "Cookie":"unpl=V2_ZzNsbUJSFxJ0AUVSZ0kOAWUfFwgXVF8dcwoVSH5MC1FhChJbQwNEEGlJKFRzEVQZJkB8XkJfQwklTShUehhaAWAzEVxBVl8UcBRGXWoZVQ5kBRlZRmdDJXUJR1V6GloGbgIibXJXQSV0OEZdexhYBmECGlpyUkZFdQhBBi8FXVdkUA5YS1FLCXwBQVVnHFwFZVBHVRBXSx13OEBS; __jda=122270672.1810527096.1583840132.1583840132.1587205676.1; __jdv=122270672|kong|t_1000027280_100756|zssc|14e60827-ac53-4dd2-973b-4dfe78170e64-p_1999-pr_2191-at_100756|1587205676408; __jdc=122270672; __jdu=1810527096; shshshfpa=5c31968b-41e5-cba8-dba2-b39569a132c9-1587205677; shshshfpb=um3kANbAFjMv5MDiflvoSBQ%3D%3D; 3AB9D23F7A4B3C9B=D35J7U2PGKXJ2GUPPEPRNBKPJWDZYS34NGT3TOIN5D7WXWYROAMHFI2GO6BGTEFJHQO6BPSSQX7BTCQ35PFLX64BUY; areaId=6; ipLoc-djd=6-379-388-0; shshshfp=0ca00c61b8d65ee5c54c217cb3fe41ca"
}
url = "https://item.jd.com/100006349791.html"
try:
    r=requests.get(url,headers=headers)
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    print(r.text[:1000])
except:
    print("爬取失敗")

 解決方法:

添加header,有很多網上隨便找一個就行

cookie:

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章