1.3 爬蟲的請求與響應

原創

2020-06-22 06:13

# requests 請求和相應 通常：通過瀏覽器 python:腳本發送 .模擬瀏覽器請求環境，獲取數據
# 數據類型：html xml text img music 視頻
# 自動化腳本
# 爬蟲：目標和分析、實施

import requests

# 1.發起請求：http協議
# GET：地址欄請求，可以攜帶數據（大小限制），數據顯示
# POST：可以攜帶數據（大小無限制），數據隱式
# 兩種方式傳參不同
resp_get_1=requests.get(url="http://httpbin.org/get",params={"name":"xiaoming"})
resp_get_2=requests.post(url="http://httpbin.org/post",data={"name":"xiaoming"})

# 2.傳RUL參數,字典存儲
# GET params={}
# POST data={}

# 3.查看請求地址
url_get_1 = resp_get_1.url
print(url_get_1)
url_get_2 = resp_get_2.url
print(url_get_2)
# https://maoyan.com/?name=xiaoming ？表示攜帶參數name=value &間隔 name=value

# 4.查看狀態碼 ：成功：200 ；無法加載:404/403； 服務器異常：500,505；
print(resp_get_1.status_code,resp_get_2.status_code)

# 5.編碼格式，中文亂碼
resp = requests.get(url="http://www.baidu.com/")
coding = resp.encoding = "utf-8"
print("編碼： ", coding)

# 6.獲取相應數據
text = resp.text  # 文本
print(text)
content = resp.content  # 二進制
print(content)

# 7.
#  headers
url = "http://www.qianlima.com/"
# 冒號放引號外面，它爲字典結構
head = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36"
}
resp = requests.get(url=url,headers=head)
print(resp.status_code)

# 8.IP代理
# 9.設置超時代理
proxies={"http":"110.83.46.180","https":"110.83.46.180"}
resp = requests.get(url=url, headers=head, proxies=proxies, timeout=10)
print(resp.status_code)

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

1.3 爬蟲的請求與響應

jsp - 1.1eclipse關聯tomcat查看源碼

1.6 爬取貓眼網站信息並存儲到數據庫（Top100榜、最受期待榜、指定影院的影片信息、音樂榜）

maven - 1.1 maven的配置（idea爲例）+簡單介紹

git - git+idea的使用方法

操作系統 - 使用google guava庫Monitor 解決消費者問題

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結