HttpClient用法詳解
現在很多爬蟲程序都是用Python寫的,但是其實什麼語言都可以寫爬蟲,在Python流行之前,我瞭解到很多公司的爬蟲都是拿Java來寫,當然也可能有其他語言,閒言少敘,圓規正轉,由於我最近在學習.Net core,所以就嘗試着,用C#來寫爬蟲程序,因爲.Net core框架也是跨平臺的,輸個命令也能在Linux下跑,跟Python腳本的效果差不多。既然寫爬蟲,就免不了涉及發送HTTP請求相關的類庫,在python中比較常用的是requests庫,異步的有aiohttp庫,在C#中與之對應的就是HttpClient庫,也是支持異步高併發的庫,而且支持的非常好。
1. 搭建測試服務
在講發送Http請求之前,我們先要搭建好一個請求的服務或網站,當然咱也可以隨便找個網站發請求,但是隨便的網站不太利於學習,有個現成的服務就非常好,它能把你每次請求的參數和標頭信息都格式工整的返回來,非常利於測試和學習,這個服務就是大名鼎鼎的httpbin.org ,官方的服務比較卡 http://httpbin.org,可以自己搭建一個,非常簡單,也可以看我寫的搭建筆記 Docker搭建httpbin服務,也可以先玩我自己搭好的 http://zhousonglin.cn:8080/
2. 發送GET請求
發送GET請求的時候比較多,大部分的時候我們都發GET請求來獲取數據,POST請求一般只有在我們登陸驗證的時候會用到。下面的代碼就是我對Get請求的異步封裝方法,微軟官方也建議儘量用異步來實現業務,因爲好處多多,這裏就不再細說了。
/// <summary>
/// Get請求發送
/// </summary>
/// <param name="requestUrl">url地址</param>
/// <returns></returns>
public static async Task<string> HtmlGet(string requestUrl)
{
string responseBody = string.Empty;
using (HttpClient httpClient = new HttpClient())
{
httpClient.DefaultRequestHeaders.Add("Method", "Get");
httpClient.DefaultRequestHeaders.Add("KeepAlive", "false");
httpClient.DefaultRequestHeaders.Add("UserAgent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
HttpResponseMessage response = await httpClient.GetAsync(requestUrl);
// var response = await httpClient.GetStringAsync(requestUrl);
response.EnsureSuccessStatusCode();
responseBody = await response.Content.ReadAsStringAsync();
}
return responseBody;
}
使用方法:
string urlRequestGet = "http://zhousonglin.cn:8080/get";
string responseStr = string.Empty;
responseStr = HtmlGet(urlRequestGet).Result;
Console.WriteLine(responseStr);
執行效果:
3. 發送POST請求
POST請求傳參有兩種方式,一種是傳form類型的參數,一種是傳Json字符串類型的參數。
3.1 傳遞form類型參數
/ <summary>
/// Post請求發送
/// </summary>
/// <param name="requestUrl">url</param>
/// <param name="postParams">傳遞參數</param>
/// <returns></returns>
public static async Task<string> HtmlPost(string requestUrl,Dictionary<string, string> postParams)
{
string responseBody = string.Empty;
using (HttpClient httpClient = new HttpClient())
{
httpClient.DefaultRequestHeaders.Add("Method", "Post");
httpClient.DefaultRequestHeaders.Add("KeepAlive", "false");
httpClient.DefaultRequestHeaders.Add("UserAgent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
HttpContent postContent = new FormUrlEncodedContent(postParams);
HttpResponseMessage response = await httpClient.PostAsync(requestUrl, postContent);
response.EnsureSuccessStatusCode();
responseBody = await response.Content.ReadAsStringAsync();
}
return responseBody;
}
使用方法:
string urlRequestPost = "http://zhousonglin.cn:8080/post";
string responseStr = string.Empty;
Dictionary<string, string> postParams = new Dictionary<string, string>()
{
{"say","Hello" },
{"ask","question" }
};
responseStr = HtmlPost(urlRequestPost, postParams).Result;
Console.WriteLine(responseStr);
執行效果:
3.2 傳遞Json類型參數
/// <summary>
/// Post請求Json參數
/// </summary>
/// <param name="requestUrl"></param>
/// <param name="jsonParams"></param>
/// <returns></returns>
public static async Task<string> HtmlPostJson(string requestUrl, string jsonParams)
{
string responseBody = string.Empty;
using (HttpClient httpClient = new HttpClient())
{
httpClient.DefaultRequestHeaders.Add("Method", "Post");
httpClient.DefaultRequestHeaders.Add("KeepAlive", "false");
httpClient.DefaultRequestHeaders.Add("UserAgent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
HttpContent content = new StringContent(jsonParams);
content.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("application/json");
HttpResponseMessage response = await httpClient.PostAsync(requestUrl, content);
response.EnsureSuccessStatusCode();
responseBody = await response.Content.ReadAsStringAsync();
}
return responseBody;
}
使用方法:
public class User
{
public User()
{ }
public string Name {get;set;}
public string Sex {get; set;}
}
string urlRequestPost = "http://zhousonglin.cn:8080/post";
User user = new User()
{
Name = "Dahlin",
Sex = "male"
};
string jsonParam = JsonConvert.SerializeObject(user);
responseStr = HtmlPostJson(urlRequestPost, jsonParam).Result;
Console.WriteLine(responseStr);
執行效果:
4. 文件下載請求
爬蟲程序一般是用來爬取字符數據的,但有時候我們也爬取一些圖片或視頻類的文件,HttpClient也是支持文件下載的,方法封裝如下:
/// <summary>
/// 下載文件
/// </summary>
/// <param name="requestUrl"></param>
/// <param name="fileName"></param>
/// <returns></returns>
public static async Task HtmlDownloadFile(string requestUrl, string fileName)
{
using HttpClient httpClient = new HttpClient();
httpClient.DefaultRequestHeaders.Add("Method", "Get");
httpClient.DefaultRequestHeaders.Add("KeepAlive", "false");
httpClient.DefaultRequestHeaders.Add("UserAgent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
HttpResponseMessage response = await httpClient.GetAsync(requestUrl);
response.EnsureSuccessStatusCode();
await response.Content.ReadAsByteArrayAsync().ContinueWith(
(readBytestTask) =>
{
byte[] data = readBytestTask.Result;
using FileStream fs = new FileStream(fileName, FileMode.Create);
fs.Write(data, 0, data.Length);
fs.Flush();
fs.Close();
});
}
使用方法:
string urlPicture = "http://qn.zhousonglin.cn/DaGuanYuan34.jpg?imageslim";
HtmlDownloadFile(urlPicture, "1.jpg").Wait();
關於HttpClient庫,以上這些方法基本就足夠用了,當然還有一些比較深度的玩法,比如自行擴展消息處理器是HttpClientHandler,再比如添加Cookie發送,如下:
CookieContainer cookieContainer = new CookieContainer();
cookieContainer.Add(new Cookie("XXXXXX", "XXXXXXX"));
HttpClientHandler httpClientHandler = new HttpClientHandler()
{
CookieContainer = cookieContainer,
AllowAutoRedirect = true,
UseCookies = true
};
HttpClient httpClient = new HttpClient(httpClientHandler);
還有加入代理等等用法,大同小異,F12 HttpClientHandler一下就明白了,這裏就不再細說了,或者以後用到了我再總結一篇深度玩法,其實就是對官方公開的接口基類做一些自定義擴展和重寫。