python學習之代理的使用

原創

2021-12-25 21:40

今天跟大家分享的文章主要是介紹了Python爬蟲使用代理IP的實現，文中通過示例代碼介紹的非常詳細，可能對很多的爬蟲工作者來說簡直是小兒科的東西，但是對一些剛入行的小白爬蟲來說還是蠻有學習價值的，有這方面需求的小夥伴跟着我一起來學習吧。

當我們在使用爬蟲進行數據獲取時，如果目標網站對訪問的速度或次數要求較高，那麼你的 IP 就很容易被封掉，也就意味着在一段時間內無法再進行下一步的工作。這時候代理的重要性就顯示出來了，因爲不管網站怎麼封，只要你的程序一直都有新的ip去訪問就可以繼續進行下一步的研究。

本文除了和大家交流下代理ip的重要性以外也會向大家分享下適合新手爬蟲使用的代理模式，那就是動態隧道代理，網絡上有很多代理商都有提供，但是各家質量有所區別，大家根據需要實際測試爲準。這裏分享下普便的隧道代理的使用方式：

// 要訪問的目標頁面
string targetUrl = "http://httpbin.org/ip";


// 代理服務器(產品官網 www.16yun.cn)
string proxyHost = "http://t.16yun.cn";
string proxyPort = "31111";

// 代理驗證信息
string proxyUser = "username";
string proxyPass = "password";

// 設置代理服務器
WebProxy proxy = new WebProxy(string.Format("{0}:{1}", proxyHost, proxyPort), true);


ServicePointManager.Expect100Continue = false;

var request = WebRequest.Create(targetUrl) as HttpWebRequest;

request.AllowAutoRedirect = true;
request.KeepAlive = true;
request.Method    = "GET";
request.Proxy     = proxy;

//request.Proxy.Credentials = CredentialCache.DefaultCredentials;

request.Proxy.Credentials = new System.Net.NetworkCredential(proxyUser, proxyPass);

// 設置Proxy Tunnel
// Random ran=new Random();
// int tunnel =ran.Next(1,10000);
// request.Headers.Add("Proxy-Tunnel", String.valueOf(tunnel));


//request.Timeout = 20000;
//request.ServicePoint.ConnectionLimit = 512;
//request.UserAgent = "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.82 Safari/537.36";
//request.Headers.Add("Cache-Control", "max-age=0");
//request.Headers.Add("DNT", "1");


//String encoded = System.Convert.ToBase64String(System.Text.Encoding.GetEncoding("ISO-8859-1").GetBytes(proxyUser + ":" + proxyPass));
//request.Headers.Add("Proxy-Authorization", "Basic " + encoded);

using (var response = request.GetResponse() as HttpWebResponse)
using (var sr = new StreamReader(response.GetResponseStream(), Encoding.UTF8))
{
    string htmlStr = sr.ReadToEnd();
}

以上就是關於爬蟲中如何使用代理的示例，只是簡單的分享了一部分，關於更多的我們下次再做更深的交流，以上就是本文的全部內容，希望對大家的學習有所幫助。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python學習之代理的使用

大模型將進一步推動AI數據發展，行業數據類型更加豐富

php 主動關閉連接，並繼續執行後續程序

ThreadLocal引用測試

Haskell 實現京東優惠券爬取的詳細步驟解析

從NoSQL到NewSQL——10年代大數據浪潮下的技術革新

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結