Python實戰 | 使用代理IP刷CSDN博客訪問量

原創

Xylon_

2019-08-28 13:30

聲明：僅供學習交流，請勿用於不正當用途

前置技能：Python爬蟲 | 代理IP的獲取和使用

通過之前的學習，我們成功獲取代理IP，有了代理IP，加上僞裝用戶UA，referer(告訴網站你是通過什麼渠道進入這裏的)，我們可以完全製造一個虛擬的訪客，來幫助我們達到一些目的，比如刷訪問量

首先是代理IP的獲取，直接從之前的博客搬過來：

from bs4 import BeautifulSoup
import requests
import random
import concurrent.futures,os
headers = {'Upgrade-Insecure-Requests':'1',
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36',
    'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Encoding':'gzip, deflate, sdch, br',
    'Accept-Language':'zh-CN,zh;q=0.8',
    'Connection':'close',
    }

ip_url = 'http://httpbin.org/ip'

def get_ip_list(url):
    page = requests.get(url,headers=headers)
    soup = BeautifulSoup(page.text,'lxml')
    # print(soup)
    ips = soup.find_all('tr')
    ip_list = []
    for i in range(1,len(ips)):
        ip_info = ips[i]
        td = ip_info.find_all('td')
        ip_list.append(td[1].text + ':'+ td[2].text)
    ip_set = set(ip_list)
    ip_list = list(ip_set)      #去重
    print(ip_list)
    #true_ip = []
    with concurrent.futures.ThreadPoolExecutor(len(ip_list)) as x:
        for ip in ip_list:
            x.submit(ip_test,ip)

def ip_test(ip):
    proxies = {
        'http': 'http://' + ip,
        'https': 'https://' + ip,
    }
    print(proxies)
    try:
        response = requests.get(ip_url,headers=headers,proxies=proxies,timeout=3)
        if response.status_code == 200:
            with open('可用IP.txt','a') as f:
                f.write(ip)
                f.write('\n')
            print('測試通過')
            print(proxies)
            print(response.text)
    except Exception as e:
        print(e)

def get_random_ip(ip_list):
    proxy_list = []
    for ip in ip_list:
        proxy_list.append(ip)
    proxy_ip = random.choice(proxy_list)
    proxies = {'http': 'http://' + proxy_ip, 'https': 'https://' + proxy_ip}
    return proxies

if __name__ == '__main__':
    url = 'https://www.xicidaili.com/wt'
    if os.path.exists('可用IP.txt'):
        os.remove('可用IP.txt')
    get_ip_list(url)
    get_ip_list(url+'/2')

獲取到的代理IP儲存在可用IP.txt 文件裏，刷訪問量的主程序我們另外新建一個py文件

首先是頭文件和多個可供使用的僞裝UA，referer，以及目標網站：

import requests
import random
import time

user_agent_list=[
            'Mozilla/5.0(compatible;MSIE9.0;WindowsNT6.1;Trident/5.0)',
            'Mozilla/4.0(compatible;MSIE8.0;WindowsNT6.0;Trident/4.0)',
            'Mozilla/4.0(compatible;MSIE7.0;WindowsNT6.0)',
            'Opera/9.80(WindowsNT6.1;U;en)Presto/2.8.131Version/11.11',
            'Mozilla/5.0(WindowsNT6.1;rv:2.0.1)Gecko/20100101Firefox/4.0.1',
            'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER',
            'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)',
            'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 SE 2.X MetaSr 1.0',
            'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Maxthon/4.4.3.4000 Chrome/30.0.1599.101 Safari/537.36',
            'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 UBrowser/4.0.3214.0 Safari/537.36'
        ]
referer_list=[
            'https://blog.csdn.net/Xylon_/article/details/100053138',
            'http://blog.csdn.net/',
            #'https://www.baidu.com/link?url=TVS47tYso1NWxFTD8ieQOOe5q3HpJEdFDAXcGZb_F6ooFilKVeXTt7zTUJgZ0jSr&amp;wd=&amp;eqid=b5f9b4bd00121a9e000000035d60fa47'
        ]

url = 'https://blog.csdn.net/Xylon_/article/details/100053138'

接下來是主程序

讀取文件：

if __name__ == '__main__':
    ip_list = []
    with open('可用IP.txt','r') as f:
        while True:
            line = f.readline()
            if not line:
                break
            line = line.strip('\n')
            ip_list.append(line)
    print(ip_list)

然後使用所有的代理IP進行模擬訪問：

拼接proxies代理IP地址，然後headers隨機獲取UA和referer，請求頁面，一次虛擬的"訪問"就完成了

    for ip in ip_list:
        proxies = {
            'http': 'http://' + ip,
            'https': 'https://' + ip,
        }
        headers = {
            'User-Agent':random.choice(user_agent_list),
            'Referer':random.choice(referer_list)
        }
        try:
            page = requests.get(url, headers=headers, proxies=proxies,timeout=3)
            if page.status_code == 200:
                print('可取 '+ str(proxies))
                time.sleep(random.randint(5,30))
        except Exception as e:
            print(e)

訪問頁面是我的上一篇博客：https://blog.csdn.net/Xylon_/article/details/100053138

測試過程中發現CSDN做了反作弊處理，同一時間頻繁訪問將會被視爲非正常行爲，因此間隔時間隨機取值(30s左右較好)

測試對比

刷之前：

刷之後：

大約十次有效訪問，成功率一半一半，如果延長時間或採取其他應對措施可能會更好

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Python實戰 | 使用代理IP刷CSDN博客訪問量

Python實戰 | 爬取天氣信息並數據可視化

Python pandas用法總結

Python 結巴分詞——自然語言處理之中文分詞器

不使用除法來計算兩個正整數的除法操作

編譯原理——FIRST集與FOLLOW集

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結