csdn既沒有做接口ip訪問量的限制,訪問量統計時也沒有做同一ip相同時間段的重複訪問重複計數的處理。這也時這個程序能夠刷訪問量的原因。
githup 地址:https://github.com/hailinli/accessCsdn
一、思路介紹
1、從頁面中 https://blog.csdn.net/linhai1028/article/list/2 解析出所有文章鏈接
2、依次訪問這些文章
由於csdn的瀏覽量要過30多秒後再一次看就又可一加了,所以我們設置一個定時器,每30秒後執行一次,所以我們的瀏覽量過萬也不是什麼事了。
二、代碼實現
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time : 18/6/24 下午8:39
# @Author : Lihailin<[email protected]>
# @Desc :
# @File : aceessCsdn.py
# @Software: PyCharm
from lxml import etree
import crawBase
import time
class AccessCsdn(crawBase.CrawBase):
'''
訪問csdn
'''
def getArticals(self, url):
'''
https://blog.csdn.net/linhai1028/article/list/2
解析所有博客鏈接
:param urls:
:return:
'''
c = self.get(url)
html = etree.HTML(c)
l = html.xpath('//div[@class="article-list"]//a/@href')
# print(l)
return l
def geAllArticals(self, urlBase):
'''
https://blog.csdn.net/linhai1028/article/list/1
list第
解析所有博客鏈接
:param urlBase:
:return:
'''
i = 1
urls = []
url = urlBase
while True:
# print('sfs'+url)
t = self.getArticals(url)
urls += t
if len(t) == 0:
break
i += 1
url = urlBase + '/article/list/%s' % i
return urls
def run(self, url, sec):
'''
刷url鏈接文章
:param url:
:param sec: 間隔時間
:return:
'''
urls = self.geAllArticals(url)
urls = list(set(urls))
# print(len(urls))
while True:
for url in urls:
# print(url)
self.get(url)
time.sleep(sec)
if __name__ == '__main__':
url = "https://blog.csdn.net/linhai1028/"
accessCsdn = AccessCsdn()
accessCsdn.run(url, 40)
三、使用
git clone git@github.com:hailinli/accessCsdn.git
cd acceseeCsdn.git
python accessCsdn.py
參考
環境
- python3
- requests2.18
- lxml4.2