爬取知乎熱搜存取到redis

原創

2020-06-23 03:06

import pymysql
import requests
from lxml import etree
from hashlib import sha1
import pickle
import zlib
import redis
def paqu():
    url = 'https://www.zhihu.com/hot'
    headers = {
        'Host': 'www.zhihu.com',
        'Accept-Language': 'zh-CN,zh;q=0.8,en;q=0.6',
        'Connection': 'keep-alive',
        'Pargma': 'np-cache',
        'Cookie': '自己的cookie',
        'Cache-Control': 'no-cache',
        'Upgrade-Insecure-Requests': '1',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'User-Agent': 'Mozilla/5.0 (Macintosh; Inter Mac OS X 10_12_4) '
                      'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36',
    }


    response = requests.get(url,headers=headers)
    html_content = response.content

    html = etree.HTML(html_content)
    links = html.xpath('//div[contains(@class,"HotItem-content")]/a/@href')
    client = redis.Redis(host='',password='',port=6379,db=1)

    hasher_proto = sha1()
    print(links)
    for link in links:
        hasher = hasher_proto.copy()
        #將 url 處理爲 SHA1 摘要
        hasher.update(link.encode("utf-8"))
        field_key = hasher.hexdigest()
        if not client.hexists('zhihu',field_key):
            html_page = requests.get(link,headers=headers).text
            zipped_page = zlib.compress(pickle.dumps(html_page))
            client.hset('zhihu',field_key,zipped_page)
    print('總共緩存了{}個頁面'.format(client.hlen('zhihu')))

if __name__ == '__main__':
    paqu()

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

爬取知乎熱搜存取到redis

釘釘打卡速度慢

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

Nginx R31 doc 官方文檔-01-nginx 如何安裝

Python 潮流週刊#51：用 Python 繪製美觀的圖表

Qt/C++音視頻開發74-合併標籤圖形/生成yolo運算結果圖形/文字和圖形合併成一個/水印濾鏡

挑戰程序設計競賽 2.2章習題 POJ - 3617 Best Cow Line 貪心

字節面試：MySQL什麼時候鎖表？如何防止鎖表？

.NET8連接SQL SERVER 2008 R2 報：證書鏈是由不受信任的頒發機構頒發的

golang開發環境搭建(win10)

python計算機視覺學習筆記——PIL庫的用法

Centos7無GUI環境下的Selenium+Chrome環境配置

Linux-結束進程（kill）

Linux-top指令

requests跳過ssl驗證

docker安裝配置mysql/redis/nginx

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結