python獲取30天嘗試新事情文章粗略信息（新手向）

原創

2020-02-21 19:20

import requests
from bs4 import BeautifulSoup
import time
import csv

urls = ["http://30daydo.com/sort_type-new__day-0__is_recommend-0__page-{}".format(str(i)) for i in range(1, 31)]

j = 0

f = open('文件名.csv', 'w', encoding='utf-8', newline="")

csv_writer = csv.writer(f)
csv_writer.writerow(["文章名稱", "分類", "作者", "評論數", "瀏覽次數", "時間", "地址"])

for url in urls:
    j = j + 1
    html_code = requests.get(url)
    html_code.encoding = "utf-8"
    print("正在爬取", j, "頁，", html_code.status_code, ",", url)

    soup = BeautifulSoup(html_code.text, "html.parser")

    soup_2 = soup.find_all(class_="aw-question-content")

    # print(len(soup_2))
    for soup_3 in soup_2:
        some_data = soup_3.find_all("span", attrs={"class": "text-color-999"})[0].get_text()

        if some_data == "貢獻":
            some_data = soup_3.find_all("span", attrs={"class": "text-color-999"})[1].get_text()

        # print(some_data)
        # print(some_data)

        if "關注" in some_data:
            pls = some_data.split(" • ")[2]
            llcs = some_data.split(" • ")[3]
            times = some_data.split(" • ")[4]
        else:
            pls = some_data.split(" • ")[1]
            llcs = some_data.split(" • ")[2]
            times = some_data.split(" • ")[3]

        data = {
            "name": soup_3.find("h4").find("a").get_text(),
            "fl": soup_3.find("a", attrs={"class": "aw-question-tags"}).get_text(),
            "author": soup_3.find("a", attrs={"class": "aw-user-name"}).get_text(),
            "pls": pls,
            "llcs": llcs,
            "time": times,
            "url": soup_3.find("h4").find("a").get("href"),

        }
        # print(data)

        csv_writer.writerow(
            [data["name"], data["fl"], data["author"], data["pls"], data["llcs"], data["time"], data["url"]])

    time.sleep(2)

f.close()

寫出excel截圖：

Ferencz

發佈了13 篇原創文章 · 獲贊 3 · 訪問量 2430

私信關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python獲取30天嘗試新事情文章粗略信息（新手向）

[軟件工具百科] 互聯網資源歷史快照歸檔站點與數字圖書館

杭州的 IT 崩盤了麼？

VS2022 解決方案打不開 .NET Framework 4.0 、 4.5 等老項目

Vue3 運行可以，build 打包發佈報錯，app.config.globalProperties 用法坑

既然測試也要求寫代碼，那乾脆讓開發兼任測試不就好了嗎？

程序員常見的文本查看工具

ITSM落地經驗之建設藍圖規劃

PDF 補丁丁 1.0.2 版更新

奇怪！應用的日誌呢？？

scrapy+selenium獲取嗶哩嗶哩排行榜（應援榜）（動態加載）

python+selenium登陸攜程網（解決滑塊驗證）（新手向）

python獲取冒險島最新新聞公告（新手向）

python對excel的基本操作（冒險島新聞公告爲例）（新手向）

通過開發者工具快速獲得鍵盤鍵代碼

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結