再也不用爲每月黨員彙報寫啥發愁了------python bs4

原創

maimaimaimai66

2018-09-06 01:48

由於黨員每月都要開黨員活動主題會議，還要寫會議體會很是麻煩，所以特地新學python爬取人民日報抓取主題關鍵字，斷斷續續忙了2天多，還是挺有意思的，Python真的超級強大，語法和其他語言比較像，大致瞭解一下就可以上手了~~

工具：python3.6、bs4、requests、pyinstaller

代碼寫的很白癡，僅供自己使用。

import requests
import os
import calendar
from bs4 import BeautifulSoup
key = input('請輸入要爬的關鍵字:')
monthstr = input('請輸入2018年起始月:')
daystr = input('請輸入2018年起始日:')
monthend = input('請輸入2018年終止月:')
dayend = input('請輸入2018年終止日:')
print("開始搜索2018年"+monthstr+"月"+daystr+"日至2018年"+monthend+"月"+dayend+"日關於“"+key+"”的數據信息...")
Hhttpfront= "http://paper.people.com.cn/rmrb/html/2018-"
Hhttpend="/nbs.D110000renmrb_01.htm"
Hhttpmi=""
print("文件存儲地址爲："+os.getcwd())
for i in range(int(monthstr),int(monthend)+1):
    monthRange = calendar.monthrange(2018,i)
    for j in range(int(daystr),int(monthRange[1])+1):
        Hhttpmi=str(i).zfill(2)+"/"+str(j).zfill(2)
        newsHtml = requests.get(Hhttpfront+Hhttpmi+Hhttpend)
        newsHtml.encoding="utf-8"
        soup = BeautifulSoup(newsHtml.text,"html.parser")
        for newsItem in soup.find_all('area'):
            url=Hhttpfront+Hhttpmi+"/"+newsItem.get('href')
            newsHtmlf = requests.get(url)
            newsHtmlf.encoding="utf-8"
            soupf = BeautifulSoup(newsHtmlf.text,"html.parser")
            alltag=soupf.find_all('h1').__add__(soupf.find_all('h3'))
            for newsItemf in alltag:
                if newsItemf.string!= None and newsItemf.string!=''and key in newsItemf.string:
                    print("標題：  "+newsItemf.string+"    期數url:"+url)
                    for ptext in soupf.find_all('p'):
                        if ptext!=None and ptext.string!=None:
                            f=open("爬蟲數據"+str(i).zfill(2)+str(j).zfill(2)+".txt","a+",encoding='utf-8')
                            f.write(ptext.string)
                    f.close()

最後使用pyinstaller 把.py轉成了.exe方便別人使用。

編程還是挺有用的蛤~

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

再也不用爲每月黨員彙報寫啥發愁了------python bs4

前端使用 Konva 實現可視化設計器（13）- 折線 - 最優路徑應用【思路篇】

再也不用爲每月黨員彙報寫啥發愁了------python bs4

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結