爬取資訊網站的新聞並保存到excel

原創

2018-09-11 02:33

#!/usr/bin/env python
#* coding:utf-8 *
#author:Jacky

from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from bs4 import BeautifulSoup
import xlwt

driver = webdriver.Firefox()
driver.implicitly_wait(3)
first_url = 'http://www.yidianzixun.com/channel/c6'
driver.get(first_url)
driver.find_element_by_class_name('icon-refresh').click()
for i in range(1, 90):
driver.find_element_by_class_name('icon-refresh').send_keys(Keys.DOWN)
soup = BeautifulSoup(driver.page_source, 'lxml')
print soup
articles=[]
for article in soup.findall(class='item doc style-small-image style-content-middle'):
title= article.find(class_='doc-title').gettext()
source=article.find(class='source').gettext()
comment=article.find(class='comment-count').get_text()
link='http://www.yidianzixun.com'+article.get('href')
articles.append([title,source,comment,link])
print articles
driver.quit()

wbk=xlwt.Workbook(encoding='utf-8')
sheet=wbk.add_sheet('yidianzixun')
i=1
sheet.write(0, 0, 'title')
sheet.write(0, 1, 'source')
sheet.write(0, 2, 'comment')
sheet.write(0, 3, 'link')
for row in articles:
#print row[0]
sheet.write(i,0,row[0])
sheet.write(i,1,row[1])
sheet.write(i,2,row[2])
sheet.write(i,3,row[3])
i +=1
wbk.save(r'zixun\zixun.xls')

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

爬取資訊網站的新聞並保存到excel

Go語言入門

從gitlab上拉取代碼並一鍵部署

Jenkins安裝部署

ELK日誌平臺之ElasticSearch

使用saltstack管理用戶公鑰

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結