使用pycharm工具，採用scrapy爬蟲框架，抓取平頂山學院新聞網（http://news.pdsu.edu.cn/）站上的內容，具體要求：抓取新聞欄目，將結果寫入lm.txt。

原創

2020-06-15 01:05

我知道很多人找到了博客，但是我就是要再寫一篇，沒有爲什嗎，就是因爲我看不懂他們寫的。
pycharm的安裝庫都和另一個不一樣，這個會一樣嗎？
這個題的第一步我們需要已經安裝好了的庫，不會的看上一篇博客博客鏈接
我們打開pycharm，找到View --> Tool Windows --> Terminal（這個意思是終端就相當於一個cmd的窗口，只不過在pycharm中）,然後我們會看到一個路徑
沒有也沒有關係，我們用命令行的方式走一個就行,such as

cd D:\python

然後你輸入

scrapy startproject suibian
#scrapy startproject 項目名

我們發現多了點東西
接着走，

cd xinwen

進入xinwen的目錄中，執行

scrapy genspider lm news.pdsu.edu.cn
#這裏邊lm是文件名稱，news.pdsu.edu.cn 是你要爬取的域名

執行

scrapy crawl lm

然後你看圖中多了一個文件lm.py，毫不猶豫的打開,粘貼下面代碼

# -*- coding: utf-8 -*-
import scrapy
from bs4 import BeautifulSoup
import re


class lmSpider(scrapy.Spider):
    name = 'lm'
    allowed_domains = ['pdsu.cn']
    start_urls = ['http://news.pdsu.edu.cn/']

    def parse(self, response):
        html_doc = response.text
        soup = BeautifulSoup(html_doc, 'html.parser')
        re = soup.find_all('h2', class_='fl')
        content = ''
        for lm in re:
            print(lm.text)
            content += lm.text + '\n'
        with open('e:\\lm.txt', 'a+') as fp:
            fp.writelines(content)
            # 文章內容保存在e盤的lm.text中

然後你就能看你的E盤
至此完結。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

使用pycharm工具，採用scrapy爬蟲框架，抓取平頂山學院新聞網（http://news.pdsu.edu.cn/）站上的內容，具體要求：抓取新聞欄目，將結果寫入lm.txt。

京東輪播圖源代碼

編寫函數接收一個時間（時分秒），返回該時間的下一秒。

軟件工程--總體設計過程包括那些步驟---軟件設計過程中應該遵循那些基本原理--模塊獨立性

使用pycharm工具，採用scrapy爬蟲框架，抓取平頂山學院新聞網（http://news.pdsu.edu.cn/）站上的內容，具體要求：抓取新聞欄目，將結果寫入lm.txt。

web前端面試題（1）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結