Python爬蟲實戰之全國高校信息綜合查詢系統

原創

2018-12-26 22:02

鑑於小編本科專業爲遙感科學與技術專業，所以爬取了在全國高校信息綜合查詢系統查詢了遙感科學與技術專業到但現在爲止的所有院校。

爬取網址：https://gkcx.eol.cn/soudaxue/querySchoolSpecialty.html?&argspecialtyname=%E9%81%A5%E6%84%9F%E7%A7%91%E5%AD%A6%E4%B8%8E%E6%8A%80%E6%9C%AF&argzycengci=%E6%9C%AC%E7%A7%91

所需Python包：

1. BeautifulSoup

2. selenium

3. csv

具體代碼：

#!/usr/bin/python
# -*- coding: utf-8 -*-
# author:zhoulong_GISER
# -*- coding: utf-8 -*-
# blog:https://blog.csdn.net/qq_33356563
from bs4 import BeautifulSoup
from selenium import webdriver

def main():
    driver_path = r'E:\spiter\data\phantomjs.exe\phantomjs-2.1.1-windows\bin\phantomjs.exe'
    value = []
    driver = webdriver.PhantomJS(executable_path=driver_path)
    for i in range(1, 5, 1):
        url = 'https://gkcx.eol.cn/soudaxue/querySchoolSpecialty.html?&argspecialtyname=%E9%81%A5%E6%84%9F%E7%A7%91%E5%AD%A6%E4%B8%8E%E6%8A%80%E6%9C%AF&page=' + str(i)
        driver.get(url)
        data = driver.page_source
        dfcontent = BeautifulSoup(data, 'lxml')
        trs = dfcontent.find_all('tr')
        for tr in trs:
            tup1 = []
            for td in tr:
                if td.string!="學校名稱" and td.string!="專業名稱" and td.string!="重點專業" and td.string!="院校屬性"and td.string!="高校對比"and td.string!='\n':
                    if str(td.string)[-3:]=="...":
                        tdstring=str(td.string[0:-3])+"學"
                        tup1.append(tdstring)
                    else:
                        tup1.append(td.string)
            if len(tup1)!=0:
                value.append(tup1[0])
    #去除重複項
    lis = []
    for va in value:
        if va not in lis:
            lis.append(va)
    for li in lis:
        print(li)



if __name__ == '__main__':
    main()

效果展示(排名不分先後)：

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Python爬蟲實戰之全國高校信息綜合查詢系統

具體代碼：

效果展示(排名不分先後)：

淺談GIS中幾種常用的座標系統與投影系統

Excel 方格子插件、DIY工具箱

太陽方位角/天頂角名詞解釋及計算方法

DN值、地表反射率、表觀反射率、發射率、輻射亮度、亮溫名詞解釋

GIS中常用的高程系

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結