Python爬蟲(應朋友之邀)-功能實現版

原創

swinblacksea

2018-11-03 21:18

環境：win10 py37

工具：pyCharm anaconda

主要包：BeautifulSoup,re

代碼：

#!/usr/bin/python
# -*- coding: UTF-8 -*-
import re
from urllib import request

from bs4 import BeautifulSoup

html = request.urlopen("http://data.eastmoney.com/report/20181101/APPISWTR4upPASearchReport.html")
bs = BeautifulSoup(html, "html.parser")
print("title")
print(bs.title)

print("meta")
links = bs.find_all("meta")
count = 0
for link in links:
    count = count + 1
    print(count)
    attrs = link.attrs
    if "name" in attrs.keys():
        print("name:", attrs['name'])
    if "http-equiv" in attrs.keys():
        print("httpEquiv:", attrs['http-equiv'])
    if "content" in attrs.keys():
        print("content:", attrs['content'])

print("p")
ps = bs.find_all("p")
index = -1
for p in ps:
    contents = p.contents
    if len(contents) > 0:
        content = contents[0]
        if str(content).__contains__("盈利預測"):
            index = ps.index(p)
            break
needContent = ""
if index != -1:
    index = index + 2
    needContent = str(ps[index])
print(needContent)

match1 = re.search(r'[\u4e00-\u9fa5]{4}20[0-9]{2}[\u4e00-\u9fa5]-20[0-9]{2}[\u4e00-\u9fa5]', needContent)
match2 = re.search(r'EPS爲.*元', needContent)
match3 = re.search(r'([\u4e00-\u9fa5]{4}“).*”[\u4e00-\u9fa5]{2}', needContent)
print(match1.group())
print(match2.group())
print(match3.group())

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Python爬蟲(應朋友之邀)-功能實現版

【SQL進階】CASE語句的使用

npm error Cannot read properties of null (reading 'isDescendantOf')

HttpsConnectionPool(host='pypi.python.org', port=443)

讀取excel文件後計算指定行列笛卡兒積並寫出

Python爬蟲(應朋友之邀)-功能實現版

Docker Pulling fs layer net/http: TLS handshake timeout

ElasticSearch Tune for indexing speed translation

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結