Python 學習之常用內建模塊(HTMLParser)

Python 利用 HTMLParser ,可以把網頁中的文本、圖像等解析出來。

實例

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

' HTMLParser '

__author__ = 'Kevin Gong'

from html.parser import HTMLParser
from urllib import request

class EventSearchParser(HTMLParser):
    def __init__(self):
        super().__init__()
        self.flag = 0  # 狀態 1:目標標籤 0:不是目標標籤

    def handle_starttag(self, tag, attrs):
        if tag == 'h3' and ('class', 'event-title') in attrs:  # 篩選會議名稱
            self.flag = 1
        elif tag == 'time' and 'datetime' in attrs[0]:  # 篩選會議時間
            self.flag = 1
        elif tag == 'span' and ('class', 'event-location') in attrs:  # 篩選會議地點
            self.flag = 1

    def handle_data(self, data):
        if self.flag:
            print(data)
            self.flag = 0  # 還原狀態

with request.urlopen('https://www.python.org/events/python-events/') as f:
    data = f.read().decode('utf-8')

parser = EventSearchParser()
parser.feed(data)

結果:

PyCon CZ 2020 (canceled)
05 June – 07 June
Ostrava, Czech Republic
PyLondinium 2020 (postponed)
05 June – 07 June
London, UK
PyCon Odessa 2020
13 June – 14 June
Odessa, Ukraine
Python Web Conference 2020 (Online-Worldwide)
17 June – 19 June
https://2020.pythonwebconf.com
Better Python Unit Tests
23 June
Online
FlaskCon (online)
04 July – 05 July
Online
Python fwdays'20
23 May
Online
Python fwdays'20
16 May
Online
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章