Python獲取全年法定節假日時間

原創

Test_Box

2020-05-20 14:07

解析日曆接口

目標URL：https://wannianrili.51240.com/

1、模擬請求抓包

2、分析源碼結構

3、代碼邏輯

如果日期左上角標籤帶有休或班的字樣，則爲需要採集的目標日期；
歷遍class屬性判斷，是否存在wnrl_riqi_xiu（休）或者wnrl_riqi_ban（班）；
獲取span 標籤下的文本信息，代表具體日期以及節日名稱；
關鍵代碼：

response = s.get(url, headers=headers, params=payload)
element = etree.HTML(response.text)
html = element.xpath('//div[@class="wnrl_riqi"]')
print('In Working:', year_month)
for _element in html:
    # 獲取節點屬性
    item = _element.xpath('./a')[0].attrib
    if 'class' in item:
        if item['class'] == 'wnrl_riqi_xiu':
            tag = '休假'
        elif item['class'] == 'wnrl_riqi_ban':
            tag = '補班'
        else:
            pass
        _span = _element.xpath('.//text()')
        result.append({'Date': year_month + '-' + _span[0], 'Holiday': _span[1], 'Tag': tag})

4、完整代碼

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Author:      Joson
# @DateTime:    2020-05-19 10:25
# @Description: https://wannianrili.51240.com/
# @Version:     1.0

import csv
import requests
from lxml import etree

class WanNianRiLi(object):
    """萬年日曆接口數據抓取
    Params:year 四位數年份字符串
    """
    def __init__(self, year):
        self.year = year
        data = self.parseHTML()
        self.exportCSV(data)

    def parseHTML(self):
        """頁面解析"""
        url = 'https://wannianrili.51240.com/ajax/'
        s = requests.session()
        headers = {
            'Host': 'wannianrili.51240.com',
            'Connection': 'keep-alive',
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36',
            'Accept': '*/*',
            'Sec-Fetch-Site': 'same-origin',
            'Sec-Fetch-Mode': 'cors',
            'Sec-Fetch-Dest': 'empty',
            'Referer': 'https://wannianrili.51240.com/',
            'Accept-Encoding': 'gzip, deflate, br',
            'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
        }
        result = []
        # 生成月份列表
        dateList = [self.year + '-' + '%02d' % i for i in range(1, 13)]
        for year_month in dateList:
            s = requests.session()
            url = 'https://wannianrili.51240.com/ajax/'
            payload = {'q': year_month}
            response = s.get(url, headers=headers, params=payload)
            element = etree.HTML(response.text)
            html = element.xpath('//div[@class="wnrl_riqi"]')
            print('In Working:', year_month)
            for _element in html:
                # 獲取節點屬性
                item = _element.xpath('./a')[0].attrib
                if 'class' in item:
                    if item['class'] == 'wnrl_riqi_xiu':
                        tag = '休假'
                    elif item['class'] == 'wnrl_riqi_ban':
                        tag = '補班'
                    else:
                        pass
                    _span = _element.xpath('.//text()')
                    result.append({'Date': year_month + '-' + _span[0], 'Holiday': _span[1], 'Tag': tag})
        print(result)
        return result

    def exportCSV(self, data):
        """導出CSV"""
        headers = ['Date', 'Holiday', 'Tag']
        # 如果存入亂碼，添加 encoding='utf-8-sig'
        with open(self.year + 'Holiday.csv', 'w', newline='')as f:
            f_csv = csv.DictWriter(f, headers)
            f_csv.writeheader()
            f_csv.writerows(data)
            
if __name__ == '__main__':
    rili = WanNianRiLi('2020')

關於博主

喜歡就點贊 or 讚賞。
3.65元，一年365天繼續創作，謝謝!

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Python獲取全年法定節假日時間

解析日曆接口

1、模擬請求抓包

2、分析源碼結構

3、代碼邏輯

4、完整代碼

關於博主

Centos 7下安裝MariaDB（MySQL）教程

Python3 多線程(連接池)操作MySQL插入數據

Python 多圖片合併生成PDF

pymysql 增刪改查二次封裝

asyncio + aiohttp協程異步併發示例

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結