文章目錄
1. 背景
最近做題的時候要寫一些題解,在把CodeForces
的題目複製下來的時候,數學公式的處理比較麻煩,所以我用Python
的urllib.request
和BeautifulSoup4
庫對題目信息進行了爬取,寫題解的時候時間節約了很多。
考慮到大家可能也會遇到同樣的問題,寫一篇筆記分享給大家。
2. 前期準備
安裝urllib
和BeautifulSoup
庫。
pip3 install urllib
pip3 install beautifulsoup4
3. 獲取網頁內容
以 CodeForces 1353 B. Two Arrays And Swaps 爲例。
# 導入庫
import urllib.request
import bs4
from bs4 import BeautifulSoup
# 題目屬性
problemSet = "1353"
problemId = "B"
# 題目鏈接
url = f"https://codeforces.com/problemset/problem/{problemSet}/{problemId}"
# 獲取網頁內容
html = urllib.request.urlopen(url).read()
# 格式化
soup = BeautifulSoup(html,'lxml')
# 存儲
data_dict = {}
# 找到主體內容
mainContent = soup.find_all(name="div", attrs={"class" :"problem-statement"})[0]
4. 內容處理
4.1. Limit
先從比較簡單的信息入手,找到題目標題、時間、和內存限制。
# Limit
# 找到題目標題、時間、和內存限制
# Title
data_dict['Title'] = f"CodeForces {problemSet} " + mainContent.find_all(name="div", attrs={"class":"title"})[0].contents[-1]
# Time Limit
data_dict['Time Limit'] = mainContent.find_all(name="div", attrs={"class":"time-limit"})[0].contents[-1]
# Memory Limit
data_dict['Memory Limit'] = mainContent.find_all(name="div", attrs={"class":"memory-limit"})[0].contents[-1]
定義函數,處理主體內容中詭異的空格和公式的三個美元符號$$$
。
def divTextProcess(div):
"""
處理<div>標籤中<p>的文本內容
"""
strBuffer = ''
# 遍歷處理每個<p>標籤
for each in div.find_all("p"):
for content in each.contents:
# 如果不是第一個,加換行符
if (strBuffer != ''):
strBuffer += '\n\n'
# 處理
if (type(content) != bs4.element.Tag):
# 如果是文本,添加至字符串buffer中
strBuffer += content.replace(" ", " ").replace("$$$", "$")
else:
# 如果是html元素,如span等,加上粗體
strBuffer += "**" + content.contents[0].replace(" ", " ").replace("$$$", "$") + "**"
# 返回結果
return strBuffer
4.2. Problem Description
獲取題目描述,由於題目描述的<div>
標籤沒有id
和class
屬性,這裏通過找列表中第10
個div
的方式來獲取。
# 處理題目描述
data_dict['Problem Description'] = divTextProcess(mainContent.find_all("div")[10])
4.3. Input
輸入描述
div = mainContent.find_all(name="div", attrs={"class":"input-specification"})[0]
data_dict['Input'] = divTextProcess(div)
4.4. Output
輸出描述
div = mainContent.find_all(name="div", attrs={"class":"output-specification"})[0]
data_dict['Output'] = divTextProcess(div)
4.5. Sample Input & Onput
輸入樣例,用代碼框環境包圍。
# Input
div = mainContent.find_all(name="div", attrs={"class":"input"})[0]
data_dict['Sample Input'] = "```cpp" + div.find_all("pre")[0].contents[0] + '```'
# Onput
div = mainContent.find_all(name="div", attrs={"class":"output"})[0]
data_dict['Sample Onput'] = "```cpp" + div.find_all("pre")[0].contents[0] + '```'
4.6. Note
樣例說明
# 若有樣例說明
if(len(mainContent.find_all(name="div", attrs={"class":"note"})) > 0):
div = mainContent.find_all(name="div", attrs={"class":"note"})[0]
data_dict['Note'] = divTextProcess(div)
4.7. Source
題目鏈接
data_dict['Source'] = '[' + data_dict['Title'] + ']' + '(' + url + ')'
5. 輸出
for each in data_dict.keys():
print('### ' + each + '\n')
print(data_dict[each].replace("\n\n**", "**").replace("**\n\n", "**") + '\n')
下面是最後的輸出結果
### Title
CodeForces 1353 B. Two Arrays And Swaps
### Time Limit
1 second
### Memory Limit
256 megabytes
### Problem Description
You are given two arrays $a$ and $b$ both consisting of $n$ positive (greater than zero) integers. You are also given an integer $k$.
In one move, you can choose two indices $i$ and $j$ ($1 \le i, j \le n$) and swap $a_i$ and $b_j$ (i.e. $a_i$ becomes $b_j$ and vice versa). Note that $i$ and $j$ can be equal or different (in particular, swap $a_2$ with $b_2$ or swap $a_3$ and $b_9$ both are acceptable moves).
Your task is to find the **maximum** possible sum you can obtain in the array $a$ if you can do no more than (i.e. at most) $k$ such moves (swaps).
You have to answer $t$ independent test cases.
### Input
The first line of the input contains one integer $t$ ($1 \le t \le 200$) — the number of test cases. Then $t$ test cases follow.
The first line of the test case contains two integers $n$ and $k$ ($1 \le n \le 30; 0 \le k \le n$) — the number of elements in $a$ and $b$ and the maximum number of moves you can do. The second line of the test case contains $n$ integers $a_1, a_2, \dots, a_n$ ($1 \le a_i \le 30$), where $a_i$ is the $i$-th element of $a$. The third line of the test case contains $n$ integers $b_1, b_2, \dots, b_n$ ($1 \le b_i \le 30$), where $b_i$ is the $i$-th element of $b$.
### Output
For each test case, print the answer — the **maximum** possible sum you can obtain in the array $a$ if you can do no more than (i.e. at most) $k$ swaps.
### Sample Input
// 這裏會有```cpp代碼環境,在這裏爲了展示方便去掉了
5
2 1
1 2
3 4
5 5
5 5 6 6 5
1 2 5 4 3
5 3
1 2 3 4 5
10 9 10 10 9
4 0
2 2 4 3
2 4 2 3
4 4
1 2 2 1
4 4 5 4
### Sample Onput
6
27
39
11
17
### Note
In the first test case of the example, you can swap $a_1 = 1$ and $b_2 = 4$, so $a=[4, 2]$ and $b=[3, 1]$.
In the second test case of the example, you don't need to swap anything.
In the third test case of the example, you can swap $a_1 = 1$ and $b_1 = 10$, $a_3 = 3$ and $b_3 = 10$ and $a_2 = 2$ and $b_4 = 10$, so $a=[10, 10, 10, 4, 5]$ and $b=[1, 9, 3, 2, 9]$.
In the fourth test case of the example, you cannot swap anything.
In the fifth test case of the example, you can swap arrays $a$ and $b$, so $a=[4, 4, 5, 4]$ and $b=[1, 2, 2, 1]$.
### Source
[CodeForces 1353 B. Two Arrays And Swaps](https://codeforces.com/problemset/problem/1353/B)
聯繫郵箱:[email protected]
Github:https://github.com/CurrenWong
歡迎轉載/Star/Fork,有問題歡迎通過郵箱交流。