【Python】Python實用代碼片段_腳手架

原創

huanqing2010

2020-04-14 09:37

網絡例子很多，但或大或小，亦或繁瑣，不易看懂，也無法抽離複用。

此均個人實踐後，將原子化的代碼片段，記錄於此。日後簡單拼接，更爲實用。

創建：2020年4月5日。

更新：2020年4月5日。

1 目錄及文件操作

1.1 遍歷多層級目錄 os.walk()

'''
注意：自帶遞歸，無限遍歷。每次的 root 均代表當前目錄，files 代表當前目錄下的全部文件
dirs 代表當前目錄下的全部目錄。
'''
import os
path = r'c:\code\py'

for root, dirs, files in os.walk(path):
	for name in files:
		print(os.path.join(root, name))

	for name in dirs:
		print( os.path.join(root, name))

1.2 正則表達式替換多行文本 re.sub()

比 str.replace(re_patten, string) 更強大的字符串正則替換。

1. re.sub() 詳細參數解釋參考見此，官網英文函數說明見此。

2. 簡潔全面的正則表達式介紹。

3. Python官網正則表達式的說明：Regular Expression Syntax。

# 使用 re.sub 正則表達式進行多行替換

import re

inputStr = '''
	<tag>
		<hello>
			<this is a string>
		</hello>
	</tag>
	'''

''' 替換 hello 標籤及子元素 '''
# 更嚴謹的做法 
pattern = re.compile(r'<hello>.*</hello>' ,re.S)
# 不太嚴謹的做法: pattern = r'<hello>.*\n.*\n.*</hello>'   
newTxt = r'<hello class="456"></hello>'
rst = re.sub(pattern, newTxt, inputStr)
print(rst)

'''
輸出：
	<tag>
		<hello class="456"></hello>
	</tag>
'''

'''
警告：
若後面多次出現 <hello> </hello>,
該正則會從第一個<hello>一直匹配到最後一個 </hello>
請注意。
'''

正則表達式簡單說明：

若需要 . 匹配換行符，需要使用 re.S 模式，即
pattern = re.compile(r'<hello>.*</hello>' ,re.S)

此方法更爲嚴謹，無論 hello 標籤內含有多少行內容，均可符合正則條件。

而 pattern = r'<hello>.*\n.*\n.*</hello>' 則比較死板，當出現 hello 標籤內部元素不止一行時，便會出錯。

.* 的 . 表示該行的任意字符，* 表示任意多個；\n 表示換行符。
<hello>.*\n.*\n.*</hello> 這個正則模式的含義就是要找到符合以下要求的內容：
<hello> + 該行後面的所有任意字符 + 換行符 + 第二行的所有任意字符 + 第二行換行符 + 第三行的前面所有任意字符直到 </hello>。

import re

inputStr = '''
	<tag>
		<hello>
			<this is a string>
		</hello>
	</tag>
	'''

'''僅替換 <this is a string> 中的 string 爲 newstr'''

pattern = re.compile(r'(<hello>.*<this is a )string(>.*</hello>)', re.S)
newTxt = r'\g<1>newstr\g<2>'
rst = re.sub(pattern, newTxt, inputStr)
print(rst)


'''
輸出是：

	<tag>
		<hello>
			<this is a newstr>
		</hello>
	</tag>
	
'''

正則表達式簡單說明：
pattern 中的括號表示分組捕獲，多個括號自動從1分組，可交由替換串（參數2 newTxt）索引使用，用於保留被替換串的部分內容。
newTxt 中的 \g<1> 表示輸出捕獲的第1個分組（即pattern中的第一個括號內容)，\g<2>表示輸出捕獲的第2個分組（即pattern中的第二個括號內容）

簡單理解方法：先把pattern用正則表達式表示出來，再把需要留用的內容用括號括起來。

re.sub() 共有5個參數。其中三個必選參數：pattern, repl, string；兩個可選參數：count, flags。可自查手冊。

1.3 讀取中文文本文件

建議使用 with 語法，省去手動 close() 文件，更安全。

f = open(filename, 'r', encoding='utf-8')
cnt = f.read()
f.close()

# 注意：必須加 encoding= ，否則參數不匹配報錯
# 函數原型 open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

#-----------------------------------------------------------

'''
讀取文件，建議使用 with 關鍵詞，可自動關閉文件。
無需手動 close() 文件
'''
with open(filename, 'r', encoding='utf-8') as f:
    cnt = f.read()

參考：Python.org 函數 open() 說明。

1.4 Python替換文件(部分)內容


f = open(filename, 'r+', encoding='utf-8')

cnt = f.read()

replaceTxt = cnt.replace(.....)

f.seek(0)        #指示符定位到文件開始
f.truncate()    #清空文件

f.write(replaceTxt)

# 注意：必須設置 seek(0)，否則出現意想不到的錯誤。

若未設置seek(0)，運行結果可能與預期不一致，參考此文。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

【Python】Python實用代碼片段_腳手架

1 目錄及文件操作

1.1 遍歷多層級目錄 os.walk()

1.2 正則表達式替換多行文本 re.sub()

1.3 讀取中文文本文件

1.4 Python替換文件(部分)內容

Python 爬蟲：Spring Boot 反爬蟲的成功案例

京東科技數字化營銷能力的演進與最佳實踐| 京東雲技術團隊

【Python】Python3基本語法-寫給Java程序員

【版權】Java Web系統的軟件版權風險——使用開源軟件用於商業系統的風險

【Java-Spring】轉-Spring AOP是什麼?你都拿它做什麼?

【OJ經驗】如何有效地做算法題

【神經網絡】神經網絡中的矩陣的理解

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結