Python用python-docx讀寫word文檔

python-docx庫可用於創建和編輯Microsoft Word(.docx)文件。官方文檔:https://python-docx.readthedocs.io/en/latest/index.html


備註:doc是微軟的專有的文件格式,docx是Microsoft Office2007之後版本使用,其基於Office Open XML標準的壓縮文件格式,比doc文件所佔用空間更小。docx格式的文件本質上是一個ZIP文件,所以其實也可以把.docx文件直接改成.zip,解壓後,裏面的word/document.xml包含了Word文檔的大部分內容,圖片文件則保存在word/media裏面。python-docx不支持.doc文件,間接解決方法是在代碼裏面先把.doc轉爲.docx。

一、安裝包

`pip3 install python-docx``
二、創建word文檔

下面是在官文示例基礎上對個別地方稍微修改,並加上函數的使用說明
****from docx import Document
from docx.shared import Inches
document = Document()
#添加標題,並設置級別,範圍:0 至 9,默認爲1
document.add_heading('Document Title', 0)
#添加段落,文本可以包含製表符(\t)、換行符(\n)或回車符(\r)等
p = document.add_paragraph('A plain paragraph having some ')
#在段落後面追加文本,並可設置樣式
p.add_run('bold').bold = True
p.add_run(' and some ')
p.add_run('italic.').italic = True
document.add_heading('Heading, level 1', level=1)
document.add_paragraph('Intense quote', style='Intense Quote')
#添加項目列表(前面一個小圓點)
document.add_paragraph(
'first item in unordered list', style='List Bullet'
)
document.add_paragraph('second item in unordered list', style='List Bullet')
#添加項目列表(前面數字)
document.add_paragraph('first item in ordered list', style='List Number')
document.add_paragraph('second item in ordered list', style='List Number')
#添加圖片
document.add_picture('monty-truth.png', width=Inches(1.25))
records = (
(3, '101', 'Spam'),
(7, '422', 'Eggs'),
(4, '631', 'Spam, spam, eggs, and spam')
)
#添加表格:一行三列
# 表格樣式參數可選:
# Normal Table
# Table Grid
# Light Shading、 Light Shading Accent 1 至 Light Shading Accent 6
# Light List、Light List Accent 1 至 Light List Accent 6
# Light Grid、Light Grid Accent 1 至 Light Grid Accent 6
# 太多了其它省略...
table = document.add_table(rows=1, cols=3, style='Light Shading Accent 2')
#獲取第一行的單元格列表
hdr_cells = table.rows[0].cells
#下面三行設置上面第一行的三個單元格的文本值
hdr_cells[0].text = 'Qty'
hdr_cells[1].text = 'Id'
hdr_cells[2].text = 'Desc'
for qty, id, desc in records:
#表格添加行,並返回行所在的單元格列表
row_cells = table.add_row().cells
row_cells[0].text = str(qty)
row_cells[1].text = id
row_cells[2].text = desc
document.add_page_break()
#保存.docx文檔
document.save('demo.docx')**

創建的demo.docx內容如下:
Python用python-docx讀寫word文檔
三、讀取word文檔

from docx import Document
doc = Document('demo.docx')
#每一段的內容
for para in doc.paragraphs:
print(para.text)
#每一段的編號、內容
for i in range(len(doc.paragraphs)):
print(str(i), doc.paragraphs[i].text)
#表格
tbs = doc.tables
for tb in tbs:
#行
for row in tb.rows:
#列
for cell in row.cells:
print(cell.text)
#也可以用下面方法
'''text = ''
for p in cell.paragraphs:
text += p.text
print(text)'''
運行結果:

Document Title
A plain paragraph having some bold and some italic.
Heading, level 1
Intense quote
first item in unordered list
second item in unordered list
first item in ordered list
second item in ordered list
Document Title
A plain paragraph having some bold and some italic.
Heading, level 1
Intense quote
first item in unordered list
second item in unordered list
first item in ordered list
second item in ordered list
Qty
Id
Desc
101
Spam
422
Eggs
631
Spam, spam, eggs, and spam
[Finished in 0.2s]

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章