Python讀取PDF的兩種方式

 首先要安裝庫:

pip install pdfminer3

代碼很簡單: 

from urllib.request import urlopen
from pdfminer.pdfinterp import PDFResourceManager, process_pdf
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from io import StringIO


def readPDF(pdfFile):
    rsrcmgr = PDFResourceManager()
    retstr = StringIO()
    laparams = LAParams()
    device = TextConverter(rsrcmgr, retstr, laparams=laparams)
    process_pdf(rsrcmgr, device, pdfFile)
    device.close()
    content = retstr.getvalue()
    retstr.close()
    return content


pdfFile = open("Python編程:從入門到實踐.pdf", 'rb')
# pdfFile = urlopen("http://pythonscraping.com/pages/warandpeace/chapter1.pdf")
outputString = readPDF(pdfFile)
print(outputString)
pdfFile.close()

如果要通過url獲取,只需要把:

# pdfFile = urlopen("http://pythonscraping.com/pages/warandpeace/chapter1.pdf")

這行代碼的註釋去除即可……

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章