最近老是要做比对Report的工作,有个想法写个Python小程序来比对两个文件。
以前只用过xlrd库,处理的是xls文件,做一些简单的数据处理
这次在写小工具的同时也学习一下新的东西
由于报表是Office 2007类型(也就是Xlsx类型的)
上网查了下,决定采用openpyxl
到官网上按着流程装了一发
由于要拖BitBucket的源码,我还先装了SourceTree(别怕麻烦
据说SourceTree还有Git的功能,下次体验一下。
Openpyxl装好以后,直接在Python里面
from openpyxl import *
就可以直接使用了
Openxyxl还提供了一个很棒的教程(不过有点太简单了
而且这个库各种文档都不是特别详细…有什么都要跑去看它的源码是怎么做的...
最终代码如下
from openpyxl import *
def compareExcel(ename1, ename2):
print "------------------------------------"
print "Comparing",ename1,ename2
# A bool to verify if 2 xlsx is the same.
fileSame = True
# Load workbook, and get sheetname
wb1 = load_workbook(filename = ename1)
wb2 = load_workbook(filename = ename2)
sn1 = wb1.get_sheet_names()
sn2 = wb2.get_sheet_names()
if (sn1 != sn2):
print "Two file has different sheets."
print ename1,"has sheet names:", sn1
print ename2,"has sheet names:", sn2
else:
sn = sn1
for wsn in sn:
# Get worksheet
ws1 = wb1.get_sheet_by_name(name = wsn)
ws2 = wb2.get_sheet_by_name(name = wsn)
c = ws1.get_highest_column()
r = ws1.get_highest_row()
# This can be replace by ws1.get_dimension
if ((ws2.get_highest_column()!= c) or (ws2.get_highest_row() != r)):
print "DIFFERDENT at SHEET-",wsn,": Rows or columns not the same!"
fileSame = False
else:
#Compare every cell.
flag = True
for i in range(1,r+1):
for j in range(1,c+1):
c1 = ws1.cell(None,i,j)
c2 = ws2.cell(None,i,j)
if (c1):
if (c2):
if (c1.value != c2.value):
if ((wsn == "Internal Info") and ((i == 4) and (j == 2)) or ((i == 5) and (j == 3))):
continue
print "DIFFERDENT_VALUE at SHEET-",wsn,": At (",i,",",j,")",
print "diff FROM",c1.value,"TO",c2.value
flag = False
else:
print "DIFFERDENT_TO_NONE at SHEET-",wsn,": At (",i,",",j,")"
print "diff FROM",c1.value
flag = False
else:
if (c2):
print "DIFFERDENT_TO_NONE at SHEET-",wsn,": At (",i,",",j,")"
print "diff FROM",c2.value
flag = False
fileSame = fileSame and flag
if fileSame:
print "SAME_FILE:", ename1, ename2
print "------------------------------------"
print