1.編譯過程簡介
python是一種解釋性的語言,其原理是將代碼塊按照需求邊運行邊翻譯給機器執行。python運行原理就是首先把python源文件解釋成pyc二進制文件,然後再將pyc文件交由python虛擬機直接運行。但是有時候我們在運行的過程中並沒有pyc文件。通常在默認編譯的情況下,只有被調用庫文件會生成pyc文件保證代碼重用,主文件不會生成pyc文件。我們可以通過使用-m參數生成pyc文件
2.python逆向
在這裏我們不詳述pyc文件格式。由於在pyc文件中對字節碼做了處理。我們使用庫對pyc文件逆向,代碼如下
import dis
import marshal
f=open("printname.pyc","rb")
b_data=f.read()
f.close()
PyCodeObjectData=b_data[8:]
Pyobj=marshal.loads(PyCodeObjectData)
dis.dis(Pyobj)
其中dis是一個python分析二進制代碼的一個重要的庫。我們也可以時候dis庫將源碼換爲字節碼
import dis
def a(c,d):
return c+d
dis.dis(a)
3.使用一道題詳解python字節碼閱讀
我們先看題目的字節碼
Disassembly of a:
3 0 LOAD_CONST 1 (0)
2 BUILD_LIST 1
4 LOAD_GLOBAL 0 (len)
6 LOAD_FAST 0 (s)
8 CALL_FUNCTION 1
10 BINARY_MULTIPLY
12 STORE_FAST 1 (o)
4 14 LOAD_GLOBAL 1 (enumerate)
16 LOAD_FAST 0 (s)
18 CALL_FUNCTION 1
20 GET_ITER
>> 22 FOR_ITER 24 (to 48)
24 UNPACK_SEQUENCE 2
26 STORE_FAST 2 (i)
28 STORE_FAST 3 (c)
5 30 LOAD_FAST 3 (c)
32 LOAD_CONST 2 (2)
34 BINARY_MULTIPLY
36 LOAD_CONST 3 (60)
38 BINARY_SUBTRACT
40 LOAD_FAST 1 (o)
42 LOAD_FAST 2 (i)
44 STORE_SUBSCR
46 JUMP_ABSOLUTE 22
6 >> 48 LOAD_FAST 1 (o)
50 RETURN_VALUE
Disassembly of b:
9 0 LOAD_GLOBAL 0 (zip)
2 LOAD_FAST 0 (s)
4 LOAD_FAST 1 (t)
6 CALL_FUNCTION 2
8 GET_ITER
>> 10 FOR_ITER 22 (to 34)
12 UNPACK_SEQUENCE 2
14 STORE_FAST 2 (x)
16 STORE_FAST 3 (y)
10 18 LOAD_FAST 2 (x)
20 LOAD_FAST 3 (y)
22 BINARY_ADD
24 LOAD_CONST 1 (50)
26 BINARY_SUBTRACT
28 YIELD_VALUE
30 POP_TOP
32 JUMP_ABSOLUTE 10
>> 34 LOAD_CONST 0 (None)
36 RETURN_VALUE
Disassembly of c:
13 0 LOAD_CONST 1 (<code object <listcomp> at 0x7ff31a16f0e0, file "vuln.py", line 13>)
2 LOAD_CONST 2 ('c.<locals>.<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_FAST 0 (s)
8 GET_ITER
10 CALL_FUNCTION 1
12 RETURN_VALUE
Disassembly of <code object <listcomp> at 0x7ff31a16f0e0, file "vuln.py", line 13>:
13 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 12 (to 18)
6 STORE_FAST 1 (c)
8 LOAD_FAST 1 (c)
10 LOAD_CONST 0 (5)
12 BINARY_ADD
14 LIST_APPEND 2
16 JUMP_ABSOLUTE 4
>> 18 RETURN_VALUE
Disassembly of e:
16 0 LOAD_CONST 1 (<code object <listcomp> at 0x7ff31a16f240, file "vuln.py", line 16>)
2 LOAD_CONST 2 ('e.<locals>.<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_FAST 0 (s)
8 GET_ITER
10 CALL_FUNCTION 1
12 STORE_FAST 0 (s)
17 14 LOAD_CONST 3 (<code object <listcomp> at 0x7ff31a16f2f0, file "vuln.py", line 17>)
16 LOAD_CONST 2 ('e.<locals>.<listcomp>')
18 MAKE_FUNCTION 0
20 LOAD_GLOBAL 0 (b)
22 LOAD_GLOBAL 1 (a)
24 LOAD_FAST 0 (s)
26 CALL_FUNCTION 1
28 LOAD_GLOBAL 2 (c)
30 LOAD_FAST 0 (s)
32 CALL_FUNCTION 1
34 CALL_FUNCTION 2
36 GET_ITER
38 CALL_FUNCTION 1
40 STORE_FAST 1 (o)
18 42 LOAD_GLOBAL 3 (bytes)
44 LOAD_FAST 1 (o)
46 CALL_FUNCTION 1
48 RETURN_VALUE
Disassembly of <code object <listcomp> at 0x7ff31a16f240, file "vuln.py", line 16>:
16 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 12 (to 18)
6 STORE_FAST 1 (c)
8 LOAD_GLOBAL 0 (ord)
10 LOAD_FAST 1 (c)
12 CALL_FUNCTION 1
14 LIST_APPEND 2
16 JUMP_ABSOLUTE 4
>> 18 RETURN_VALUE
Disassembly of <code object <listcomp> at 0x7ff31a16f2f0, file "vuln.py", line 17>:
17 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (c)
8 LOAD_FAST 1 (c)
10 LOAD_CONST 0 (5)
12 BINARY_XOR
14 LOAD_CONST 1 (30)
16 BINARY_SUBTRACT
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
Disassembly of main:
21 0 LOAD_GLOBAL 0 (input)
2 LOAD_CONST 1 ('Guess?')
4 CALL_FUNCTION 1
6 STORE_FAST 0 (s)
22 8 LOAD_CONST 2 (b'\xae\xc0\xa1\xab\xef\x15\xd8\xca\x18\xc6\xab\x17\x93\xa8\x11\xd7\x18\x15\xd7\x17\xbd\x9a\xc0\xe9\x93\x11\xa7\x04\xa1\x1c\x1c\xed')
10 STORE_FAST 1 (o)
23 12 LOAD_GLOBAL 1 (e)
14 LOAD_FAST 0 (s)
16 CALL_FUNCTION 1
18 LOAD_FAST 1 (o)
20 COMPARE_OP 2 (==)
22 POP_JUMP_IF_FALSE 34
24 24 LOAD_GLOBAL 2 (print)
26 LOAD_CONST 3 ('Correct!')
28 CALL_FUNCTION 1
30 POP_TOP
32 JUMP_FORWARD 8 (to 42)
26 >> 34 LOAD_GLOBAL 2 (print)
36 LOAD_CONST 4 ('Wrong...')
38 CALL_FUNCTION 1
40 POP_TOP
>> 42 LOAD_CONST 0 (None)
44 RETURN_VALUE
很明顯主函數在結尾,所以我們倒着來看。先看主函數。首先最前面的數字應該是代碼的行號,在同一數字之下的字節碼應該是一行代碼翻譯出來的。然後在同一行字節碼我們可以倒着看。看第21行,最後一行“STORE_FAST 0 (s)”表明結果是存放在臨時變量s,向上看一行是“CALL_FUNCTION 1”表明調用了一個函數,函數有一個參數。再向上看一行“LOAD_CONST 1 ('Guess?')”表明函數參數是“Guess”。第一行“LOAD_GLOBAL 0 (input)”表明這裏調用了一個函數input。綜上所述這一句應該是s=input('Guess?')。第22行是將一串字符付給o,即o=b'\xae\xc0\xa1\xab\xef\x15\xd8\xca\x18\xc6\xab\x17\x93\xa8\x11\xd7\x18\x15\xd7\x17\xbd\x9a\xc0\xe9\x93\x11\xa7\x04\xa1\x1c\x1c\xed'。第23行裏最後一句“POP_JUMP_IF_FALSE 34”表明這是一個條件語句,當條件成立時向下執行,條件不成立將調香34。這裏的34不是行數,而是字節碼之前的標記。倒着向上看首先是一個"==",再往上是一個變量o,在運算符的右側,再往上是一個函數調用,且有一個參數——e(s)。所以這一句是if e(s)==0:。第24行最後一句是“JUMP_FORWARD 8 (to 42)”,就是if結束後跳到42處執行。剩下的可以看出這一句是print("Correct!")。第26行最後兩行是“LOAD_CONST 0 (None)”和“RETURN_VALUE”。表明該函數return None,即不返回。之上可以易知是"else: print('Wrong...')"。所以mian函數反編譯結果如下:
s=input("Guess!")
o=b'\xae\xc0\xa1\xab\xef\x15\xd8\xca\x18\xc6\xab\x17\x93\xa8\x11\xd7\x18\x15\xd7\x17\xbd\x9a\xc0\xe9\x93\x11\xa7\x04\xa1\x1c\x1c\xed'
if e(s)==o:
print('Correct!')
else:
print('Wrong...')
倒上去看函數e的第16行,首先我們發現結果存在s中,然後將一個函數帶着參數s返回。結合看函數,發現這是做一個循環,將每一個元素c轉換爲ord(c)返回。所以我們猜測這裏應該是s=[ord(c) for c in s]。第17行類似於上一行,不同的是函數裏的參數,首先第一個函數b有兩個參數,每個參數又在調用一個有一個參數s的函數a,c。所以這裏應該是o=[(c^5)-30 for c in b(a(s),c(s))]。第27行是return bytes(o)。所以函數e的反編譯結果如下
def e(s):
s=[ord(c) for c in s]
o=[(c^5)-30 for c in b(a(s),c(s))]
return bytes(o)
向上看c函數,根據前面所述,這裏反彙編比較容易,,結果如下:
def c(s):
return [(c+5) for c in s]
再向上看b函數,先看第九行後幾句“STORE_FAST 2 (x) STORE_FAST 3 (y) UNPACK_SEQUENCE 2“表明最後結果存在(x,y),向上看”FOR_ITER 22 (to 34)說明在22-34有一個循環。再向上看“LOAD_GLOBAL 0 (zip) LOAD_FAST 0 (s) LOAD_FAST 1 (t) CALL_FUNCTION 2 GET_ITER”說明從zip(s,t)中取元素,所以第9行的反彙編結果是“for (x,y) in zip(s,t):”在看第10行。先看最後兩句,說明這個函數沒有return返回。在向上看進入循環體——向上看”YIELD_VALUE",說明使用yield返回,之上是一個表達式——x+y-50。所以b函數逆向結果如下:
def b(s,t):
for (x,y) in zip(s,t):
yield x+y-50
看最後一個函數a,可以利用前面的經驗,很輕鬆對函數a反彙編,結果如下:
def a(s):
o=[0]*len(s)
for i,c in enumerate(s):
o[i]=c*2-60
return o
所以完整的反彙編結果如下:
def a(s):
o=[0]*len(s)
for i,c in enumerate(s):
o[i]=c*2-60
return o
def b(s,t):
for (x,y) in zip(s,t):
yield x+y-50
def c(s):
return [(c+5) for c in s]
def e(s):
s=[ord(c) for c in s]
o=[(c^5)-30 for c in b(a(s),c(s))]
return bytes(o)
s=input("Guess!")
o=b'\xae\xc0\xa1\xab\xef\x15\xd8\xca\x18\xc6\xab\x17\x93\xa8\x11\xd7\x18\x15\xd7\x17\xbd\x9a\xc0\xe9\x93\x11\xa7\x04\xa1\x1c\x1c\xed'
if e(s)==o:
print('Correct!')
else:
print('Wrong...')
我們對該反彙編代碼逆向,代碼如下:
o=b'\xae\xc0\xa1\xab\xef\x15\xd8\xca\x18\xc6\xab\x17\x93\xa8\x11\xd7\x18\x15\xd7\x17\xbd\x9a\xc0\xe9\x93\x11\xa7\x04\xa1\x1c\x1c\xed'
ll=[]
for i in o:
ll.append((((int(i.encode("hex"),16)+30)^5)+50+55)//3)
m=""
for ii in ll:
m=m+chr(ii)
print(m)