androguard解析指令及字節碼

androguard解析指令及字節碼

解析字節碼是常用到的一個需求,被解析出來的字節碼可以用於多種用途,例如數值分析、機器學習等。

所謂的字節碼:在 Java 語言中中引入了虛擬機的概念,即在機器和編譯程序之間加入了一層抽象的虛擬的機器。這臺虛擬的機器在任何平臺上都提供給編譯程序一個的共同的接口。編譯程序只需要面向虛擬機,生成虛擬機能夠理解的代碼,然後由解釋器來將虛擬機代碼轉換爲特定系統的機器碼執行。在 Java 中,這種供虛擬機理解的代碼叫做字節碼(即擴展名爲 .class 的文件),它不面向任何特定的處理器,只面向虛擬機。每一種平臺的解釋器是不同的,但是實現的虛擬機是相同的。Java 源程序經過編譯器編譯後變成字節碼,字節碼由虛擬機解釋執行,虛擬機將每一條要執行的字節碼送給解釋器,解釋器將其翻譯成特定機器上的機器碼,然後在特定的機器上運行。這也就是解釋了 Java 的編譯與解釋並存的特點。

採用字節碼的好處:Java 語言通過字節碼的方式,在一定程度上解決了傳統解釋型語言執行效率低的問題,同時又保留了解釋型語言可移植的特點。所以 Java 程序運行時比較高效,而且,由於字節碼並不專對一種特定的機器,因此,Java程序無須重新編譯便可在多種不同的計算機上運行。

每種方法的字節碼都存儲在 Dalvik 文件中。Androguard 可以提供三種不同形式獲取字節碼的方法。

字節碼是以16位爲單位構造的,但是Androguard將使用8位單位來顯示字節碼。 如果在字節碼中給出了偏移量,則也以字節表示。同樣的所有索引均以字節長度提供。

由於Dalvik與Java密切相關,因此所有整數值都表示爲帶符號的“ int”(32位值)或“ long”(64位)。
值以十進制或十六進制表示。如果值爲十六進制,則該值後綴爲“ h”,即“ f7a0h”或“ 63392”。

Getting the raw bytecode

要想獲取方法的字節碼的原始字節表示,第一步仍然是先要加載要測試的 APK 文件

ubuntu@ubuntu:~$ androguard analyze /home/ubuntu/Desktop/meeting.apk 
Please be patient, this might take a while.
Found the provided file is of type 'APK'
[INFO    ] androguard.apk: Starting analysis on AndroidManifest.xml
[INFO    ] androguard.apk: APK file was successfully validated!
[INFO    ] androguard.analysis: Adding DEX file version 35
[INFO    ] androguard.analysis: Reading bytecode took : 0min 00s
[INFO    ] androguard.analysis: Adding DEX file version 35
[INFO    ] androguard.analysis: Reading bytecode took : 0min 00s
[INFO    ] androguard.analysis: End of creating cross references (XREF) run time: 0min 00s
Added file to session: SHA256::689673bed0f4d6121a63f3c9fd88efb538ec316561d426120c440d8be89f6256
Loaded APK file...
>>> a
<androguard.core.bytecodes.apk.APK object at 0x7f3b27bb8390>
>>> d
[<androguard.core.bytecodes.dvm.DalvikVMFormat object at 0x7f3b1ae9d978>, <androguard.core.bytecodes.dvm.DalvikVMFormat object at 0x7f3b1ae36b00>]
>>> dx
<analysis.Analysis VMs: 2, Classes: 85, Methods: 340, Strings: 122>

Androguard version 3.4.0a1 started

然後通過 python 編程獲取所有方法的字節表示:

In [1]: for method in dx.get_methods():
   ...:     if method.is_external():
   ...:         continue
   ...:     m = method.get_method()
   ...:     if m.get_code():
   ...:         print(m.get_code().get_bc().get_raw())

其輸出結果中將包含很多的二進制數據,如下:

bytearray(b'p\x10\t\x00\x00\x00\x0e\x00')
bytearray(b'p\x10\t\x00\x00\x00\x0e\x00')
bytearray(b'p\x10\xf3\x00\x00\x00\x0e\x00')
bytearray(b'\x12\x01i\x01E\x00\x1a\x00C\x01i\x00C\x00\x1a\x00\xa1\x02i\x00F\x00i\x01D\x00\x0e\x00')
bytearray(b'p\x10\x00\x00\x00\x00\x0e\x00')
bytearray(b'\x1d\x02b\x00C\x00\x1a\x01\x08\x00n \x03\x01\x10\x00\n\x008\x00\x1e\x00"\x00G\x00o\x10\x03\x00\x02\x00\x0c\x01q\x10\x06\x01\x01\x00\x0c\x01p \t\x01\x10\x00b\x01C\x00n \x0f\x01\x10\x00\x0c\x00n\x10\x11\x01\x00\x00\x0c\x00i\x00C\x00\x12\x10\x1e\x02\x0f\x00b\x00C\x00\x1a\x01\x08\x00n \xfe\x00\x10\x00\n\x00;\x00\xf5\xff"\x00G\x00o\x10\x03\x00\x02\x00\x0c\x01q\x10\x06\x01\x01\x00\x0c\x01p \t\x01\x10\x00\x1a\x01\x08\x00n \x0f\x01\x10\x00\x0c\x00b\x01C\x00n \x0f\x01\x10\x00\x0c\x00n\x10\x11\x01\x00\x00\x0c\x00i\x00C\x00(\xd4\r\x00\x1e\x02\'\x00')

Getting disassembled instructions

編寫代碼獲取反彙編形式

In [2]: for method in dx.get_methods():
   ...:      if method.is_external():
   ...:          continue
   ...:      m = method.get_method()
   ...:      for idx, ins in m.get_instructions_idx():
   ...:          print(idx, ins.get_op_value(), ins.get_name(), ins.get_output())
   ...: 

輸出爲

0 112 invoke-direct v0, Ljava/lang/Object;-><init>()V
6 14 return-void 
0 112 invoke-direct v0, Ljava/lang/Object;-><init>()V
6 14 return-void 
0 18 const/4 v1, 0
2 105 sput-object v1, Lcom/wrapper/proxyapplication/WrapperProxyApplication;->shellApp Landroid/app/Application;

如果想根據具體的類名和方法名獲取其反彙編形式,可以採取以下做法

In [3]: for m in dx.find_methods("Lcom/tencent/wemeet/app/MyWrapperProxyApplication;"):
   ...:      print(m.full_name)
   ...:      for idx, ins in m.get_method().get_instructions_idx():
   ...:         print(idx, ins.get_op_value(), ins.get_name(), ins.get_output())
   ...:         

輸出中將會看到

Lcom/tencent/wemeet/app/MyWrapperProxyApplication; <init> ()V
0 112 invoke-direct v0, Lcom/wrapper/proxyapplication/WrapperProxyApplication;-><init>()V
6 14 return-void 
Lcom/tencent/wemeet/app/MyWrapperProxyApplication; initProxyApplication (Landroid/content/Context;)V
0 110 invoke-virtual v7, Landroid/content/Context;->getApplicationInfo()Landroid/content/pm/ApplicationInfo;
6 12 move-result-object v4
8 84 iget-object v0, v4, Landroid/content/pm/ApplicationInfo;->sourceDir Ljava/lang/String;
12 18 const/4 v1, 0
14 34 new-instance v2, Ljava/util/zip/ZipFile;
18 112 invoke-direct v2, v0, Ljava/util/zip/ZipFile;-><init>(Ljava/lang/String;)V
24 7 move-object v1, v2
26 57 if-nez v1, +00dh
30 113 invoke-static Landroid/os/Process;->myPid()I
36 10 move-result v4
38 113 invoke-static v4, Landroid/os/Process;->killProcess(I)V
44 18 const/4 v4, 0
46 113 invoke-static v4, Ljava/lang/System;->exit(I)V
52 113 invoke-static v7, v1, Lcom/wrapper/proxyapplication/Util;->PrepareSecurefiles(Landroid/content/Context; Ljava/util/zip/ZipFile;)I
58 110 invoke-virtual v1, Ljava/util/zip/ZipFile;->close()V
64 98 sget-object v4, Lcom/wrapper/proxyapplication/Util;->CPUABI Ljava/lang/String;
68 26 const-string v5, "x86"
72 51 if-ne v4, v5, +031h
76 34 new-instance v4, Ljava/lang/StringBuilder;
80 110 invoke-virtual v7, Landroid/content/Context;->getFilesDir()Ljava/io/File;
86 12 move-result-object v5
88 110 invoke-virtual v5, Ljava/io/File;->getAbsolutePath()Ljava/lang/String;
94 12 move-result-object v5
96 113 invoke-static v5, Ljava/lang/String;->valueOf(Ljava/lang/Object;)Ljava/lang/String;
102 12 move-result-object v5
104 112 invoke-direct v4, v5, Ljava/lang/StringBuilder;-><init>(Ljava/lang/String;)V
110 26 const-string v5, "/prodexdir/"
114 110 invoke-virtual v4, v5, Ljava/lang/StringBuilder;->append(Ljava/lang/String;)Ljava/lang/StringBuilder;
120 12 move-result-object v4
122 98 sget-object v5, Lcom/wrapper/proxyapplication/Util;->libname Ljava/lang/String;
126 110 invoke-virtual v4, v5, Ljava/lang/StringBuilder;->append(Ljava/lang/String;)Ljava/lang/StringBuilder;
132 12 move-result-object v4
134 110 invoke-virtual v4, Ljava/lang/StringBuilder;->toString()Ljava/lang/String;
140 12 move-result-object v4
142 113 invoke-static v4, Ljava/lang/System;->load(Ljava/lang/String;)V
148 14 return-void 
150 13 move-exception v3
152 110 invoke-virtual v3, Ljava/io/IOException;->printStackTrace()V
158 40 goto -42h
160 13 move-exception v3
162 110 invoke-virtual v3, Ljava/io/IOException;->printStackTrace()V
168 40 goto -34h
170 98 sget-object v4, Lcom/wrapper/proxyapplication/Util;->libname Ljava/lang/String;
174 113 invoke-static v4, Ljava/lang/System;->loadLibrary(Ljava/lang/String;)V
180 40 goto -10h
Lcom/tencent/wemeet/app/MyWrapperProxyApplication; onCreate ()V
0 111 invoke-super v0, Lcom/wrapper/proxyapplication/WrapperProxyApplication;->onCreate()V
6 14 return-void 

Get processed bytecode from decompiler

In [9]: for method in dx.get_methods():
   ...:     if method.is_external():
   ...:         continue
   ...:     m = method.get_method()
   ...:     print(m.source())
   ...:     

通過上述代碼,可以輸出所有方法的源碼,輸出舉例如下:

	private declared_synchronized boolean Fixappname()
    {
        try {
            if (!com.wrapper.proxyapplication.WrapperProxyApplication.className.startsWith(.)) {
                if (com.wrapper.proxyapplication.WrapperProxyApplication.className.indexOf(.) < 0) {
                    com.wrapper.proxyapplication.WrapperProxyApplication.className = new StringBuilder(String.valueOf(super.getPackageName())).append(.).append(com.wrapper.proxyapplication.WrapperProxyApplication.className).toString();
                }
            } else {
                com.wrapper.proxyapplication.WrapperProxyApplication.className = new StringBuilder(String.valueOf(super.getPackageName())).append(com.wrapper.proxyapplication.WrapperProxyApplication.className).toString();
            }
        } catch (String v0_11) {
            throw v0_11;
        }
        return 1;
    }

也可以使用 DAD 編譯抽象語法樹(AST),AST 可以輕鬆地用於對代碼本身進行分析。其方法如下:

from pprint import pprint
    from androguard.decompiler.dad.decompile import DvMethod
    for method in dx.get_methods():
        if method.is_external():
            continue
        dv = DvMethod(method)
        dv.process(doAST=True)
        pprint(dv.get_ast())

其輸出形式爲:

{'body': ['BlockStatement',
          None,
          [['LocalDeclarationStatement',
            None,
            [['TypeName', ('.int', 0)], ['Local', 'v1_0']]],
           ['LocalDeclarationStatement',
            ['ClassInstanceCreation',
             (java/io/File, <init>, (Ljava/lang/String;)V),
             [['Local', 'p5']],
             ['TypeName', (java/io/File, 0)]],
            [['TypeName', (java/io/File, 0)], ['Local', 'v0_1']]],
           ['IfStatement',
            None,
            ['BinaryInfix',
             [['Parenthesis',
               [['Unary',
                 [['MethodInvocation',
                   [['Local', 'v0_1']],
                   (java/io/File, exists, ()Z),
                   exists,
                   True]],
                 '!',
                 False]]],
              ['Parenthesis',
               [['BinaryInfix',
                 [['MethodInvocation',
                   [['Local', 'v0_1']],
                   (java/io/File, length, ()J),
                   length,
                   True],
                  ['Local', 'p6']],
                 '!=']]]],
             '||'],
            [['BlockStatement',
              None,
              [['ExpressionStatement',
                ['Assignment',
                 [['Local', 'v1_0'], ['Literal', '0', ('.int', 0)]],
                 '']]]],
             ['BlockStatement',
              None,
              [['ExpressionStatement',
                ['Assignment',
                 [['Local', 'v1_0'], ['Literal', '1', ('.int', 0)]],
                 '']]]]]],
           ['ReturnStatement', ['Local', 'v1_0']]]],
 'comments': [],
 'flags': ['private', 'static'],
 'params': [[['TypeName', (java/lang/String, 0)], ['Local', 'p5']],
            [['TypeName', ('.long', 0)], ['Local', 'p6']]],
 'ret': ['TypeName', ('.boolean', 0)],
 'triple': (com/wrapper/proxyapplication/Util,
            isFileValid,
            (Ljava/lang/String;J)Z)}

以上 AST 等價於下面的源代碼

private static boolean isFileValid(String p5, long p6)
    {
        int v1_0;
        java.io.File v0_1 = new java.io.File(p5);
        if ((!v0_1.exists()) || (v0_1.length() != p6)) {
            v1_0 = 0;
        } else {
            v1_0 = 1;
        }
        return v1_0;
    }

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章