【DynamoRIO 入門教程】四： inc2add.c

例子中的註釋寫到：執行動態優化，在不干擾目標應用程序行爲的情況下, 只要有價值和可行, 就將 “inc” 指令轉換爲 “add 1”。說明在已知底層處理器時最好的做法是在運行時執行微體系結構特定優化。可見這個例子和上一個 div.c 的區別就是這次我們實實在在進行了代碼替換和優化。

老規矩，先看 dr_client_main

DR_EXPORT void
dr_client_main(client_id_t id, int argc, const char *argv[])
{
    /* We only used drreg for liveness, not for spilling, so we need no slots. */
    drreg_options_t ops = { sizeof(ops), 0 /*no slots needed*/, false };
    if (!drmgr_init() || drreg_init(&ops) != DRREG_SUCCESS)
        DR_ASSERT(false);

    /* Register for our events: process exit, and code transformation.
     * We're changing the app's code, rather than just inserting observational
     * instrumentation.
     */
    dr_register_exit_event(event_exit);
    if (!drmgr_register_bb_app2app_event(event_instruction_change, NULL))
        DR_ASSERT(false);

    /* Long ago, this optimization would target the Pentium 4 (identified via
     * "proc_get_family() == FAMILY_PENTIUM_4"), where an add of 1 is faster
     * than an inc.  For illustration purposes we leave a boolean controlling it
     * but we turn it on all the time in this sample and leave it for future
     * work to determine whether to disable it on certain microarchitectures.
     */
    enable = true;

    /* Initialize our global variables. */
    num_examined = 0;
    num_converted = 0;

註釋裏說，這個優化是針對於奔騰4 處理器的，不過我們是爲了學習 DynamoRIO ，不用管這些。

drreg_options_t ops = { sizeof(ops), 0 /no slots needed/, false };
這一行代碼是我們第一次遇見， drreg_options_t 用來初始化 drreg 擴展，可以看到後面的 drreg_init(&ops)。drreg 是一個輔佐管理寄存器的擴展，我們可以利用它獲得一個寄存器的使用權。這裏ops結構體裏的第二個參數 0，用來表示需要的插槽數，因爲本次我們不需要寄存器插槽，因此設爲0。關於該擴展更多的細節我還沒有弄明白，以後再說。

然後就是一堆初始化，把 drmgr 和 drreg 都初始化完成。

dr_register_exit_event(event_exit) 這裏註冊了結束回調函數。

drmgr_register_bb_app2app_event(event_instruction_change, NULL) 也是個關鍵，該函數的作用是爲在 basic block creation 細分化的4個階段裏的第一個階段 “app2app" ，註冊回調函數。這樣在4個階段之前會先調用 event_instruction_change函數。

剩下幾個變量裏，enable 用來表示決定需要優化。

event_exit 退出回調函數

static void
event_exit(void)
{
    char msg[256];
    int len;
    if (enable) {
        len = dr_snprintf(msg, sizeof(msg) / sizeof(msg[0]),
                          "converted %d out of %d inc/dec to add/sub\n", num_converted,
                          num_examined);
    } else {
        len = dr_snprintf(msg, sizeof(msg) / sizeof(msg[0]),
                          "decided to keep all original inc/dec\n");
    }
    DR_ASSERT(len > 0);
    msg[sizeof(msg) / sizeof(msg[0]) - 1] = '\0';
    DISPLAY_STRING(msg);

    if (!drmgr_unregister_bb_app2app_event(event_instruction_change) ||
        drreg_exit() != DRREG_SUCCESS)
        DR_ASSERT(false);
    drmgr_exit();
}

第一部分是用來打印結果的，沒有什麼好說的。
第二部分則是 unregister 函數和 exit 函數。 drreg_exit() 和 drmgr_exit() 都是和 _init()函數相對應的。
關鍵在於那個 drmgr_unregister_bb_app2app_event(event_instruction_change) 函數。
我有了一個疑問，爲什麼有的時候有 basic block 的unregister函數，有的時候沒有呢？答案在 dr_unregister_bb_event() 的文檔裏：

We do not recommend unregistering for the basic block event unless it aways returned DR_EMIT_STORE_TRANSLATIONS (including when for_trace is true, or if the client has a trace creation callback that returns DR_EMIT_STORE_TRANSLATIONS). Unregistering can prevent proper state translation on a later fault or other translation event for this basic block or for a trace that includes this basic block. Instead of unregistering, turn the event callback into a nop.

文檔裏提到，不建議使用 basic block event 的 unregister函數。除非在bb的回調函數裏返回了DR_EMIT_STORE_TRANSLATIONS，不幸的是，我沒有在這個例子裏找到這個返回值。

再看註冊bb回調函數 event_instruction_change

/* Replaces inc with add 1, dec with sub 1.
 * If cannot replace (eflags constraints), leaves original instruction alone.
 */
static dr_emit_flags_t
event_instruction_change(void *drcontext, void *tag, instrlist_t *bb, bool for_trace,
                         bool translating)
{
    int opcode;
    instr_t *instr, *next_instr;

    /* Only bother replacing for hot code, i.e., when for_trace is true, and
     * when the underlying microarchitecture calls for it.
     */
    if (!for_trace || !enable)
        return DR_EMIT_DEFAULT;

    for (instr = instrlist_first_app(bb); instr != NULL; instr = next_instr) {
        /* We're deleting some instrs, so get the next first. */
        next_instr = instr_get_next_app(instr);
        opcode = instr_get_opcode(instr);
        if (opcode == OP_inc || opcode == OP_dec) {
            if (!translating)
                ATOMIC_INC(num_examined);
            if (replace_inc_with_add(drcontext, instr, bb)) {
                if (!translating)
                    ATOMIC_INC(num_converted);
            }
        }
    }

    return DR_EMIT_DEFAULT;
}

註釋裏寫到，只針對 hot code代碼進行優化，所以只有 for_trace = trace，也就是當前 basic block 要加入到trace 裏時，纔對當前basic block 進行優化。

instrlist_first_app(bb) 用來從bb指令序列裏取出第一條指令。
instr_get_next_app(instr) 用來獲取instr指令的下一條指令。
instr_get_opcode(instr) 用來獲取instr指令的操作碼。

然後就是用 replace_inc_with_add 函數來進行替換。等會再說這個函數。

ATOMIC_INC 好像是一個內聯彙編，用來保證原子操作。

還有一處讓我疑惑，只有當 translatign 爲false 時才進行 num_examined和 nm_converted的增加。
translating 爲 false表示本次回調函數是爲了 basic block 的創建，當爲true 時，則表示本次回調函數的調用是因爲地址轉換。這裏的地址轉換我覺得可能是故障地址轉換。

如果因爲地址轉換調用而不進行統計，那爲什麼不把這個 if判斷放在該回調函數的最前面呢？

replace_inc_with_add

/* Replaces inc with add 1, dec with sub 1.
 * Returns true if successful, false if not.
 */
static bool
replace_inc_with_add(void *drcontext, instr_t *instr, instrlist_t *bb)
{
    instr_t *new_instr;
    uint eflags;
    int opcode = instr_get_opcode(instr);

    DR_ASSERT(opcode == OP_inc || opcode == OP_dec);

    /* Add/sub writes CF, inc/dec does not, so we make sure that's ok.
     * We use drreg's liveness analysis, which includes the rest of this block.
     * To be more sophisticated, we could examine instructions at target of each
     * direct exit instead of assuming CF is live across any branch.
     */
    if (drreg_aflags_liveness(drcontext, instr, &eflags) != DRREG_SUCCESS ||
        (eflags & EFLAGS_READ_CF) != 0) {

        return false;
    }
    if (opcode == OP_inc) {

        new_instr =
            INSTR_CREATE_add(drcontext, instr_get_dst(instr, 0), OPND_CREATE_INT8(1));
    } else {

        new_instr =
            INSTR_CREATE_sub(drcontext, instr_get_dst(instr, 0), OPND_CREATE_INT8(1));
    }
    if (instr_get_prefix_flag(instr, PREFIX_LOCK))
        instr_set_prefix_flag(new_instr, PREFIX_LOCK);
    instr_set_translation(new_instr, instr_get_app_pc(instr));
    instrlist_replace(bb, instr, new_instr);
    instr_destroy(drcontext, instr);
    return true;

該函數涉及函數較多，我們仔細說一說。
註釋裏說到，替換後的 add/sub 指令會對標誌寄存器裏的 CF標誌進行操作（注意，CF標誌是進位標誌），而 inc/dec 則不會影響到 CF標誌位。所以，如果我們要把 inc/dec 替換爲 add/sub 則要保證，替換指令位置後面的一系列指令不會對 CF標誌位進行讀操作。所以我們可以看到代碼裏用 drreg 對指令進行了活躍度分析。

if (drreg_aflags_liveness(drcontext, instr, &eflags) != DRREG_SUCCESS ||
        (eflags & EFLAGS_READ_CF) != 0)

drreg_aflagsg_liveness 用來進行活躍度分析，如果執行成功，則返回 DRREG_SUCCESS，並且將分析結果保存在 eflags變量裏面，結果是 EFLAGS_READ_6bits 這樣一個常量。
另外，我們要注意註釋裏這樣一句話：We use drreg’s liveness analysis, which includes the rest of this block
這個分析涵蓋了當前 basic block 的剩餘部分，也就是說，這裏的活躍度分析是從 instr 到 basic block 結束的。

當完成活躍度分析，判斷指令替換後不會對後面的指令照成影響，我們就開始正是進行指令替換。

INSTR_CREATE_add 和 INSTR_CREATE_sub 都是用來創建指令的，分別創建了 add指令，和 sub指令，新的指令是存放再在變量 new_instr 裏面的，還沒有插入到 basic block 裏。

instr_get_prefix_flag(instr, PREFIX_LOCK)
** instr_set_prefix_flag(new_instr, PREFIX_LOCK)**
從函數名來看，這兩個函數用來獲取/設置指令前綴，一開始我不知道指令前綴是啥？查了一下，原來是這樣的：

指令前綴有4種，而且一條指令可以前有多種前綴，每一個前綴佔一個字節，在32位指令裏，前綴種類的排列順序不作規定。它們的名稱和機器碼，分別是：

操作數長度前綴（66H）

地址長度前綴（67H）

段超越前綴（2eH、3eH、26H、64H、65H、36H）

鎖定前綴和重複前綴

那這裏，我們重點關注的就是鎖定前綴，如果原指令存在鎖定前綴，那麼我們就要給新指令添上鎖定前綴，保持一致。

** instr_set_translation(new_instr, instr_get_app_pc(instr))** 設置新指令的轉換地址。
我們把 basic block 從原始可執行文件裏拷貝到 code cache 時，指令的地址必然要發生變化。所以，code cache 裏的每一條指令都有一個轉換地址，這個地址就是原程序裏對應指令的地址。
如今我們創建了一個新指令，也要爲它設置一個轉換地址，儘管原程序裏該地址處的指令可能是 inc/dec ，而新指令是 add/sub 。這樣做可能是爲了以後的故障信息轉換什麼的？
總之，這裏先用 instr_get_app_pc 函數獲取了原指令的app_pc，即應用程序裏的地址。然後將新地址設置給了新指令 new_instr。

instrlist_replace(bb, instr, new_instr); 在basic block 裏將老指令 instr 替換爲新指令 new_instr ，但是不會銷燬老指令。因此後面跟着一個銷燬函數。

instr_destroy(drcontext, instr) 文檔裏寫到：執行instr_free（），然後爲instr_create（）執行的instr釋放線程局部堆存儲。
但由於這裏我們要銷燬的指令是原始的有 DR自己創建的，所以應該只會執行 instr_free() 這一步。

本例子，可以分爲三步：指令查找與篩選，指令創建與準備，指令替換。
其中，第一步的麻煩在於要分析指令快裏標誌寄存器的活躍度，以此來判斷是否可以進行替換。
第二步，設計的函數較多，創建一個新指令時要考慮到指令前綴、轉換地址、操作數等關鍵信息。
第三步都非常簡單，一兩個函數就可以搞定，不要忘記了銷燬舊指令。

完整代碼如下：

#include "dr_api.h"
#include "drmgr.h"
#include "drreg.h"

#ifdef WINDOWS
#    define DISPLAY_STRING(msg) dr_messagebox(msg)
#    define ATOMIC_INC(var) _InterlockedIncrement((volatile LONG *)(&(var)))
#else
#    define DISPLAY_STRING(msg) dr_printf("%s\n", msg);
#    define ATOMIC_INC(var) __asm__ __volatile__("lock incl %0" : "=m"(var) : : "memory")
#endif

static bool enable;

/* Use atomic operations to increment these to avoid the hassle of locking. */
static int num_examined, num_converted;

/* Replaces inc with add 1, dec with sub 1.
 * Returns true if successful, false if not.
 */
static bool
replace_inc_with_add(void *drcontext, instr_t *inst, instrlist_t *trace);

static dr_emit_flags_t
event_instruction_change(void *drcontext, void *tag, instrlist_t *bb, bool for_trace,
                         bool translating);

static void
event_exit(void);

DR_EXPORT void
dr_client_main(client_id_t id, int argc, const char *argv[])
{
    /* We only used drreg for liveness, not for spilling, so we need no slots. */
    drreg_options_t ops = { sizeof(ops), 0 /*no slots needed*/, false };
    //dr_set_client_name("DynamoRIO Sample Client 'inc2add'",
    //                  "http://dynamorio.org/issues");
    if (!drmgr_init() || drreg_init(&ops) != DRREG_SUCCESS)
        DR_ASSERT(false);

    /* Register for our events: process exit, and code transformation.
     * We're changing the app's code, rather than just inserting observational
     * instrumentation.
     */
    dr_register_exit_event(event_exit);
    if (!drmgr_register_bb_app2app_event(event_instruction_change, NULL))
        DR_ASSERT(false);

    /* Long ago, this optimization would target the Pentium 4 (identified via
     * "proc_get_family() == FAMILY_PENTIUM_4"), where an add of 1 is faster
     * than an inc.  For illustration purposes we leave a boolean controlling it
     * but we turn it on all the time in this sample and leave it for future
     * work to determine whether to disable it on certain microarchitectures.
     */
    enable = true;

    /* Initialize our global variables. */
    num_examined = 0;
    num_converted = 0;
}

static void
event_exit(void)
{
    char msg[256];
    int len;
    if (enable) {
        len = dr_snprintf(msg, sizeof(msg) / sizeof(msg[0]),
                          "converted %d out of %d inc/dec to add/sub\n", num_converted,
                          num_examined);
    } else {
        len = dr_snprintf(msg, sizeof(msg) / sizeof(msg[0]),
                          "decided to keep all original inc/dec\n");
    }
    DR_ASSERT(len > 0);
    msg[sizeof(msg) / sizeof(msg[0]) - 1] = '\0';
    DISPLAY_STRING(msg);

    if (!drmgr_unregister_bb_app2app_event(event_instruction_change) ||
        drreg_exit() != DRREG_SUCCESS)
        DR_ASSERT(false);
    drmgr_exit();
}

/* Replaces inc with add 1, dec with sub 1.
 * If cannot replace (eflags constraints), leaves original instruction alone.
 */
static dr_emit_flags_t
event_instruction_change(void *drcontext, void *tag, instrlist_t *bb, bool for_trace,
                         bool translating)
{
    int opcode;
    instr_t *instr, *next_instr;

    /* Only bother replacing for hot code, i.e., when for_trace is true, and
     * when the underlying microarchitecture calls for it.
     */
    if (!for_trace || !enable)
        return DR_EMIT_DEFAULT;

    for (instr = instrlist_first_app(bb); instr != NULL; instr = next_instr) {
        /* We're deleting some instrs, so get the next first. */
        next_instr = instr_get_next_app(instr);
        opcode = instr_get_opcode(instr);
        if (opcode == OP_inc || opcode == OP_dec) {
            if (!translating)
                ATOMIC_INC(num_examined);
            if (replace_inc_with_add(drcontext, instr, bb)) {
                if (!translating)
                    ATOMIC_INC(num_converted);
            }
        }
    }

    return DR_EMIT_DEFAULT;
}

/* Replaces inc with add 1, dec with sub 1.
 * Returns true if successful, false if not.
 */
static bool
replace_inc_with_add(void *drcontext, instr_t *instr, instrlist_t *bb)
{
    instr_t *new_instr;
    uint eflags;
    int opcode = instr_get_opcode(instr);

    DR_ASSERT(opcode == OP_inc || opcode == OP_dec);

    /* Add/sub writes CF, inc/dec does not, so we make sure that's ok.
     * We use drreg's liveness analysis, which includes the rest of this block.
     * To be more sophisticated, we could examine instructions at target of each
     * direct exit instead of assuming CF is live across any branch.
     */
    if (drreg_aflags_liveness(drcontext, instr, &eflags) != DRREG_SUCCESS ||
        (eflags & EFLAGS_READ_CF) != 0) {

        return false;
    }
    if (opcode == OP_inc) {

        new_instr =
            INSTR_CREATE_add(drcontext, instr_get_dst(instr, 0), OPND_CREATE_INT8(1));
    } else {

        new_instr =
            INSTR_CREATE_sub(drcontext, instr_get_dst(instr, 0), OPND_CREATE_INT8(1));
    }
    if (instr_get_prefix_flag(instr, PREFIX_LOCK))
        instr_set_prefix_flag(new_instr, PREFIX_LOCK);
    instr_set_translation(new_instr, instr_get_app_pc(instr));
    instrlist_replace(bb, instr, new_instr);
    instr_destroy(drcontext, instr);
    return true;
}

【DynamoRIO 入門教程】四： inc2add.c

老規矩，先看 dr_client_main

event_exit 退出回調函數

再看註冊bb回調函數 event_instruction_change

replace_inc_with_add

linux安裝cuda和cudnn

模擬手機設備：使用 Playwright 實現移動端自動化測試

Mellanox網卡開啓SR-IOV

測試人員都是畫畫大神，讓我看看誰還不會用代碼圖？

Object.values()對象遍歷

我拍了拍Redis，被移出了羣聊···

網絡現代化通向雲原生應用的高速公路

面試官：說說你對序列化的理解

我宣佈，這是我找到的史上AI最全論文體系！

VS2017 編譯 libpeg

x6dbg配色方案的導入導出

如何在 64位的linux 上安裝32位的庫

Visual Studio “無可用源“ 問題

【DynamoRIO 入門教程】六：inline.c

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結