我的LLVM學習筆記——OLLVM混淆研究之BCF篇

因爲要做代碼保護，所以抽時間研究了下OLLVM中的三種保護方案：BCF（Bogus Control Flow，中文名虛假控制流）、FLA（Control Flow Flattening，中文名控制流平坦化）、SUB（Instructions Substitution，中文名指令替換），本文是BCF介紹篇。

1，查看BCF的頭文件，暴露給外界的兩個函數如下：

// Namespace
namespace llvm {
	Pass *createBogus ();
	Pass *createBogus (bool flag);
}

兩個函數用於創建BCF對應的PASS，這兩個函數主要區別表現在toObfuscate函數上，用於判斷當前函數是否需要進行BCF保護。

2，在OLLVM中，BCF的PASS通過PassManager進行管理，BCF對應的PASS添加和調用參見：//need add

3，BCF調用入口是runOnFunction函數，如下所示：

    virtual bool runOnFunction(Function &F){
      // Check if the percentage is correct
      if (ObfTimes <= 0) {
        errs()<<"BogusControlFlow application number -bcf_loop=x must be x > 0";
		return false;
      }

      // Check if the number of applications is correct
      if ( !((ObfProbRate > 0) && (ObfProbRate <= 100)) ) {
        errs()<<"BogusControlFlow application basic blocks percentage -bcf_prob=x must be 0 < x <= 100";
		return false;
      }
      // If fla annotations
      if(toObfuscate(flag,&F,"bcf")) {
        bogus(F);
        doF(*F.getParent());
        return true;
      }

      return false;
    } // end of runOnFunction()

1）首先檢查ObfTimes是否不比0大，如果是的話就不進行混淆了。ObfTimes代表在一個函數上的混淆次數，在編譯程序時可以通過下面參數進行設置：

-mllvm -bcf_loop=3

這代表在每個函數上進行三次BCF混淆。

2）接着檢查ObfProbRate的取值範圍是不是在0到100之間，如果不是，則不進行混淆。ObfProbRate代表每個基本塊被混淆的概率，在編譯程序時可以通過下面參數進行設置：

-mllvm -bcf_prob=40

這代碼每個基本塊有40%的概率進行BCF混淆

3）通過toObfuscate函數來檢查創建BCF時傳入的flag值以及待保護函數上的標註值，如果檢查通過則開始進行BCF混淆。下面我們在第4步對toObfuscate函數進行分析，在第5步對BCF混淆步驟進行分析

4，Utils（/lib/Transforms/Obfuscation/Utils.cpp）-->toObfuscate

其函數原型如下：

bool toObfuscate(bool flag, Function *f, std::string attribute)

它可以接收三個參數：第一個參數flag就是我們在創建FLA的PASS時傳入的那個flag值（見第1步），如果在創建PASS時沒有傳遞flag值，則默認爲false；第二個參數是我們當前正在分析的函數對應的指針，注意這裏的函數指的是處於IR狀態下的函數；第三個參數是標註字符串（可以在編寫代碼時通過attribute關鍵字添加）。這個函數總體來說分爲兩個步驟，標註分析與flag分析：

4.1，標註分析

  std::string attr = attribute;
  std::string attrNo = "no" + attr;
 
  // Check if declaration
  if (f->isDeclaration()) {
    return false;
  }
 
  // Check external linkage
  if(f->hasAvailableExternallyLinkage() != 0) {
    return false;
  }
 
  // We have to check the nobcf flag first
  // Because .find("bcf") is true for a string like "bcf" or
  // "nofla"
  if (readAnnotate(f).find(attrNo) != std::string::npos) {
    return false;
  }
 
  // If bcf annotations
  if (readAnnotate(f).find(attr) != std::string::npos) {
    return true;
  }

首先檢查當前函數是不是僅僅是一個函數聲明，如果是的話則返回false，即不進行BCF保護；

接着檢查這個函數是不是extern函數，如果是的話返回false；

再接着讀取這個函數上的標註值，如果找到了'nobcf'，則返回false；

讀取函數標註值時如果找到了'bcf'，則返回true；

4.2，flag分析

  // If bcf flag is set
  if (flag == true) {
    return true;
  }

在上面的檢測都完成後如果還沒有返回，則再檢查一下flag（能到這一步說明函數上不屬於外部函數，也不是純聲明函數，而且沒有對應的標註），如果是true，則返回true，否則返回false。

5，BogusControlFlow（/lib/Transforms/Obfuscation/BogusControlFlow.cpp）-->bogus(Function &F)

緊接着上面第3步，在檢查通過確認需要進行BCF混淆後，先調用bogus函數，再調用doF函數。其中bogus函數會進行實際的BCF混淆，而doF主要是替換模塊中永遠爲true的語句。bogus代碼分段解析如下：

5.1，統計和調試信息

void bogus(Function &F) {
      // For statistics and debug
      ++NumFunction;
      int NumBasicBlocks = 0;
      bool firstTime = true; // First time we do the loop in this function
      bool hasBeenModified = false;
      DEBUG_WITH_TYPE("opt", errs() << "bcf: Started on function " << F.getName() << "\n");
      DEBUG_WITH_TYPE("opt", errs() << "bcf: Probability rate: "<< ObfProbRate<< "\n");
      if(ObfProbRate < 0 || ObfProbRate > 100){
        DEBUG_WITH_TYPE("opt", errs() << "bcf: Incorrect value,"
            << " probability rate set to default value: "
            << defaultObfRate <<" \n");
        ObfProbRate = defaultObfRate;
      }
      DEBUG_WITH_TYPE("opt", errs() << "bcf: How many times: "<< ObfTimes<< "\n");
      if(ObfTimes <= 0){
        DEBUG_WITH_TYPE("opt", errs() << "bcf: Incorrect value,"
            << " must be greater than 1. Set to default: "
            << defaultObfTime <<" \n");
        ObfTimes = defaultObfTime;
      }
      NumTimesOnFunctions = ObfTimes;
      int NumObfTimes = ObfTimes;

上面這段代碼主要進行統計和調試用，另外記錄下傳入的混淆次數，作爲後面循環體的判定變量（NumObfTimes）

5.2，記錄基本塊

do{
          DEBUG_WITH_TYPE("cfg", errs() << "bcf: Function " << F.getName()
              <<", before the pass:\n");
          DEBUG_WITH_TYPE("cfg", F.viewCFG());
          // Put all the function's block in a list
          std::list<BasicBlock *> basicBlocks;
          for (Function::iterator i=F.begin();i!=F.end();++i) {
            basicBlocks.push_back(&*i);
          }
          DEBUG_WITH_TYPE("gen", errs() << "bcf: Iterating on the Function's Basic Blocks\n");

上面這段代碼是遍歷函數的所有基本塊，然後保存到basicBlocks中

5.3，對單個基本塊添加虛假控制流

          while(!basicBlocks.empty()){
            NumBasicBlocks ++;
            // Basic Blocks' selection
            if((int)llvm::cryptoutils->get_range(100) <= ObfProbRate){
              DEBUG_WITH_TYPE("opt", errs() << "bcf: Block "
                  << NumBasicBlocks <<" selected. \n");
              hasBeenModified = true;
              ++NumModifiedBasicBlocks;
              NumAddedBasicBlocks += 3;
              FinalNumBasicBlocks += 3;
              // Add bogus flow to the given Basic Block (see description)
              BasicBlock *basicBlock = basicBlocks.front();
              addBogusFlow(basicBlock, F);
            }
            else{
              DEBUG_WITH_TYPE("opt", errs() << "bcf: Block "
                  << NumBasicBlocks <<" not selected.\n");
            }
            // remove the block from the list
            basicBlocks.pop_front();

            if(firstTime){ // first time we iterate on this function
              ++InitNumBasicBlocks;
              ++FinalNumBasicBlocks;
            }
          }

對於函數中的基本塊，隨機決定當前基本塊是否要進行混淆，如果被選中，則調用addBogusFlow函數進行虛假控制流添加，addBogusFlow是進行BCF的核心，其具體細節如下：

5.3.1，BogusControlFlow（/lib/Transforms/Obfuscation/BogusControlFlow.cpp）-->addBogusFlow(BasicBlock * basicBlock, Function &F)

addBogusFlow函數分塊代碼如下：

1）分割基本塊

      BasicBlock::iterator i1 = basicBlock->begin();
      if(basicBlock->getFirstNonPHIOrDbgOrLifetime())
        i1 = (BasicBlock::iterator)basicBlock->getFirstNonPHIOrDbgOrLifetime();
      Twine *var;
      var = new Twine("originalBB");
      BasicBlock *originalBB = basicBlock->splitBasicBlock(i1, *var);
      DEBUG_WITH_TYPE("gen", errs() << "bcf: First and original basic blocks: ok\n");

上面這段代碼是對當前這個基本塊進行分割，分割完成後第一個塊中只包含PHI和調試信息，第二塊（新名字是originalBB）則保存剩餘的指令

2）創建alteredBB塊（一個虛假塊）

      Twine * var3 = new Twine("alteredBB");
      BasicBlock *alteredBB = createAlteredBasicBlock(originalBB, *var3, &F);
      DEBUG_WITH_TYPE("gen", errs() << "bcf: Altered basic block: ok\n");

以originalBB塊爲模板創建alteredBB塊，在創建時會複製originalBB塊，然後複製出來的塊上添加一些花指令。createAlteredBasicBlock的具體邏輯見後面第7步

3）調整basicBlock塊與alteredBB塊的尾部節點

      alteredBB->getTerminator()->eraseFromParent();
      basicBlock->getTerminator()->eraseFromParent();
      DEBUG_WITH_TYPE("gen", errs() << "bcf: Terminator removed from the altered"
          <<" and first basic blocks\n");

把alteredBB塊（上面創建的虛假塊）和basicBlock（只包含PHI和調試信息的塊）尾部的terminator指令（通常是一個塊的結尾點，如return指令和branch指令）從其對應的塊中擦除，這麼做主要是取消它們與原有的後繼塊的關係。

4）創建一個總是true的比較指令

      Value * LHS = ConstantFP::get(Type::getFloatTy(F.getContext()), 1.0);
      Value * RHS = ConstantFP::get(Type::getFloatTy(F.getContext()), 1.0);
      DEBUG_WITH_TYPE("gen", errs() << "bcf: Value LHS and RHS created\n");

      // The always true condition. End of the first block
      Twine * var4 = new Twine("condition");
      FCmpInst * condition = new FCmpInst(*basicBlock, FCmpInst::FCMP_TRUE , LHS, RHS, *var4);
      DEBUG_WITH_TYPE("gen", errs() << "bcf: Always true condition created\n");

上面這個比較是浮點數1.0和1.0的一個比較式，condition爲此比較指令

5）創建basicBlock、originalBB、alteredBB三個塊的邏輯跳轉關係

      // Jump to the original basic block if the condition is true or
      // to the altered block if false.
      BranchInst::Create(originalBB, alteredBB, (Value *)condition, basicBlock);
      DEBUG_WITH_TYPE("gen",
          errs() << "bcf: Terminator instruction in first basic block: ok\n");

      // The altered block loop back on the original one.
      BranchInst::Create(originalBB, alteredBB);
      DEBUG_WITH_TYPE("gen", errs() << "bcf: Terminator instruction in altered block: ok\n");

利用上面的那個總爲true的比較創建一條分支指令（插入到basicBlock尾部），如果條件爲真則從basicBlock跳到originalBB塊，如果爲假則跳到alteredBB塊（實際上永遠不會跳到alteredBB塊）。

然後在alteredBB塊尾部插入一條無條件跳轉指令，使其可以跳到originalBB塊

6）繼續分割originalBB塊

      BasicBlock::iterator i = originalBB->end();

      // Split at this point (we only want the terminator in the second part)
      Twine * var5 = new Twine("originalBBpart2");
      BasicBlock * originalBBpart2 = originalBB->splitBasicBlock(--i , *var5);
      DEBUG_WITH_TYPE("gen", errs() << "bcf: Terminator part of the original basic block"
          << " is isolated\n");
      // the first part go either on the return statement or on the begining
      // of the altered block.. So we erase the terminator created when splitting.
      originalBB->getTerminator()->eraseFromParent();
      // We add at the end a new always true condition
      Twine * var6 = new Twine("condition2");
      FCmpInst * condition2 = new FCmpInst(*originalBB, CmpInst::FCMP_TRUE , LHS, RHS, *var6);
      BranchInst::Create(originalBBpart2, alteredBB, (Value *)condition2, originalBB);

把originalBB塊尾部的terminator指令分割到originalBBpart2塊中，然後在分割後的originalBB塊尾部添加一條分支跳轉指令，如果條件爲真，則跳到originalBBpart2塊，如果爲假則跳轉到alteredBB塊。由於比較指令比較的是浮點數1.0與1.0，因此比較式恆爲真，所以實際只會從originalBB塊跳到originalBBpart2塊。

以上就是對單個基本塊進行混淆的核心邏輯，下面介紹doF函數邏輯

6，BogusControlFlow（/lib/Transforms/Obfuscation/BogusControlFlow.cpp）-->doF(Module &M)

doF函數會找出模塊（一般是當前文件）中所有的永遠爲true的比較語句（上面第5步在每個基本塊中都創建了兩個），然後將它們替換爲下面語句：

(y < 10 || x * (x + 1) % 2 == 0)

可以看出，實際上面這個語句也永遠爲真，只不過比單純的1.0與1.0的比較複雜了一些。doF的具體代碼邏輯如下：

1）創建兩個全局變量x和y

      Twine * varX = new Twine("x");
      Twine * varY = new Twine("y");
      Value * x1 =ConstantInt::get(Type::getInt32Ty(M.getContext()), 0, false);
      Value * y1 =ConstantInt::get(Type::getInt32Ty(M.getContext()), 0, false);

      GlobalVariable 	* x = new GlobalVariable(M, Type::getInt32Ty(M.getContext()), false,
          GlobalValue::CommonLinkage, (Constant * )x1,
          *varX);
      GlobalVariable 	* y = new GlobalVariable(M, Type::getInt32Ty(M.getContext()), false,
          GlobalValue::CommonLinkage, (Constant * )y1,
          *varY);


      std::vector<Instruction*> toEdit, toDelete;
      BinaryOperator *op,*op1 = NULL;
      LoadInst * opX , * opY;
      ICmpInst * condition, * condition2;

2）尋找所有的恆爲true的語句

      std::vector<Instruction*> toEdit, toDelete;
      BinaryOperator *op,*op1 = NULL;
      LoadInst * opX , * opY;
      ICmpInst * condition, * condition2;
      // Looking for the conditions and branches to transform
      for(Module::iterator mi = M.begin(), me = M.end(); mi != me; ++mi){
        for(Function::iterator fi = mi->begin(), fe = mi->end(); fi != fe; ++fi){
          //fi->setName("");
          TerminatorInst * tbb= fi->getTerminator();
          if(tbb->getOpcode() == Instruction::Br){
            BranchInst * br = (BranchInst *)(tbb);
            if(br->isConditional()){
              FCmpInst * cond = (FCmpInst *)br->getCondition();
              unsigned opcode = cond->getOpcode();
              if(opcode == Instruction::FCmp){
                if (cond->getPredicate() == FCmpInst::FCMP_TRUE){
                  DEBUG_WITH_TYPE("gen",
                      errs()<<"bcf: an always true predicate !\n");
                  toDelete.push_back(cond); // The condition
                  toEdit.push_back(tbb);    // The branch using the condition
                }
              }
            }
          }
          /*
          for (BasicBlock::iterator bi = fi->begin(), be = fi->end() ; bi != be; ++bi){
            bi->setName(""); // setting the basic blocks' names
          }
          */
        }
      }

上面語句比較簡單，循環遍歷Module中的所有基本塊，找出條件爲true的比較語句。

3）表達式替換

      // Replacing all the branches we found
      for(std::vector<Instruction*>::iterator i =toEdit.begin();i!=toEdit.end();++i){
        //if y < 10 || x*(x-1) % 2 == 0
        opX = new LoadInst ((Value *)x, "", (*i));
        opY = new LoadInst ((Value *)y, "", (*i));

        op = BinaryOperator::Create(Instruction::Sub, (Value *)opX,
            ConstantInt::get(Type::getInt32Ty(M.getContext()), 1,
              false), "", (*i));
        op1 = BinaryOperator::Create(Instruction::Mul, (Value *)opX, op, "", (*i));
        op = BinaryOperator::Create(Instruction::URem, op1,
            ConstantInt::get(Type::getInt32Ty(M.getContext()), 2,
              false), "", (*i));
        condition = new ICmpInst((*i), ICmpInst::ICMP_EQ, op,
            ConstantInt::get(Type::getInt32Ty(M.getContext()), 0,
              false));
        condition2 = new ICmpInst((*i), ICmpInst::ICMP_SLT, opY,
            ConstantInt::get(Type::getInt32Ty(M.getContext()), 10,
              false));
        op1 = BinaryOperator::Create(Instruction::Or, (Value *)condition,
            (Value *)condition2, "", (*i));

        BranchInst::Create(((BranchInst*)*i)->getSuccessor(0),
            ((BranchInst*)*i)->getSuccessor(1),(Value *) op1,
            ((BranchInst*)*i)->getParent());
        DEBUG_WITH_TYPE("gen", errs() << "bcf: Erase branch instruction:"
            << *((BranchInst*)*i) << "\n");
        (*i)->eraseFromParent(); // erase the branch
      }

上面這坨代碼，就是在指令i前創建了一個表達式： if y < 10 || x*(x-1) % 2 == 0

4）去除原有的條件式

      // Erase all the associated conditions we found
      for(std::vector<Instruction*>::iterator i =toDelete.begin();i!=toDelete.end();++i){
        DEBUG_WITH_TYPE("gen", errs() << "bcf: Erase condition instruction:"
            << *((Instruction*)*i)<< "\n");
        (*i)->eraseFromParent();
      }

以上就是OLLVM中進行BCF變換的基本代碼邏輯。附一張官方變換前後圖

我的LLVM學習筆記——OLLVM混淆研究之BCF篇

VirtualApp框架--- Application啓動過程

解析駕考json數據，將其寫入word文件中

mprop工具

我的LLVM學習筆記——OLLVM混淆研究之BCF篇

Android Art Hook 技術方案

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結