因爲要做代碼保護,所以抽時間研究了下OLLVM中的三種保護方案:BCF(Bogus Control Flow,中文名虛假控制流)、FLA(Control Flow Flattening,中文名控制流平坦化)、SUB(Instructions Substitution,中文名指令替換),本文是BCF介紹篇。
1,查看BCF的頭文件,暴露給外界的兩個函數如下:
// Namespace
namespace llvm {
Pass *createBogus ();
Pass *createBogus (bool flag);
}
兩個函數用於創建BCF對應的PASS,這兩個函數主要區別表現在toObfuscate函數上,用於判斷當前函數是否需要進行BCF保護。
2,在OLLVM中,BCF的PASS通過PassManager進行管理,BCF對應的PASS添加和調用參見://need add
3,BCF調用入口是runOnFunction函數,如下所示:
virtual bool runOnFunction(Function &F){
// Check if the percentage is correct
if (ObfTimes <= 0) {
errs()<<"BogusControlFlow application number -bcf_loop=x must be x > 0";
return false;
}
// Check if the number of applications is correct
if ( !((ObfProbRate > 0) && (ObfProbRate <= 100)) ) {
errs()<<"BogusControlFlow application basic blocks percentage -bcf_prob=x must be 0 < x <= 100";
return false;
}
// If fla annotations
if(toObfuscate(flag,&F,"bcf")) {
bogus(F);
doF(*F.getParent());
return true;
}
return false;
} // end of runOnFunction()
1)首先檢查ObfTimes是否不比0大,如果是的話就不進行混淆了。ObfTimes代表在一個函數上的混淆次數,在編譯程序時可以通過下面參數進行設置:
-mllvm -bcf_loop=3
這代表在每個函數上進行三次BCF混淆。
2)接着檢查ObfProbRate的取值範圍是不是在0到100之間,如果不是,則不進行混淆。ObfProbRate代表每個基本塊被混淆的概率,在編譯程序時可以通過下面參數進行設置:
-mllvm -bcf_prob=40
這代碼每個基本塊有40%的概率進行BCF混淆
3)通過toObfuscate函數來檢查創建BCF時傳入的flag值以及待保護函數上的標註值,如果檢查通過則開始進行BCF混淆。下面我們在第4步對toObfuscate函數進行分析,在第5步對BCF混淆步驟進行分析
4,Utils(/lib/Transforms/Obfuscation/Utils.cpp)-->toObfuscate
其函數原型如下:
bool toObfuscate(bool flag, Function *f, std::string attribute)
它可以接收三個參數:第一個參數flag就是我們在創建FLA的PASS時傳入的那個flag值(見第1步),如果在創建PASS時沒有傳遞flag值,則默認爲false;第二個參數是我們當前正在分析的函數對應的指針,注意這裏的函數指的是處於IR狀態下的函數;第三個參數是標註字符串(可以在編寫代碼時通過attribute關鍵字添加)。這個函數總體來說分爲兩個步驟,標註分析與flag分析:
4.1,標註分析
std::string attr = attribute;
std::string attrNo = "no" + attr;
// Check if declaration
if (f->isDeclaration()) {
return false;
}
// Check external linkage
if(f->hasAvailableExternallyLinkage() != 0) {
return false;
}
// We have to check the nobcf flag first
// Because .find("bcf") is true for a string like "bcf" or
// "nofla"
if (readAnnotate(f).find(attrNo) != std::string::npos) {
return false;
}
// If bcf annotations
if (readAnnotate(f).find(attr) != std::string::npos) {
return true;
}
首先檢查當前函數是不是僅僅是一個函數聲明,如果是的話則返回false,即不進行BCF保護;
接着檢查這個函數是不是extern函數,如果是的話返回false;
再接着讀取這個函數上的標註值,如果找到了'nobcf',則返回false;
讀取函數標註值時如果找到了'bcf',則返回true;
4.2,flag分析
// If bcf flag is set
if (flag == true) {
return true;
}
在上面的檢測都完成後如果還沒有返回,則再檢查一下flag(能到這一步說明函數上不屬於外部函數,也不是純聲明函數,而且沒有對應的標註),如果是true,則返回true,否則返回false。
5,BogusControlFlow(/lib/Transforms/Obfuscation/BogusControlFlow.cpp)-->bogus(Function &F)
緊接着上面第3步,在檢查通過確認需要進行BCF混淆後,先調用bogus函數,再調用doF函數。其中bogus函數會進行實際的BCF混淆,而doF主要是替換模塊中永遠爲true的語句。bogus代碼分段解析如下:
5.1,統計和調試信息
void bogus(Function &F) {
// For statistics and debug
++NumFunction;
int NumBasicBlocks = 0;
bool firstTime = true; // First time we do the loop in this function
bool hasBeenModified = false;
DEBUG_WITH_TYPE("opt", errs() << "bcf: Started on function " << F.getName() << "\n");
DEBUG_WITH_TYPE("opt", errs() << "bcf: Probability rate: "<< ObfProbRate<< "\n");
if(ObfProbRate < 0 || ObfProbRate > 100){
DEBUG_WITH_TYPE("opt", errs() << "bcf: Incorrect value,"
<< " probability rate set to default value: "
<< defaultObfRate <<" \n");
ObfProbRate = defaultObfRate;
}
DEBUG_WITH_TYPE("opt", errs() << "bcf: How many times: "<< ObfTimes<< "\n");
if(ObfTimes <= 0){
DEBUG_WITH_TYPE("opt", errs() << "bcf: Incorrect value,"
<< " must be greater than 1. Set to default: "
<< defaultObfTime <<" \n");
ObfTimes = defaultObfTime;
}
NumTimesOnFunctions = ObfTimes;
int NumObfTimes = ObfTimes;
上面這段代碼主要進行統計和調試用,另外記錄下傳入的混淆次數,作爲後面循環體的判定變量(NumObfTimes)
5.2,記錄基本塊
do{
DEBUG_WITH_TYPE("cfg", errs() << "bcf: Function " << F.getName()
<<", before the pass:\n");
DEBUG_WITH_TYPE("cfg", F.viewCFG());
// Put all the function's block in a list
std::list<BasicBlock *> basicBlocks;
for (Function::iterator i=F.begin();i!=F.end();++i) {
basicBlocks.push_back(&*i);
}
DEBUG_WITH_TYPE("gen", errs() << "bcf: Iterating on the Function's Basic Blocks\n");
上面這段代碼是遍歷函數的所有基本塊,然後保存到basicBlocks中
5.3,對單個基本塊添加虛假控制流
while(!basicBlocks.empty()){
NumBasicBlocks ++;
// Basic Blocks' selection
if((int)llvm::cryptoutils->get_range(100) <= ObfProbRate){
DEBUG_WITH_TYPE("opt", errs() << "bcf: Block "
<< NumBasicBlocks <<" selected. \n");
hasBeenModified = true;
++NumModifiedBasicBlocks;
NumAddedBasicBlocks += 3;
FinalNumBasicBlocks += 3;
// Add bogus flow to the given Basic Block (see description)
BasicBlock *basicBlock = basicBlocks.front();
addBogusFlow(basicBlock, F);
}
else{
DEBUG_WITH_TYPE("opt", errs() << "bcf: Block "
<< NumBasicBlocks <<" not selected.\n");
}
// remove the block from the list
basicBlocks.pop_front();
if(firstTime){ // first time we iterate on this function
++InitNumBasicBlocks;
++FinalNumBasicBlocks;
}
}
對於函數中的基本塊,隨機決定當前基本塊是否要進行混淆,如果被選中,則調用addBogusFlow函數進行虛假控制流添加,addBogusFlow是進行BCF的核心,其具體細節如下:
5.3.1,BogusControlFlow(/lib/Transforms/Obfuscation/BogusControlFlow.cpp)-->addBogusFlow(BasicBlock * basicBlock, Function &F)
addBogusFlow函數分塊代碼如下:
1)分割基本塊
BasicBlock::iterator i1 = basicBlock->begin();
if(basicBlock->getFirstNonPHIOrDbgOrLifetime())
i1 = (BasicBlock::iterator)basicBlock->getFirstNonPHIOrDbgOrLifetime();
Twine *var;
var = new Twine("originalBB");
BasicBlock *originalBB = basicBlock->splitBasicBlock(i1, *var);
DEBUG_WITH_TYPE("gen", errs() << "bcf: First and original basic blocks: ok\n");
上面這段代碼是對當前這個基本塊進行分割,分割完成後第一個塊中只包含PHI和調試信息,第二塊(新名字是originalBB)則保存剩餘的指令
2)創建alteredBB塊(一個虛假塊)
Twine * var3 = new Twine("alteredBB");
BasicBlock *alteredBB = createAlteredBasicBlock(originalBB, *var3, &F);
DEBUG_WITH_TYPE("gen", errs() << "bcf: Altered basic block: ok\n");
以originalBB塊爲模板創建alteredBB塊,在創建時會複製originalBB塊,然後複製出來的塊上添加一些花指令。createAlteredBasicBlock的具體邏輯見後面第7步
3)調整basicBlock塊與alteredBB塊的尾部節點
alteredBB->getTerminator()->eraseFromParent();
basicBlock->getTerminator()->eraseFromParent();
DEBUG_WITH_TYPE("gen", errs() << "bcf: Terminator removed from the altered"
<<" and first basic blocks\n");
把alteredBB塊(上面創建的虛假塊)和basicBlock(只包含PHI和調試信息的塊)尾部的terminator指令(通常是一個塊的結尾點,如return指令和branch指令)從其對應的塊中擦除,這麼做主要是取消它們與原有的後繼塊的關係。
4)創建一個總是true的比較指令
Value * LHS = ConstantFP::get(Type::getFloatTy(F.getContext()), 1.0);
Value * RHS = ConstantFP::get(Type::getFloatTy(F.getContext()), 1.0);
DEBUG_WITH_TYPE("gen", errs() << "bcf: Value LHS and RHS created\n");
// The always true condition. End of the first block
Twine * var4 = new Twine("condition");
FCmpInst * condition = new FCmpInst(*basicBlock, FCmpInst::FCMP_TRUE , LHS, RHS, *var4);
DEBUG_WITH_TYPE("gen", errs() << "bcf: Always true condition created\n");
上面這個比較是浮點數1.0和1.0的一個比較式,condition爲此比較指令
5)創建basicBlock、originalBB、alteredBB三個塊的邏輯跳轉關係
// Jump to the original basic block if the condition is true or
// to the altered block if false.
BranchInst::Create(originalBB, alteredBB, (Value *)condition, basicBlock);
DEBUG_WITH_TYPE("gen",
errs() << "bcf: Terminator instruction in first basic block: ok\n");
// The altered block loop back on the original one.
BranchInst::Create(originalBB, alteredBB);
DEBUG_WITH_TYPE("gen", errs() << "bcf: Terminator instruction in altered block: ok\n");
利用上面的那個總爲true的比較創建一條分支指令(插入到basicBlock尾部),如果條件爲真則從basicBlock跳到originalBB塊,如果爲假則跳到alteredBB塊(實際上永遠不會跳到alteredBB塊)。
然後在alteredBB塊尾部插入一條無條件跳轉指令,使其可以跳到originalBB塊
6)繼續分割originalBB塊
BasicBlock::iterator i = originalBB->end();
// Split at this point (we only want the terminator in the second part)
Twine * var5 = new Twine("originalBBpart2");
BasicBlock * originalBBpart2 = originalBB->splitBasicBlock(--i , *var5);
DEBUG_WITH_TYPE("gen", errs() << "bcf: Terminator part of the original basic block"
<< " is isolated\n");
// the first part go either on the return statement or on the begining
// of the altered block.. So we erase the terminator created when splitting.
originalBB->getTerminator()->eraseFromParent();
// We add at the end a new always true condition
Twine * var6 = new Twine("condition2");
FCmpInst * condition2 = new FCmpInst(*originalBB, CmpInst::FCMP_TRUE , LHS, RHS, *var6);
BranchInst::Create(originalBBpart2, alteredBB, (Value *)condition2, originalBB);
把originalBB塊尾部的terminator指令分割到originalBBpart2塊中,然後在分割後的originalBB塊尾部添加一條分支跳轉指令,如果條件爲真,則跳到originalBBpart2塊,如果爲假則跳轉到alteredBB塊。由於比較指令比較的是浮點數1.0與1.0,因此比較式恆爲真,所以實際只會從originalBB塊跳到originalBBpart2塊。
以上就是對單個基本塊進行混淆的核心邏輯,下面介紹doF函數邏輯
6,BogusControlFlow(/lib/Transforms/Obfuscation/BogusControlFlow.cpp)-->doF(Module &M)
doF函數會找出模塊(一般是當前文件)中所有的永遠爲true的比較語句(上面第5步在每個基本塊中都創建了兩個),然後將它們替換爲下面語句:
(y < 10 || x * (x + 1) % 2 == 0)
可以看出,實際上面這個語句也永遠爲真,只不過比單純的1.0與1.0的比較複雜了一些。doF的具體代碼邏輯如下:
1)創建兩個全局變量x和y
Twine * varX = new Twine("x");
Twine * varY = new Twine("y");
Value * x1 =ConstantInt::get(Type::getInt32Ty(M.getContext()), 0, false);
Value * y1 =ConstantInt::get(Type::getInt32Ty(M.getContext()), 0, false);
GlobalVariable * x = new GlobalVariable(M, Type::getInt32Ty(M.getContext()), false,
GlobalValue::CommonLinkage, (Constant * )x1,
*varX);
GlobalVariable * y = new GlobalVariable(M, Type::getInt32Ty(M.getContext()), false,
GlobalValue::CommonLinkage, (Constant * )y1,
*varY);
std::vector<Instruction*> toEdit, toDelete;
BinaryOperator *op,*op1 = NULL;
LoadInst * opX , * opY;
ICmpInst * condition, * condition2;
2)尋找所有的恆爲true的語句
std::vector<Instruction*> toEdit, toDelete;
BinaryOperator *op,*op1 = NULL;
LoadInst * opX , * opY;
ICmpInst * condition, * condition2;
// Looking for the conditions and branches to transform
for(Module::iterator mi = M.begin(), me = M.end(); mi != me; ++mi){
for(Function::iterator fi = mi->begin(), fe = mi->end(); fi != fe; ++fi){
//fi->setName("");
TerminatorInst * tbb= fi->getTerminator();
if(tbb->getOpcode() == Instruction::Br){
BranchInst * br = (BranchInst *)(tbb);
if(br->isConditional()){
FCmpInst * cond = (FCmpInst *)br->getCondition();
unsigned opcode = cond->getOpcode();
if(opcode == Instruction::FCmp){
if (cond->getPredicate() == FCmpInst::FCMP_TRUE){
DEBUG_WITH_TYPE("gen",
errs()<<"bcf: an always true predicate !\n");
toDelete.push_back(cond); // The condition
toEdit.push_back(tbb); // The branch using the condition
}
}
}
}
/*
for (BasicBlock::iterator bi = fi->begin(), be = fi->end() ; bi != be; ++bi){
bi->setName(""); // setting the basic blocks' names
}
*/
}
}
上面語句比較簡單,循環遍歷Module中的所有基本塊,找出條件爲true的比較語句。
3)表達式替換
// Replacing all the branches we found
for(std::vector<Instruction*>::iterator i =toEdit.begin();i!=toEdit.end();++i){
//if y < 10 || x*(x-1) % 2 == 0
opX = new LoadInst ((Value *)x, "", (*i));
opY = new LoadInst ((Value *)y, "", (*i));
op = BinaryOperator::Create(Instruction::Sub, (Value *)opX,
ConstantInt::get(Type::getInt32Ty(M.getContext()), 1,
false), "", (*i));
op1 = BinaryOperator::Create(Instruction::Mul, (Value *)opX, op, "", (*i));
op = BinaryOperator::Create(Instruction::URem, op1,
ConstantInt::get(Type::getInt32Ty(M.getContext()), 2,
false), "", (*i));
condition = new ICmpInst((*i), ICmpInst::ICMP_EQ, op,
ConstantInt::get(Type::getInt32Ty(M.getContext()), 0,
false));
condition2 = new ICmpInst((*i), ICmpInst::ICMP_SLT, opY,
ConstantInt::get(Type::getInt32Ty(M.getContext()), 10,
false));
op1 = BinaryOperator::Create(Instruction::Or, (Value *)condition,
(Value *)condition2, "", (*i));
BranchInst::Create(((BranchInst*)*i)->getSuccessor(0),
((BranchInst*)*i)->getSuccessor(1),(Value *) op1,
((BranchInst*)*i)->getParent());
DEBUG_WITH_TYPE("gen", errs() << "bcf: Erase branch instruction:"
<< *((BranchInst*)*i) << "\n");
(*i)->eraseFromParent(); // erase the branch
}
上面這坨代碼,就是在指令i前創建了一個表達式: if y < 10 || x*(x-1) % 2 == 0
4)去除原有的條件式
// Erase all the associated conditions we found
for(std::vector<Instruction*>::iterator i =toDelete.begin();i!=toDelete.end();++i){
DEBUG_WITH_TYPE("gen", errs() << "bcf: Erase condition instruction:"
<< *((Instruction*)*i)<< "\n");
(*i)->eraseFromParent();
}
以上就是OLLVM中進行BCF變換的基本代碼邏輯。附一張官方變換前後圖