vuzzer 具體原理解析

1.安裝(vmware 15.01環境下安裝):

由於vuzzer是比較久遠的項目,且無人更新,所以使用的環境比較老,我們需要安裝ubuntu14.04版本的linu系統,並將ubuntu14.04系統內核降到3.13.0-24。具體做法如下:

#下載3.13.0-24版的內核
sudo apt-get install linux-image-3.13.0-24-generic 
#重啓
sudo reboot

然後我們在進入初始界面時按“esc”進入內核選擇界面,選擇3.13.0-24進入(注意這時界面顯示存在一些問題,全屏會導致系統黑屏,所以以小窗口顯示,該問題至今未解決),進入後我們可以使用uname -r查看內核版本,然後將原先的內核卸載

sudo apt-get purge linux-image-版本
sudo apt-get purge linux-headers-版本

接下來我們安裝vuzzer

#下載vuzzer源碼
git clone https://github.com/vusec/vuzzer 
gcc --version
g++ --version
#查詢gcc和g++版本,若不爲4.8則用以下命令安裝
sudo apt-get install gcc-4.8
sudo apt-get install g++-4.8
#在官網上下載pin-2.14版本的pin,並在vuzzer中創建到pin的鏈接,回到vuzzer文件夾下
ln -s /path-to-pin-homes pin

python --version
#檢查是否帶有帶有python2.7,如果未安裝執行以下命令安裝
sudo apt-get install python-2.7
#下載EWAHBoolArray源碼
git clone https://github.com/lemire/EWAHBoolArray
#將EWAHBoolArray中headers文件夾下的4個頭文件拷貝到/usr/include/目錄下
sudo cp headers/* /usr/include/
#安裝BitMagic
sudo apt-get install bmagic
#安裝BitVector,可在https://engineering.purdue.edu/kak/dist/BitVector-2.2.html下載,解壓後在BitVector目錄下執行以下命令
sudo python setup.py install
#安裝vuzzer,首先回到vuzzer文件夾下
export PIN_ROOT=$(pwd)/pin
cd ./support/libdft/src
make clean
#再回到vuzzer文件夾下
make support-libdft
make 
make -f mymakefile
#當我們可以找到obj-ia32/dtracker.so和obj-i32/bbcounts2.so,則說明我們已經安裝成功

2.vuzzer使用說明

vuzzer的入口文件是runfuzzer.py,我們運行python runfuzzer.py -h,運行結果如下

其中-s:後的參數位被測試程序命令行,例如-s '/bin/a %s',注意要將轉入文件的位置改爲%s,使vuzzer以此處輸入爲基礎進行漏洞挖掘

-i後的參數爲初始種子所在文件夾,例如-i 'datatemp/a/',注意初始種子文件要有三個或三個以上

-w後的參數爲該程序生成的.pkl文件(即程序塊權重文件),-n後的參數爲該程序生成的.names文件(即cmp指令信息文件)

-l後的參數爲需要監測的二進制文件數量,-o後的參數爲程序或庫的起始地址,-b 後的參數爲要監測的庫名,下面將介紹vuzzer如何對一個二進制程序做測試

首先我們先寫一個c程序,代碼如下:

#include<stdio.h>
#include<stdlib.h>
int main(int argc,char** argv)
{
	char s[30];
	FILE* fp;
	fp=fopen(argv[1],"r+");
	if(fp==NULL)
	{
		exit(1);
	}
	fscanf(fp,"%29s",s);
	if(s[0]=='W')
	{
		if(s[10]=='A')
		{
			fscanf(fp,"%s",s);
			printf("%s\n",s);
		}
		else
		{
			printf("wrong");
		}
	}
	else
	{
		printf("wrong");
	}
	return 0;
}

使用gcc a.c -o a編譯生成32位文件(由於vuzzer只能藉助命令行測試且只能使用文件輸入,所以程序寫成上面那個樣子,我們發現在第二個fscanf有溢出風險)

我們使用ida打開二進制程序a,在file下選擇script file選項,之後選擇BB-weightv4.py腳本文件,ida會運行該腳本生成程序塊權重文件.pkl和cmp指令信息文件.names,將新生成的文件放在vuzzer/idafiles下,將程序a放在vuzzer/bin文件下,在vuzzer/datatemp下新建一個文件夾a,放入三個初始種子文件。運行命令python runfuzzer.py -s './bin/a %s' -i 'datatemp/a/' -w 'idafiles/a.pkl' -n idafiles/a.names即可正常使用vuzzer。運行結果如下圖所示

 其中,測試中用到的所有種子文件都放在data目錄下,可以引發crash的種子放在outd/crashInputs目錄下,一些crash記錄會記在error.log中,每一代的信息都放在status.log內,對cmp的分析結果放在cmp.out中。

3.vuzzer原理

3.1權重文件以及有着cmp信息的文件生成

def findCMPopnds():使用ida的接口尋找cmp指令,並將cmp指令中的立即數讀取出來,並將其轉爲[set(字符串),set(字符)]返回

def get_children(BB):藉助廣度優先算法將某函數所有從BB塊出發能夠到達的子塊首地址組合成列表返回

def calculate_weight(func, fAddr):根據馬爾可夫模型和程序控制流圖計算每一塊到達的概率

def calculate_weight(func, fAddr):
    edges.clear()
    temp = deque([]) #工作隊列
    rootFound= False
    visited=[] # 已經訪問的BBID列表(已計算權重)
    shadow=[]
    noorphan=True
    #首先計算邊的概率
    for block in func:
        pLen=len(list(block.succs()))
        if pLen == 0:
            continue
        eProb=1.0/pLen #該程序塊後置有n塊,則邊的概率爲1/n
        for succBB in block.succs():
            if (succBB.startEA <= block.startEA) and (len(list(succBB.preds()))>1):
                #當控制流圖存在循環的情況,設定邊的概率爲1
                edges[(block.startEA,succBB.startEA)]=1.0
            else:
                #當控制流從前向後移動,那麼邊的概率爲1/n
                edges[(block.startEA,succBB.startEA)]=eProb
    print "[*] Finished edge probability calculation"
    #for edg in edges:
        #print " %x -> %x: %3.1f "%(edg[0],edg[1],edges[edg])
    # lets find the root BB
    #orphanage=[]#home for orphan BBs
    orphID=[]
    for block in func:
        if len(list(block.preds())) == 0:
        #Note: this only check was not working as there are orphan BB in code. Really!!!
		#注意:此唯一檢查不起作用當代碼中有孤立BB。真的?!!!
            if block.startEA == fAddr:
                rootFound=True
                root = block
            else:
                if rootFound==True:
                    noorphan=False
                    break
                pass
    #now, all the BBs should be children of root node and those that are not children are orphans. This check is required only if we have orphans.
	#現在,所有bbs都應該是根節點的子級,而那些不是子級的bbs都是孤立的。只有當我們有孤立節點時才需要這張支票。
    if noorphan == False:
        rch=get_children(root)
        rch.append(fAddr)# add root also as a non-orphan BB
        for blk in func:
            if blk.startEA not in rch:
                weight[blk.startEA]=(1.0,blk.endEA)
                visited.append(blk.id)
                orphID.append(blk.id)
        #print "[*] orphanage calculation done."
        del rch
    if rootFound==True:
        #print "[*] found root BB at %x"%(root.startEA,)
        設定開頭塊的概率爲1,使用廣度優先算法遍歷控制流圖
        weight[root.startEA] = (1.0,root.endEA)
        visited.append(root.id)
        print "[*] Root found. Starting weight calculation."
        for sBlock in root.succs():
            #if sBlock.id not in shadow:
            #print "Pushing successor %x"%(sBlock.startEA,)
            temp.append(sBlock)
            shadow.append(sBlock.id)
        loop=dict()# this is a temp dictionary to avoid get_children() call everytime a BB is analysed.
        while len(temp) > 0:
            current=temp.popleft()
            shadow.remove(current.id)
            print "current: %x"%(current.startEA,)
            if current.id not in loop:
                loop[current.id]=[]
            # we check for orphan BB and give them a lower score
            # by construction and assumptions, this case should not hit!
			#我們檢查孤立的BB並通過構造和假設給他們一個較低的分數,這種情況不應該發生!
            if current.id in orphID:
                #weight[current.startEA]=(0.5,current.endEA)
                #visited.append(current.id)
                continue

            tempSum=0.0
            stillNot=False
            chCalculated=False
            for pb in current.preds():
                #print "[*] pred of current %x"%(pb.startEA,)
                if pb.id not in visited:
                    if edges[(pb.startEA,current.startEA)]==0.0:
                        weight[pb.startEA]=(0.5,pb.endEA)
                        #artificial insertion
                        #print "artificial insertion branch"
						#人工插入分支
                        continue
                    #在循環中會存在某一程序塊的前置被沒有概率,這時假定未訪問的概率爲0.5
                    if pb.id not in [k[0] for k in loop[current.id]]:
                        if chCalculated == False:
                            chCurrent=get_children(current)
                            chCalculated=True
                        #查找該塊的確處於該函數中而不是外來的
                        if pb.startEA in chCurrent:
                            # this BB is in a loop. we give less score to such BB
                            weight[pb.startEA]=(0.5,pb.endEA)
                            loop[current.id].append((pb.id,True))
                            #print "loop branch"
                            continue
                        else:
                            loop[current.id].append((pb.id,False))
                    else:
                        if (pb.id,True) in loop[current.id]:
                            weight[pb.startEA]=(0.5,pb.endEA)
                            continue
                            
                    #print "not pred %x"%(pb.startEA,)
                    if current.id not in shadow:
                        temp.append(current)
                        #print "pushed back %x"%(current.startEA,)
                        shadow.append(current.id)
                    stillNot=True
                    break
           #使用前一塊的快權重*前一塊到該塊的概率得到該塊的概率
            if stillNot == False:
                # as we sure to get weight for current, we push its successors
                for sb in current.succs():
                    if sb.id in visited:
                        continue
                    if sb.id not in shadow:
                        temp.append(sb)
                        shadow.append(sb.id)
                for pb in current.preds():
                    tempSum = tempSum+ (weight[pb.startEA][0]*edges[(pb.startEA,current.startEA)])
                weight[current.startEA] = (tempSum,current.endEA)
                visited.append(current.id)
                del loop[current.id]
                print "completed %x"%(current.startEA,)
                #print "remaining..."
                #for bs in temp:
                    #print "\t %x"%(bs.startEA,)

def analysis():將程序按函數切分,並對每個函數生成控制流圖,進入def calculate_weight(func, fAddr)中計算權重

def main()

def main():
    strings=[]
    start = timeit.default_timer()
    #獲得麼個程序塊的概率
    analysis()
    #獲得cmp的信息
    strings=findCMPopnds()
    stop = timeit.default_timer()
    #每個程序塊的權重=1/概率,返回(程序塊開始指令位置:(程序塊權重,程序塊結束後一條指令位置)
    for bb in weight:
        fweight[bb]=(1.0/weight[bb][0],weight[bb][1])
    print"[**] Printing weights..."
    for bb in fweight:
        print "BB [%x-%x] -> %3.2f"%(bb,fweight[bb][1],fweight[bb][0])
    print " [**] Total Time: ", stop - start
    print "[**] Total functions analyzed: %d"%(fCount,)
    print "[**] Total BB analyzed: %d"%(len(fweight),)
    outFile=GetInputFile() # name of the that is being analysed
    strFile=outFile+".names"
    outFile=outFile+".pkl"
    fd=open(outFile,'w')
    #將程序權重放在.pkl文件中
    pickle.dump(fweight,fd)
    fd.close()
    strFD=open(strFile,'w')
    #將程序cmp信息放在.name文件中
    pickle.dump(strings,strFD)
    strFD.close()
    print "[*] Saved results in pickle files: %s, %s"%(outFile,strFile)

3.2 vuzzer種子生成,變異原理

這一部分功能主要由runfuzz.py,gautils.py,operators.py實現,下面我們將看一下其中的原理

def main():

def main():
    check_env()
    將命令行的指令拆解放入配置的變量中
    parser = argparse.ArgumentParser(description='VUzzer options')
    parser.add_argument('-s','--sut', help='SUT commandline',required=True)
    parser.add_argument('-i','--inputd', help='seed input directory (relative path)',required=True)
    parser.add_argument('-w','--weight', help='path of the pickle file(s) for BB wieghts (separated by comma, in case there are two) ',required=True)
	#
    parser.add_argument('-n','--name', help='Path of the pickle file(s) containing strings from CMP inst (separated by comma if there are two).',required=True)
    parser.add_argument('-l','--libnum', help='Nunber of binaries to monitor (only application or used libraries)',required=False, default=1)
    parser.add_argument('-o','--offsets',help='base-address of application and library (if used), separated by comma', required=False, default='0x00000000')
    parser.add_argument('-b','--libname',help='library name to monitor',required=False, default='')
    args = parser.parse_args()
    config.SUT=args.sut
    config.INITIALD=os.path.join(config.INITIALD, args.inputd)
    config.LIBNUM=int(args.libnum)
    config.LIBTOMONITOR=args.libname
    config.LIBPICKLE=[w for w in args.weight.split(',')]
    config.NAMESPICKLE=[n for n in args.name.split(',')]
    config.LIBOFFSETS=[o for o in args.offsets.split(',')]
    ih=config.PINCMD.index("#") # this is just to find the index of the placeholder in PINCMD list to replace it with the libname,這只是爲了在pincmd列表中找到佔位符的索引,用libname替換它。
    config.PINCMD[ih]=args.libname


    ###################################

    config.minLength=get_min_file(config.INITIALD)
    #對文件中清空操作
    try:
        shutil.rmtree(config.KEEPD)
    except OSError:
        pass
    os.mkdir(config.KEEPD)
    
    try:
        os.mkdir("outd")
    except OSError:
        pass
    
    try:
        os.mkdir("outd/crashInputs")
    except OSError:
        gau.emptyDir("outd/crashInputs")

    crashHash=[]
    try:
        os.mkdir(config.SPECIAL)
    except OSError:
        gau.emptyDir(config.SPECIAL)
    
    try:
        os.mkdir(config.INTER)
    except OSError:
        gau.emptyDir(config.INTER)
	
    ###### open names pickle files,打開名稱pickle文件
    將.pkl和.names文件的內容讀入
    gau.prepareBBOffsets()
    if config.PTMODE:
        pt = simplept.simplept()
    else:
        pt = None
    if config.ERRORBBON==True:
        #檢查程序中錯誤處理的程序塊
        gbb,bbb=dry_run()
    else:
        gbb=0
   # gau.die("dry run over..")
    import timing
    #selftest()
    noprogress=0
    currentfit=0
    lastfit=0
    
    config.CRASHIN.clear()
    stat=open("stats.log",'w')
    stat.write("**** Fuzzing started at: %s ****\n"%(datetime.now().isoformat('+'),))
    stat.write("**** Initial BB for seed inputs: %d ****\n"%(gbb,))
    stat.flush()
    os.fsync(stat.fileno())
    stat.write("Genaration\t MINfit\t MAXfit\t AVGfit MINlen\t Maxlen\t AVGlen\t #BB\t AppCov\t AllCov\n")
    stat.flush()
    os.fsync(stat.fileno())
    starttime=time.clock()
    allnodes = set()
    alledges = set()
    try:
        shutil.rmtree(config.INPUTD)
    except OSError:
        pass
    shutil.copytree(config.INITIALD,config.INPUTD)
    # fisrt we get taint of the intial inputs
    在data目錄下生成初始種子文件
    get_taint(config.INITIALD)
    
    print "MOst common offsets and values:", config.MOSTCOMMON
    #gg=raw_input("press enter to continue..")
    config.MOSTCOMFLAG=True
    crashhappend=False
    filest = os.listdir(config.INPUTD)
    filenum=len(filest)
    if filenum < config.POPSIZE:
        gau.create_files(config.POPSIZE - filenum)
    
    if len(os.listdir(config.INPUTD)) != config.POPSIZE:
        gau.die("something went wrong. number of files is not right!")

    efd=open(config.ERRORS,"w")
    gau.prepareBBOffsets()
    writecache = True
    genran=0
    bbslide=10 # this is used to call run_error_BB() functions
    keepslide=3
    keepfilenum=config.BESTP
    使用遺傳變異的算法生成種子並運行fuzz
    while True:
        print "[**] Generation %d\n***********"%(genran,)
        del config.SPECIALENTRY[:]
        del config.TEMPTRACE[:]
        del config.BBSEENVECTOR[:]
        config.SEENBB.clear()
        config.TMPBBINFO.clear()
        config.TMPBBINFO.update(config.PREVBBINFO)
        
        fitnes=dict()
        execs=0
        config.cPERGENBB.clear()
        config.GOTSTUCK=False
       
        if config.ERRORBBON == True:
            if genran > config.GENNUM/5:
                bbslide = max(bbslide,config.GENNUM/20)
                keepslide=max(keepslide,config.GENNUM/100)
                keepfilenum=keepfilenum/2
        #config.cPERGENBB.clear()
        #config.GOTSTUCK=False
            if 0< genran < config.GENNUM/5 and genran%keepslide == 0:
                copy_files(config.INPUTD,config.KEEPD,keepfilenum)
                
        #lets find out some of the error handling BBs,讓我們找出一些錯誤處理bbs
            if  genran >20 and genran%bbslide==0:
                stat.write("\n**** Error BB cal started ****\n")
                stat.flush()
                os.fsync(stat.fileno())
                run_error_bb(pt)
                copy_files(config.KEEPD,config.INPUTD,len(os.listdir(config.KEEPD))*1/10)
            #copy_files(config.INITIALD,config.INPUTD,1)
        files=os.listdir(config.INPUTD)
        #將種子文件代入程序中運行,看是否有bug產生且計算每個種子文件的權重
        for fl in files:
                將種子文件逐個加入命令行運行,並將運行結果返回
                tfl=os.path.join(config.INPUTD,fl)
                iln=os.path.getsize(tfl)
                args = (config.SUT % tfl).split(' ')
                progname = os.path.basename(args[0])
                #print ''
                #print 'Input file sha1:', sha1OfFile(tfl)
                #print 'Going to call:', ' '.join(args)
                (bbs,retc)=execute(tfl)
                #計算權重
                if config.BBWEIGHT == True:
                    fitnes[fl]=gau.fitnesCal2(bbs,fl,iln)
                else:
                    fitnes[fl]=gau.fitnesNoWeight(bbs,fl,iln)

                execs+=1
                #當種子文件引發程序漏洞執行後面的程序
                if retc < 0 and retc != -2:
                    print "[*]Error code is %d"%(retc,)
                    efd.write("%s: %d\n"%(tfl, retc))
                    efd.flush()
                    os.fsync(efd)
                    tmpHash=sha1OfFile(config.CRASHFILE)
                    #將種子文件放入crashInputs文件夾和special文件夾中
                    if tmpHash not in crashHash:
                            crashHash.append(tmpHash)
                            tnow=datetime.now().isoformat().replace(":","-")
                            nf="%s-%s.%s"%(progname,tnow,gau.splitFilename(fl)[1])
                            npath=os.path.join("outd/crashInputs",nf)
                            shutil.copyfile(tfl,npath)
                            shutil.copy(tfl,config.SPECIAL)
                            config.CRASHIN.add(fl)
                    #打開STOPONCRASH選項,fuzz會在第一次發現bug的時候崩潰
                    if config.STOPONCRASH == True:
                        #efd.close()
                        crashhappend=True
                        break
        計算種子文件大小和分數的一些信息
        fitscore=[v for k,v in fitnes.items()]
        maxfit=max(fitscore)
        avefit=sum(fitscore)/len(fitscore)
        mnlen,mxlen,avlen=gau.getFileMinMax(config.INPUTD)
        print "[*] Done with all input in Gen, starting SPECIAL. \n"
        #### copy special inputs in SPECIAL directory and update coverage info ###
        spinputs=os.listdir(config.SPECIAL)
        #將上輪中覆蓋率小於本輪的新種子的種子文件刪除
        for sfl in spinputs:
                if sfl in config.PREVBBINFO and sfl not in config.TMPBBINFO:
                        tpath=os.path.join(config.SPECIAL,sfl)
                        os.remove(tpath)
                        if sfl in config.TAINTMAP:
                            del config.TAINTMAP[sfl]
        config.PREVBBINFO=copy.deepcopy(config.TMPBBINFO)
        spinputs=os.listdir(config.SPECIAL)
        將本次覆蓋率更高的種子文件放入
        for inc in config.TMPBBINFO:
                config.SPECIALENTRY.append(inc)
                if inc not in spinputs:
                        incp=os.path.join(config.INPUTD,inc)
                        shutil.copy(incp,config.SPECIAL)
                        #del fitnes[incp]
        計算本次fuzz的代碼覆蓋率
        appcov,allcov=gau.calculateCov()
        stat.write("\t%d\t %d\t %d\t %d\t %d\t %d\t %d\t %d\t %d\t %d\n"%(genran,min(fitscore),maxfit,avefit,mnlen,mxlen,avlen,len(config.cPERGENBB),appcov,allcov))
        stat.flush()
        os.fsync(stat.fileno())
        print "[*] Wrote to stat.log\n"
        if crashhappend == True:
            break
        #lets find out some of the error handling BBs
        #if genran >20 and genran%5==0:
         #   run_error_bb(pt)
        genran += 1
        #this part is to get initial fitness that will be used to determine if fuzzer got stuck.
        #查看種子的分數是否提升,如果二十輪都沒有改變則說明種子卡死
        lastfit=currentfit
        currentfit=maxfit
        if currentfit==lastfit:#lastfit-config.FITMARGIN < currentfit < lastfit+config.FITMARGIN:
            noprogress +=1
        else:
            noprogress =0
        if noprogress > 20:
            config.GOTSTUCK=True
            stat.write("Heavy mutate happens now..\n")
            noprogress =0
        if (genran >= config.GENNUM) and (config.STOPOVERGENNUM == True):
            break
        # copy inputs to SPECIAL folder (if they do not yet included in this folder
        #spinputs=os.listdir(config.SPECIAL)
        #for sfl in spinputs:
        #        if sfl in config.PREVBBINFO and sfl not in config.TMPBBINFO:
        #                tpath=os.path.join(config.SPECIAL,sfl)
        #                os.remove(tpath)
        #config.PREVBBINFO=copy.deepcopy(config.TMPBBINFO)
        #spinputs=os.listdir(config.SPECIAL)
        #for inc in config.TMPBBINFO:
        #        config.SPECIALENTRY.append(inc)
        #        if inc not in spinputs:
        #                incp=os.path.join(config.INPUTD,inc)
        #                shutil.copy(incp,config.SPECIAL)
        #                #del fitnes[incp]
        #使用special中的種子文件查看cmp比較信息的結果
        if len(os.listdir(config.SPECIAL))>0:
            if len(os.listdir(config.SPECIAL))<config.NEWTAINTFILES:
                get_taint(config.SPECIAL)
            else:
                try:
                    os.mkdir("outd/tainttemp")
                except OSError:
                    gau.emptyDir("outd/tainttemp")
                if conditional_copy_files(config.SPECIAL,"outd/tainttemp",config.NEWTAINTFILES) == 0:
                    get_taint("outd/tainttemp")
            #print "MOst common offsets and values:", config.MOSTCOMMON
            #gg=raw_input("press any key to continue..")
        print "[*] Going for new generation creation.\n" 
        gau.createNextGeneration3(fitnes,genran)
        #raw_input("press any key...")

    efd.close()
    stat.close()
    libfd_mm.close()
    libfd.close()
    endtime=time.clock()
    
    print "[**] Totol time %f sec."%(endtime-starttime,)
    print "[**] Fuzzing done. Check %s to see if there were crashes.."%(config.ERRORS,)

def dry_run():

def dry_run():
    ''' this function executes the initial test set to determine error handling BBs in the SUT. Such BBs are given zero weights during actual fuzzing.
    此函數執行初始測試集以確定SUT中的錯誤處理BBS。這種BBS在實際過程中被賦予零權重。
'''
    '''將程序正常運行和程序不正常運行時候經過的程序塊輸出。'''
    print "[*] Starting dry run now..."
    tempbad=[]
    dfiles=os.listdir(config.INITIALD)
    if len(dfiles) <3:
        gau.die("not sufficient initial files")
    '''基於初始種子運行程序,標記正常運行的一些程序塊'''
    for fl in dfiles:
        tfl=os.path.join(config.INITIALD,fl)
        try:
            f=open(tfl, 'r')
            f.close()
        except:
            gau.die("can not open our own input %s!"%(tfl,))
        (bbs,retc)=execute(tfl)
        if retc < 0:
            gau.die("looks like we already got a crash!!")
        config.GOODBB |= set(bbs.keys())
    print "[*] Finished good inputs (%d)"%(len(config.GOODBB),)
    #now lets run SUT of probably invalid files. For that we need to create them first.
     
    #現在讓我們運行可能無效文件的SUT。爲此,我們需要先創建它們。
    print "[*] Starting bad inputs.."
    lp=0
    badbb=set()
    while lp <2:
        try:
                shutil.rmtree(config.INPUTD)
        except OSError:
                pass

        os.mkdir(config.INPUTD)
        #生成一些隨機字符作爲一些種子文件作爲測試
        gau.create_files_dry(30)
        dfiles=os.listdir(config.INPUTD)
        #當運行到一些之前沒有經過的程序塊,那麼就是錯誤處理的程序塊
        for fl in dfiles:
            tfl=os.path.join(config.INPUTD,fl)
            (bbs,retc)=execute(tfl)
            if retc < 0:
                gau.die("looks like we already got a crash!!")
            tempbad.append(set(bbs.keys()) - config.GOODBB)
            
        tempcomn=set(tempbad[0])
        for di in tempbad:
            tempcomn.intersection_update(set(di))
        badbb.update(tempcomn)
        lp +=1
    #else:
    #  tempcomn = set()
    ###print "[*] finished bad inputs (%d)"%(len(tempbad),)
    config.ERRORBBALL=badbb.copy()
    print "[*] finished common BB. TOtal such BB: %d"%(len(badbb),)
    for ebb in config.ERRORBBALL:
        print "error bb: 0x%x"%(ebb,)
    time.sleep(5)
    if config.LIBNUM == 2:
        baseadr=config.LIBOFFSETS[1]
        for ele in tempcomn:
            if ele < baseadr:
                config.ERRORBBAPP.add(ele)
            else:
                config.ERRORBBLIB.add(ele-baseadr)
                         
    del tempbad
    del badbb
    #del tempgood
    將正確的程序塊首地址寫入GOODBB中,將錯誤的程序塊首地址寫入ERRORBBALL中,返回
    return len(config.GOODBB),len(config.ERRORBBALL)

def read_taint(fpath):返回當前種子文件遇到的cmp信息

def get_taint(dirin):

def get_taint(dirin):
    ''' This function is used to get taintflow for each CMP instruction to find which offsets in the input are used at the instructions. It also gets the values used in the CMP.'''
    '''此函數用於獲取每個CMP指令的污染流,以查找在指令中使用的輸入偏移量。它還獲取CMP中使用的值。'''
    print "[*] starting taintflow calculation."
    files=os.listdir(dirin)
    #taintmap=dict()#this is a dictionary to keep taintmap of each input file. Key is the input file name and value is a tuple returned by read_taint, wherein 1st element is a set of all offsets used in cmp and 2nd elment is a dictionary with key a offset and value is a set of values at that offsetthat were found in CMP instructions.
    #mostcommon=dict()# this dictionary keeps offsets which are common across all the inputs with same value set. 
    #記錄每個種子文件經過的cmp指令的信息
    for fl in files:
        if fl in config.TAINTMAP:
            continue
        pfl=os.path.join(dirin,fl)
        rcode=execute2(pfl,fl)
        if rcode ==255:
            continue
            gau.die("pintool terminated with error 255 on input %s"%(pfl,))
        config.TAINTMAP[fl]=read_taint(pfl)
        config.LEAMAP[fl]=read_lea()
        #print config.TAINTMAP[fl][1]
        #raw_input("press key..")
    #將所有種子文件中遇到的相同的cmp信息組合記錄至config.MOSTCOMMON
    if config.MOSTCOMFLAG==False:
        print "computing MOSTCOM calculation..."
        for k1,v1 in config.TAINTMAP.iteritems():
            for off1,vset1 in v1[1].iteritems():
                tag=True
                if off1 > config.MAXOFFSET:
                    config.TAINTMAP[k1][0].add(off1)
                    #print "[==] ",k1,off1
                    continue
                                        
                for k2,v2 in config.TAINTMAP.iteritems():
                    if off1 not in v2[1]:
                        config.TAINTMAP[k1][0].add(off1)
                        #print k2,v2[1]
                        tag=False
                        break
                    #print "passed..", off1
                    if len(set(vset1) & set(v2[1][off1]))==0:#set(vset1) != set(v2[off1])
                        print k1, k2, off1, set(vset1), set(v2[1][off1])
                        config.TAINTMAP[k1][0].add(off1)
                        tag=False
                        break
                    #print "passed set", vset1
                if tag==True:
                    config.MOSTCOMMON[off1]=vset1[:]
                    #print "[++]",config.MOSTCOMMON[off1]
            break # we just want to take one input and check if all the offsets in other inputs have commonality.
            #我們只需要接受一個輸入,並檢查其他輸入中的所有偏移是否具有共性。
    else:
        print "computing MORECOM calculation..."
        for k1,v1 in config.TAINTMAP.iteritems():
            for off1,vset1 in v1[1].iteritems():
                tag=True
                #if off1 > config.MAXOFFSET:
                    #print k1,off1
                #    continue
                for k2,v2 in config.TAINTMAP.iteritems():
                    if off1 not in v2[1]:
                        config.TAINTMAP[k1][0].add(off1)
                        #print k2,v2[1]
                        tag=False
                        break
                    #print "passed..", off1
                    if len(set(vset1) ^ set(v2[1][off1]))>3:#vset1 != v2[1][off1]:
                        print k2, vset1, v2[1][off1]
                        config.TAINTMAP[k1][0].add(off1)
                        tag=False
                        break
                    #print "passed set", vset1
                if tag==True:
                    config.MORECOMMON[off1]=vset1[:]
                    #print config.MOSTCOMMON[off1]
            break # we just want to take one input and check if all the offsets in other inputs have commonality.
                  #我們只需要接受一個輸入,並檢查其他輸入中的所有偏移是否具有共性。
    print "[*] taintflow finished."  

gautils.py:

def create_files_dry(num):

def create_files_dry(num):
    ''' This function creates num number of files in the input directory. This is called if we do not have enough initial population.
    此函數在輸入目錄中創建num個文件。如果沒有足夠的初始人口,就稱之爲。
''' 
    #files=os.listdir(config.INPUTD)
    #使用datatemp目錄下最初的種子文件作爲初始文件
    files=os.listdir(config.INITIALD)
    #files=random.sample(filef, 2)
    #初始化operators類
    ga=operators.GAoperator(random.Random(),[set(),set()])
    fl=random.choice(files)
    bn, ext = splitFilename(fl)
    while (num != 0):
        #if random.uniform(0.1,1.0)>(1.0 - config.PROBCROSS) and (num >1):
         #   #we are going to use crossover, so we get two parents.
         #   par=random.sample(files, 2)
         #   bn, ext = splitFilename(par[0])
         #   #fp1=os.path.join(config.INPUTD,par[0])
         #   #fp2=os.path.join(config.INPUTD,par[1])
         #   fp1=os.path.join(config.INITIALD,par[0])
         #   fp2=os.path.join(config.INITIALD,par[1])
         #   p1=readFile(fp1)
         #   p2=readFile(fp2)
         #   ch1,ch2 = ga.crossover(p1,p2)
         #   np1=os.path.join(config.INPUTD,"ex-%d.%s"%(num,ext))
         #   np2=os.path.join(config.INPUTD,"ex-%d.%s"%(num-1,ext))
         #   writeFile(np1,ch1)
         #   writeFile(np2,ch2)
         #   num -= 2
        #else:
        fl=random.choice(files)
        #bn, ext = splitFilename(fl)
        #fp=os.path.join(config.INPUTD,fl)
        fp=os.path.join(config.INITIALD,fl)
        p1=readFile(fp)
        #ch1= ga.mutate(p1)
        #藉助函數ga中的totally_random函數生成隨機長度的字符串,函數的參數並沒有用
        ch1= ga.totally_random(p1,fl)
        #將字符拷貝在文件中,並複製到data文件夾中
        np1=os.path.join(config.INPUTD,"ex-%d.%s"%(num,ext))
        writeFile(np1,ch1)
        num -= 1
        #files.extend(os.listdir(config.INPUTD))
    return 0

def create_files(num):

def create_files(num):
    ''' This function creates num number of files in the input directory. This is called if we do not have enough initial population.
    Addition: once a new file is created by mutation/cossover, we query MOSTCOMMON dict to find offsets that replace values at those offsets in the new files. Int he case of mutation, we also use taintmap of the parent input to get other offsets that are used in CMP and change them. For crossover, as there are two parents invlived, we cannot query just one, so we do a random change on those offsets from any of the parents in resulting children.
    此函數在輸入目錄中創建num個文件。如果沒有足夠的初始數量將會被調用。
    另外:一旦mutation/cossover創建了一個新文件,我們將查詢mostcommon dict以查找在新文件中替換這些偏移值的偏移量。在突變的情況下,我們還使用父輸入的污染圖來獲取CMP中使用的其他偏移並更改它們。對於交叉,因爲有兩個父對象是反向的,所以我們不能只查詢一個,所以我們對這些偏移量從產生子對象的任何父對象進行隨機更改。
''' 
    #files=os.listdir(config.INPUTD)
    files=os.listdir(config.INITIALD)
    #初始化operators類,注意這裏將cmp比較信息,即config.ALLSTRINGS作爲參數傳入
    ga=operators.GAoperator(random.Random(),config.ALLSTRINGS)
    while (num != 0):
        當滿足該條件,將選擇兩個種子文件做交叉
        if random.uniform(0.1,1.0)>(1.0 - config.PROBCROSS) and (num >1):
            #we are going to use crossover, so we get two parents.
            par=random.sample(files, 2)
            bn, ext = splitFilename(par[0])
            #fp1=os.path.join(config.INPUTD,par[0])
            #fp2=os.path.join(config.INPUTD,par[1])
            fp1=os.path.join(config.INITIALD,par[0])
            fp2=os.path.join(config.INITIALD,par[1])
            p1=readFile(fp1)
            p2=readFile(fp2)
            #完成交叉
            ch1,ch2 = ga.crossover(p1,p2)
            # now we make changes according to taintflow info.
            ch1=taint_based_change(ch1,par[0])
            ch2=taint_based_change(ch2,par[1])
            np1=os.path.join(config.INPUTD,"ex-%d.%s"%(num,ext))
            np2=os.path.join(config.INPUTD,"ex-%d.%s"%(num-1,ext))
            writeFile(np1,ch1)
            writeFile(np2,ch2)
            num -= 2
        #當滿足該條件時,將對單個文件做變異
        else:
            fl=random.choice(files)
            bn, ext = splitFilename(fl)
            #fp=os.path.join(config.INPUTD,fl)
            fp=os.path.join(config.INITIALD,fl)
            p1=readFile(fp)
            #隨機選擇一種策略對種子做變異
            ch1= ga.mutate(p1,fl)
            ch1=taint_based_change(ch1,fl)
            np1=os.path.join(config.INPUTD,"ex-%d.%s"%(num,ext))
            writeFile(np1,ch1)
            num -= 1
    return 0

 

 

 

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章