clang 10 介紹——sanitizerCoverage

1.Introduction

llvm內置了一個簡單的代碼覆蓋率檢測(sanitizercoverage)。它在函數級、基本塊級和邊緣級插入對用戶定義函數的調用。提供了這些回調的默認實現,並實現了簡單的覆蓋率報告和可視化,但是,如果您只需要覆蓋率可視化,則可能需要改用sourcebasedcodecoverage。

2.Tracing PCs with guards

使用-fsanitize coverage=trace pc guard,編譯器將在每個邊緣插入以下代碼:

__sanitizer_cov_trace_pc_guard(&guard_variable)

每個邊都有自己的保護變量(uint32)。

完成程序還將插入對模塊構造函數的調用:

// The guards are [start, stop).警衛在[start,stop)。
// This function will be called at least once per DSO and may be called.每個dso至少調用一次此函數,可以調用
// more than once with the same values of start/stop.多次使用相同的“開始/停止”值。
__sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop);

在每個間接調用中都會插入一個附加的 ...=trace-pc,indirect-calls標誌__sanitizer_cov_trace_pc_indirect(void *callee)。

函數__sanitizer_cov_trace_pc_*應由用戶定義。

例如:

#include <stdint.h>
#include <stdio.h>
#include <sanitizer/coverage_interface.h>

// This callback is inserted by the compiler as a module constructor
// into every DSO. 'start' and 'stop' correspond to the
// beginning and end of the section with the guards for the entire
// binary (executable or DSO). The callback will be called at least
// once per DSO and may be called multiple times with the same parameters.
//編譯器將此回調作爲模塊構造函數插入到每個dso中。“開始”和“停止”對應於節的開頭和結尾,並帶有整個二進制文件(可執行文件或DSO)的保護。每個dso至少調用一次回調,並且可以使用相同的參數多次調用。
extern "C" void __sanitizer_cov_trace_pc_guard_init(uint32_t *start,
                                                    uint32_t *stop) {
  static uint64_t N;  // Counter for the guards.
  if (start == stop || *start) return;  // Initialize only once.初始化一次
  printf("INIT: %p %p\n", start, stop);
  for (uint32_t *x = start; x < stop; x++)
    *x = ++N;  // Guards should start from 1.
}

// This callback is inserted by the compiler on every edge in the
// control flow (some optimizations apply).
// Typically, the compiler will emit the code like this:
//    if(*guard)
//      __sanitizer_cov_trace_pc_guard(guard);
// But for large functions it will emit a simple call:
//    __sanitizer_cov_trace_pc_guard(guard);
//此回調由編譯器在控制流的每一條邊上插入(應用某些優化)。通常,編譯器會發出如下代碼:
//if(*guard)
//  __sanitizer_cov_trace_pc_guard(guard);
//但對於大型函數,它將發出一個簡單的調用:
//  __sanitizer_cov_trace_pc_guard(guard);
extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
  if (!*guard) return;  // Duplicate the guard check.重複警衛檢查。
  // If you set *guard to 0 this code will not be called again for this edge.
  // Now you can get the PC and do whatever you want:
  //   store it somewhere or symbolize it and print right away.
  // The values of `*guard` are as you set them in
  // __sanitizer_cov_trace_pc_guard_init and so you can make them consecutive
  // and use them to dereference an array or a bit vector.
  //如果將*guard設置爲0,則不會爲此邊緣再次調用此代碼。
  //現在你可以得到PC,做任何你想做的事:把它儲存在某處或象徵它,並立即打印。
  //`*guard`的值與您在__sanitizer_cov_trace_pc_guard_init中設置的值相同,因此您可以使它們連續,並使用它們取消對數組或位向量的引用。
  void *PC = __builtin_return_address(0);
  char PcDescr[1024];
  // This function is a part of the sanitizer run-time.
  // To use it, link with AddressSanitizer or other sanitizer.
  //此函數是消毒劑運行時的一部分。
  //要使用它,請鏈接AddressSanitizer或其他sanitizer。
  __sanitizer_symbolize_pc(PC, "%p %F %L", PcDescr, sizeof(PcDescr));
  printf("guard: %p %x PC %s\n", guard, *guard, PcDescr);
}
// trace-pc-guard-example.cc
int sub() {
	int d=9-5;
	return d;}
int foo() {
	int c=sub()+5;
	return c;}
int main() {
	int f=foo();
	return 0;
}
clang++ -g  -fsanitize-coverage=trace-pc-guard trace-pc-guard-example.cc -c
clang++ trace-pc-guard-cb.cc trace-pc-guard-example.o -fsanitize=address
ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out
ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out
INIT: 0x530c50 0x530c5c
guard: 0x530c58 3 PC 0x4f86e6 in main trace-pc-guard-example.cc:7
guard: 0x530c54 2 PC 0x4f86b6 in foo() trace-pc-guard-example.cc:4
guard: 0x530c50 1 PC 0x4f8686 in sub() trace-pc-guard-example.cc:1

3.Inline 8bit-counters

實驗性的,將來可能改變或消失

如果-fsanitize-coverage=inline-8bit-counters,編譯器將在每個邊緣插入內聯計數器增量。這類似於-fsanitize-coverage=trace-pc-guard,但檢測只是增加一個計數器,而不是回調。

用戶需要實現一個函數來捕獲啓動時的計數器。

extern "C"
void __sanitizer_cov_8bit_counters_init(char *start, char *end) {
  // [start,end) is the array of 8-bit counters created for the current DSO.
  // Capture this array in order to read/modify the counters.
//[start,end)是爲當前DSO創建的8位計數器數組。捕獲此數組以讀取/修改計數器。
}

4.PC-Table

實驗性的,將來可能改變或消失

注意:對於lld以外的鏈接器,此檢測可能與死代碼剝離(-wl,-gc段)不兼容,從而導致顯著的二進制大小開銷。有關更多信息,請參閱Bug 34636。

使用-fsanitize-coverage=pc-table,編譯器將創建一個檢測的pc的表。需要-fsanitize-coverage=inline-8bit-counters或-fsanitize-coverage=trace-pc-guard。

用戶需要實現一個函數來在啓動時捕獲PC表:

extern "C"
void __sanitizer_cov_pcs_init(const uintptr_t *pcs_beg,
                              const uintptr_t *pcs_end) {
  // [pcs_beg,pcs_end) is the array of ptr-sized integers representing
  // pairs [PC,PCFlags] for every instrumented block in the current DSO.
  // Capture this array in order to read the PCs and their Flags.
  // The number of PCs and PCFlags for a given DSO is the same as the number
  // of 8-bit counters (-fsanitize-coverage=inline-8bit-counters) or
  // trace_pc_guard callbacks (-fsanitize-coverage=trace-pc-guard)
  // A PCFlags describes the basic block:
  //  * bit0: 1 if the block is the function entry block, 0 otherwise.
  //[pcs-beg,pcs-end)是當前dso中每個檢測塊的ptr大小的整數數組,表示對[PC,PCFlags]。
  //捕獲此陣列以讀取PC及其標誌。
  //給定dso的pc和pcflags的數量與8位計數器的數量相同(-fsanitize-coverage=inline-8bit-counters)或trace-pc-guard回調(-fsanitize-coverage=trace-pc-guard)
  //PCFlags描述基本塊:
  //*bit0:1如果塊是函數輸入塊,則爲0。
}

舉個例子,我們可以藉助上面的一些函數完成對程序運行時信息收集(即如何完成程序覆蓋率的計算)

//foo.cc
#include<iostream>
#include<string>
int add(int i,int j)
{
	return i+j;
}
int main()
{
	std::string s;
	std::string s1="abcdefghijik";
	int i;
	std::cin>>s;
	if(s==s1){
		i=add(3,5);
	}
	else{
		std::cout<<"wrong"<<std::endl;
	}
	return 0;
}

 

#include <stdint.h>
#include <stdio.h>
#include <sanitizer/coverage_interface.h>
#include <assert.h>
#include <vector>
#define ATTRIBUTE_INTERFACE __declspec(dllexport)
#define ATTRIBUTE_INTERFACE __attribute__((visibility("default")))
struct Module {
	uint32_t *Start, *Stop;
};

static const size_t kNumPCs = 1 << 21;
uint8_t __sancov_trace_pc_guard_8bit_counters[kNumPCs];
uintptr_t __sancov_trace_pc_pcs[kNumPCs];
Module Modules[4096];
size_t NumModules=0;  // linker-initialized.
size_t NumGuards=0;  // linker-initialized.
uint8_t *Counterss() {
	return __sancov_trace_pc_guard_8bit_counters;
}
uintptr_t *PCs(){
	return __sancov_trace_pc_pcs;
}
size_t GetNumPCs() { return kNumPCs<NumGuards + 1?kNumPCs:NumGuards + 1; }
//std::vector<uintptr_t> PCsCopy(GetNumPCs());
uintptr_t *PCs();
uintptr_t GetPC(size_t Idx) {
	assert(Idx < GetNumPCs());
	return PCs()[Idx];
}
size_t GetTotalPCCoverage() {
	size_t Res = 0;
	for (size_t i = 1, N = GetNumPCs(); i < N; i++)
		if (PCs()[i])
      Res++;
  return Res;
}
//ATTRIBUTE_INTERFACE
extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *Guard) {
	uintptr_t PC = reinterpret_cast<uintptr_t>(__builtin_return_address(0));
	uint32_t Idx = *Guard;
	__sancov_trace_pc_pcs[Idx] = PC;
	__sancov_trace_pc_guard_8bit_counters[Idx]++;
	//size_t NumFeatures = CollectFeatures([&](size_t Feature) -> bool {return Feature%3;});
	printf("GetTotalPCCoverage() is %zu\n",GetTotalPCCoverage());
	//GetNumPCs
}
extern "C" void __sanitizer_cov_trace_pc_guard_init(uint32_t *Start, uint32_t *Stop) {
	if (Start == Stop || *Start) return;
		assert(NumModules < sizeof(Modules) / sizeof(Modules[0]));
	for (uint32_t *P = Start; P < Stop; P++) {
		NumGuards++;
		if (NumGuards == kNumPCs) {
			printf(
			"WARNING: The binary has too many instrumented PCs.\n"
			"         You may want to reduce the size of the binary\n"
			"         for more efficient fuzzing and precise coverage data\n");}
		*P = NumGuards % kNumPCs;
	}
	Modules[NumModules].Start = Start;
	Modules[NumModules].Stop = Stop;
	NumModules++;
}

運行結果如下所示:

# clang++ -g  -fsanitize-coverage=trace-pc-guard,inline-8bit-counters,pc-table,trace-cmp,func foo.cc -c
# clang++ san.cc foo.o -fsanitize=address -o a
# ./a
GetTotalPCCoverage() is 1
GetTotalPCCoverage() is 2
GetTotalPCCoverage() is 3
aaaaaaaaaaaaaaaaa
GetTotalPCCoverage() is 4
wrong

 

5.Tracing PCs

當-fsanitize-coverage=trace-pc時,編譯器將在每個邊上插入 __sanitizer_cov_trace_pc()。在每個間接調用中都會插入一個附加的 ...=trace-pc,indirect-calls標誌__sanitizer_cov_trace_pc_indirect(void *callee)。這些回調不是在Sanitizer運行時實現的,應該由用戶定義。此機制用於模糊化Linux內核(https://github.com/google/syzkaller)。

6.Instrumentation points

  • 邊(默認):邊被檢測(見下文)。
  • BB:基本塊被檢測。
  • 函數:只檢測每個函數的入口塊。

將這些標誌與trace-pc-guard或trace-pc一起使用,如下所示: -fsanitize-coverage=func,trace-pc-guard

當使用edge或bb時,如果這種檢測被認爲是多餘的,則某些邊/塊可能仍然沒有被檢測(修剪)。使用無修剪(例如-fsanitize coverage=bb,no-prune,trace-pc-guard)禁用修剪。這可能有助於更好的覆蓋可視化。

7.Edge coverage

思考如下代碼

void foo(int *a) {
  if (a)
    *a = 0;
}

它包含3個基本塊,我們將它們命名爲a、b、c:

A
|\
| \
|  B
| /
|/
C

如果塊a、b和c都被覆蓋了,我們肯定邊a=>b和b=>c都被執行了,但是我們仍然不知道邊a=>c是否被執行了。這種控制流圖的邊稱爲臨界邊。邊緣級覆蓋通過引入新的虛擬塊來簡單地分割所有關鍵邊緣,然後插入這些塊:

A
|\
| \
D  B
| /
|/
C

8.Tracing data flow

支持數據流引導的fuzz。使用-fsanitize-coverage=trace-cmp,編譯器將在比較指令和switch語句周圍插入額外的檢測。類似地,使用-fsanitize-coverage=trace-div編譯器將插入整數除法指令(以捕獲除法的正確參數),使用 -fsanitize-coverage=trace-gep–llvm gep指令(以捕獲數組索引)。

除非提供no-prune選項,否則不會檢測某些比較指令。

// Called before a comparison instruction.
// Arg1 and Arg2 are arguments of the comparison.
//在比較指令之前調用。
//arg1和arg2是比較的參數。
void __sanitizer_cov_trace_cmp1(uint8_t Arg1, uint8_t Arg2);
void __sanitizer_cov_trace_cmp2(uint16_t Arg1, uint16_t Arg2);
void __sanitizer_cov_trace_cmp4(uint32_t Arg1, uint32_t Arg2);
void __sanitizer_cov_trace_cmp8(uint64_t Arg1, uint64_t Arg2);

// Called before a comparison instruction if exactly one of the arguments is constant.
// Arg1 and Arg2 are arguments of the comparison, Arg1 is a compile-time constant.
// These callbacks are emitted by -fsanitize-coverage=trace-cmp since 2017-08-11
//如果恰好有一個參數是常量,則在比較指令之前調用。
//arg1和arg2是比較的參數,arg1是編譯時常量。
//這些回調是由-fsanitize-coverage=trace-cmp從2017-08-11發出的
void __sanitizer_cov_trace_const_cmp1(uint8_t Arg1, uint8_t Arg2);
void __sanitizer_cov_trace_const_cmp2(uint16_t Arg1, uint16_t Arg2);
void __sanitizer_cov_trace_const_cmp4(uint32_t Arg1, uint32_t Arg2);
void __sanitizer_cov_trace_const_cmp8(uint64_t Arg1, uint64_t Arg2);

// Called before a switch statement.
// Val is the switch operand.
// Cases[0] is the number of case constants.
// Cases[1] is the size of Val in bits.
// Cases[2:] are the case constants.
//在switch語句之前調用。
//val是開關操作數。
//cases[0]是case常量的數目。
//cases[1]是以位爲單位的val的大小。
//cases[2:]是case常量。
void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);

// Called before a division statement.
// Val is the second argument of division.
//在division語句之前調用。
//val是除法的第二個參數。
void __sanitizer_cov_trace_div4(uint32_t Val);
void __sanitizer_cov_trace_div8(uint64_t Val);

// Called before a GetElemementPtr (GEP) instruction
// for every non-constant array index.
//在getelemementptr(gep)指令之前調用
//對於每個非常量數組索引。
void __sanitizer_cov_trace_gep(uintptr_t Idx);

舉個例子 

//foo.cc
#include<iostream>
#include<string>
int add(int i,int j)
{
	return i+j;
}
int main()
{
	std::string s;
	int i;
	std::cin>>s;
	if(s[0]=='w'){
		i=add(3,5);
	}
	else{
		std::cout<<"wrong"<<std::endl;
	}
	return 0;
}
#include <stdint.h>
#include <stdio.h>
#include <sanitizer/coverage_interface.h>
extern "C" void __sanitizer_cov_trace_const_cmp4(uint32_t Arg1, uint32_t Arg2)
{
	uintptr_t PC = reinterpret_cast<uintptr_t>(__builtin_return_address(0));
	printf("cmp4PC is %lu,Arg1 is %u,Arg2 is %u\n",PC,Arg1,Arg2);
}

 運行結果如下:
 

# clang++ -g  -fsanitize-coverage=trace-pc-guard,inline-8bit-counters,pc-table,trace-cmp foo.cc -c
# clang++ san.cc foo.o -fsanitize=address
# ./a.out 
qqqqqqqqqqqqqqq
cmp4PC is 5211447,Arg1 is 119,Arg2 is 113
wrong

9.Default implementation

消毒劑運行時(addresssanitizer、memorysanizer等)提供了一些覆蓋率回調的默認實現。您可以使用此實現在進程出口將覆蓋率轉儲到磁盤上。

例子:

//cov.cc
#include<stdio.h>
__attribute__((noinline))
void foo(){printf("foo\n");}
int main(int argc,char **argv)
{
	if(argc==2)
	{
		foo();
	}
	printf("main\n");
}
% clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=trace-pc-guard
% ASAN_OPTIONS=coverage=1 ./a.out; wc -c *.sancov
main
SanitizerCoverage: ./a.out.7312.sancov 2 PCs written
24 a.out.7312.sancov
% ASAN_OPTIONS=coverage=1 ./a.out foo ; wc -c *.sancov
foo
main
SanitizerCoverage: ./a.out.7316.sancov 3 PCs written
24 a.out.7312.sancov
32 a.out.7316.sancov

每次運行使用sanitizercoverage檢測的可執行文件時,都會在進程關閉期間創建一個*.sancov文件。如果可執行文件與插入指令的DSO動態鏈接,則還將爲每個DSO創建一個*.sancov文件。

10.Sancov data format

*.sancov文件的格式非常簡單:前8個字節是magic,0xc0bffffffffffff64和0xc0bffffffffffffffff32之一。魔術的最後一個字節定義了以下偏移量的大小。其餘的數據是運行期間執行的相應二進制/dso中的偏移量。

11.Sancov Tool

提供了一個簡單的sancov工具來處理覆蓋率文件。該工具是llvm項目的一部分,目前僅在linux上受支持。它可以自主地處理符號化任務,而無需環境的任何額外支持。您需要傳遞.sancov文件(名爲<module\u name><pid>.sancov)和所有對應的二進制elf文件的路徑。sancov使用模塊名和二進制文件名來匹配這些文件。

12.Coverage Reports

實驗

.sancov文件包含的信息不足,無法生成源級別的覆蓋率報告。缺少的信息包含在二進制文件的調試信息中。因此,必須對.sancov進行符號化,才能首先生成.symcov文件:

sancov -symbolize my_program.123.sancov my_program > my_program.123.symcov

通過運行將啓動http服務器的tools/sancov/coverage-report-server.py腳本,可以在源代碼上覆蓋瀏覽.symcov文件。

13.Output directory

默認情況下,.sancov文件是在當前工作目錄中創建的。這可以通過ASAN_OPTIONS=coverage_dir=/path更改:

% ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
% ls -l /tmp/cov/*sancov
-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
-rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章