1.Introduction

llvm內置了一個簡單的代碼覆蓋率檢測（sanitizercoverage）。它在函數級、基本塊級和邊緣級插入對用戶定義函數的調用。提供了這些回調的默認實現，並實現了簡單的覆蓋率報告和可視化，但是，如果您只需要覆蓋率可視化，則可能需要改用sourcebasedcodecoverage。

2.Tracing PCs with guards

使用-fsanitize coverage=trace pc guard，編譯器將在每個邊緣插入以下代碼：

__sanitizer_cov_trace_pc_guard(&guard_variable)

每個邊都有自己的保護變量（uint32）。

完成程序還將插入對模塊構造函數的調用：

// The guards are [start, stop).警衛在[start,stop)。
// This function will be called at least once per DSO and may be called.每個dso至少調用一次此函數，可以調用
// more than once with the same values of start/stop.多次使用相同的“開始/停止”值。
__sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop);

在每個間接調用中都會插入一個附加的 ...=trace-pc,indirect-calls標誌__sanitizer_cov_trace_pc_indirect(void *callee)。

函數__sanitizer_cov_trace_pc_*應由用戶定義。

例如：

#include <stdint.h>
#include <stdio.h>
#include <sanitizer/coverage_interface.h>

// This callback is inserted by the compiler as a module constructor
// into every DSO. 'start' and 'stop' correspond to the
// beginning and end of the section with the guards for the entire
// binary (executable or DSO). The callback will be called at least
// once per DSO and may be called multiple times with the same parameters.
//編譯器將此回調作爲模塊構造函數插入到每個dso中。“開始”和“停止”對應於節的開頭和結尾，並帶有整個二進制文件（可執行文件或DSO）的保護。每個dso至少調用一次回調，並且可以使用相同的參數多次調用。
extern "C" void __sanitizer_cov_trace_pc_guard_init(uint32_t *start,
                                                    uint32_t *stop) {
  static uint64_t N;  // Counter for the guards.
  if (start == stop || *start) return;  // Initialize only once.初始化一次
  printf("INIT: %p %p\n", start, stop);
  for (uint32_t *x = start; x < stop; x++)
    *x = ++N;  // Guards should start from 1.
}

// This callback is inserted by the compiler on every edge in the
// control flow (some optimizations apply).
// Typically, the compiler will emit the code like this:
//    if(*guard)
//      __sanitizer_cov_trace_pc_guard(guard);
// But for large functions it will emit a simple call:
//    __sanitizer_cov_trace_pc_guard(guard);
//此回調由編譯器在控制流的每一條邊上插入（應用某些優化）。通常，編譯器會發出如下代碼：
//if（*guard）
//  __sanitizer_cov_trace_pc_guard(guard);
//但對於大型函數，它將發出一個簡單的調用：
//  __sanitizer_cov_trace_pc_guard(guard);
extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
  if (!*guard) return;  // Duplicate the guard check.重複警衛檢查。
  // If you set *guard to 0 this code will not be called again for this edge.
  // Now you can get the PC and do whatever you want:
  //   store it somewhere or symbolize it and print right away.
  // The values of `*guard` are as you set them in
  // __sanitizer_cov_trace_pc_guard_init and so you can make them consecutive
  // and use them to dereference an array or a bit vector.
  //如果將*guard設置爲0，則不會爲此邊緣再次調用此代碼。
  //現在你可以得到PC，做任何你想做的事：把它儲存在某處或象徵它，並立即打印。
  //`*guard`的值與您在__sanitizer_cov_trace_pc_guard_init中設置的值相同，因此您可以使它們連續，並使用它們取消對數組或位向量的引用。
  void *PC = __builtin_return_address(0);
  char PcDescr[1024];
  // This function is a part of the sanitizer run-time.
  // To use it, link with AddressSanitizer or other sanitizer.
  //此函數是消毒劑運行時的一部分。
  //要使用它，請鏈接AddressSanitizer或其他sanitizer。
  __sanitizer_symbolize_pc(PC, "%p %F %L", PcDescr, sizeof(PcDescr));
  printf("guard: %p %x PC %s\n", guard, *guard, PcDescr);
}

// trace-pc-guard-example.cc
int sub() {
	int d=9-5;
	return d;}
int foo() {
	int c=sub()+5;
	return c;}
int main() {
	int f=foo();
	return 0;
}

clang++ -g  -fsanitize-coverage=trace-pc-guard trace-pc-guard-example.cc -c
clang++ trace-pc-guard-cb.cc trace-pc-guard-example.o -fsanitize=address
ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out

ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out

INIT: 0x530c50 0x530c5c
guard: 0x530c58 3 PC 0x4f86e6 in main trace-pc-guard-example.cc:7
guard: 0x530c54 2 PC 0x4f86b6 in foo() trace-pc-guard-example.cc:4
guard: 0x530c50 1 PC 0x4f8686 in sub() trace-pc-guard-example.cc:1

3.Inline 8bit-counters

實驗性的，將來可能改變或消失

如果-fsanitize-coverage=inline-8bit-counters，編譯器將在每個邊緣插入內聯計數器增量。這類似於-fsanitize-coverage=trace-pc-guard，但檢測只是增加一個計數器，而不是回調。

用戶需要實現一個函數來捕獲啓動時的計數器。

extern "C"
void __sanitizer_cov_8bit_counters_init(char *start, char *end) {
  // [start,end) is the array of 8-bit counters created for the current DSO.
  // Capture this array in order to read/modify the counters.
//[start，end)是爲當前DSO創建的8位計數器數組。捕獲此數組以讀取/修改計數器。
}

4.PC-Table

實驗性的，將來可能改變或消失

注意：對於lld以外的鏈接器，此檢測可能與死代碼剝離（-wl，-gc段）不兼容，從而導致顯著的二進制大小開銷。有關更多信息，請參閱Bug 34636。

使用-fsanitize-coverage=pc-table，編譯器將創建一個檢測的pc的表。需要-fsanitize-coverage=inline-8bit-counters或-fsanitize-coverage=trace-pc-guard。

用戶需要實現一個函數來在啓動時捕獲PC表：

extern "C"
void __sanitizer_cov_pcs_init(const uintptr_t *pcs_beg,
                              const uintptr_t *pcs_end) {
  // [pcs_beg,pcs_end) is the array of ptr-sized integers representing
  // pairs [PC,PCFlags] for every instrumented block in the current DSO.
  // Capture this array in order to read the PCs and their Flags.
  // The number of PCs and PCFlags for a given DSO is the same as the number
  // of 8-bit counters (-fsanitize-coverage=inline-8bit-counters) or
  // trace_pc_guard callbacks (-fsanitize-coverage=trace-pc-guard)
  // A PCFlags describes the basic block:
  //  * bit0: 1 if the block is the function entry block, 0 otherwise.
  //[pcs-beg，pcs-end）是當前dso中每個檢測塊的ptr大小的整數數組，表示對[PC,PCFlags]。
  //捕獲此陣列以讀取PC及其標誌。
  //給定dso的pc和pcflags的數量與8位計數器的數量相同（-fsanitize-coverage=inline-8bit-counters）或trace-pc-guard回調（-fsanitize-coverage=trace-pc-guard）
  //PCFlags描述基本塊：
  //*bit0:1如果塊是函數輸入塊，則爲0。
}

舉個例子，我們可以藉助上面的一些函數完成對程序運行時信息收集(即如何完成程序覆蓋率的計算）

//foo.cc
#include<iostream>
#include<string>
int add(int i,int j)
{
	return i+j;
}
int main()
{
	std::string s;
	std::string s1="abcdefghijik";
	int i;
	std::cin>>s;
	if(s==s1){
		i=add(3,5);
	}
	else{
		std::cout<<"wrong"<<std::endl;
	}
	return 0;
}

#include <stdint.h>
#include <stdio.h>
#include <sanitizer/coverage_interface.h>
#include <assert.h>
#include <vector>
#define ATTRIBUTE_INTERFACE __declspec(dllexport)
#define ATTRIBUTE_INTERFACE __attribute__((visibility("default")))
struct Module {
	uint32_t *Start, *Stop;
};

static const size_t kNumPCs = 1 << 21;
uint8_t __sancov_trace_pc_guard_8bit_counters[kNumPCs];
uintptr_t __sancov_trace_pc_pcs[kNumPCs];
Module Modules[4096];
size_t NumModules=0;  // linker-initialized.
size_t NumGuards=0;  // linker-initialized.
uint8_t *Counterss() {
	return __sancov_trace_pc_guard_8bit_counters;
}
uintptr_t *PCs(){
	return __sancov_trace_pc_pcs;
}
size_t GetNumPCs() { return kNumPCs<NumGuards + 1?kNumPCs:NumGuards + 1; }
//std::vector<uintptr_t> PCsCopy(GetNumPCs());
uintptr_t *PCs();
uintptr_t GetPC(size_t Idx) {
	assert(Idx < GetNumPCs());
	return PCs()[Idx];
}
size_t GetTotalPCCoverage() {
	size_t Res = 0;
	for (size_t i = 1, N = GetNumPCs(); i < N; i++)
		if (PCs()[i])
      Res++;
  return Res;
}
//ATTRIBUTE_INTERFACE
extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *Guard) {
	uintptr_t PC = reinterpret_cast<uintptr_t>(__builtin_return_address(0));
	uint32_t Idx = *Guard;
	__sancov_trace_pc_pcs[Idx] = PC;
	__sancov_trace_pc_guard_8bit_counters[Idx]++;
	//size_t NumFeatures = CollectFeatures([&](size_t Feature) -> bool {return Feature%3;});
	printf("GetTotalPCCoverage() is %zu\n",GetTotalPCCoverage());
	//GetNumPCs
}
extern "C" void __sanitizer_cov_trace_pc_guard_init(uint32_t *Start, uint32_t *Stop) {
	if (Start == Stop || *Start) return;
		assert(NumModules < sizeof(Modules) / sizeof(Modules[0]));
	for (uint32_t *P = Start; P < Stop; P++) {
		NumGuards++;
		if (NumGuards == kNumPCs) {
			printf(
			"WARNING: The binary has too many instrumented PCs.\n"
			"         You may want to reduce the size of the binary\n"
			"         for more efficient fuzzing and precise coverage data\n");}
		*P = NumGuards % kNumPCs;
	}
	Modules[NumModules].Start = Start;
	Modules[NumModules].Stop = Stop;
	NumModules++;
}

運行結果如下所示：

# clang++ -g  -fsanitize-coverage=trace-pc-guard,inline-8bit-counters,pc-table,trace-cmp,func foo.cc -c
# clang++ san.cc foo.o -fsanitize=address -o a
# ./a
GetTotalPCCoverage() is 1
GetTotalPCCoverage() is 2
GetTotalPCCoverage() is 3
aaaaaaaaaaaaaaaaa
GetTotalPCCoverage() is 4
wrong

5.Tracing PCs

當-fsanitize-coverage=trace-pc時，編譯器將在每個邊上插入 __sanitizer_cov_trace_pc()。在每個間接調用中都會插入一個附加的 ...=trace-pc,indirect-calls標誌__sanitizer_cov_trace_pc_indirect(void *callee)。這些回調不是在Sanitizer運行時實現的，應該由用戶定義。此機制用於模糊化Linux內核（https://github.com/google/syzkaller）。

6.Instrumentation points

邊（默認）：邊被檢測（見下文）。
BB：基本塊被檢測。
函數：只檢測每個函數的入口塊。

將這些標誌與trace-pc-guard或trace-pc一起使用，如下所示： -fsanitize-coverage=func,trace-pc-guard。

當使用edge或bb時，如果這種檢測被認爲是多餘的，則某些邊/塊可能仍然沒有被檢測（修剪）。使用無修剪（例如-fsanitize coverage=bb，no-prune，trace-pc-guard）禁用修剪。這可能有助於更好的覆蓋可視化。

7.Edge coverage

思考如下代碼

void foo(int *a) {
  if (a)
    *a = 0;
}

它包含3個基本塊，我們將它們命名爲a、b、c：

A
|\
| \
|  B
| /
|/
C

如果塊a、b和c都被覆蓋了，我們肯定邊a=>b和b=>c都被執行了，但是我們仍然不知道邊a=>c是否被執行了。這種控制流圖的邊稱爲臨界邊。邊緣級覆蓋通過引入新的虛擬塊來簡單地分割所有關鍵邊緣，然後插入這些塊：

A
|\
| \
D  B
| /
|/
C

8.Tracing data flow

支持數據流引導的fuzz。使用-fsanitize-coverage=trace-cmp，編譯器將在比較指令和switch語句周圍插入額外的檢測。類似地，使用-fsanitize-coverage=trace-div編譯器將插入整數除法指令（以捕獲除法的正確參數），使用 -fsanitize-coverage=trace-gep–llvm gep指令（以捕獲數組索引）。

除非提供no-prune選項，否則不會檢測某些比較指令。

// Called before a comparison instruction.
// Arg1 and Arg2 are arguments of the comparison.
//在比較指令之前調用。
//arg1和arg2是比較的參數。
void __sanitizer_cov_trace_cmp1(uint8_t Arg1, uint8_t Arg2);
void __sanitizer_cov_trace_cmp2(uint16_t Arg1, uint16_t Arg2);
void __sanitizer_cov_trace_cmp4(uint32_t Arg1, uint32_t Arg2);
void __sanitizer_cov_trace_cmp8(uint64_t Arg1, uint64_t Arg2);

// Called before a comparison instruction if exactly one of the arguments is constant.
// Arg1 and Arg2 are arguments of the comparison, Arg1 is a compile-time constant.
// These callbacks are emitted by -fsanitize-coverage=trace-cmp since 2017-08-11
//如果恰好有一個參數是常量，則在比較指令之前調用。
//arg1和arg2是比較的參數，arg1是編譯時常量。
//這些回調是由-fsanitize-coverage=trace-cmp從2017-08-11發出的
void __sanitizer_cov_trace_const_cmp1(uint8_t Arg1, uint8_t Arg2);
void __sanitizer_cov_trace_const_cmp2(uint16_t Arg1, uint16_t Arg2);
void __sanitizer_cov_trace_const_cmp4(uint32_t Arg1, uint32_t Arg2);
void __sanitizer_cov_trace_const_cmp8(uint64_t Arg1, uint64_t Arg2);

// Called before a switch statement.
// Val is the switch operand.
// Cases[0] is the number of case constants.
// Cases[1] is the size of Val in bits.
// Cases[2:] are the case constants.
//在switch語句之前調用。
//val是開關操作數。
//cases[0]是case常量的數目。
//cases[1]是以位爲單位的val的大小。
//cases[2:]是case常量。
void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);

// Called before a division statement.
// Val is the second argument of division.
//在division語句之前調用。
//val是除法的第二個參數。
void __sanitizer_cov_trace_div4(uint32_t Val);
void __sanitizer_cov_trace_div8(uint64_t Val);

// Called before a GetElemementPtr (GEP) instruction
// for every non-constant array index.
//在getelemementptr（gep）指令之前調用
//對於每個非常量數組索引。
void __sanitizer_cov_trace_gep(uintptr_t Idx);

舉個例子

//foo.cc
#include<iostream>
#include<string>
int add(int i,int j)
{
	return i+j;
}
int main()
{
	std::string s;
	int i;
	std::cin>>s;
	if(s[0]=='w'){
		i=add(3,5);
	}
	else{
		std::cout<<"wrong"<<std::endl;
	}
	return 0;
}

#include <stdint.h>
#include <stdio.h>
#include <sanitizer/coverage_interface.h>
extern "C" void __sanitizer_cov_trace_const_cmp4(uint32_t Arg1, uint32_t Arg2)
{
	uintptr_t PC = reinterpret_cast<uintptr_t>(__builtin_return_address(0));
	printf("cmp4PC is %lu,Arg1 is %u,Arg2 is %u\n",PC,Arg1,Arg2);
}

運行結果如下：

# clang++ -g  -fsanitize-coverage=trace-pc-guard,inline-8bit-counters,pc-table,trace-cmp foo.cc -c
# clang++ san.cc foo.o -fsanitize=address
# ./a.out 
qqqqqqqqqqqqqqq
cmp4PC is 5211447,Arg1 is 119,Arg2 is 113
wrong

9.Default implementation

消毒劑運行時（addresssanitizer、memorysanizer等）提供了一些覆蓋率回調的默認實現。您可以使用此實現在進程出口將覆蓋率轉儲到磁盤上。

例子：

//cov.cc
#include<stdio.h>
__attribute__((noinline))
void foo(){printf("foo\n");}
int main(int argc,char **argv)
{
	if(argc==2)
	{
		foo();
	}
	printf("main\n");
}

% clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=trace-pc-guard
% ASAN_OPTIONS=coverage=1 ./a.out; wc -c *.sancov
main
SanitizerCoverage: ./a.out.7312.sancov 2 PCs written
24 a.out.7312.sancov
% ASAN_OPTIONS=coverage=1 ./a.out foo ; wc -c *.sancov
foo
main
SanitizerCoverage: ./a.out.7316.sancov 3 PCs written
24 a.out.7312.sancov
32 a.out.7316.sancov

每次運行使用sanitizercoverage檢測的可執行文件時，都會在進程關閉期間創建一個*.sancov文件。如果可執行文件與插入指令的DSO動態鏈接，則還將爲每個DSO創建一個*.sancov文件。

10.Sancov data format

*.sancov文件的格式非常簡單：前8個字節是magic，0xc0bffffffffffff64和0xc0bffffffffffffffff32之一。魔術的最後一個字節定義了以下偏移量的大小。其餘的數據是運行期間執行的相應二進制/dso中的偏移量。

11.Sancov Tool

提供了一個簡單的sancov工具來處理覆蓋率文件。該工具是llvm項目的一部分，目前僅在linux上受支持。它可以自主地處理符號化任務，而無需環境的任何額外支持。您需要傳遞.sancov文件（名爲<module\u name><pid>.sancov）和所有對應的二進制elf文件的路徑。sancov使用模塊名和二進制文件名來匹配這些文件。

12.Coverage Reports

實驗

.sancov文件包含的信息不足，無法生成源級別的覆蓋率報告。缺少的信息包含在二進制文件的調試信息中。因此，必須對.sancov進行符號化，才能首先生成.symcov文件：

sancov -symbolize my_program.123.sancov my_program > my_program.123.symcov

通過運行將啓動http服務器的tools/sancov/coverage-report-server.py腳本，可以在源代碼上覆蓋瀏覽.symcov文件。

13.Output directory

默認情況下，.sancov文件是在當前工作目錄中創建的。這可以通過ASAN_OPTIONS=coverage_dir=/path更改：

% ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
% ls -l /tmp/cov/*sancov
-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
-rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov

clang 10 介紹——sanitizerCoverage