Leveldb源碼分析--6

5 操作Log 2

5.3 讀日誌

日誌讀取顯然比寫入要複雜，要檢查checksum，檢查是否有損壞等等，處理各種錯誤。

5.3.1 類層次

先來看看讀取涉及到的類圖，如圖5.3-1。

Reader主要用到了兩個接口，一個是彙報錯誤的Reporter，另一個是log文件讀取類SequentialFile。

> Reporter的接口只有一個：void Corruption(size_t bytes,const Status& status);

> SequentialFile有兩個接口：

Status Read(size_t n, Slice* result, char* scratch);

Status Skip(uint64_t n);

說明下，Read接口有一個*result參數傳遞結果就行了，爲何還有一個*scratch呢，這個就和Slice相關了。它的字符串指針是傳入的外部char*指針，自己並不負責內存的管理與分配。因此Read接口需要調用者提供一個字符串指針，實際存放字符串的地方。

圖5.3-1

Reader類有幾個成員變量，需要注意：

bool eof_; // 上次Read()返回長度< kBlockSize，暗示到了文件結尾EOF

uint64_t last_record_offset_; // 函數ReadRecord返回的上一個record的偏移

uint64_t end_of_buffer_offset_; // 當前的讀取偏移

uint64_t const initial_offset_; // 偏移，從哪裏開始讀取第一條record

Slice buffer_; // 讀取的內容

5.3.2日誌讀取流程

Reader只有一個接口，那就是ReadRecord，下面來分析下這個函數。

S1 根據initial offset跳轉到調用者指定的位置，開始讀取日誌文件。跳轉就是直接調用SequentialFile的Seek接口。

另外，需要先調整調用者傳入的initialoffset參數，調整和跳轉邏輯在SkipToInitialBlock函數中。

if (last_record_offset_ <initial_offset_) { // 當前偏移 < 指定的偏移，需要Seek

if (!SkipToInitialBlock()) returnfalse;

}

下面的代碼是SkipToInitialBlock函數調整read offset的邏輯：

// 計算在block內的偏移位置，並圓整到開始讀取block的起始位置

size_t offset_in_block =initial_offset_ % kBlockSize;

uint64_t block_start_location =initial_offset_ - offset_in_block;

// 如果偏移在最後的6byte裏，肯定不是一條完整的記錄，跳到下一個block

if (offset_in_block >kBlockSize - 6) {

offset_in_block = 0;

block_start_location +=kBlockSize;

}

end_of_buffer_offset_ =block_start_location; // 設置讀取偏移

if (block_start_location > 0) file_->Skip(block_start_location); // 跳轉

首先計算出在block內的偏移位置，然後圓整到要讀取block的起始位置。開始讀取日誌的時候都要保證讀取的是完整的block，這就是調整的目的。

同時成員變量end_of_buffer_offset_記錄了這個值，在後續讀取中會用到。

S2在開始while循環前首先初始化幾個標記：

// 當前是否在fragment內，也就是遇到了FIRST 類型的record

bool in_fragmented_record = false;

uint64_t prospective_record_offset = 0; // 我們正在讀取的邏輯record的偏移

S3 進入到while(true)循環，直到讀取到KLastType或者KFullType的record，或者到了文件結尾。從日誌文件讀取完整的record是ReadPhysicalRecord函數完成的。

讀取出現錯誤時，並不會退出循環，而是彙報錯誤，繼續執行，直到成功讀取一條user record，或者遇到文件結尾。

S3.1 從文件讀取record

uint64_t physical_record_offset = end_of_buffer_offset_ -buffer_.size();

const unsigned int record_type = ReadPhysicalRecord(&fragment);

physical_record_offset存儲的是當前正在讀取的record的偏移值。接下來根據不同的record_type類型，分別處理，一共有7種情況：

S3.2 FULL type(kFullType)，表明是一條完整的log record，成功返回讀取的user record數據。另外需要對早期版本做些work around，早期的Leveldb會在block的結尾生產一條空的kFirstType log record。

if (in_fragmented_record) {

if (scratch->empty())in_fragmented_record = false;

else ReportCorruption(scratch->size(),"partial record without end(1)");

}

prospective_record_offset= physical_record_offset;

scratch->clear(); // 清空scratch，讀取成功不需要返回scratch數據

*record = fragment;

last_record_offset_ =prospective_record_offset; // 更新last record offset

return true;

S3.3 FIRST type(kFirstType)，表明是一系列logrecord(fragment)的第一個record。同樣需要對早期版本做work around。

把數據讀取到scratch中，直到成功讀取了LAST類型的log record，才把數據返回到result中，繼續下次的讀取循環。

如果再次遇到FIRSTor FULL類型的log record，如果scratch不爲空，就說明日誌文件有錯誤。

if (in_fragmented_record) {

if (scratch->empty())in_fragmented_record = false;

else ReportCorruption(scratch->size(),"partial record without end(2)");

}

prospective_record_offset =physical_record_offset;

scratch->assign(fragment.data(), fragment.size()); //賦值給scratch

in_fragmented_record =true; // 設置fragment標記爲true

S3.4 MIDDLE type(kMiddleType)，這個處理很簡單，如果不是在fragment中，報告錯誤，否則直接append到scratch中就可以了。

if (!in_fragmented_record){

ReportCorruption(fragment.size(),"missing start of fragmentedrecord(1)");

}else {scratch->append(fragment.data(),fragment.size());}

S3.5 LAST type(kLastType)，說明是一系列log record(fragment)中的最後一條。如果不在fragment中，報告錯誤。

if (!in_fragmented_record) {

ReportCorruption(fragment.size(),"missing start of fragmentedrecord(2)");

} else {

scratch->append(fragment.data(), fragment.size());

*record = Slice(*scratch);

last_record_offset_ =prospective_record_offset;

return true;

}

至此，4種正常的log record type已經處理完成，下面3種情況是其它的錯誤處理，類型聲明在Logger類中：

enum {

kEof = kMaxRecordType + 1, // 遇到文件結尾

// 非法的record，當前有3中情況會返回bad record：

// * CRC校驗失敗 (ReadPhysicalRecord reports adrop)

// * 長度爲0 (No drop is reported)

// * 在指定的initial_offset之外 (No drop is reported)

kBadRecord = kMaxRecordType +2

};

S3.6 遇到文件結尾kEof，返回false。不返回任何結果。

if (in_fragmented_record) {

ReportCorruption(scratch->size(), "partial record withoutend(3)");

scratch->clear();

}

return false;

S3.7 非法的record(kBadRecord)，如果在fragment中，則報告錯誤。

if (in_fragmented_record){

ReportCorruption(scratch->size(), "error in middle ofrecord");

in_fragmented_record = false;

scratch->clear();

}

S3.8 缺省分支，遇到非法的record 類型，報告錯誤，清空scratch。

ReportCorruption(…, "unknownrecord type %u", record_type);

in_fragmented_record = false; // 重置fragment標記

scratch->clear();// 清空scratch

上面就是ReadRecord的全部邏輯，解釋起來還有些費力。

5.3.3 從log文件讀取record

就是前面講過的ReadPhysicalRecord函數，它調用SequentialFile的Read接口，從文件讀取數據。

該函數開始就進入了一個while(true)循環，其目的是爲了讀取到一個完整的record。讀取的內容存放在成員變量buffer_中。這樣的邏輯有些奇怪，實際上，完全不需要一個while(true)循環的。

函數基本邏輯如下：

S1 如果buffer_小於block header大小kHeaderSize，進入如下的幾個分支：

S1.1 如果eof_爲false，表明還沒有到文件結尾，清空buffer，並讀取數據。

buffer_.clear(); // 因爲上次肯定讀取了一個完整的record

Status status =file_->Read(kBlockSize, &buffer_, backing_store_);

end_of_buffer_offset_ +=buffer_.size(); // 更新buffer讀取偏移值

if (!status.ok()) { // 讀取失敗，設置eof_爲true，報告錯誤並返回kEof

buffer_.clear();

ReportDrop(kBlockSize,status);

eof_ = true;

return kEof;

} else if (buffer_.size()< kBlockSize){

eof_ = true; // 實際讀取字節<指定(Block Size)，表明到了文件結尾

}

continue; // 繼續下次循環

S1.2 如果eof_爲true並且buffer爲空，表明已經到了文件結尾，正常結束，返回kEof。

S1.3 否則，也就是eof_爲true，buffer不爲空，說明文件結尾包含了一個不完整的record，報告錯誤，返回kEof。

size_t drop_size =buffer_.size();

buffer_.clear();

ReportCorruption(drop_size,"truncated record at end of file");

return kEof;

S2 進入到這裏表明上次循環中的Read讀取到了一個完整的log record，continue後的第二次循環判斷buffer_.size() >= kHeaderSize將執行到此處。

解析出log record的header部分，判斷長度是否一致。

根據log的格式，前4byte是crc32。後面就是length和type，解析如下：

const char* header = buffer_.data();

const uint32_t length = ((header[4])& 0xff) | ((header[5]&0xff)<<8)

const uint32_t type = header[6];

if (kHeaderSize + length >buffer_.size()) { // 長度超出了，彙報錯誤

size_t drop_size =buffer_.size();

buffer_.clear();

ReportCorruption(drop_size,"bad record length");

return kBadRecord; // 返回kBadRecord

}

if (type == kZeroType&& length == 0) { // 對於Zero Type類型，不彙報錯誤

buffer_.clear();

return kBadRecord; // 依然返回kBadRecord

}

S3 校驗CRC32，如果校驗出錯，則彙報錯誤，並返回kBadRecord。

S4 如果record的開始位置在initial offset之前，則跳過，並返回kBadRecord，否則返回record數據和type。

buffer_.remove_prefix(kHeaderSize+ length);

if (end_of_buffer_offset_ -buffer_.size() - kHeaderSize - length < initial_offset_) {

result->clear();

return kBadRecord;

}

*result = Slice(header +kHeaderSize, length);

return type;

從log文件讀取record的邏輯就是這樣的。至此，讀日誌的邏輯也完成了。接下來將進入磁盤存儲的sstable部分。

Leveldb源碼分析--6

5 操作Log 2

5.3 讀日誌

5.3.1 類層次

5.3.2日誌讀取流程

5.3.3 從log文件讀取record

如何使用 JS 判斷用戶是否處於活躍狀態

Mono 支持LoongArch架構

lightdb秒級增加列和刪除列（not null帶默認值）

lightdb數據庫超時相關控制參數

通過HPA+CronHPA組合應對業務複雜彈性伸縮場景

❤️‍🔥 Solon Cloud Event 新的事務特性與應用

網絡爬蟲的祕密：如何高效地抓取JD.com視頻鏈接

lightdb mysql 8.0兼容之不可見主鍵

使用 JS 實現在瀏覽器控制檯打印圖片 console.image()

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（四）使用域名訪問網站應用

oceanbase之RootServer（三）

Leveldb源碼分析--7

Leveldb源碼分析--6

Leveldb源碼分析--3

Leveldb源碼分析--10

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結