日誌(Log)是日誌段(Log Segment)的容器,裏面定義了很多管理日誌段的操作。
Log 源碼結構
Log 源碼位於 Kafka core 工程的 log 源碼包下,文件名是 Log.scala
Log Class & Object
Log Obj:
object Log {
val LogFileSuffix = ".log"
val IndexFileSuffix = ".index"
val TimeIndexFileSuffix = ".timeindex"
val ProducerSnapshotFileSuffix = ".snapshot"
val TxnIndexFileSuffix = ".txnindex"
val DeletedFileSuffix = ".deleted"
val CleanedFileSuffix = ".cleaned"
val SwapFileSuffix = ".swap"
val CleanShutdownFile = ".kafka_cleanshutdown"
val DeleteDirSuffix = "-delete"
val FutureDirSuffix = "-future"
}
這是 Log Object 定義的所有常量。耳熟能詳的.log、.index、.timeindex 和.txnindex 都在裏面。介紹幾種其他文件類型:
- .snapshot 是 Kafka 爲冪等型或事務型 Producer 所做的快照文件。
- .deleted 是刪除日誌段操作創建的文件。目前刪除日誌段文件是異步操作,Broker 端把日誌段文件從.log 後綴修改爲.deleted 後綴。如果你看到一大堆.deleted 後綴的文件名,別慌,這是 Kafka 在執行日誌段文件刪除。
- .cleaned 和.swap 都是 Compaction 操作的產物。
- -delete 則是應用於文件夾的。當你刪除一個主題的時候,主題的分區文件夾會被加上這個後綴。
- -future 是用於變更主題分區文件夾地址的,屬於比較高階的用法。
def filenamePrefixFromOffset(offset: Long): String = {
val nf = NumberFormat.getInstance()
nf.setMinimumIntegerDigits(20)
nf.setMaximumFractionDigits(0)
nf.setGroupingUsed(false)
nf.format(offset)
}
這個方法的作用是通過給定的位移值計算出對應的日誌段文件名。Kafka 日誌文件固定是 20 位的長度,此方法就是用前面補 0 的方式,把給定位移值擴充成一個固定 20 位長度的字符串。
舉個例子,我們給定一個位移值是 12345,那麼 Broker 端磁盤上對應的日誌段文件名就應該是 00000000000000012345.log。
Log Class:
class Log(@volatile var dir: File,
@volatile var config: LogConfig,
@volatile var logStartOffset: Long,
@volatile var recoveryPoint: Long,
scheduler: Scheduler,
brokerTopicStats: BrokerTopicStats,
val time: Time,
val maxProducerIdExpirationMs: Int,
val producerIdExpirationCheckIntervalMs: Int,
val topicPartition: TopicPartition,
val producerStateManager: ProducerStateManager,
logDirFailureChannel: LogDirFailureChannel) extends Logging with KafkaMetricsGroup {
……
}
dir 和 logStartOffset 是最重要的屬性。dir 就是這個日誌所在的文件夾路徑,也就是主題分區的路徑。而 logStartOffset,表示日誌的當前最早位移。dir 和 logStartOffset 都是 volatile var 類型,表示它們的值是變動的,而且可能被多個線程更新。
Log類常提到的有LEO和HW,用圖來描述下:
日誌的當前末端位移,也就是 Log End Offset(LEO),它是表示日誌下一條待插入消息的位移值,而 Log Start Offset 是跟它相反的,它表示日誌當前對外可見的最早一條消息的位移值。Log Start Offset 之前的位移可能過期被截斷。
位移值 8 是高水位值(High Watermark),它是區分已提交消息和未提交消息的分水嶺。
Log類下其他重要的屬性:
@volatile private var nextOffsetMetadata: LogOffsetMetadata = _
@volatile private var highWatermarkMetadata: LogOffsetMetadata = LogOffsetMetadata(logStartOffset)
private val segments: ConcurrentNavigableMap[java.lang.Long, LogSegment] = new ConcurrentSkipListMap[java.lang.Long, LogSegment]
@volatile var leaderEpochCache: Option[LeaderEpochFileCache] = None
nextOffsetMetadata 可以 等同 LEO,下一條要插入的位移值。
highWatermarkMetadata,是分區日誌高水位值。
segments,這是 Log 類中非常重要的屬性。它保存了分區日誌下所有的日誌段信息,只不過是用 Map 的數據結構來保存的。Map 的 Key 值是日誌段的起始位移值,Value 則是日誌段對象本身。Kafka 源碼使用 ConcurrentNavigableMap 數據結構來保存日誌段對象。
Leader Epoch Cache 對象,主要是用來判斷出現 Failure 時是否執行日誌截斷操作(Truncation)。之前靠高水位來判斷的機制,可能會造成副本間數據不一致的情形。這裏的 Leader Epoch Cache 是一個緩存類數據,裏面保存了分區 Leader 的 Epoch 值與對應位移值的映射關係。
Log類的初始化:
locally {
val startMs = time.milliseconds
// create the log directory if it doesn't exist
Files.createDirectories(dir.toPath)
initializeLeaderEpochCache()
val nextOffset = loadSegments()
/* Calculate the offset of the next message */
nextOffsetMetadata = LogOffsetMetadata(nextOffset, activeSegment.baseOffset, activeSegment.size)
leaderEpochCache.foreach(_.truncateFromEnd(nextOffsetMetadata.messageOffset))
logStartOffset = math.max(logStartOffset, segments.firstEntry.getValue.baseOffset)
// The earliest leader epoch may not be flushed during a hard failure. Recover it here.
leaderEpochCache.foreach(_.truncateFromStart(logStartOffset))
// Any segment loading or recovery code must not use producerStateManager, so that we can build the full state here
// from scratch.
if (!producerStateManager.isEmpty)
throw new IllegalStateException("Producer state must be empty during log initialization")
loadProducerState(logEndOffset, reloadFromCleanShutdown = hasCleanShutdownFile)
info(s"Completed load of log with ${segments.size} segments, log start offset $logStartOffset and " +
s"log end offset $logEndOffset in ${time.milliseconds() - startMs}
主要邏輯用圖描述一下:
重點說下第三步,即加載日誌段的實現邏輯,以下是 loadSegments 的實現代碼:
private def loadSegments(): Long = {
// first do a pass through the files in the log directory and remove any temporary files
// and find any interrupted swap operations
val swapFiles = removeTempFilesAndCollectSwapFiles()
// Now do a second pass and load all the log and index files.
// We might encounter legacy log segments with offset overflow (KAFKA-6264). We need to split such segments. When
// this happens, restart loading segment files from scratch.
retryOnOffsetOverflow {
// In case we encounter a segment with offset overflow, the retry logic will split it after which we need to retry
// loading of segments. In that case, we also need to close all segments that could have been left open in previous
// call to loadSegmentFiles().
logSegments.foreach(_.close())
segments.clear()
loadSegmentFiles()
}
// Finally, complete any interrupted swap operations. To be crash-safe,
// log files that are replaced by the swap segment should be renamed to .deleted
// before the swap file is restored as the new segment file.
completeSwapOperations(swapFiles)
if (!dir.getAbsolutePath.endsWith(Log.DeleteDirSuffix)) {
val nextOffset = retryOnOffsetOverflow {
recoverLog()
}
// reset the index size of the currently active log segment to allow more entries
activeSegment.resizeIndexes(config.maxIndexSize)
nextOffset
} else {
if (logSegments.isEmpty) {
addSegment(LogSegment.open(dir = dir,
baseOffset = 0,
config,
time = time,
fileAlreadyExists = false,
initFileSize = this.initFileSize,
preallocate = false))
}
0
}
這段代碼會對分區日誌路徑遍歷兩次。
首先,它會移除上次 Failure 遺留下來的各種臨時文件(包括.cleaned、.swap、.deleted 文件等),removeTempFilesAndCollectSwapFiles 方法實現了這個邏輯。
之後,它會清空所有日誌段對象,並且再次遍歷分區路徑,重建日誌段 segments Map 並刪除無對應日誌段文件的孤立索引文件。待執行完這兩次遍歷之後,它會完成未完成的 swap 操作,即調用 completeSwapOperations 方法。
等這些都做完之後,再調用 recoverLog 方法恢復日誌段對象,然後返回恢復之後的分區日誌 LEO 值。
看下removeTempFilesAndCollectSwapFiles方法的實現:
private def removeTempFilesAndCollectSwapFiles(): Set[File] = {
// 在方法內部定義一個名爲deleteIndicesIfExist的方法,用於刪除日誌文件對應的索引文件
def deleteIndicesIfExist(baseFile: File, suffix: String = ""): Unit = {
info(s"Deleting index files with suffix $suffix for baseFile $baseFile")
val offset = offsetFromFile(baseFile)
Files.deleteIfExists(Log.offsetIndexFile(dir, offset, suffix).toPath)
Files.deleteIfExists(Log.timeIndexFile(dir, offset, suffix).toPath)
Files.deleteIfExists(Log.transactionIndexFile(dir, offset, suffix).toPath)
}
var swapFiles = Set[File]()
var cleanFiles = Set[File]()
var minCleanedFileOffset = Long.MaxValue
// 遍歷分區日誌路徑下的所有文件
for (file <- dir.listFiles if file.isFile) {
if (!file.canRead) // 如果不可讀,直接拋出IOException
throw new IOException(s"Could not read file $file")
val filename = file.getName
if (filename.endsWith(DeletedFileSuffix)) { // 如果以.deleted結尾
debug(s"Deleting stray temporary file ${file.getAbsolutePath}")
Files.deleteIfExists(file.toPath) // 說明是上次Failure遺留下來的文件,直接刪除
} else if (filename.endsWith(CleanedFileSuffix)) { // 如果以.cleaned結尾
minCleanedFileOffset = Math.min(offsetFromFileName(filename), minCleanedFileOffset) // 選取文件名中位移值最小的.cleaned文件,獲取其位移值,並將該文件加入待刪除文件集合中
cleanFiles += file
} else if (filename.endsWith(SwapFileSuffix)) { // 如果以.swap結尾
val baseFile = new File(CoreUtils.replaceSuffix(file.getPath, SwapFileSuffix, ""))
info(s"Found file ${file.getAbsolutePath} from interrupted swap operation.")
if (isIndexFile(baseFile)) { // 如果該.swap文件原來是索引文件
deleteIndicesIfExist(baseFile) // 刪除原來的索引文件
} else if (isLogFile(baseFile)) { // 如果該.swap文件原來是日誌文件
deleteIndicesIfExist(baseFile) // 刪除掉原來的索引文件
swapFiles += file // 加入待恢復的.swap文件集合中
}
}
}
// 從待恢復swap集合中找出那些起始位移值大於minCleanedFileOffset值的文件,直接刪掉這些無效的.swap文件
val (invalidSwapFiles, validSwapFiles) = swapFiles.partition(file => offsetFromFile(file) >= minCleanedFileOffset)
invalidSwapFiles.foreach { file =>
debug(s"Deleting invalid swap file ${file.getAbsoluteFile} minCleanedFileOffset: $minCleanedFileOffset")
val baseFile = new File(CoreUtils.replaceSuffix(file.getPath, SwapFileSuffix, ""))
deleteIndicesIfExist(baseFile, SwapFileSuffix)
Files.deleteIfExists(file.toPath)
}
// Now that we have deleted all .swap files that constitute an incomplete split operation, let's delete all .clean files
// 清除所有待刪除文件集合中的文件
cleanFiles.foreach { file =>
debug(s"Deleting stray .clean file ${file.getAbsolutePath}")
Files.deleteIfExists(file.toPath)
}
// 最後返回當前有效的.swap文件集合
validSwapFiles
}
執行完了 removeTempFilesAndCollectSwapFiles 邏輯之後,源碼開始清空已有日誌段集合,並重新加載日誌段文件。這就是第二步。這裏調用的主要方法是 loadSegmentFiles。
private def loadSegmentFiles(): Unit = {
// 按照日誌段文件名中的位移值正序排列,然後遍歷每個文件
for (file <- dir.listFiles.sortBy(_.getName) if file.isFile) {
if (isIndexFile(file)) { // 如果是索引文件
val offset = offsetFromFile(file)
val logFile = Log.logFile(dir, offset)
if (!logFile.exists) { // 確保存在對應的日誌文件,否則記錄一個警告,並刪除該索引文件
warn(s"Found an orphaned index file ${file.getAbsolutePath}, with no corresponding log file.")
Files.deleteIfExists(file.toPath)
}
} else if (isLogFile(file)) { // 如果是日誌文件
val baseOffset = offsetFromFile(file)
val timeIndexFileNewlyCreated = !Log.timeIndexFile(dir, baseOffset).exists()
// 創建對應的LogSegment對象實例,並加入segments中
val segment = LogSegment.open(dir = dir,
baseOffset = baseOffset,
config,
time = time,
fileAlreadyExists = true)
try segment.sanityCheck(timeIndexFileNewlyCreated)
catch {
case _: NoSuchFileException =>
error(s"Could not find offset index file corresponding to log file ${segment.log.file.getAbsolutePath}, " +
"recovering segment and rebuilding index files...")
recoverSegment(segment)
case e: CorruptIndexException =>
warn(s"Found a corrupted index file corresponding to log file ${segment.log.file.getAbsolutePath} due " +
s"to ${e.getMessage}}, recovering segment and rebuilding index files...")
recoverSegment(segment)
}
addSegment(segment)
}
}
}
第三步是處理第一步返回的有效.swap 文件集合。completeSwapOperations 方法就是做這件事的:
private def completeSwapOperations(swapFiles: Set[File]): Unit = {
// 遍歷所有有效.swap文件
for (swapFile <- swapFiles) {
val logFile = new File(CoreUtils.replaceSuffix(swapFile.getPath, SwapFileSuffix, "")) // 獲取對應的日誌文件
val baseOffset = offsetFromFile(logFile) // 拿到日誌文件的起始位移值
// 創建對應的LogSegment實例
val swapSegment = LogSegment.open(swapFile.getParentFile,
baseOffset = baseOffset,
config,
time = time,
fileSuffix = SwapFileSuffix)
info(s"Found log file ${swapFile.getPath} from interrupted swap operation, repairing.")
// 執行日誌段恢復操作
recoverSegment(swapSegment)
// We create swap files for two cases:
// (1) Log cleaning where multiple segments are merged into one, and
// (2) Log splitting where one segment is split into multiple.
// Both of these mean that the resultant swap segments be composed of the original set, i.e. the swap segment
// must fall within the range of existing segment(s). If we cannot find such a segment, it means the deletion
// of that segment was successful. In such an event, we should simply rename the .swap to .log without having to
// do a replace with an existing segment.
// 確認之前刪除日誌段是否成功,是否還存在老的日誌段文件
val oldSegments = logSegments(swapSegment.baseOffset, swapSegment.readNextOffset).filter { segment =>
segment.readNextOffset > swapSegment.baseOffset
}
// 如果存在,直接把.swap文件重命名成.log
replaceSegments(Seq(swapSegment), oldSegments.toSeq, isRecoveredSwapFile = true)
}
}
最後一步是 recoverLog 操作:
private def recoverLog(): Long = {
// if we have the clean shutdown marker, skip recovery
// 如果不存在以.kafka_cleanshutdown結尾的文件。通常都不存在
if (!hasCleanShutdownFile) {
// 獲取到上次恢復點以外的所有unflushed日誌段對象
val unflushed = logSegments(this.recoveryPoint, Long.MaxValue).toIterator
var truncated = false
// 遍歷這些unflushed日誌段
while (unflushed.hasNext && !truncated) {
val segment = unflushed.next
info(s"Recovering unflushed segment ${segment.baseOffset}")
val truncatedBytes =
try {
// 執行恢復日誌段操作
recoverSegment(segment, leaderEpochCache)
} catch {
case _: InvalidOffsetException =>
val startOffset = segment.baseOffset
warn("Found invalid offset during recovery. Deleting the corrupt segment and " +
s"creating an empty one with starting offset $startOffset")
segment.truncateTo(startOffset)
}
if (truncatedBytes > 0) { // 如果有無效的消息導致被截斷的字節數不爲0,直接刪除剩餘的日誌段對象
warn(s"Corruption found in segment ${segment.baseOffset}, truncating to offset ${segment.readNextOffset}")
removeAndDeleteSegments(unflushed.toList, asyncDelete = true)
truncated = true
}
}
}
// 這些都做完之後,如果日誌段集合不爲空
if (logSegments.nonEmpty) {
val logEndOffset = activeSegment.readNextOffset
if (logEndOffset < logStartOffset) { // 驗證分區日誌的LEO值不能小於Log Start Offset值,否則刪除這些日誌段對象
warn(s"Deleting all segments because logEndOffset ($logEndOffset) is smaller than logStartOffset ($logStartOffset). " +
"This could happen if segment files were deleted from the file system.")
removeAndDeleteSegments(logSegments, asyncDelete = true)
}
}
// 這些都做完之後,如果日誌段集合爲空了
if (logSegments.isEmpty) {
// 至少創建一個新的日誌段,以logStartOffset爲日誌段的起始位移,並加入日誌段集合中
addSegment(LogSegment.open(dir = dir,
baseOffset = logStartOffset,
config,
time = time,
fileAlreadyExists = false,
initFileSize = this.initFileSize,
preallocate = config.preallocate))
}
// 更新上次恢復點屬性,並返回
recoveryPoint = activeSegment.readNextOffset
recoveryPoint
最後這些接上個思維導圖總結下:
這篇具體是日誌如何加載日誌段的,那麼加載完後的怎麼操作呢?別走開,點個贊後請看下一篇。