揭祕Kafka的硬盤設計方案,快速完成PB級數據擴容需求!

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"疫情期間,騰訊醫療爲全國人民提供了及時精準的疫情信息服務。騰訊雲kafka作爲騰訊醫療大數據架構中的關鍵組件。在面對業務短時間內成倍的數據存儲需求的情況下,如何快速響應、快速擴容以支持業務的穩定運行的呢? 本文將從Kafka集羣底層物理機層面硬盤的設計方案,來講解面對不同的業務需求場景,如何選擇好合適的磁盤方案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"醫療資訊場景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"騰訊醫療的使用場景是典型的日誌分析系統。Kafka作爲消息中間件,起到了數據聚合、流量削峯的作用。如下圖所示 :"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a7\/a73cce35a0492b9a2ce9039dcb22be73.png","alt":"Image","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"日誌分析系統架構在醫療的實例中,Kafka承載着峯值GB\/s的數據吞吐和大量的數據存儲壓力。這對底層的硬盤性能提出了很大的要求。比如吞吐\/存儲能力、快速擴縮容能力等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以一般在硬盤方案設計的時候需要綜合考慮下列因素:大容量存儲高吞吐量的IO能力快速擴縮容能力數據的安全性低冗餘的存儲組建Kafka集羣,常見的硬盤構建主要包括:單硬盤讀寫、多目錄讀寫、硬盤陣列(RAID0,RAID10)、邏輯卷(LVM)等方案。下面將深入分析各個方案的優劣勢,供讀者選擇參考。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"硬盤方案概述"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"硬盤存儲方案的設計使用的是現在成熟的工業化方案,並沒有特殊的創新。硬盤存儲方案的選擇更多的是從Apache Kafka產品的視角出發,考慮哪種方案更貼合使用者的業務需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":2,"normalizeStart":2},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"選擇硬盤介質"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"工業界的硬盤市場基本是機械硬盤和固態硬盤(SSD)的天下。在超大規模的存儲容量場景下,SSD的價格依舊是它的硬傷。對於Kafka這種高IO的應用,固態硬盤的損壞率和使用壽命是一個很大的問題。所以,機械硬盤以其便宜的價格及大容量成爲了不二之選。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"機械硬盤需要解決的兩個問題是:如何提高硬盤IO能力;在硬盤損壞成爲一個常態的情況下,又該如何保持業務系統的穩定。我們先從這兩個方面來分析下。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":3,"normalizeStart":3},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"提高硬盤IO能力"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一般使用以下指標衡量硬盤的性能:IOPS:每秒讀\/寫次數,單位爲次(計數)。存儲設備的底層驅動類型決定了不同的 IOPS。吞吐量:每秒的讀寫數據量,單位爲MB\/s。時延:I\/O 操作的發送時間到接收確認所經過的時間,單位爲秒。Kafka程序本身通過順序讀寫、Page Cache、零拷貝等方案,從應用層面極大的利用了硬盤的性能。但是,一旦硬盤吞吐能力不足,Kafka集羣提供服務的能力將大打折扣。因爲Kafka的使用場景和運行方式,最關注的性能指標是吞吐量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在自建集羣的場景下,用獨立主機掛載單塊硬盤的方式是最常用的方案。在單硬盤讀寫的基礎上,提高硬盤吞吐能力的方案主要有如下幾種:單硬盤讀寫Kafka的多目錄讀寫RAID硬盤陣列方案Logical Volume Manage(LVM)條帶化方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"方案一: 單硬盤讀寫"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"單機單硬盤部署是在自建集羣當中最常見的一種方案,也是在實踐當中用的最多的一種方案。如下圖:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/e6\/e6d0fce0bb62d7ddcaaf3812bbe092ed.jpeg","alt":"Image","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"單機單硬盤集羣部署方案上圖是一個由三臺節點構成的Kafka集羣,集羣的每個節點掛載一塊SATA\/SSD的數據盤,用來存放Kafka的數據。該方案的特點就是思路簡單,搭建快捷。很適合自建的小規模集羣。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種方案,當發現硬盤能力不足時,最直接有效的解決思路就是垂直擴容。即提高單盤的IO能力,比如將5400 轉\/秒的硬盤換爲7400 轉\/秒的,或者換爲10000 轉\/秒、甚至10000 轉\/秒以上的更高轉速。當機械硬盤的能力不足時,直接換爲大容量的SSD。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種方案的缺點也比較明顯。首先,單塊的SSD吞吐量是有上限的,當Kafka流量增大,SSD也有承受不了的一天,單塊硬盤的吞吐將限制Kafka集羣的吞吐能力。另外SSD的價格大約是SATA的好幾倍。以當前騰訊雲上的硬盤價格爲例,SSD價格是高性能雲硬盤的3倍。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以,當集羣規模持續擴大時,該方案並不是一個長久的選擇。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"方案二: Kafka的多目錄讀寫"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當集羣的壓力持續增大,單塊的硬盤滿足不了需求或者基於成本考慮,不想使用SSD了,該怎麼辦呢? Apache Kafka官方在0.8開始,提供了多目錄讀寫的能力。將log.dir屬性變爲log.dirs。官方解釋如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A comma-separated list of one or more directories in which Kafka data is stored. Each new partition that is created will be placed in the directory which currently has the fewest partitions"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"簡單說,就是支持配置多個日誌文件夾,文件夾之間用逗號隔開,這樣做在實際項目中有非常大的好處,即支持多硬盤的讀寫能力。在server.properties配置文件添加如上配置:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"log.dirs=\/data,\/data1,\/data2"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"添加了這個配置後,有什麼效果呢,看下圖:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/53\/5317453d8888b111a5d3a767d11ffbdf.jpeg","alt":"Image","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如上圖,假設有一個有9個分區1個副本的topicA。這9個分區會平均分佈在節點1,2,3上。假設節點1上分配了0、1、2三個分區。那麼Kafka會將這三個分區的數據目錄分別放在\/data、\/data1、\/data2三個目錄下。partition的數據目錄分別爲: \/data\/topicA-0、\/data1\/topicA-1、\/data2\/topicA-2。至於爲什麼會均勻分佈就不詳細展開了,有興趣同學可以去參閱相關資料。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此時當Kafka往這三個分區寫入數據的時候 ,就可以利用到三塊硬盤的IO能力。其實這是一個很好用的方案。但是多目錄讀寫方案也有一些情況是不能處理的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們來看下這種情況:因爲業務特點不同,有的業務會出現數據冷熱明顯的問題。可能出現有的partition的量很大,有的partition量很小。這就有可能出現單個Partion的量達到了硬盤的IO瓶頸,此時就又回到了單硬盤的方案遇到的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以當業務有這種場景的時候,多目錄的方案可能會有一些侷限。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"方案三: RAID磁盤陣列"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RAID就是磁盤陣列,由很多塊獨立的硬盤組合成一個容量巨大的硬盤組,利用多個硬盤產生加成效果提升整個硬盤IO能力。典型代表有RAID0、RAID1、RAID5、RAID10等。由於篇幅原因,就不展開解釋RAID的相關知識。有興趣的可以查閱相關資料。簡單比對一下以上方案的優劣勢:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/19\/190d244189c20e9197063814f7e7d462.png","alt":"Image","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"RAID10方案"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用RAID方案的原因就是爲了解決單硬盤IO瓶頸的問題。經過以上方案比對,我們以RAID10爲例,展開說明RAID做了什麼,以及應該選擇哪種RAID方案。先看下圖:"}]},{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/5b\/5b0623a0c45985ef6c01e9386995a4ab.jpeg","alt":"Image","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖中單機用四塊盤組成RAID10,即先用兩塊盤組成一塊RAID1的虛擬盤,再用兩塊虛擬盤構建成一塊虛擬的RAID0盤,並掛載到\/data目錄下。理論上來看,假設單盤吞吐量是100MB,那四塊盤組成的RAID10陣列的吞吐就是200MB,且每份數據在底層存儲,是雙副本的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏RAID1的作用是,即使底層有一塊數據盤損壞,系統會自動讀寫另外一塊備份盤的數據。此時系統還是能夠正常使用的。RAID0的作用是並行IO,提高整體的IO能力。從理論上講,三塊硬盤的並行操作使同一時間內硬盤讀寫速度提升了3倍。但由於總線帶寬等多種因素的影響,實際的提升速率肯定會低於理論值。但是,大量數據並行傳輸與串行傳輸比較,提速效果顯著顯然毋庸置疑。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏需要注意一點的是,單純從IO能力,從多塊盤的IO能力相加的角度來看,RAID0並不比多目錄方案的IO能力強。RAID0的優點在於,單塊虛擬盤的IO能力比單塊物理硬盤的強,單塊虛擬盤容量比單塊物理硬盤容量大。這兩個特性是可以解決數據冷熱明顯、數據傾斜的情況的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":2,"normalizeStart":2},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"直接用RAID0可以嗎?"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"細心的同學可能會發現這麼一個問題?假設我們有1個分區2個副本的topicB。兩個副本分佈在節點1和節點2。此時當生產一條數據messageA時,messageA會在集羣裏面存儲4份。即節點1和節點2各存兩份數據(RAID1雙副本)。如下圖:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a0\/a097c69d6f61381e2f4dee7b92b99b73.jpeg","alt":"Image","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"既然節點1和節點2都有Kafka的Replication副本了。爲什麼要在硬盤多冗餘一份副本呢?這不是很浪費。是的。從數學的角度來看,是這樣子沒錯。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們來假設一個場景,假設我們直接用RAID0。此時一個數據盤壞掉了,Kafka集羣會自動把影響到的Partition分區遷移到其他可用的機器上。如果剛好是leader,則進行leader切換。看起來好像沒問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是假設剛好影響的Parition量比較大,將其切換到其他機器,會導致其他機器的壓力增大,則很可能會影響到其他Partition的使用。這裏是一個影響集羣安全的重大隱患。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在一個成千上萬臺的大集羣內,硬盤損壞是一件常事。這樣就會造成分區遷移、leader切換的過程變得相對頻繁。但這點看起來不是特別大的問題,因爲數據可以正常訪問,也不會丟失。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是如果客戶對leader切換比較敏感,就會很快的感知到服務端的波動。作爲服務提供商,還是希望給用戶提供穩定的服務。如果發生上述情況,用戶可能會覺得服務不夠穩定,以至於影響廠商口碑。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於工業發展,機械硬盤的價格持續下降,大容量的機械硬盤價格其實是極低的。所以在思考成本和穩定性的平衡中,可以考慮多付出成本,保證服務的穩定。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果是RAID10,壞了一塊硬盤,此時系統還能正常運行。因爲硬盤都有損壞告警,假設駐場更換的週期是24小時,則只需保證在24小時內,同一塊RAID1的另一塊硬盤不損壞,即可保證系統正常穩定無波動的運行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面分享個小技巧,先來看下面這張圖:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/5b\/5b269beefa52bc43abee84c12ec4bb3d.png","alt":"Image","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"單機不同批次硬盤部署示意圖圖中的硬盤爲什麼是不同顏色的呢?其實在做RAID1的時候,兩塊硬盤最好不要是同一個批次的。因爲同一個批次硬盤具備相同的製作流程,考慮到可能會具備相同的磁盤壽命等因素,在一定程度上可能會增大兩塊硬盤同時損壞的概率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"到這裏,似乎挺完美的了。我們再加一個因子:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假設開始規劃的單機容量是8TB,業務發展需要單機容量變爲10TB。那此時應該怎麼辦呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲Raid0是不能動態擴展的。此時怎麼辦呢,貌似只有更換整個硬盤了。下面來看一下LVM方案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"方案四: LVM邏輯卷條帶化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"LVM邏輯卷的條帶化原理和RAID1很像。都是條帶化的進行數據讀寫。都有並行讀寫的能力。在實測過程中,兩種方案的並行讀寫性能是差不多的。LVM相對於RAID10的好處在於,它提供了動態擴容硬盤的能力。LVM條帶化的擴容是依賴以lvmextend命令實現的。擴容有一個條件:條帶化的lvm擴容需要每個硬盤擴容大小一樣的容量。如果每個硬盤容量不一樣,條帶化的lvextend會失敗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"舉個例子:例如當前有三塊盤容量1T,剩餘空間200GB的盤,加入兩塊空間1T的盤,此時因爲做了條帶化,因爲擴容需要從每塊盤平均分出相等的空間來擴容,所以每塊盤最多隻能分出200GB的空間,五塊盤共1000GB空間來擴容。此時,新加的盤就有800GB的空間沒法利用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如上所述,在用物理機掛載物理盤部署Kafka集羣的的場景下,LVM的動態擴容能力看起來沒有實際用處。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們換個場景,隨着雲服務時代到來。當我們在雲上購買虛擬機,購買雲上網盤來搭建集羣的時候,LVM的作用就凸顯出來了,請看下圖:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/28\/287db66a8a1fe2f321a5ca2d451728ba.webp","alt":"Image","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如圖所示,每臺CVM上掛三塊雲硬盤,三塊雲硬盤通過LVM條帶化組建成一塊邏輯硬盤,掛載到\/data目錄下。雲硬盤的特點是底層多副本,可在線擴容。這兩個特點和LVM很搭。爲什麼呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雲硬盤底層自帶多副本,就不需要再做Raid1避免數據損壞、系統波動了。在線擴容,是指可以動態的對單塊硬盤進行擴容。此時lvm的動態擴容能力就凸顯出來了。如下圖:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/65\/658774898e435149168a9fd23a07de29.jpeg","alt":"Image","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"舉個例子:假設集羣一開始的時候規劃每臺機器只需要600GB的容量。此時,我們可以每臺機器購買6塊100GB的雲硬盤,構建LVM條帶化。掛載到\/data目錄下,這樣即可以利用條帶化的並行寫入能力,也可以得到所需的600GB容量。當業務發展一段時間,忽然發現,600GB不夠用了,每臺Broker需要 1.2TB。此時我們通過控制檯在線擴容硬盤容量,將每臺broker的雲硬盤擴容到1.2TB,然後通過lvextend命令擴容\/data的容量,即可。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"騰訊雲上,單塊雲硬盤允許的最大容量爲16TB,單機允許最多掛載20塊硬盤。所以,按照上述思路,一臺機器理論的容量是16*20=320TB。這對於單臺機器來講,基本是一個超大的容量了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當然,上述畢竟是理論值。實踐肯定達不到這麼理想的狀態。但是提供了一個思路。另外在提升條帶化IO能力、估算掛盤的數量,都與LVM條帶化時設置的IO SIZE有關。IO SIZE取決於業務單條數據的長度和數據量。這個值並沒有一個推薦的值,需要根據用戶自身業務特點去評估。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如上所述,如果是部署在雲上的Kafka,LVM是一種比Raid10更適合的方案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文分析了常見的幾種方案的使用場景和優劣。總結髮現其實並沒有一套完美的通用方案。而磁盤陣列也不是一個大力推薦的方案,自建集羣的業務場景簡單,單硬盤方案和多目錄讀寫方案基本可以解決很多問題。而在業務場景複雜、規模大的物理機集羣,RAID0和RAID10都是可以考慮的方案。如果是部署在雲上CVM主機的集羣,LVM方案是一個較好的選項。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"綜上所述,運營一個合適的Apache Kafka集羣,需要根據業務特點、成本、數據可靠性、現有資源、所處環境等因素來考慮和權衡合適的硬盤方案。"}]},{"type":"horizontalrule"},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"頭圖:Unsplash"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作者:許文強"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文:https:\/\/mp.weixin.qq.com\/s\/m1BzUynhDidXH3iDHt9wBg"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文:揭祕Kafka的硬盤設計方案,快速完成PB級數據擴容需求!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"來源:騰訊雲中間件 - 微信公衆號 [ID:gh_6ea1bc2dd5fd]"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"轉載:著作權歸作者所有。商業轉載請聯繫作者獲得授權,非商業轉載請註明出處。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章