如何設計高效的HBase數據模型

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從學習和使用HBase的經歷中,整理出對使用者而言,需要了解的HBase基礎知識,Mark一下。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.背景知識","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.1 數據模型","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"學習HBase/BigTable最困難的部分,是理解它的數據模型,換句話說它究竟是咋用的?在BigTable論文中明確說明:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"A Bigtable is a sparse, distributed, persistent multidimensional sorted map.\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"論文做了進一步解釋:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推薦兩篇文章,對此解釋的非常清楚:","attrs":{}},{"type":"link","attrs":{"href":"https://dzone.com/articles/understanding-hbase-and-bigtab","title":"","type":null},"content":[{"type":"text","text":"understanding-hbase-and-bigtab","attrs":{}}]},{"type":"text","text":" 和 ","attrs":{}},{"type":"link","attrs":{"href":"http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/9353-login1210_khurana.pdf","title":"","type":null},"content":[{"type":"text","text":"Introduction to HBase Schema Design","attrs":{}}]},{"type":"text","text":",下面是一個更形象的示例:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"{\n // ...\n \"aaaaa\" : { // Row Key\n \"A\" : { // Column Families\n \"foo\" : { // Column Qualifiers\n 15 : \"y\", // 15: Timestamp / Version number, \"y\": Cell Value\n 4 : \"m\"\n },\n \"bar\" : {\n 15 : \"d\",\n }\n },\n \"B\" : {\n \"\" : {\n 6 : \"w\"\n 3 : \"o\"\n 1 : \"w\"\n }\n }\n },\n // ...\n}\n\n拆解說明:\n map : 存儲KeyValue數據。\n persistent : 數據以文件的形式在HDFS(或S3)/GFS存儲。\n distributed : HBase和BigTable都建立在分佈式文件系統之上,計算/存儲分離架構;由Master和一組Storage Server組成,數據分區存儲,典型的分佈式系統。\n sorted : RowKey按字典序排序的。\n multidimensional: 如上面的示例看到的,這是一個嵌套Map,或者說一個多維Map。\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"真實的存儲是平面文件結構,存儲模型是類似下面的結構:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"aaaaa, A:foo,15 ---> y\naaaaa, A:foo, 4 ---> m\naaaaa, A:bar, 15 ---> d\naaaaa, B:, 6 ---> w\naaaaa, B:, 3 ---> w\naaaaa, B:, 1 ---> w\n\nKey的邏輯結構是: {RowKey:ColumnFamily:Qualifier:TimeStamp},Key按字母升序排序,TimeStamp按降序排序。\n","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.2 讀/寫/壓縮","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HBase/BigTable存儲使用LSM方式實現,這個網上文章很多,這裏只簡單介紹需要了解的主要流程:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"(1) 寫流程:先寫Log文件(WAL),然後寫MemTable,當MemTable寫滿後,把數據落地到文件當中。\n(2) 讀流程:由於MemStore、HFiles中都包含數據,讀取操作其實類似一個多路歸併排序操作,最近的數據在MemStore\n中,次新數據在近期生成的HFile中,老數據在更早生成的HFile中,按照這個順序遍歷要查找的key。\n(3) 壓縮流程:當文件越來越多時,就需要進行壓縮,回收無效效數據(減少存儲佔用),減少文件數量(提高讀效率)。\n壓縮操作的思路是合併MemStore和HFiles文件,刪除無效的key值(被新版本覆蓋或刪除),生成新的文件,回收舊文件。\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以看出,HBase/BigTable非常適合有大量寫操作的應用,順序讀性能也不錯,適合批量數據處理(例如MapReduce)。BigTable論文提到的其中兩個使用場景都符合這個特徵:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"(1) 存儲抓取回來的網頁,使用MapReduce進行處理;\n(2) Google Analytics項目,收集用戶點擊數據,使用MapReduce定期分析,生成網站訪問報告。\n","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.3 存儲分區","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HBase支持數據自動分區,分區方式: 水平分割+垂直分割,","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"水平分割:KeyValue數據依據RowKey劃分到不同的Region,每一個Region被分配給一個Region Server,具備良好的擴展性。當一個Region過大時,會被分割成兩個Region。當一個Region Server負載過重時,把其中的部分Region遷移到其他Region Server。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"垂直分割:在一個Region內部,每個ColumnFamily的數據是單獨存儲的,這使他們有更好的訪問局部性。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"參考","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"http://hbase.apache.org/book.html#trouble.namenode.hbase.objects","attrs":{}}],"attrs":{}},{"type":"text","text":",HBase在HDFS存儲數據的目錄結構如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"/hbase\n /data\n / (Namespaces in the cluster)\n / (Tables in the cluster)\n / (Regions for the table)\n / (ColumnFamilies for the Region for the table)\n / (StoreFiles for the ColumnFamily for the Regions for the table)\n","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.5 KeyValue存儲格式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"The KeyValue format inside a byte array is:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"keylength","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"valuelength","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"key","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"value","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在新版本中,支持爲Cell附加tags,KeyValue的格式爲: ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"{keylength,valuelength, key, value, tags}","attrs":{}}],"attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"The Key is further decomposed as:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"rowlength","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"row (i.e., the rowkey)","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"columnfamilylength","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"columnfamily","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"columnqualifier","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"timestamp","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"keytype (e.g., Put, Delete, DeleteColumn, DeleteFamily)","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"行鍵和列族/列屬性,會編碼到每一個行當中,所以行鍵、列族和列屬性應該儘可能短一些。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.4 訪問模型和事務","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HBase主要使用Put/Delete/Scan這幾個接口訪問數據,支持批量讀寫:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Version/TimeStamp:HBase/BigTable支持一個Key的多個Value版本(依靠TimeStamp區分),查詢的時候可以訪問最新版本,也可以訪問指定時間區間的所有版本。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Column:每一個Column Family都可以有任意多個Column,所以必須指定要訪問的Column Qualifiers,才能確認要訪問的數據。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HBase可以保證對同一個Row的操作是原子操作,這是因爲:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Row只屬於一個Region,只能通過一個Region Server進行寫操作,不存在寫衝突;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過WAL,可以保證對一個Row的多個操作(可以是多個ColumnFamily)要麼都成功,要麼都失敗。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一個Region Server的多個Region共享一組WAL日誌文件,一個Batch操作做爲一個Record寫入WAL,所以可以保證要麼 都成功,要麼都失敗。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}],"attrs":{}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"The HDFS directory structure of HBase WAL is..\n /hbase\n /WALs\n /\t\t(RegionServers)\n / (WAL files for the RegionServer)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.5 TTL","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以對錶和列族設置TTL(秒),HBase會自動刪除過期的Rows;在新近的版本中,HBase支持對每個Cell單獨設置TTL(毫秒)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"(1) 標和列族的TTL保存在Schema中,屬於表/列族級別的配置。\n(2) Cell的TTL是作爲tag,編碼保存在Cell裏面的,因此每個Cell都可以單獨設置。\n","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.6 版本數量","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HBase支持多版本,一個RowKey可以有多個版本的Value,使用時間戳區分版本。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Maximum Number of Versions","attrs":{}},{"type":"text","text":"。 一般不建議設置成很大的值保(例如幾百或更多),除非這些舊版本的數據非常有價值,因爲這會使StoreFile大小劇增。缺省值是1。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Minimum Number of Versions","attrs":{}},{"type":"text","text":"。這個值和ttl參數一起使用,實現類似“保留最後T分鐘的數據,最多N個版本,但至少保留M個版本”的需求。缺省值是0,也就是不啓用這個feature。","attrs":{}}]}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 模式設計指南","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HBase本質上是一個kv數據庫,不支持RDBMS中常見的索引、事務等特性,它的設計目標是應對高吞吐量、海量數據擴展性的需求。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"應用程序應該根據業務需求,來設計高層數據模式。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.1 梳理數據訪問模式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"訪問模式直接決定我們的設計方案,仔細梳理數據訪問模式,列出主要訪問場景,在後面設計中時時回看,我們的設計能否滿足這些訪問場景?還有沒有更高的設計方案?","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.2 設計行鍵","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RowKey是HBase表設計中最重要的事情,它決定了應用與HBase交互的方式,直接影響數據訪問的性能。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"(1) 行鍵,列族,列屬性儘可能短,以節省存儲空間\n\n(2) 利用行鍵排序的特性,使用相同key前綴聚合經常一起訪問的數據,提高讀效率\n例如,對於網頁URL,可以對域名做字符串反轉,得到com.demo.xxx/page1.html,使一個域名下的頁面在存儲上相鄰。\n\n(3) 充分發揮分區的併發性能,避免行鍵設計導致出現熱門分區\n常見的方法: \n - salt:通過增加不同的前綴,使Key均勻分佈在各個region。\n - hash:對key值做hash,這可以使key的分佈非常均勻,並且可以得到固定長度(例如md5),缺點是不方便做區間遍歷。\n\n(4) 儘量使用單行設計,避免多行事務\n考慮如何在單個API調用中完成訪問而不是多個API調用,HBase沒有跨行事務,避免在客戶端代碼中構建這種邏輯。\n\n(5) 對訪問最近數據的場景,使用逆序時間戳\n數據庫的一個場景問題是快速查找數據的最新版本,解決這個問題的一個方法是利用Hadoop的key值有序這個特徵,把時間戳逆序後append到key值尾部。\nkey格式:\n [key][reverse_timestamp]`,其中`reverse_timestamp = Long.MAX_VALUE - timestamp\n","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.3 設計列族和列屬性","attrs":{}}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"(1) 一個典型的模式每個表有1到3個列族。\n(2) 把訪問模式相似的列放到一個列族中。\n(3) 使列族名稱儘可能短,最好是一個字符。\n(4) 使用更短的列屬性(例如\"via\", 而不是\"myVeryImportantAttribute\"這樣冗長的列屬性)。\n(5) 列屬性也可以用來存儲數據,就像Cell一樣。\n(6) 使cell小於10MB;在使用mob時,不要超過50MB;否則,考慮把cell存到HDFS中,在hbase中僅保存一個指針。\n","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.4 規劃分區","attrs":{}}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"(1) 使region大小保持在10~50GB\n(2) 對於包含1或2個列族的表來說,大約50-100個區域是一個很好的數字。\n(3) 對於RowKey爲:`設備ID+時間戳`格式的時序數據,可以允許更多的分區,因爲歷史數據分區通常是不活動的。\n(4) 預分區\n Hadoop支持自動分區和負載均衡,但如果你非常瞭解你的數據,對table預先(人工)分區通常是一種最佳實踐。\n 但要小心測試你的數據,務必把key均勻分佈在各個region中,避免負載傾斜。\n 注意:使用Bytes.split (which is the split strategy used when creating regions in \n Admin.createTable(byte[] startKey, byte[] endKey,numRegions)分區時,要注意它是按照\n Byte範圍來分割的(包括了不可見字符),可能會導致有些Region根本不會有key存進來。\n","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.案例解析-OpenTSDB","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"OpenTSDB使用HBase存儲數據,它的兩個核心設計思想:","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"(1) 使用一行存儲一個時間區間的所有數據","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"OpenTSDB把1小時內的事件作爲一行存儲,所有事件存儲在列族t中,Key中的時間戳精確到小時,列名稱爲以本小時起始的秒數(1小時3600秒,最多也就3600個事件),或毫秒數(1小時可以容納的事件數就多了)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在當前時間段過去以後,可以把1個小時內的數據壓縮到一個列中存儲(對照KeyValue結構,這能節省太多的冗餘數據)。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"(2) 對字符串編碼,減小key的長度","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"TSDB把字符串映射成數值,來減小Key的長度。指標名稱,標籤Key,標籤Value,都映射成一個變長整數(使用映射表:tsdb-uid 存儲文本和整數的映射關係),這就使得Key值非常短。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"RowKey結構:``。\n\n注:\n (1) 時間戳:指明時間點\n (2) 指標名稱: 這個數據的抽象概括,指明監控內容,如溫度,溼氣,大小\n (3) 標籤: 對象,指明監控對象 ,如某個城市,某個CPU,某塊區域\n (4) 值: 指標數值\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f3/f3ef8ea13aafb652a97a1150c1688e4f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖1 OpenTSDB tsdb表結構","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"參考:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(1) ","attrs":{}},{"type":"link","attrs":{"href":"https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf","title":"","type":null},"content":[{"type":"text","text":"Bigtable: A Distributed Storage System for Structured Data","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(2) ","attrs":{}},{"type":"link","attrs":{"href":"http://hbase.apache.org/book.html","title":"","type":null},"content":[{"type":"text","text":"Apache HBase Reference Guide","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(3) ","attrs":{}},{"type":"link","attrs":{"href":"https://dzone.com/articles/understanding-hbase-and-bigtab","title":"","type":null},"content":[{"type":"text","text":"understanding hbase andbigtab","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(4) ","attrs":{}},{"type":"link","attrs":{"href":"http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/9353-login1210_khurana.pdf","title":"","type":null},"content":[{"type":"text","text":"Introduction to HBase Schema Design","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(5) ","attrs":{}},{"type":"link","attrs":{"href":"http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html","title":"","type":null},"content":[{"type":"text","text":"OpenTSDB HBase Schema","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(6) Hbase最佳實踐系列","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://sq.163yun.com/blog/article/155473761979842560","title":"","type":null},"content":[{"type":"text","text":"HBase最佳實踐之集羣規劃","attrs":{}}]}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://sq.163yun.com/blog/article/168426776398073856","title":"","type":null},"content":[{"type":"text","text":"HBase最佳實踐之列族設計優化","attrs":{}}]}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://sq.163yun.com/blog/article/170967745293946880","title":"","type":null},"content":[{"type":"text","text":"HBase最佳實踐之讀性能優化策略","attrs":{}}]}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://sq.163yun.com/blog/article/170966951870259200","title":"","type":null},"content":[{"type":"text","text":"HBase最佳實踐之寫性能優化策略","attrs":{}}]}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://sq.163yun.com/blog/article/170968702513811456","title":"","type":null},"content":[{"type":"text","text":"HBase最佳實踐-管好你的操作系統","attrs":{}}]}]}]}],"attrs":{}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章