如何设计高效的HBase数据模型

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"从学习和使用HBase的经历中,整理出对使用者而言,需要了解的HBase基础知识,Mark一下。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.背景知识","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.1 数据模型","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"学习HBase/BigTable最困难的部分,是理解它的数据模型,换句话说它究竟是咋用的?在BigTable论文中明确说明:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"A Bigtable is a sparse, distributed, persistent multidimensional sorted map.\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"论文做了进一步解释:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推荐两篇文章,对此解释的非常清楚:","attrs":{}},{"type":"link","attrs":{"href":"https://dzone.com/articles/understanding-hbase-and-bigtab","title":"","type":null},"content":[{"type":"text","text":"understanding-hbase-and-bigtab","attrs":{}}]},{"type":"text","text":" 和 ","attrs":{}},{"type":"link","attrs":{"href":"http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/9353-login1210_khurana.pdf","title":"","type":null},"content":[{"type":"text","text":"Introduction to HBase Schema Design","attrs":{}}]},{"type":"text","text":",下面是一个更形象的示例:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"{\n // ...\n \"aaaaa\" : { // Row Key\n \"A\" : { // Column Families\n \"foo\" : { // Column Qualifiers\n 15 : \"y\", // 15: Timestamp / Version number, \"y\": Cell Value\n 4 : \"m\"\n },\n \"bar\" : {\n 15 : \"d\",\n }\n },\n \"B\" : {\n \"\" : {\n 6 : \"w\"\n 3 : \"o\"\n 1 : \"w\"\n }\n }\n },\n // ...\n}\n\n拆解说明:\n map : 存储KeyValue数据。\n persistent : 数据以文件的形式在HDFS(或S3)/GFS存储。\n distributed : HBase和BigTable都建立在分布式文件系统之上,计算/存储分离架构;由Master和一组Storage Server组成,数据分区存储,典型的分布式系统。\n sorted : RowKey按字典序排序的。\n multidimensional: 如上面的示例看到的,这是一个嵌套Map,或者说一个多维Map。\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"真实的存储是平面文件结构,存储模型是类似下面的结构:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"aaaaa, A:foo,15 ---> y\naaaaa, A:foo, 4 ---> m\naaaaa, A:bar, 15 ---> d\naaaaa, B:, 6 ---> w\naaaaa, B:, 3 ---> w\naaaaa, B:, 1 ---> w\n\nKey的逻辑结构是: {RowKey:ColumnFamily:Qualifier:TimeStamp},Key按字母升序排序,TimeStamp按降序排序。\n","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.2 读/写/压缩","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HBase/BigTable存储使用LSM方式实现,这个网上文章很多,这里只简单介绍需要了解的主要流程:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"(1) 写流程:先写Log文件(WAL),然后写MemTable,当MemTable写满后,把数据落地到文件当中。\n(2) 读流程:由于MemStore、HFiles中都包含数据,读取操作其实类似一个多路归并排序操作,最近的数据在MemStore\n中,次新数据在近期生成的HFile中,老数据在更早生成的HFile中,按照这个顺序遍历要查找的key。\n(3) 压缩流程:当文件越来越多时,就需要进行压缩,回收无效效数据(减少存储占用),减少文件数量(提高读效率)。\n压缩操作的思路是合并MemStore和HFiles文件,删除无效的key值(被新版本覆盖或删除),生成新的文件,回收旧文件。\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以看出,HBase/BigTable非常适合有大量写操作的应用,顺序读性能也不错,适合批量数据处理(例如MapReduce)。BigTable论文提到的其中两个使用场景都符合这个特征:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"(1) 存储抓取回来的网页,使用MapReduce进行处理;\n(2) Google Analytics项目,收集用户点击数据,使用MapReduce定期分析,生成网站访问报告。\n","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.3 存储分区","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HBase支持数据自动分区,分区方式: 水平分割+垂直分割,","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"水平分割:KeyValue数据依据RowKey划分到不同的Region,每一个Region被分配给一个Region Server,具备良好的扩展性。当一个Region过大时,会被分割成两个Region。当一个Region Server负载过重时,把其中的部分Region迁移到其他Region Server。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"垂直分割:在一个Region内部,每个ColumnFamily的数据是单独存储的,这使他们有更好的访问局部性。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"参考","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"http://hbase.apache.org/book.html#trouble.namenode.hbase.objects","attrs":{}}],"attrs":{}},{"type":"text","text":",HBase在HDFS存储数据的目录结构如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"/hbase\n /data\n / (Namespaces in the cluster)\n / (Tables in the cluster)\n / (Regions for the table)\n / (ColumnFamilies for the Region for the table)\n / (StoreFiles for the ColumnFamily for the Regions for the table)\n","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.5 KeyValue存储格式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"The KeyValue format inside a byte array is:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"keylength","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"valuelength","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"key","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"value","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在新版本中,支持为Cell附加tags,KeyValue的格式为: ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"{keylength,valuelength, key, value, tags}","attrs":{}}],"attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"The Key is further decomposed as:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"rowlength","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"row (i.e., the rowkey)","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"columnfamilylength","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"columnfamily","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"columnqualifier","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"timestamp","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"keytype (e.g., Put, Delete, DeleteColumn, DeleteFamily)","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"行键和列族/列属性,会编码到每一个行当中,所以行键、列族和列属性应该尽可能短一些。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.4 访问模型和事务","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HBase主要使用Put/Delete/Scan这几个接口访问数据,支持批量读写:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Version/TimeStamp:HBase/BigTable支持一个Key的多个Value版本(依靠TimeStamp区分),查询的时候可以访问最新版本,也可以访问指定时间区间的所有版本。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Column:每一个Column Family都可以有任意多个Column,所以必须指定要访问的Column Qualifiers,才能确认要访问的数据。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HBase可以保证对同一个Row的操作是原子操作,这是因为:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Row只属于一个Region,只能通过一个Region Server进行写操作,不存在写冲突;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通过WAL,可以保证对一个Row的多个操作(可以是多个ColumnFamily)要么都成功,要么都失败。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一个Region Server的多个Region共享一组WAL日志文件,一个Batch操作做为一个Record写入WAL,所以可以保证要么 都成功,要么都失败。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}],"attrs":{}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"The HDFS directory structure of HBase WAL is..\n /hbase\n /WALs\n /\t\t(RegionServers)\n / (WAL files for the RegionServer)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.5 TTL","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以对表和列族设置TTL(秒),HBase会自动删除过期的Rows;在新近的版本中,HBase支持对每个Cell单独设置TTL(毫秒)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"(1) 标和列族的TTL保存在Schema中,属于表/列族级别的配置。\n(2) Cell的TTL是作为tag,编码保存在Cell里面的,因此每个Cell都可以单独设置。\n","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.6 版本数量","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HBase支持多版本,一个RowKey可以有多个版本的Value,使用时间戳区分版本。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Maximum Number of Versions","attrs":{}},{"type":"text","text":"。 一般不建议设置成很大的值保(例如几百或更多),除非这些旧版本的数据非常有价值,因为这会使StoreFile大小剧增。缺省值是1。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Minimum Number of Versions","attrs":{}},{"type":"text","text":"。这个值和ttl参数一起使用,实现类似“保留最后T分钟的数据,最多N个版本,但至少保留M个版本”的需求。缺省值是0,也就是不启用这个feature。","attrs":{}}]}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 模式设计指南","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HBase本质上是一个kv数据库,不支持RDBMS中常见的索引、事务等特性,它的设计目标是应对高吞吐量、海量数据扩展性的需求。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"应用程序应该根据业务需求,来设计高层数据模式。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.1 梳理数据访问模式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"访问模式直接决定我们的设计方案,仔细梳理数据访问模式,列出主要访问场景,在后面设计中时时回看,我们的设计能否满足这些访问场景?还有没有更高的设计方案?","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.2 设计行键","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RowKey是HBase表设计中最重要的事情,它决定了应用与HBase交互的方式,直接影响数据访问的性能。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"(1) 行键,列族,列属性尽可能短,以节省存储空间\n\n(2) 利用行键排序的特性,使用相同key前缀聚合经常一起访问的数据,提高读效率\n例如,对于网页URL,可以对域名做字符串反转,得到com.demo.xxx/page1.html,使一个域名下的页面在存储上相邻。\n\n(3) 充分发挥分区的并发性能,避免行键设计导致出现热门分区\n常见的方法: \n - salt:通过增加不同的前缀,使Key均匀分布在各个region。\n - hash:对key值做hash,这可以使key的分布非常均匀,并且可以得到固定长度(例如md5),缺点是不方便做区间遍历。\n\n(4) 尽量使用单行设计,避免多行事务\n考虑如何在单个API调用中完成访问而不是多个API调用,HBase没有跨行事务,避免在客户端代码中构建这种逻辑。\n\n(5) 对访问最近数据的场景,使用逆序时间戳\n数据库的一个场景问题是快速查找数据的最新版本,解决这个问题的一个方法是利用Hadoop的key值有序这个特征,把时间戳逆序后append到key值尾部。\nkey格式:\n [key][reverse_timestamp]`,其中`reverse_timestamp = Long.MAX_VALUE - timestamp\n","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.3 设计列族和列属性","attrs":{}}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"(1) 一个典型的模式每个表有1到3个列族。\n(2) 把访问模式相似的列放到一个列族中。\n(3) 使列族名称尽可能短,最好是一个字符。\n(4) 使用更短的列属性(例如\"via\", 而不是\"myVeryImportantAttribute\"这样冗长的列属性)。\n(5) 列属性也可以用来存储数据,就像Cell一样。\n(6) 使cell小于10MB;在使用mob时,不要超过50MB;否则,考虑把cell存到HDFS中,在hbase中仅保存一个指针。\n","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.4 规划分区","attrs":{}}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"(1) 使region大小保持在10~50GB\n(2) 对于包含1或2个列族的表来说,大约50-100个区域是一个很好的数字。\n(3) 对于RowKey为:`设备ID+时间戳`格式的时序数据,可以允许更多的分区,因为历史数据分区通常是不活动的。\n(4) 预分区\n Hadoop支持自动分区和负载均衡,但如果你非常了解你的数据,对table预先(人工)分区通常是一种最佳实践。\n 但要小心测试你的数据,务必把key均匀分布在各个region中,避免负载倾斜。\n 注意:使用Bytes.split (which is the split strategy used when creating regions in \n Admin.createTable(byte[] startKey, byte[] endKey,numRegions)分区时,要注意它是按照\n Byte范围来分割的(包括了不可见字符),可能会导致有些Region根本不会有key存进来。\n","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.案例解析-OpenTSDB","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"OpenTSDB使用HBase存储数据,它的两个核心设计思想:","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"(1) 使用一行存储一个时间区间的所有数据","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"OpenTSDB把1小时内的事件作为一行存储,所有事件存储在列族t中,Key中的时间戳精确到小时,列名称为以本小时起始的秒数(1小时3600秒,最多也就3600个事件),或毫秒数(1小时可以容纳的事件数就多了)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在当前时间段过去以后,可以把1个小时内的数据压缩到一个列中存储(对照KeyValue结构,这能节省太多的冗余数据)。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"(2) 对字符串编码,减小key的长度","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"TSDB把字符串映射成数值,来减小Key的长度。指标名称,标签Key,标签Value,都映射成一个变长整数(使用映射表:tsdb-uid 存储文本和整数的映射关系),这就使得Key值非常短。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"RowKey结构:``。\n\n注:\n (1) 时间戳:指明时间点\n (2) 指标名称: 这个数据的抽象概括,指明监控内容,如温度,湿气,大小\n (3) 标签: 对象,指明监控对象 ,如某个城市,某个CPU,某块区域\n (4) 值: 指标数值\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f3/f3ef8ea13aafb652a97a1150c1688e4f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"图1 OpenTSDB tsdb表结构","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"参考:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(1) ","attrs":{}},{"type":"link","attrs":{"href":"https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf","title":"","type":null},"content":[{"type":"text","text":"Bigtable: A Distributed Storage System for Structured Data","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(2) ","attrs":{}},{"type":"link","attrs":{"href":"http://hbase.apache.org/book.html","title":"","type":null},"content":[{"type":"text","text":"Apache HBase Reference Guide","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(3) ","attrs":{}},{"type":"link","attrs":{"href":"https://dzone.com/articles/understanding-hbase-and-bigtab","title":"","type":null},"content":[{"type":"text","text":"understanding hbase andbigtab","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(4) ","attrs":{}},{"type":"link","attrs":{"href":"http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/9353-login1210_khurana.pdf","title":"","type":null},"content":[{"type":"text","text":"Introduction to HBase Schema Design","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(5) ","attrs":{}},{"type":"link","attrs":{"href":"http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html","title":"","type":null},"content":[{"type":"text","text":"OpenTSDB HBase Schema","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(6) Hbase最佳实践系列","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://sq.163yun.com/blog/article/155473761979842560","title":"","type":null},"content":[{"type":"text","text":"HBase最佳实践之集群规划","attrs":{}}]}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://sq.163yun.com/blog/article/168426776398073856","title":"","type":null},"content":[{"type":"text","text":"HBase最佳实践之列族设计优化","attrs":{}}]}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://sq.163yun.com/blog/article/170967745293946880","title":"","type":null},"content":[{"type":"text","text":"HBase最佳实践之读性能优化策略","attrs":{}}]}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://sq.163yun.com/blog/article/170966951870259200","title":"","type":null},"content":[{"type":"text","text":"HBase最佳实践之写性能优化策略","attrs":{}}]}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://sq.163yun.com/blog/article/170968702513811456","title":"","type":null},"content":[{"type":"text","text":"HBase最佳实践-管好你的操作系统","attrs":{}}]}]}]}],"attrs":{}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章