storm+hbase開發

原創

2020-02-20 21:19

1、、maven依賴導入

    <!--storm-hbase到數據到hbase添加-->
    <dependency>
      <groupId>org.apache.storm</groupId>
      <artifactId>storm-hbase</artifactId>
      <version>1.1.1</version>
      <type>jar</type>
    </dependency>

    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-client</artifactId>
      <version>2.7.3</version>
      <exclusions>
        <exclusion>
          <groupId>org.slf4j</groupId>
          <artifactId>slf4j-log4j12</artifactId>
        </exclusion>
      </exclusions>
    </dependency>

    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>2.7.3</version>
      <exclusions>
        <exclusion>
          <groupId>org.slf4j</groupId>
          <artifactId>slf4j-log4j12</artifactId>
        </exclusion>
      </exclusions>
    </dependency>

    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-common</artifactId>
      <version>1.2.0</version>
    </dependency>

    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-client</artifactId>
      <version>1.2.0</version>
    </dependency>

3、rowkey設計

1)、RowKey長度原則：RowKey是一個二進制碼流，可以是任意字符串，最大長度爲64KB，實際應用中一般爲10~100bytes，存爲byte[]字節數組，一般設計成定長。建議是越短越好，不要超過16個字節。原因一是數據的持久化文件HFile中是按照KeyValue存儲的，如果RowKey過長比如100字節，1000萬列數據光RowKey就要佔用100*1000萬=10億個字節，將近1G數據，這會極大影響HFile的存儲效率；原因二是memstore將緩存部分數據到內存，如果RowKey字段過長內存的有效利用率會降低，系統將無法緩存更多的數據，這會降低檢索效率。因此RowKey的字節長度越短越好原因三是目前操作系統大都是64位，內存8字節對齊。控制在16個字節，8字節的整數倍利用操作系統的最佳特性。

2)、RowKey散列原則：如果RowKey是按時間戳的方式遞增，不要將時間放在二進制碼的前面，建議將RowKey的高位作爲散列字段，由程序循環生成，低位放時間字段，這樣將提高數據均衡分佈在每個RegionServer實現負載均衡的機率，如果沒有散列字段，首字段直接是時間信息，將產生所有數據都在一個RegionServer上堆積的熱點現象，這樣在做數據檢索的時候負載將會集中在個別RegionServer，降低查詢效率。

3)、RowKey唯一原則：必須在設計上保證其唯一性。

RowKey是按照字典排序存儲的，因此，設計RowKey時候，要充分利用這個排序特點，將經常一起讀取的數據存儲到一塊，將最近可能會被訪問的數據放在一塊。

舉個例子：如果最近寫入HBase表中的數據是最可能被訪問的，可以考慮將時間戳作爲RowKey的一部分，由於是字段排序，所以可以使用Long.MAX_VALUE-timeStamp作爲RowKey，這樣能保證新寫入的數據在讀取時可以別快速命中。
參考：

HBase RowKey的設計原則

HBase的rowkey設計（含實例）

4、java API

初始化連接：

private Connection connection; //HBase 連接
private Table table;
public void initHbase() {
    // 本地調試需要，windows中需要添加hadoop的安裝包
    System.setProperty("hadoop.home.dir", "D:\\Program Files\\hadoop-common-2.2.0-bin-master");
    // 本地調試需要，設置當前window/linux下用戶爲HBase可訪問用戶
    System.setProperty("HADOOP_USER_NAME", "hbase");
    Configuration conf = HBaseConfiguration.create(); //HBase 配置信息

    try {
        this.connection = ConnectionFactory.createConnection(conf);
        String tableName = "TABLE_NAME";
        this.table = this.connection.getTable(TableName.valueOf(tableName));
    } catch (IOException e) {
        e.printStackTrace();
    }
}

判斷表，建表：

// 判斷表是否存在
Admin admin = connection.getAdmin();
TableName tableName = TableName.valueOf(myTableName);
if(admin.tableExists(tableName)){
    System.out.println("table exists!");
} else {
    HTableDescriptor hTableDescriptor = new HTableDescriptor(tableName);
    for(String str:colFamily){
        HColumnDescriptor hColumnDescriptor = new HColumnDescriptor(str);
        hTableDescriptor.addFamily(hColumnDescriptor);
    }
    admin.createTable(hTableDescriptor);
}

批量寫入：


private String rowKey; //行鍵
private String family; // 列族
private String column; // 列
private String value; //列值

byte[] rowKey = Bytes.toBytes(rowKey);
byte[] family = Bytes.toBytes(family);
    
Put put = new Put(rowKey);
put.addColumn(family, Bytes.toBytes(column), Bytes.toBytes(value));
puts.add(put);
// 批量寫入
table.put(puts);

關閉連接：

public void closeHbase() {
    //關閉table
    try {
        if (this.table != null) {
            this.table.close();
        }
    } catch (Exception e) {
        e.printStackTrace();
        log.error(e.getMessage());
    } finally {
        //在finally中關閉connection
        try {
            this.connection.close();
        } catch (IOException e) {
            e.printStackTrace();
            log.error(e.getMessage());
        }
    }
}

掃描查詢：

// scan查詢
Scan scan = new Scan();
scan.setStartRow(Bytes.toBytes(""));
scan.setStopRow(Bytes.toBytes(""));

ResultScanner resutScanner = table.getScanner(scan);
for (Result result : resutScanner) {
    String rowKey = Bytes.toString(result.getRow());

    for (Cell kv : result.rawCells()) {
        Long timestamp = kv.getTimestamp();
        String qualifier = Bytes.toString(CellUtil.cloneQualifier(kv));
        String value = Bytes.toString(CellUtil.cloneValue(kv));
       
    }
}

參考：HBase Java API編程實例