hive Hcatalog streaming API使用

hive streaming

hive傳統的數據導入採用批量導入的方式，這中數據導入難以滿足實時性的要求。hive streaming提供了數據流式寫入的API，這樣外部數據可以連續不斷的寫入hive中。

必備條件

hive streaming 需要配合hive 事務表使用，表的數據存儲格式式必須爲 orc

在 hive-site.xml 中設置如下參數以支持hive事務表

hive.txn.manager =org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.compactor.initiator.on = true (See more important details here)
hive.compactor.worker.threads > 0

建表時指定表爲事務表 tblproperties(“transactional”=“true”)
hive表必須爲分區分桶表

案例

hadoop版本：2.6.5
hive版本：1.2.2

1.在hive中新建一張表test.t3

CREATE TABLE t3 (id INT, name STRING, address STRING) partitioned by (country string)   CLUSTERED BY (id) INTO 8 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional'='true');

2.代碼

```
public class HiveStreamingDemo {

    /**
     *  DelimitedInputWriter使用
     * @throws InterruptedException
     * @throws StreamingException
     * @throws ClassNotFoundException
     */
    public static void delimitedInputWriterDemo() throws InterruptedException, StreamingException, ClassNotFoundException {
        String dbName = "test";
        String tblName = "t3";
        List<String> partitionVals = new ArrayList<String>(1);
        partitionVals.add("china");

        HiveEndPoint hiveEP = new HiveEndPoint("thrift://192.168.61.146:9083", dbName, tblName, partitionVals);
        String[] fieldNames = new String[3];
        fieldNames[0] = "id";
        fieldNames[1] = "name";
        fieldNames[2] = "address";

        StreamingConnection connection = hiveEP.newConnection(true);
        DelimitedInputWriter writer =
                new DelimitedInputWriter(fieldNames,",", hiveEP);

        TransactionBatch txnBatch= connection.fetchTransactionBatch(10, writer);

        txnBatch.beginNextTransaction();
        for (int i = 0 ; i < 100; ++i) {
            txnBatch.write((i + ",zhangsan,beijing").getBytes());
        }
        txnBatch.commit();
        txnBatch.close();

        connection.close();
    }

    /**
     * StrictJsonWriter 使用
     * @throws StreamingException
     * @throws InterruptedException
     */
    public static void strictJsonWriterDemo() throws StreamingException, InterruptedException {
        String dbName = "test";
        String tblName = "t3";
        List<String> partitionVals = new ArrayList<String>(1);
        partitionVals.add("china");

        HiveEndPoint hiveEP = new HiveEndPoint("thrift://192.168.61.146:9083", dbName, tblName, partitionVals);
        StreamingConnection connection = hiveEP.newConnection(true);

        StrictJsonWriter writer = new StrictJsonWriter(hiveEP);
        TransactionBatch txnBatch= connection.fetchTransactionBatch(10, writer);

        txnBatch.beginNextTransaction();
        for (int i = 0 ; i < 10; ++i) {
            JSONObject jsonObject = new JSONObject();
            jsonObject.put("id", i);
            jsonObject.put("name", "chenli" + i);
            jsonObject.put("address", "beijing");
            txnBatch.write(jsonObject.toJSONString().getBytes());
        }
        txnBatch.commit();
        txnBatch.close();
        connection.close();
    }

    public static void main(String[] args) throws InterruptedException, StreamingException, ClassNotFoundException {
        strictJsonWriterDemo();
    }
```

pom.xml

```
<dependency>
	<groupId>org.apache.hive.hcatalog</groupId>
	<artifactId>hive-hcatalog-streaming</artifactId>
	<version>1.2.2</version>
</dependency>
<dependency>
	<groupId>org.apache.hive.hcatalog</groupId>
	<artifactId>hive-hcatalog-core</artifactId>
	<version>1.2.2</version>
</dependency>
```

3.添加hive-site.xml,hdfs-site.xml,core-site.xml到resources目錄下

4.運行結果

5.查看hive表中的數據

```
hive> select * from t3;
OK
0       chenli0 beijing china
1       chenli1 beijing china
2       chenli2 beijing china
3       chenli3 beijing china
4       chenli4 beijing china
5       chenli5 beijing china
6       chenli6 beijing china
7       chenli7 beijing china
8       chenli8 beijing china
9       chenli9 beijing china
Time taken: 0.666 seconds, Fetched: 10 row(s)
```

參考

https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest#StreamingDataIngest-Limitations

hive Hcatalog streaming API使用

hive streaming

案例

參考

【SQL進階】CASE語句的使用

npm error Cannot read properties of null (reading 'isDescendantOf')

Spark job提交流程源代碼分析

SparkContext 初始化源代碼分析

HDFS源代碼分析之DataNode DirectoryScanner實現

strace命令使用

HDFS源代碼分析之DataNode BlockScanner實現

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結