hive update delete

Hive 從0.14開始支持事務,即支持update和delete操作。事務操作有嚴格的要求,在寫這篇文章時用的1.1.0有以下限制

  1. BEGIN, COMMIT, and ROLLBACK are not yet supported. All language operations are auto-commit. The plan is to support these in a future release.
  2. Only ORC file format is supported in this first release. The feature has been built such that transactions can be used by any storage format that can determine how updates or deletes apply to base records (basically, that has an explicit or implicit row id), but so far the integration work has only been done for ORC.
  3. By default transactions are configured to be off. See the Configuration section below for a discussion of which values need to be set to configure it.
  4. Tables must be bucketed to make use of these features. Tables in the same system not using transactions and ACID do not need to be bucketed.
  5. At this time only snapshot level isolation is supported. When a given query starts it will be provided with a consistent snapshot of the data. There is no support for dirty read, read committed, repeatable read, or serializable. With the introduction of BEGIN the intention is to support snapshot isolation for the duration of transaction rather than just a single query. Other isolation levels may be added depending on user requests.
  6. The existing ZooKeeper and in-memory lock managers are not compatible with transactions. There is no intention to address this issue. See Basic Design below for a discussion of how locks are stored for transactions.

    經過多次試驗,才最終實現,具體步驟如下:
    1、設置相關參數

set hive.support.concurrency=true;
set hive.enforce.bucketing=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.compactor.initiator.on=true;
set hive.compactor.worker.threads=1;

2、創建表

use tmp;
drop table cuixz;
create table cuixz (
log_id int,
agg_status tinyint
)
CLUSTERED BY (log_id) INTO 1 BUCKETS
stored as orc
TBLPROPERTIES ("transactional"="true","NO_AUTO_COMPACTION"="true");  --目前大小寫敏感

3、測試

hive> insert into cuixz select log_id, agg_status from stg_dp.s_log_file;

hive> select * from cuixz where log_id = 3480224;
OK
3480224 NULL
Time taken: 0.213 seconds, Fetched: 1 row(s)

hive> update cuixz set agg_status = 1 where log_id = 3480224;

hive> select * from cuixz where log_id = 3480224;
OK
3480224 1
Time taken: 0.144 seconds, Fetched: 1 row(s)
hive> select * from cuixz where log_id = 3480224;

hive>  select * from cuixz where log_id = 3480224;
OK
Time taken: 0.351 seconds

注:update\delete後面的where表達式不支持子查詢
參考
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/TruncateTable

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章