數據量比較大時,選擇官方推薦方案distcp;
1.創建目標數據庫
CREATE DATABASE IF NOT EXISTS xxxxxx LOCATION '/xxx/xxx/xxxx/xxxx.db';
2.創建目標表,與原表信息博保持一致
CREATE [EXTERNAL] TABLE `xxxx`(
`uid` string,
`channel` string)
PARTITIONED BY (
`date` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'/xxx/xxxx/xxx.db/xxxx';
3.distcp遷移數據
./hadoop distcp hdfs://source/table/dir hdfs://target/table/dir
具體使用請詳細看官網文檔:https://hadoop.apache.org/docs/r3.1.1/hadoop-distcp/DistCp.html
4.恢復數據
(1)外部表比較簡單:msck repair table xxxxx; 分區也會自己修復好;
(2)內部表:
LOAD DATA INPATH '/xxx/xxx/xxx' OVERWRITE INTO TABLE xxx;
LOAD DATA INPATH '/xxx/xxxx/xxx' OVERWRITE INTO TABLE xxx PARTITION (xxxx);
ALTER TABLE xxxx ADD PARTITION (xxxx) location '/xxx/xxx/xxxx/xxxx';
有問題加QQ羣:877769335
或者用QQ掃描二維碼加羣: