1. 加載數據(load)
LOAD DATA [LOCAL] INPATH ‘filepath’ [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 …)]
加載數據到表中時,Hive不做任何轉換。加載操作只是把數據拷貝或移動操作,即移動數據文件到Hive表相應的位置。
加載的目標可以是一個表,也可以是一個分區。如果表是分區的,則必須通過指定所有分區列的值來指定一個表的分區。
filepath可以是一個文件,也可以是一個目錄。不管什麼情況下,filepath被認爲是一個文件集合。LOCAL:表示輸入文件在本地文件系統(Linux),如果沒有加LOCAL,hive則會去HDFS上查找該文件。
- OVERWRITE:重寫,覆蓋。
- PARTITION:如果表中存在分區,可以按照分區進行導入。
導入數據
建表
1. hive> CREATE TABLE emp ( 2. > empno int, 3. > ename string, 4. > job string, 5. > mgr int, 6. > hiredate string, 7. > salary double, 8. > comm double, 9. > deptno int 10. > ) ROW FORMAT DELIMITED FIELDS TERMINATED BY "\n"; 11. OK 12. Time taken: 0.54 seconds
導入本地文本
1. LOAD DATA LOCAL INPATH '/home/hadoop/emp.txt' OVERWRITE INTO TABLE emp; 2. hive> select * from emp; 3. OK 4. 7369 SMITH CLERK 7902 1980/12/17 800.0 NULL 20 5. 7499 ALLEN SALESMAN 7698 1981/2/20 1600.0 300.0 30 6. 7521 WARD SALESMAN 7698 1981/2/22 1250.0 500.0 30 7. 7566 JONES MANAGER 7839 1981/4/2 2975.0 NULL 20 8. 7654 MARTIN SALESMAN 7698 1981/9/28 1250.0 1400.0 30 9. 7698 BLAKE MANAGER 7839 1981/5/1 2850.0 NULL 30 10. 7782 CLARK MANAGER 7839 1981/6/9 2450.0 NULL 10 11. 7788 SCOTT ANALYST 7566 1987/4/19 3000.0 NULL 20 12. 7839 KING PRESIDENT NULL 1981/11/17 5000.0 NULL 10 13. 7844 TURNER SALESMAN 7698 1981/9/8 1500.0 0.0 30 14. 7876 ADAMS CLERK 7788 1987/5/23 1100.0 NULL 20 15. 7900 JAMES CLERK 7698 1981/12/3 950.0 NULL 30 16. 7902 FORD ANALYST 7566 1981/12/3 3000.0 NULL 20 17. 7934 MILLER CLERK 7782 1982/1/23 1300.0 NULL 10 18. Time taken: 0.938 seconds, Fetched: 14 row(s)
導入分區
1. hive> load data local inpath '/home/hadoop/dept.txt' into table dept partition (dt='2018-09-09'); 2. Loading data to table default.dept partition (dt=2018-09-09) 3. Partition default.dept{dt=2018-09-09} stats: [numFiles=1, totalSize=84] 4. OK 5. Time taken: 10.631 seconds 6. hive> select * form dept; 7. FAILED: ParseException line 1:9 missing EOF at 'form' near '*' 8. hive> select * from dept; 9. OK 10. 10 ACCOUNTING NEW YORK 2018-08-08 12. 20 RESEARCH DALLAS 2018-08-08 13. 30 SALES CHICAGO 2018-08-08 14. 40 OPERATIONS BOSTON 2018-08-08 15. 10 ACCOUNTING NEW YORK 2018-09-09 16. 20 RESEARCH DALLAS 2018-09-09 17. 30 SALES CHICAGO 2018-09-09 18. 40 OPERATIONS BOSTON 2018-09-09 19. Time taken: 1.385 seconds, Fetched: 8 row(s)
2. 插入數據(insert into)
標準插入
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 …) [IF NOT EXISTS]] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 …)] select_statement1 FROM from_statement;
可以使用PARTITION 關鍵字,進行分區插入。
OVERWRITE是否選擇覆蓋。
使用插入語法會跑mr作業。向emp1表中插入emp表
1. hive> insert overwrite table emp1 select * from emp; 2. Query ID = hadoop_20180109081212_d62e58f3-946c-465e-999d-2ddf0d76d807 3. Total jobs = 3 4. Launching Job 1 out of 3 5. hive> select * from emp1; 6. OK 7. 7369 SMITH CLERK 7902 1980/12/17 800.0 NULL 20 8. 7499 ALLEN SALESMAN 7698 1981/2/20 1600.0 300.0 30 9. 7521 WARD SALESMAN 7698 1981/2/22 1250.0 500.0 30 10. 7566 JONES MANAGER 7839 1981/4/2 2975.0 NULL 20 11. 7654 MARTIN SALESMAN 7698 1981/9/28 1250.0 1400.0 30 12. 7698 BLAKE MANAGER 7839 1981/5/1 2850.0 NULL 30 13. 7782 CLARK MANAGER 7839 1981/6/9 2450.0 NULL 10 14. 7788 SCOTT ANALYST 7566 1987/4/19 3000.0 NULL 20 15. 7839 KING PRESIDENT NULL 1981/11/17 5000.0 NULL 10 16. 7844 TURNER SALESMAN 7698 1981/9/8 1500.0 0.0 30 17. 7876 ADAMS CLERK 7788 1987/5/23 1100.0 NULL 20 18. 7900 JAMES CLERK 7698 1981/12/3 950.0 NULL 30 19. 7902 FORD ANALYST 7566 1981/12/3 3000.0 NULL 20 20. 7934 MILLER CLERK 7782 1982/1/23 1300.0 NULL 10 21. Time taken: 0.211 seconds, Fetched: 14 row(s)
按字段進行插入時,不要把字段順序寫錯,否則插入時不會報錯,但要查找數據時,數據查詢不到
演示:
把job,ename順序寫錯進行插入1. hive> insert overwrite table emp2 select empno,job,ename,mgr,hiredate,salary,comm,deptno from emp;
再插入emp表
1. hive> insert into table emp2 select * from emp; 2. hive> select * from emp2; 3. OK 4. 7369 CLERK SMITH 7902 1980/12/17 800.0 NULL 20 5. 7499 SALESMAN ALLEN 7698 1981/2/20 1600.0 300.0 30 6. 7521 SALESMAN WARD 7698 1981/2/22 1250.0 500.0 30 7. 7566 MANAGER JONES 7839 1981/4/2 2975.0 NULL 20 8. 7654 SALESMAN MARTIN 7698 1981/9/28 1250.0 1400.0 30 9. 7698 MANAGER BLAKE 7839 1981/5/1 2850.0 NULL 30 10. 7782 MANAGER CLARK 7839 1981/6/9 2450.0 NULL 10 11. 7788 ANALYST SCOTT 7566 1987/4/19 3000.0 NULL 20 12. 7839 PRESIDENT KING NULL 1981/11/17 5000.0 NULL 10 13. 7844 SALESMAN TURNER 7698 1981/9/8 1500.0 0.0 30 14. 7876 CLERK ADAMS 7788 1987/5/23 1100.0 NULL 20 15. 7900 CLERK JAMES 7698 1981/12/3 950.0 NULL 30 16. 7902 ANALYST FORD 7566 1981/12/3 3000.0 NULL 20 17. 7934 CLERK MILLER 7782 1982/1/23 1300.0 NULL 10 18. 7369 SMITH CLERK 7902 1980/12/17 800.0 NULL 20 19. 7499 ALLEN SALESMAN 7698 1981/2/20 1600.0 300.0 30 20. 7521 WARD SALESMAN 7698 1981/2/22 1250.0 500.0 30 21. 7566 JONES MANAGER 7839 1981/4/2 2975.0 NULL 20 22. 7654 MARTIN SALESMAN 7698 1981/9/28 1250.0 1400.0 30 23. 7698 BLAKE MANAGER 7839 1981/5/1 2850.0 NULL 30 24. 7782 CLARK MANAGER 7839 1981/6/9 2450.0 NULL 10 25. 7788 SCOTT ANALYST 7566 1987/4/19 3000.0 NULL 20 26. 7839 KING PRESIDENT NULL 1981/11/17 5000.0 NULL 10 27. 7844 TURNER SALESMAN 7698 1981/9/8 1500.0 0.0 30 28. 7876 ADAMS CLERK 7788 1987/5/23 1100.0 NULL 20 29. 7900 JAMES CLERK 7698 1981/12/3 950.0 NULL 30 30. 7902 FORD ANALYST 7566 1981/12/3 3000.0 NULL 20 31. 7934 MILLER CLERK 7782 1982/1/23 1300.0 NULL 10 32. Time taken: 2.363 seconds, Fetched: 28 row(s)
查詢結果前14條記錄job,ename是反的,可以成功插入,但是在查詢相關數據時查詢不到結果
多行插入
- FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 …) [IF NOT EXISTS]] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION … [IF NOT EXISTS]] select_statement2]
[INSERT INTO TABLE tablename2 [PARTITION …] select_statement2] …;
多行插入是把from提到語句首,其實質就是簡化標準插入
手動插入數據
INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] …)] VALUES values_row [, values_row …]
1. e> create table stu( 2. > id int, 3. > name string 4. > ) 5. ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; 6. OK 7. Time taken: 0.405 seconds 8. hive> select * from stu; 9. OK 10. hive> insert into table stu values(1,'zhangsan'),(2,'lisi); 11. hive> select * from stu; 12. OK 13. 1 zhangsan 14. 2 lisi
3. 數據導出
標準導出
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
[ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0)
SELECT … FROM …LOCAL:加上LOCAL關鍵字代表導入本地系統,不加默認導入HDFS;
STORED AS:可以指定存儲格式。1. hive> insert overwrite local directory '/home/hadoop/data' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' select * from stu; 2. [hadoop@zydatahadoop001 data]$ pwd 3. /home/hadoop/data 4. [hadoop@zydatahadoop001 data]$ cat 000000_0 5. 1 zhangsan 6. 2 lisi
多條導出
FROM from_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] …1. hive> from emp 2. > INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/tmp/hivetmp1' 3. > ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" 4. > select empno, ename 5. > INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/tmp/hivetmp2' 6. > ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" 7. > select ename; 8. [hadoop@zydatahadoop001 tmp]$ pwd 9. /home/hadoop/tmp 10. [hadoop@zydatahadoop001 tmp]$ cat hivetmp1/000000_0 11. 7369 SMITH 12. 7499 ALLEN 13. 7521 WARD 14. 7566 JONES 15. 7654 MARTIN 16. 7698 BLAKE 17. 7782 CLARK 18. 7788 SCOTT 19. 7839 KING 20. 7844 TURNER 21. 7876 ADAMS 22. 7900 JAMES 23. 7902 FORD 24. 7934 MILLER 25. [hadoop@zydatahadoop001 tmp]$ cat hivetmp2/000000_0 26. SMITH 27. ALLEN 28. WARD 29. JONES 30. MARTIN 31. BLAKE 32. CLARK 33. SCOTT 34. KING 35. TURNER 36. ADAMS 37. JAMES 38. FORD 39. MILLER
4. SELECT
where條件語句
查詢員工表deptno=10的員工
1. hive> select * from emp where deptno=10; 2. OK 3. 7782 CLARK MANAGER 7839 1981/6/9 2450.0 NULL 10 4. 7839 KING PRESIDENT NULL 1981/11/17 5000.0 NULL 10 5. 7934 MILLER CLERK 7782 1982/1/23 1300.0 NULL 10 6. Time taken: 1.144 seconds, Fetched: 3 row(s)
查詢員工編號小於等於7800的員工
1. hive> select * from emp where empno <= 7800;
查詢員工工資大於1000小於1500的員工
1. hive> select * from emp where salary between 1000 and 1500;
查詢前5條記錄
1. hive> select * from emp limit 5;
查詢編號爲7566或7499的員工
1. hive> select * from emp where empno in(7566,7499);
查詢有津貼不爲空的員工
1. hive> select * from emp where comm is not null;
來自@若澤大數據