hive>createtable studenta(> id int,> name string)>row format delimited
>fieldsterminatedby'\t'> stored as textfile;
OK
Time taken: 0.138 seconds
hive>createtable studentb(> id int,> age int)>row format delimited
>fieldsterminatedby'\t'> stored as textfile;
OK
Time taken: 0.057 seconds
創建兩個數據文件
vim studenta.txt
vim studentb.txt
studenta.txt
10001 shiny
10002 mark
10003 angel
10005 ella
10009 jack
10014 eva
10018 judy
10020 cendy
studentb.txt
100012310004221000724100082110009251001225100152010018191002026
載入數據
hive>loaddatalocal inpath '/home/zjt/data/studenta.txt' overwrite intotable studenta;
Loading datatotable ducl_test.studenta
Table ducl_test.studenta stats: [numFiles=1, numRows=0, totalSize=90, rawDataSize=0]
OK
Time taken: 0.248 seconds
hive>loaddatalocal inpath '/home/zjt/data/studentb.txt' overwrite intotable studentb;
Loading datatotable ducl_test.studentb
Table ducl_test.studentb stats: [numFiles=1, numRows=0, totalSize=81, rawDataSize=0]
OK
Time taken: 0.199 seconds
hive>select*from studena;
FAILED: SemanticException [Error 10001]: Line 1:14Tablenot found 'studena'
hive>select*from studenta;
OK
10001 shiny
10002 mark
10003 angel
10005 ella
10009 jack
10014 eva
10018 judy
10020 cendy
Time taken: 0.045 seconds, Fetched: 8row(s)
hive>select*from studentb;
OK
100012310004221000724100082110009251001225100152010018191002026Time taken: 0.047 seconds, Fetched: 9row(s)
1.2 Join操作
1.2.1 內連接JOIN
語法與實例
語法:...join...on...//實例
hive>select*from studenta a join studentb b on a.id=b.id;...
OK
10001 shiny 100012310009 jack 100092510018 judy 100181910020 cendy 1002026Time taken: 34.066 seconds, Fetched: 4row(s)
作用
把符合兩邊連接條件的數據查出來
1.2.2 外連接
左外連接
語法與實例
//語法...leftjoin...on...//實例
hive>select*from studenta a leftjoin studentb b on a.id=b.id;
OK
10001 shiny 100012310002 mark NULLNULL10003 angel NULLNULL10005 ella NULLNULL10009 jack 100092510014 eva NULLNULL10018 judy 100181910020 cendy 1002026Time taken: 28.853 seconds, Fetched: 8row(s)
作用
以左表數據爲匹配標準,左大右小
匹配不上的就是Null
返回的數據條數與左表相同
1.2.3 右外連接
語法與實例
//語法...rightjoin...on...
hive>select*from studenta a rightjoin studentb b on a.id=b.id;
OK
10001 shiny 1000123NULLNULL1000422NULLNULL1000724NULLNULL100082110009 jack 1000925NULLNULL1001225NULLNULL100152010018 judy 100181910020 cendy 1002026Time taken: 28.703 seconds, Fetched: 9row(s)
作用
以右表數據爲匹配標準,左小右大
匹配不上的就是Null
返回的數據條數與右表相同
1.2.4 全外連接
實例
//語法...fulljoin...on...//實例
hive>select*from studenta a fulljoin studentb b on a.id=b.id;
OK
10001 shiny 100012310002 mark NULLNULL10003 angel NULLNULLNULLNULL100042210005 ella NULLNULLNULLNULL1000724NULLNULL100082110009 jack 1000925NULLNULL100122510014 eva NULLNULLNULLNULL100152010018 judy 100181910020 cendy 1002026Time taken: 34.453 seconds, Fetched: 13row(s)
作用
以兩個表的數據爲匹配標準
匹配不上的爲null
反回的數據條數等於兩表數據去重之和
1.3 左半連接
語法與實例
//語法...left semi join...on...//實例
hive>select*from studenta a left semi join studentb b on a.id=b.id;
OK
10001 shiny
10009 jack
10018 judy
10020 cendy
Time taken: 32.075 seconds, Fetched: 4row(s)
hive>createtable employee(> name string,> age int,> work_location array<string>)>row format delimited
>fieldsterminatedby'\t'> collection items terminatedby','//指定數組的分隔符> stored as textfile;
OK
Time taken: 0.096 seconds
數據準備
array.txt
shiny 23 beijing,tianjin,qingdao
jack 34 shanghai,guangzhou
mark 26 beijing,xian
ella 21 beijing
judy 30 shanghai,hangzhou,chongqing
cendy 28 beijing,shanghai,dalian,chengdu
導入數據
[zjt@masterdata]$ vim employee.txt
hive>loaddatalocal inpath '/home/zjt/data/employee.txt' overwrite intotable employee;
Loading datatotable ducl_test.employee
Table ducl_test.employee stats: [numFiles=1, numRows=0, totalSize=174, rawDataSize=0]
OK
Time taken: 0.698 seconds
查詢數據
//查詢所有數據
hive>select*from employee;
OK
shiny 23["beijing","tianjin","qingdao"]
jack 34["shanghai","guangzhou"]
mark 26["beijing","xian"]
ella 21["beijing"]
judy 30["shanghai","hangzhou","chongqing"]
cendy 28["beijing","shanghai","dalian","chengdu"]Time taken: 0.277 seconds, Fetched: 6row(s)//查詢數組中指定位置的數據
hive>select name,age,work_location[0]from employee;
OK
shiny 23 beijing
jack 34 shanghai
mark 26 beijing
ella 21 beijing
judy 30 shanghai
cendy 28 beijing
Time taken: 0.12 seconds, Fetched: 6row(s)//數組索引不足的位置用NULL代替
hive>select name,age,work_location[2]from employee;
OK
shiny 23 qingdao
jack 34NULL
mark 26NULL
ella 21NULL
judy 30 chongqing
cendy 28 dalian
Time taken: 0.084 seconds, Fetched: 6row(s)
2.3.2 映射
創建表
hive>createtable scores(> name string,> score map<string,int>)>row format delimited
>fieldsterminatedby'\t'> collection items terminatedby','> map keysterminatedby':'> stored as textfile;
OK
Time taken: 0.09 seconds
數據準備
scores.txt
shiny chinese:90,math:100,english:99
mark chinese:89,math:56,english:87
judy chinese:94,math:78,english:81
ella chinese:54,math:23,english:48
jack chinese:100,math:95,english:69
cendy chinese:67,math:83,english:45
導入數據
[zjt@masterdata]$ vim scores.txt
hive>loaddatalocal inpath '/home/zjt/data/scores.txt'intotable scores;
Loading datatotable ducl_test.scores
Table ducl_test.scores stats: [numFiles=1, totalSize=214]
OK
Time taken: 0.251 seconds
查詢數據
//查詢所有數據
hive>select*from scores;
OK
shiny {"chinese":90,"math":100,"english":99}
mark {"chinese":89,"math":56,"english":87}
judy {"chinese":94,"math":78,"english":81}
ella {"chinese":54,"math":23,"english":48}
jack {"chinese":100,"math":95,"english":69}
cendy {"chinese":67,"math":83,"english":45}
Time taken: 0.069 seconds, Fetched: 6row(s)//查詢map數據中的某個key,併爲查詢結果添加固定列
hive>select name,'chinese',score["chinese"]from scores;
OK
shiny chinese 90
mark chinese 89
judy chinese 94
ella chinese 54
jack chinese 100
cendy chinese 67Time taken: 1.341 seconds, Fetched: 6row(s)
2.3.3 結構體
創建表
hive>createtable coursescore(> id int,> course struct<name:string,score:int>)>row format delimited
>fieldsterminatedby'\t'> collection items terminatedby','> stored as textfile;
OK
Time taken: 0.323 seconds
descfunction trim
hive>descfunction trim;
OK
trim(str)- Removes the leading and trailing space characters from str
Time taken: 0.008 seconds, Fetched: 1row(s)
4.1.3 顯示函數的擴展信息
descfunctionextended trim
hive>descfunctionextended trim;
OK
trim(str)- Removes the leading and trailing space characters from str
Example:
>SELECT trim(' facebook ')FROM src LIMIT1;'facebook'Time taken: 0.005 seconds, Fetched: 4row(s)
4.2 JSON數據解析-內置函數
4.2.1 創建表
hive>createtable rat_json(> line string)>row format delimited;
OK
//打成jar包添加到Hive的classpath下
hive>add jar /home/zjt/json-udf.jar;
Added [/home/zjt/json-udf.jar]to class path
Added resources: [/home/zjt/json-udf.jar]
hive> list jar;/home/zjt/json-udf.jar
創建臨時函數
//創建臨時函數與開發好的class關聯起來
hive>createtemporaryfunction jsontostring as'org.zjt.hive.udf.JsonUDF';
OK
Time taken: 0.014 seconds