參考的安裝文章地址:https://blog.csdn.net/pengjunlee/article/details/81607890
實際安裝的版本爲: hadoop 2.9.2, hive 2.3.6;操作系統:centos 3.10.0-957.1.3.el7.x86_64
但安裝步驟與原文相同,可直接參考。
操作實錄:
1、hive中建立數據庫後,會在hdfs中出現對象的庫名.db的文件夾
文章中,我們使用了create語句進行hive建表,建表之後,再通過show create table 語句,可以看到表信息如下:
hive> show create table user_sample;
OK
CREATE TABLE `user_sample`(
`user_num` bigint,
`user_name` string,
`user_gender` string,
`user_age` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'=',',
'serialization.format'=',')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'file:/user/hive/warehouse/user_test.db/user_sample'
TBLPROPERTIES (
'transient_lastDdlTime'='1569339780')
可見,表user_sample對應着文件: /user/hive/warehouse/user_test.db/user_sample
建表的目的是爲了和數據文件進行映射;建立表之後(默認建立在default庫中),hive會在hdfs上建立對應的文件夾,文件夾的名字就是表名稱;
mysql中的元數據信息,這涉及到hive的內部工作機制:
MariaDB [hive]> select * from DBS;
+-------+-----------------------+----------------------------------------+-----------+------------+------------+
| DB_ID | DESC | DB_LOCATION_URI | NAME | OWNER_NAME | OWNER_TYPE |
+-------+-----------------------+----------------------------------------+-----------+------------+------------+
| 1 | Default Hive database | file:/user/hive/warehouse | default | public | ROLE |
| 2 | NULL | file:/user/hive/warehouse/user_test.db | user_test | root | USER |
+-------+-----------------------+----------------------------------------+-----------+------------+------------+
字段信息:
MariaDB [hive]> select * from COLUMNS_V2;
+-------+---------+-------------+-----------+-------------+
| CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |
+-------+---------+-------------+-----------+-------------+
| 1 | NULL | user_age | int | 3 |
| 1 | NULL | user_gender | string | 2 |
| 1 | NULL | user_name | string | 1 |
| 1 | NULL | user_num | bigint | 0 |
+-------+---------+-------------+-----------+-------------+
由此可見,我們在hive中建立一張表,實際上包含以下內容:
(1)在mysql中記錄這張表的定義;
(2)在hdfs中創建目錄;
(3)只要把數據文件放到目錄下,就可以在hive中進行查詢了;
(4)因此,不同的hive只要是操作的同一個mysq,同一個hdfs集羣,看到的數據是一致的;
2、hive2支持spark作爲engine
配置在hive-site.xml中,name: hive.execution.engine。默認爲mr,可以修改爲spark,替代mr提升效率