本篇介紹如何在hive中查詢、更新、插入ES數據，以及把數據從hive導入到es中。本方案適用於任何hive可以掛外表的數據庫類型。

-----------------------------------------------在HIVE中操作ES----------------------------------------------

官方文檔：https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html

下載：https://www.elastic.co/cn/downloads/past-releases#es-hadoop

參數：https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html

環境

CDH-6.3.1

hive-2.1.1

ElasticSearch-6.6.2  (ES7.7一樣的，因爲之前用的ES6沒升級，修改配置需要重啓HIVE所以我就不用ES7演示了)

依賴

要在hive中查詢操作ES需要先將ES的jar包放在hive的依賴環境中，這裏有幾種方式，首先下載jar包，到上面的鏈接中下載對應ES版本的elasticsearch-hadoop-hive-6.6.2.jar，然後到hive的lib下【/opt/cloudera/parcels/CDH/jars/commons-httpclient-3.1.jar】找到commons-httpclient-3.1.jar，待會講爲什麼需要它。

在hive中添加外部jar包，hive中加載外部jar包的方式有幾種：

1. hive shell中add jar

session級別的配置，只在hive命令行生效，不需要重啓服務。

#在hive shell中：
#本地jar
add jar /home/tools/wyk/elasticsearch-hadoop-hive-6.6.2.jar;
add jar /home/tools/wyk/commons-httpclient-3.1.jar;
#或hdfs的jar包
add jar  hdfs://nameservice1/tmp/hive/elasticsearch-hadoop-hive-6.6.2.jar;
add jar  hdfs://nameservice1/tmp/hive/commons-httpclient-3.1.jar;

2. 啓動hive shell時添加配置

session級別的配置，只在hive命令行生效，不需要重啓服務。

hive -hiveconf hive.aux.jars.path=/home/tools/wyk/elasticsearch-hadoop-hive-6.6.2.jar

3. 在hive-site.xml中添加配置

服務級別的配置，在hive-shell和hiveserver2都可以生效，需要重啓Hive。

vim hive-site.xml

<property>
  <name>hive.aux.jars.path</name>
  <value>/path/elasticsearch-hadoop.jar</value>
  <description>A comma separated list (with no spaces) of the jar files</description>
</property>

4. 將jar放置在hive.aux.jars.path下【推薦】

服務級別的配置，在hive-shell和hiveserver2都可以生效，需要重啓Hive，

把elasticsearch-hadoop-hive-6.6.2.jar和commons-httpclient-3.1.jar 放在hiveserver2節點的/home/public/java_project/udf/目錄，此目錄在CM中可配置，參數爲hive.aux.jars.path。

爲什麼要添加httpclient的jar？

如果不加的話很可能會遇到下面的錯誤信息，將【/opt/cloudera/parcels/CDH/jars/commons-httpclient-3.1.jar】文件也通過上面的方式加到依賴中即可解決此問題：

java.lang.ClassNotFoundException: org.apache.commons.httpclient.protocol.ProtocolSocketFactory

Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: org/apache/commons/httpclient/protocol/ProtocolSocketFactory
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: java.lang.NoClassDefFoundError: org/apache/commons/httpclient/protocol/ProtocolSocketFactory
	at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransportFactory.create(CommonsHttpTransportFactory.java:40)
	at org.elasticsearch.hadoop.rest.NetworkClient.selectNextNode(NetworkClient.java:102)
	at org.elasticsearch.hadoop.rest.NetworkClient.<init>(NetworkClient.java:85)
	at org.elasticsearch.hadoop.rest.NetworkClient.<init>(NetworkClient.java:61)
	at org.elasticsearch.hadoop.rest.RestClient.<init>(RestClient.java:94)
	at org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:90)
	at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:581)
	at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:173)
	at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:58)
	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:769)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
	at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
	at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:146)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:484)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148)
	... 8 more
Caused by: java.lang.ClassNotFoundException: org.apache.commons.httpclient.protocol.ProtocolSocketFactory
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 25 more

驗證

通過上面第四種方式將jar放放置在hs2節點的hive.aux.jar.path路徑下並重啓Hive之後即可生效，在HUE中也可以直接通過Hive操作ES了。

hive建表屬性

建表的TBLPROPERTIES中的屬性可以在官網文檔中查看：hive建表配置

...
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES ( 
'es.index.auto.create'='false',--自動創建索引
'es.index.read.missing.as.empty'='true', --防止查詢爲空值報錯
'es.mapping.names'='id:id,name:name,age:age,pt:pt', --hive字段對應的ES字段，手動指定或自動映射
'es.mapping.id'='id', --指定hive的字段對應es索引的_id值
'es.resource'='wyk_csdn/_doc'  --ES中的索引
'es.net.http.auth.pass'='ES密碼', --ES密碼
'es.net.http.auth.user'='ES登錄名', --ES登錄名
'es.nodes'='localhost',  --ES地址
'es.port'='9200', 
'es.nodes.wan.only'='true', 
'es.nodes.discovery' = 'false',
'es.read.metadata'='true'
)

類型映射規則

hive中建表的時候可以不指定字段映射，也可以手動指定字段的類型映射，不指定的話會自動映射字段類型，映射規則如下：

hive和ES類型映射規則
Hive type	Elasticsearch type
void	null
boolean	boolean
tinyint	byte
smallint	short
int	int
bigint	bigint
double	double
float	float
string	string
binary	binary
timestamp	date
struct	map
map	map
array	array
union	not supported (yet)
decimal	string
date	date
varchar	string
char	string

驗證交互

在Hive中創建ES外表並在ES中創建索引並插入數據：

-- 在Hive中創建ES外表並在ES中創建索引並插入數據
DROP TABLE IF exists default.es_bigdata_csdn01;
CREATE EXTERNAL TABLE if not exists default.es_bigdata_csdn01 (
 name string,
 age bigint,
 email string
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'wyk_hive_csdn01/_doc',
'es.index.auto.create' = 'true', --自動建index
'es.index.read.missing.as.empty'='true',
'es.nodes' = '10.1.174.10',
'es.port'='9200'
);
-- 此時建完hive表，在es中沒有索引
-- 插入一條數據後，在es中才能看到該索引
insert into default.es_bigdata_csdn01 values('王義凱',28,'[email protected]');

建hive外表讀ES中已有的索引：

-- 建hive外表讀已有的索引
CREATE EXTERNAL TABLE if not exists default.es_bigdata_csdn02(
 name string,
 age bigint,
 email string
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'wyk_hive_csdn01/_doc',
'es.index.auto.create' = 'false', --因爲是已有索引，所以無需自動建，其實就算選擇了自動建，也可以讀到已有的數據
'es.index.read.missing.as.empty'='true',
'es.nodes' = '10.1.174.10',
'es.port'='9200'
);

hive建表的時候指定ES索引的id唯一鍵映射：

在上面的Kibana截圖中可以看到這條通過hive插入的數據的_id 屬性的值是自動生成的，那麼如果我們想指定該_id列爲hive表的name列，這樣的話就可以通過hive對ES的數據進行更新了，因爲ES中_id相同的內容會自動更新。

-- 在hive中建ES外表並指定主鍵_id的映射字段
CREATE EXTERNAL TABLE if not exists default.es_bigdata_csdn03 (
 name string,
 age bigint,
 email string
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'wyk_hive_csdn03/_doc',
'es.index.auto.create' = 'true', --若es中沒有wyk_hive_csdn03索引則自動創建(需要在hive中對該索引插入數據纔會生效)
'es.mapping.id'='name', --指定_id對應hive表的name字段
'es.index.read.missing.as.empty'='true',
'es.nodes' = '10.1.174.10',
'es.port'='9200'
);

insert into default.es_bigdata_csdn03 values('王義凱',28,'[email protected]');

此時我們在Hive中對該記錄進行更新，看看在ES中該記錄會變成什麼樣，注意在hive中無法對ES表使用update和delete命令，因此我們仍舊使用insert命令只要主鍵相同就會自動更新：

insert into default.es_bigdata_csdn03 values('王義凱',29,'[email protected]');
insert into default.es_bigdata_csdn03 values('Rick.Wang',30,'[email protected]');

全量覆蓋 insert overwrite

效果等同於append，insert into，如果有相同主鍵則更新沒有則insert。

--insert overwrite的效果等同於insert into
insert overwrite table default.es_bigdata_csdn03 select 'Rick.Wang',33,'[email protected]';

多類型測試

在hive中建一個測試表，類型豐富，測試在ES中自動映射之後的類型：

drop table if exists default.es_bigdata_csdn04;
CREATE  TABLE if not exists default.es_bigdata_csdn04(
 id bigint,
 str_type string,
 decimal_type decimal(5,2),
 double_type double,
 float_type float,
 ins_ts timestamp,
 arr array<string>
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'wyk_hive_csdn04/_doc',
'es.index.auto.create' = 'true',
'es.mapping.id'='id',
'es.index.read.missing.as.empty'='true',
'es.nodes' = '10.1.174.10',
'es.port'='9200'
);
--插入記錄
insert into default.es_bigdata_csdn04 select 1,'str',3.14,4.13333,1.3333444,current_timestamp(),array('str1','str2','str3');

注意這裏插入數據的時候不能用insert ... values(), 因爲帶有集合類型的記錄，只能用insert .. select ...的寫法，否則會報下面的錯誤：

FAILED: SemanticException [Error 10293]: Unable to create temp file for insert values Expression of type TOK_FUNCTION not supported in insert/values

從hive將數據導入ES

這個就很簡單了，到目前未知應該很多人都能想到了，其實就是按照hive源表結構創建hive外表連接ES，然後在hive中執行下面的命令即可：

insert into hive_es select * from hive_src_tbl;

hive連接ES索引的其他的注意事項

不支持update,delete,truncate命令；
truncate命令對hive外表(不能對no-managed的表進行清空操作)；
truncate命令對hive內表(不支持清空non-native 非原生表)；
刪除hive中的表(無論是External表還是Internal表)，都是隻會刪掉hive表，對ES中的索引無影響；
insert overwrite 全表覆蓋，效果等同於append，insert into，如果有相同主鍵則更新沒有則insert；

希望本文對你有幫助，請點個贊鼓勵一下作者吧~ 謝謝！

ELK系列(十三)、在Hive中操作ES的索引數據，創建/查詢/更新/插入

環境

依賴

1. hive shell中add jar

2. 啓動hive shell時添加配置

3. 在hive-site.xml中添加配置

4. 將jar放置在hive.aux.jars.path下【推薦】

爲什麼要添加httpclient的jar？

驗證

hive建表屬性

類型映射規則

驗證交互

在Hive中創建ES外表並在ES中創建索引並插入數據：

建hive外表讀ES中已有的索引：

hive建表的時候指定ES索引的id唯一鍵映射：

全量覆蓋 insert overwrite

多類型測試

從hive將數據導入ES

hive連接ES索引的其他的注意事項

Redis系列(五)、數據類型之無序集合Set

Redis系列(四)、數據類型之列表List

Redis系列(六)、數據類型之有序集合ZSet(sorted_set)

Redis系列(八)、常用服務器命令

ELK系列(十四)、在Python中操作ES，創建/查詢/插入/更新/刪除

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結