Atlas 2.1.0 實踐（3）—— Atlas集成HIve

原創

2021-01-26 13:13

Atlas集成Hive

在安裝好Atlas以後，如果想要使用起來，還要讓Atlas與其他組件建立聯繫。

其中最常用的就是Hive。

通過Atlas的架構，只要配置好Hive Hook ，那麼每次Hive做任何操作就會寫入Kafka從而被atlas接收。

並在Atlas中已圖的形式展示出來。

Hive Model

都會記錄Hive哪些操作信息呢？Altas對Hive Model進行了定義。

包含以下內容：

1、實體類型：

hive_db

類型： Asset

屬性：qualifiedName, name, description, owner, clusterName, location, parameters, ownerName

hive_table

類型：DataSet

屬性：qualifiedName, name, description, owner, db, createTime, lastAccessTime, comment, retention, sd, partitionKeys, columns, aliases, parameters, viewOriginalText, viewExpandedText, tableType, temporary

hive_column

類型：DataSet

屬性：qualifiedName, name, description, owner, type, comment, table

hive_storagedesc

類型：Referenceable

屬性： qualifiedName, table, location, inputFormat, outputFormat, compressed, numBuckets, serdeInfo, bucketCols, sortCols, parameters, storedAsSubDirectories

hive_process

類型：Process

屬性：qualifiedName, name, description, owner, inputs, outputs, startTime, endTime, userName, operationType, queryText, queryPlan, queryId, clusterName

hive_column_lineage

類型：Process

屬性：qualifiedName, name, description, owner, inputs, outputs, query, depenendencyType, expression

2、枚舉類型：

hive_principal_type 值：USER, ROLE, GROUP

3、構造類型

hive_order 屬性： col, order

hive_serde 屬性： name, serializationLib, parameters

HIve實體的結構：

hive_db.qualifiedName:     <dbName>@<clusterName>
hive_table.qualifiedName:  <dbName>.<tableName>@<clusterName>
hive_column.qualifiedName: <dbName>.<tableName>.<columnName>@<clusterName>
hive_process.queryString:  trimmed query string in lower case

配置Hive hook

hive hook會監聽hive的 create/update/delete 操作，下面是配置步驟：

1、修改hive-env.sh（指定包地址）

export HIVE_AUX_JARS_PATH=/opt/apps/apache-atlas-2.1.0/hook/hive

2、修改hive-site.xml（配置完需要重啓hive）

<property>
    <name>hive.exec.post.hooks</name>
    <value>org.apache.atlas.hive.hook.HiveHook</value>
</property>
1234

注意，這裏其實是執行後的監控，可以有執行前，執行中的監控。

3、同步配置
拷貝atlas配置文件atlas-application.properties到hive配置目錄
添加配置：

atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary
atlas.rest.address=http://doit33:21000

將Hive元數據導入Atlas

bin/import-hive.sh

Using Hive configuration directory [/opt/module/hive/conf]

Log file for import is /opt/module/atlas/logs/import-hive.log

log4j:WARN No such property [maxFileSize] in org.apache.log4j.PatternLayout.

log4j:WARN No such property [maxBackupIndex] in org.apache.log4j.PatternLayout.

輸入用戶名：admin；輸入密碼：admin

Enter username for atlas :- admin

Enter password for atlas :-

Hive Meta Data import was successful!!!

踩坑全記錄

一、找不到類 org.apache.atlas.hive.hook.hivehook

hive第三方jar包沒加進去

小技巧使用hive-shell 看一下jar包加進去沒有 set這將打印由用戶或配置單元覆蓋的配置變量列表。

以加入elsaticsearch-hadoop-2.1.2.jar爲例，講述在Hive中加入第三方jar的幾種方式。

1，在hive shell中加入

hive> add jar /home/hadoop/elasticsearch-hadoop-hive-2.1.2.jar;

連接方式	是否有效
Hive Shell	不需要重啓Hive服務就有效
Hive Server	無效

2，Jar放入${HIVE_HOME}/auxlib目錄

在${HIVE_HOME}中創建文件夾auxlib，然後將自定義jar文件放入該文件夾中。
此方法添加不需要重啓Hive。而且比較便捷。

連接方式	是否有效
Hive Shell	不需要重啓Hive服務就有效
Hive Server	重啓Hive服務才生效

3，HIVE.AUX.JARS.PATH和hive.aux.jars.path

hive-env.sh中的HIVE.AUX.JARS.PATH和hive-site.xml的hive.aux.jars.path配置對服務器無效，僅對當前hive shell有效，不同的hive shell相互不影響，每個hive shell都需要配置，可以配置成文件夾形式。
HIVE.AUX.JARS.PATH和hive.aux.jars.path僅支持本地文件。可配置成文件，也可配置爲文件夾。

連接方式	是否有效
Hive Shell	重啓Hive服務才生效
Hive Server	重啓Hive服務才生效

二、HIVE報錯 Failing because I am unlikely to write too

HIVE.AUX.JARS.PATH配置不對

hive-env.sh腳本中有一段

# Folder containing extra libraries required for hive compilation/execution can be controlled by:
if [ "${HIVE_AUX_JARS_PATH}" != "" ]; then
  export HIVE_AUX_JARS_PATH=${HIVE_AUX_JARS_PATH}
elif [ -d "/usr/hdp/current/hive-webhcat/share/hcatalog" ]; then
  export HIVE_AUX_JARS_PATH=/usr/hdp/current/hive-webhcat/share/hcatalog
fi

如果給HIVE_AUX_JARS_PATH設值，則/usr/hdp/current/hive-webhcat/share/hcatalog就會被忽略掉。

hive只能讀取一個HIVE_AUX_JARS_PATH

在一個地方集中放置我們的共享jar包，然後在/usr/hdp/current/hive-webhcat/share/hcatalog下面建立一相應的軟連接就可以

sudo -u hive ln -s /usr/lib/share-lib/elasticsearch-hadoop-2.1.0.Beta4.jar /usr/hdp/current/hive-webhcat/share/hcatalog/elasticsearch-hadoop-2.1.0.Beta4.jar

瞭解大數據實時計算感受數據流動之美歡迎關注實時流式計算

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Atlas 2.1.0 實踐（3）—— Atlas集成HIve

Atlas集成Hive

Hive Model

1、實體類型：

hive_db

hive_table

2、枚舉類型：

3、構造類型

配置Hive hook

將Hive元數據導入Atlas

踩坑全記錄

一、找不到類 org.apache.atlas.hive.hook.hivehook

二、HIVE報錯 Failing because I am unlikely to write too

實操|基於OceanBase打造更穩定的Zabbix監控系統

Milvus 老友匯｜RAG 場景、電商平臺、AI 平臺……如何用向量數據庫構建業務方案？

提高 RAG 應用準確度，時下流行的 Reranker 瞭解一下？

實錄｜三大AI開發神器亮相！李彥宏：人人都是開發者

【開源項目推薦】——純中文本地GPT知識庫搭建項目.assets

下一代APP Store——GPT應用商店GPTs初體驗

數據資產新規！《關於加強數據資產管理的指導意見》發佈（附全文）

【開源項目】輕量元數據管理解決方案——Marquez

元數據管理平臺對比預研 Atlas VS Datahub VS Openmetadata

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結