Atlas概述
Apache Atlas爲組織提供開放式元數據管理和治理功能,用以構建其數據資產目錄,對這些資產進行分類和管理,併爲數據分析師和數據治理團隊,提供圍繞這些數據資產的協作功能。
Atlas架構原理
Atlas安裝及使用
1)Atlas官網地址:https://atlas.apache.org/
2)文檔查看地址:https://atlas.apache.org/0.8.4/index.html
3)下載地址:https://www.apache.org/dyn/closer.cgi/atlas/0.8.4/apache-atlas-0.8.4-sources.tar.gz
通常企業開發中選擇集成外部的HBase+Solr,方便項目整體進行集成操作。
安裝Solr5.2.1
1)Solr版本要求必須是5.2.1,見官網 2)Solr下載:http://archive.apache.org/dist/lucene/solr/5.2.1/solr-5.2.1.tgz 4)解壓solr-5.2.1.tgz到/opt/module/目錄下面 [kris@hadoop101 module]$ tar -zxvf solr-5.2.1.tgz -C /opt/module/ 5)修改solr-5.2.1的名稱爲solr [kris@hadoop101 module]$ mv solr-5.2.1/ solr 6)進入solr/bin目錄,修改solr.in.sh文件 [kris@hadoop102 solr]$ vim bin/solr.in.sh #添加下列指令 ZK_HOST="hadoop101:2181,hadoop102:2181,hadoop103:2181" SOLR_HOST="hadoop101" # Sets the port Solr binds to, default is 8983 #可修改端口號 SOLR_PORT=8983 7)分發Solr,進行Cloud模式部署 [kris@hadoop101 module]$ xsync solr 提示:分發完成後,分別對hadoop102、hadoop103主機/opt/module/solr/bin下的solr.in.sh文件,修改爲SOLR_HOST=對應主機名。 8)在三臺節點上分別啓動Solr,這個就是Cloud模式 [kris@hadoop101 solr]$ bin/solr start [kris@hadoop102 solr]$ bin/solr start [kris@hadoop103 solr]$ bin/solr start 提示:啓動Solr前,需要提前啓動Zookeeper服務。 9)Web訪問8983端口,可指定三臺節點中的任意一臺IP,http://hadoop101:8983/solr/#/ 提示:UI界面出現Cloud菜單欄時,Solr的Cloud模式纔算部署成功。 10)編寫Solr啓動停止腳本 (1)在hadoop101的/home/kris/bin目錄下創建腳本 [kris@hadoop102 bin]$ vim s.sh 在腳本中編寫如下內容
#!/bin/bash case $1 in "start"){ for i in hadoop101 hadoop102 hadoop103 do ssh $i "/opt/module/solr/bin/solr start" done };; "stop"){ for i in hadoop101 hadoop102 hadoop103 do ssh $i "/opt/module/solr/bin/solr stop" done };; esac
(2)增加腳本執行權限
[kris@hadoop101 bin]$ chmod +x s.sh
(3)Solr集羣啓動腳本
[kris@hadoop101 module]$ s.sh start
(4)Solr集羣停止腳本
[kris@hadoop101 module]$ s.sh stop
安裝Atlas0.8.4
1)解壓apache-atlas-0.8.4-bin.tar.gz到/opt/module/目錄下面 [kris@hadoop101 module]$ tar -zxvf apache-atlas-0.8.4-bin.tar.gz -C /opt/module/ 2)修改apache-atlas-0.8.4的名稱爲atlas [kris@hadoop101 module]$ mv apache-atlas-0.8.4/ atlas
Atlas集成外部框架
Atlas集成Hbase
1)進入/opt/module/atlas/conf/目錄,修改配置文件
[kris@hadoop101 conf]$ vim atlas-application.properties
#修改atlas存儲數據主機
atlas.graph.storage.hostname=hadoop101:2181,hadoop102:2181,hadoop103:2181
2)進入到/opt/module/atlas/conf/hbase路徑,添加Hbase集羣的配置文件到${Atlas_Home}
[kris@hadoop101 hbase]$
ln -s /opt/module/hbase/conf/ /opt/module/atlas/conf/hbase/
3)在/opt/module/atlas/conf/atlas-env.sh中添加HBASE_CONF_DIR
[kris@hadoop101 conf]$ vim atlas-env.sh
#添加HBase配置文件路徑
export HBASE_CONF_DIR=/opt/module/atlas/conf/hbase/conf
Atlas集成Solr
1)進入/opt/module/atlas/conf目錄,修改配置文件
[kris@hadoop101 conf]$ vim atlas-application.properties
#修改如下配置
atlas.graph.index.search.solr.zookeeper-url=hadoop101:2181,hadoop102:2181,hadoop103:2181
2)將Atlas自帶的Solr文件夾拷貝到外部Solr集羣的各個節點。
[kris@hadoop101 conf]$
cp -r /opt/module/atlas/conf/solr /opt/module/solr/
3)進入到/opt/module/solr路徑,修改拷貝過來的配置文件名稱爲atlas_conf
[kris@hadoop101 solr]$ mv solr atlas_conf
4)在Cloud模式下,啓動Solr(需要提前啓動Zookeeper集羣),並創建collection
[kris@hadoop101 solr]$ bin/solr create -c vertex_index -d /opt/module/solr/atlas_conf -shards 3 -replicationFactor 2
[kris@hadoop101 solr]$ bin/solr create -c edge_index -d /opt/module/solr/atlas_conf -shards 3 -replicationFactor 2
[kris@hadoop101 solr]$ bin/solr create -c fulltext_index -d /opt/module/solr/atlas_conf -shards 3 -replicationFactor 2
-shards 3:表示該集合分片數爲3
-replicationFactor 2:表示每個分片數都有2個備份
vertex_index、edge_index、fulltext_index:表示集合名稱
注意:如果需要刪除vertex_index、edge_index、fulltext_index等collection可以執行如下命令。
[kris@hadoop101 solr]$ bin/solr delete -c ${collection_name}
創建collection的詳細如下:
[kris@hadoop101 solr]$ bin/solr create -c vertex_index -d /opt/module/solr/atlas_conf -shards 3 -replicationFactor 2 Connecting to ZooKeeper at hadoop101:2181,hadoop102:2181,hadoop103:2181 Uploading /opt/module/solr/atlas_conf for config vertex_index to ZooKeeper at hadoop101:2181,hadoop102:2181,hadoop103:2181 Creating new collection 'vertex_index' using command: http://hadoop103:8983/solr/admin/collections?action=CREATE&name=vertex_index&numShards=3&replicationFactor=2&maxShardsPerNode=2&collection.configName=vertex_index { "responseHeader":{ "status":0, "QTime":5435}, "success":{"":{ "responseHeader":{ "status":0, "QTime":5094}, "core":"vertex_index_shard1_replica1"}}} [kris@hadoop101 solr]$ bin/solr create -c edge_index -d /opt/module/solr/atlas_conf -shards 3 -replicationFactor 2 Connecting to ZooKeeper at hadoop101:2181,hadoop102:2181,hadoop103:2181 Uploading /opt/module/solr/atlas_conf for config edge_index to ZooKeeper at hadoop101:2181,hadoop102:2181,hadoop103:2181 Creating new collection 'edge_index' using command: http://hadoop103:8983/solr/admin/collections?action=CREATE&name=edge_index&numShards=3&replicationFactor=2&maxShardsPerNode=2&collection.configName=edge_index { "responseHeader":{ "status":0, "QTime":3280}, "success":{"":{ "responseHeader":{ "status":0, "QTime":3116}, "core":"edge_index_shard3_replica2"}}} [kris@hadoop101 solr]$ bin/solr create -c fulltext_index -d /opt/module/solr/atlas_conf -shards 3 -replicationFactor 2 Connecting to ZooKeeper at hadoop101:2181,hadoop102:2181,hadoop103:2181 Uploading /opt/module/solr/atlas_conf for config fulltext_index to ZooKeeper at hadoop101:2181,hadoop102:2181,hadoop103:2181 Creating new collection 'fulltext_index' using command: http://hadoop103:8983/solr/admin/collections?action=CREATE&name=fulltext_index&numShards=3&replicationFactor=2&maxShardsPerNode=2&collection.configName=fulltext_index { "responseHeader":{ "status":0, "QTime":3455}, "success":{"":{ "responseHeader":{ "status":0, "QTime":3115}, "core":"fulltext_index_shard3_replica1"}}}
5)驗證創建collection成功
登錄solr web控制檯:http://hadoop101:8983/solr/#/~cloud 看到如下圖顯示:
Atlas集成Kafka
1)進入/opt/module/atlas/conf/目錄,修改配置文件atlas-application.properties
[kris@hadoop101 conf]$ vim atlas-application.properties
######### Notification Configs #########
atlas.notification.embedded=false
atlas.kafka.zookeeper.connect=hadoop101:2181,hadoop102:2181,hadoop103:2181
atlas.kafka.bootstrap.servers=hadoop101:9092,hadoop102:9092,hadoop103:9092
atlas.kafka.zookeeper.session.timeout.ms=4000
atlas.kafka.zookeeper.connection.timeout.ms=2000
atlas.kafka.enable.auto.commit=true
2)啓動Kafka集羣,並創建Topic
[kris@hadoop101 kafka]$ bin/kafka-topics.sh --zookeeper hadoop101:2181,hadoop102:2181,hadoop103:2181 --create --replication-factor 3 --partitions 3 --topic _HOATLASOK
[kris@hadoop101 kafka]$ bin/kafka-topics.sh --zookeeper hadoop101:2181,hadoop102:2181,hadoop103:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_ENTITIES
Atlas其他配置
1)進入/opt/module/atlas/conf/目錄,修改配置文件atlas-application.properties
[kris@hadoop101 conf]$ vim atlas-application.properties
######### Server Properties #########
atlas.rest.address=http://hadoop101:21000
# If enabled and set to true, this will run setup steps when the server starts
atlas.server.run.setup.on.start=false
######### Entity Audit Configs #########
atlas.audit.hbase.zookeeper.quorum=hadoop101:2181,hadoop102:2181,hadoop103:2181
2)記錄性能指標,進入/opt/module/atlas/conf/路徑,修改當前目錄下的atlas-log4j.xml
[kris@hadoop101 conf]$ vim atlas-log4j.xml
#去掉如下代碼的註釋
<appender name="perf_appender" class="org.apache.log4j.DailyRollingFileAppender">
<param name="file" value="${atlas.log.dir}/atlas_perf.log" />
<param name="datePattern" value="'.'yyyy-MM-dd" />
<param name="append" value="true" />
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d|%t|%m%n" />
</layout>
</appender>
<logger name="org.apache.atlas.perf" additivity="false">
<level value="debug" />
<appender-ref ref="perf_appender" />
</logger>
Atlas集成Hive
1)進入/opt/module/atlas/conf/目錄,修改配置文件atlas-application.properties
[kris@hadoop101 conf]$ vim atlas-application.properties
######### Hive Hook Configs #######
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary
2)將atlas-application.properties配置文件加入到atlas-plugin-classloader-1.0.0.jar中
[kris@hadoop101 hive]$ zip -u /opt/module/atlas/hook/hive/atlas-plugin-classloader-0.8.4.jar /opt/module/atlas/conf/atlas-application.properties
[kris@hadoop101 hive]$ cp /opt/module/atlas/conf/atlas-application.properties /opt/module/hive/conf/
原因:這個配置不能參照官網,將配置文件考到hive的conf中。參考官網的做法一直讀取不到atlas-application.properties配置文件,
看了源碼發現是在classpath讀取的這個配置文件,所以將它壓到jar裏面。
3)在/opt/module/hive/conf/hive-site.xml文件中設置Atlas hook
[kris@hadoop101 conf]$ vim hive-site.xml
<property>
<name>hive.exec.post.hooks</name>
<value>org.apache.atlas.hive.hook.HiveHook</value>
</property>
[kris@hadoop101 conf]$ vim hive-env.sh
#在tez引擎依賴的jar包後面追加hive插件相關jar包
export HIVE_AUX_JARS_PATH=/opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar$TEZ_JARS,/opt/module/atlas/hook/hive/atlas-plugin-classloader-0.8.4.jar,/opt/module/atlas/hook/hive/hive-bridge-shim-0.8.4.jar
將Hive元數據導入Atlas
1)配置Hive環境變量
[kris@hadoop101 hive]$ sudo vim /etc/profile
#配置Hive環境變量
export HIVE_HOME=/opt/module/hive
export PATH=$PATH:$HIVE_HOME/bin/
[kris@hadoop101 hive]$ source /etc/profile
2)啓動Hive,如果Hive能正常啓動說明環境OK,就可以退出Hive客戶端
[kris@hadoop101 hive]$ hive
hive (default)> show databases;
hive (default)> use gmall;
3)在/opt/module/atlas/路徑,將Hive元數據導入到Atlas
[kris@hadoop101 atlas]$ bin/import-hive.sh
Using Hive configuration directory [/opt/module/hive/conf]
Log file for import is /opt/module/atlas/logs/import-hive.log
log4j:WARN No such property [maxFileSize] in org.apache.log4j.PatternLayout.
log4j:WARN No such property [maxBackupIndex] in org.apache.log4j.PatternLayout.
輸入用戶名:admin;輸入密碼:admin
Enter username for atlas :- admin
Enter password for atlas :-
Hive Meta Data import was successful!!!
在Hive中創建test庫,total_amount_result表:刷新頁面即可實時展示出來:
hive (default)> create database test;
OK
Time taken: 0.034 seconds
hive (default)> use test;
OK
Time taken: 0.019 seconds
hive (test)> CREATE EXTERNAL TABLE total_amount_result(
> `order_id` int,
> `update_time` string,
> `deal_total_amount` double
> )row format delimited fields terminated by ','
> ;
OK
必須得跑個腳本任務atlas它才知道依賴關係,歷史數據它不知道;
執行ads_gmv.sh腳本:
表之間的血緣圖:
Lineage血緣圖、Audits記錄修改時間等詳情信息
字段的血緣圖:
Rest API使用(二次開發使用)
如果需要對Atlas進行二次開發,可以查看官網地址,https://atlas.apache.org/api/v2/index.html,根據官方API進行二次開發。