cdh6.3.2上部署janusgraph0.5.2和atlas2.0.0

一、cdh6.3.2上部署janusgraph0.5.2+hbase2.1.0-cdh6.3.2+elasticsearch6.6.0(通過kibana6.6.0開啓了xpack和鉑金服務【es不可太高版本】以使用其圖算法),參考一:圖數據庫JanusGraph實戰[6]: JanusGraph+HBase+ElasticSearch的環境搭建,其有兩個小注意點:
1.關於jp官網上所提ssl方式連接es(圖一)一直報錯kibana的證書無法加載問題(圖二),不過可以通過http方式正常使用es(圖三)

具體細節參見上面參考教程的評論區。

2.只有MixedIndex纔會自動同步至es

二、cdh6.3.2上部署atlas2.0.0+hbase2.1.0-cdh6.3.2+hive2.1.1-cdh6.3.2+kafka2.2.1-cdh6.3.2++elasticsearch6.6.0(通過kibana6.6.0開啓了xpack),參考二:CDH整合APACHE ATLAS管理元數據,以及遇到的一些坑和解決辦法

1.關於版本選擇,參考三:CDH6.x對應的Apache Atlas版本選擇

2.關於使用atlas的意義,參考四五的文首兩段:Apache atlas集成CDH管理元數據CDH6配置 Atlas,及 Hive Hook

3.參考二文中的步驟,需略微修改幾處:

0)按照參考二配置hive後

重啓hive報錯

此時我把cdh6.3.2-worker03上單機部署的atlas等下文配置好atlas-application.properties後執行:

[root@cdh632-worker03 module]# scp -r ./apache-atlas-sources-2.0.0 root@cdh632-master01:/opt/module/
[root@cdh632-master01 module]# cd /opt/
[root@cdh632-master01 opt]# ln -s /opt/module/apache-atlas-sources-2.0.0/distro/target/apache-atlas-2.0.0-server/apache-atlas-2.0.0 atlas

切記:

(1)一定要把下文該配置的都配置好後再去重啓hive,因爲scp源碼後要確保各其內容一樣
(2)按報錯我只需從cdh632-worker03機器scp至cdh632-master01機器,因爲

1)由於本次atlas的索引放至自己部署的es集羣上,故編譯的第二個命令

mvn clean -DskipTests package -Pdist
不可用下面兩個
使用以下命令進行編譯 表示使用外部的hbase與solr
mvn clean package -DskipTests -Pdist,external-hbase-solr
如果要使用Atlas內嵌的HBase與solr,那麼我們可以執行以下命令進行編譯
mvn clean package -DskipTests -Pdist,embedded-hbase-solr

2)參考二文中的atlas-application.properties文件,由於我es開啓了xpack以及執行import-hive腳本報錯問題,我的此配置文件(比參考二多了四行)最終爲:

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

#########  Graph Database Configs  #########

# Graph Database

#Configures the graph database to use.  Defaults to JanusGraph
#atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase

# Graph Storage
# Set atlas.graph.storage.backend to the correct value for your desired storage
# backend. Possible values:
#
# hbase
# cassandra
# embeddedcassandra - Should only be set by building Atlas with  -Pdist,embedded-cassandra-solr
# berkeleyje
#
# See the configuration documentation for more information about configuring the various  storage backends.
#
atlas.graph.storage.backend=hbase
atlas.graph.storage.hbase.table=apache_atlas_janus

#Hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here
atlas.graph.storage.hostname=cdh632-master01,cdh632-worker02,cdh632-worker03
atlas.graph.storage.hbase.regions-per-server=1
atlas.graph.storage.lock.wait-time=10000

#In order to use Cassandra as a backend, comment out the hbase specific properties above, and uncomment the
#the following properties
#atlas.graph.storage.clustername=
#atlas.graph.storage.port=

# Gremlin Query Optimizer
#
# Enables rewriting gremlin queries to maximize performance. This flag is provided as
# a possible way to work around any defects that are found in the optimizer until they
# are resolved.
#atlas.query.gremlinOptimizerEnabled=true

# Delete handler
#
# This allows the default behavior of doing "soft" deletes to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1 - all deletes are "soft" deletes
# org.apache.atlas.repository.store.graph.v1.HardDeleteHandlerV1 - all deletes are "hard" deletes
#
#atlas.DeleteHandlerV1.impl=org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1

# Entity audit repository
#
# This allows the default behavior of logging entity changes to hbase to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.audit.HBaseBasedAuditRepository - log entity changes to hbase
# org.apache.atlas.repository.audit.CassandraBasedAuditRepository - log entity changes to cassandra
# org.apache.atlas.repository.audit.NoopEntityAuditRepository - disable the audit repository
#
atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.HBaseBasedAuditRepository

# if Cassandra is used as a backend for audit from the above property, uncomment and set the following
# properties appropriately. If using the embedded cassandra profile, these properties can remain
# commented out.
# atlas.EntityAuditRepository.keyspace=atlas_audit
# atlas.EntityAuditRepository.replicationFactor=1


# Graph Search Index
atlas.graph.index.search.backend=elasticsearch

#Solr
#Solr cloud mode properties
#atlas.graph.index.search.solr.mode=cloud
#atlas.graph.index.search.solr.zookeeper-url=
#atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
#atlas.graph.index.search.solr.zookeeper-session-timeout=60000
#atlas.graph.index.search.solr.wait-searcher=true

#Solr http mode properties
#atlas.graph.index.search.solr.mode=http
#atlas.graph.index.search.solr.http-urls=http://localhost:8983/solr

# ElasticSearch support (Tech Preview)
# Comment out above solr configuration, and uncomment the following two lines. Additionally, make sure the
# hostname field is set to a comma delimited set of elasticsearch master nodes, or an ELB that fronts the masters.
#
# Elasticsearch does not provide authentication out of the box, but does provide an option with the X-Pack product
# https://www.elastic.co/products/x-pack/security
#
# Alternatively, the JanusGraph documentation provides some tips on how to secure Elasticsearch without additional
# plugins: http://docs.janusgraph.org/latest/elasticsearch.html
#atlas.graph.index.hostname=localhost
#atlas.graph.index.search.elasticsearch.client-only=true
atlas.graph.index.search.hostname=cdh632-worker03
atlas.graph.index.search.elasticsearch.client-only=true
atlas.graph.index.search.elasticsearch.http.auth.type=basic
atlas.graph.index.search.elasticsearch.http.auth.basic.username=elastic
atlas.graph.index.search.elasticsearch.http.auth.basic.password=potato

# Solr-specific configuration property
atlas.graph.index.search.max-result-set-size=150

#########  Notification Configs  #########
#atlas.notification.embedded=true
#atlas.kafka.data=${sys:atlas.home}/data/kafka
#atlas.kafka.zookeeper.connect=localhost:9026
#atlas.kafka.bootstrap.servers=localhost:9027
atlas.notification.embedded=false
#atlas.kafka.data=${sys:atlas.home}/data/kafka
atlas.kafka.data=/var/local/kafka/data
atlas.kafka.zookeeper.connect=cdh632-worker03:2181
atlas.kafka.bootstrap.servers=cdh632-worker03:9092
atlas.kafka.zookeeper.session.timeout.ms=400
#atlas.kafka.zookeeper.connection.timeout.ms=200
atlas.kafka.zookeeper.connection.timeout.ms=30000
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.hook.group.id=atlas

atlas.kafka.enable.auto.commit=false
atlas.kafka.auto.offset.reset=earliest
#atlas.kafka.session.timeout.ms=30000
atlas.kafka.session.timeout.ms=60000
atlas.kafka.offsets.topic.replication.factor=1
atlas.kafka.poll.timeout.ms=1000

atlas.notification.create.topics=true
atlas.notification.replicas=1
atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
atlas.notification.log.failed.messages=true
atlas.notification.consumer.retry.interval=500
atlas.notification.hook.retry.interval=1000
# Enable for Kerberized Kafka clusters
#atlas.notification.kafka.service.principal=kafka/[email protected]
#atlas.notification.kafka.keytab.location=/etc/security/keytabs/kafka.service.keytab

## Server port configuration
#atlas.server.http.port=21000
#atlas.server.https.port=21443

#########  Security Properties  #########

# SSL config
atlas.enableTLS=false

#truststore.file=/path/to/truststore.jks
#cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks

#following only required for 2-way SSL
#keystore.file=/path/to/keystore.jks

# Authentication config

atlas.authentication.method.kerberos=false
atlas.authentication.method.file=true

#### ldap.type= LDAP or AD
atlas.authentication.method.ldap.type=none

#### user credentials file
atlas.authentication.method.file.filename=${sys:atlas.home}/conf/users-credentials.properties

### groups from UGI
#atlas.authentication.method.ldap.ugi-groups=true

######## LDAP properties #########
#atlas.authentication.method.ldap.url=ldap://<ldap server url>:389
#atlas.authentication.method.ldap.userDNpattern=uid={0},ou=People,dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchBase=dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchFilter=(member=uid={0},ou=Users,dc=example,dc=com)
#atlas.authentication.method.ldap.groupRoleAttribute=cn
#atlas.authentication.method.ldap.base.dn=dc=example,dc=com
#atlas.authentication.method.ldap.bind.dn=cn=Manager,dc=example,dc=com
#atlas.authentication.method.ldap.bind.password=<password>
#atlas.authentication.method.ldap.referral=ignore
#atlas.authentication.method.ldap.user.searchfilter=(uid={0})
#atlas.authentication.method.ldap.default.role=<default role>


######### Active directory properties #######
#atlas.authentication.method.ldap.ad.domain=example.com
#atlas.authentication.method.ldap.ad.url=ldap://<AD server url>:389
#atlas.authentication.method.ldap.ad.base.dn=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.bind.dn=CN=team,CN=Users,DC=example,DC=com
#atlas.authentication.method.ldap.ad.bind.password=<password>
#atlas.authentication.method.ldap.ad.referral=ignore
#atlas.authentication.method.ldap.ad.user.searchfilter=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.default.role=<default role>

#########  JAAS Configuration ########

#atlas.jaas.KafkaClient.loginModuleName = com.sun.security.auth.module.Krb5LoginModule
#atlas.jaas.KafkaClient.loginModuleControlFlag = required
#atlas.jaas.KafkaClient.option.useKeyTab = true
#atlas.jaas.KafkaClient.option.storeKey = true
#atlas.jaas.KafkaClient.option.serviceName = kafka
#atlas.jaas.KafkaClient.option.keyTab = /etc/security/keytabs/atlas.service.keytab
#atlas.jaas.KafkaClient.option.principal = atlas/[email protected]

#########  Server Properties  #########
#atlas.rest.address=http://localhost:21000
atlas.rest.address=http://cdh632-worker03:21000
# If enabled and set to true, this will run setup steps when the server starts
#atlas.server.run.setup.on.start=false

#########  Entity Audit Configs  #########
atlas.audit.hbase.tablename=apache_atlas_entity_audit
atlas.audit.zookeeper.session.timeout.ms=1000
#atlas.audit.hbase.zookeeper.quorum=localhost:2181
atlas.audit.hbase.zookeeper.quorum=cdh632-master01:2181,

#########  High Availability Configuration ########
atlas.server.ha.enabled=false
#### Enabled the configs below as per need if HA is enabled #####
#atlas.server.ids=id1
#atlas.server.address.id1=localhost:21000
#atlas.server.ha.zookeeper.connect=localhost:2181
#atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
#atlas.server.ha.zookeeper.num.retries=3
#atlas.server.ha.zookeeper.session.timeout.ms=20000
## if ACLs need to be set on the created nodes, uncomment these lines and set the values ##
#atlas.server.ha.zookeeper.acl=<scheme>:<id>
#atlas.server.ha.zookeeper.auth=<scheme>:<authinfo>



######### Atlas Authorization #########
atlas.authorizer.impl=simple
atlas.authorizer.simple.authz.policy.file=atlas-simple-authz-policy.json

#########  Type Cache Implementation ########
# A type cache class which implements
# org.apache.atlas.typesystem.types.cache.TypeCache.
# The default implementation is org.apache.atlas.typesystem.types.cache.DefaultTypeCache which is a local in-memory type cache.
#atlas.TypeCache.impl=

#########  Performance Configs  #########
#atlas.graph.storage.lock.retries=10
#atlas.graph.storage.cache.db-cache-time=120000

#########  CSRF Configs  #########
atlas.rest-csrf.enabled=true
atlas.rest-csrf.browser-useragents-regex=^Mozilla.*,^Opera.*,^Chrome.*
atlas.rest-csrf.methods-to-ignore=GET,OPTIONS,HEAD,TRACE
atlas.rest-csrf.custom-header=X-XSRF-HEADER

############ KNOX Configs ################
#atlas.sso.knox.browser.useragent=Mozilla,Chrome,Opera
#atlas.sso.knox.enabled=true
#atlas.sso.knox.providerurl=https://<knox gateway ip>:8443/gateway/knoxsso/api/v1/websso
#atlas.sso.knox.publicKey=

############ Atlas Metric/Stats configs ################
# Format: atlas.metric.query.<key>.<name>
atlas.metric.query.cache.ttlInSecs=900
#atlas.metric.query.general.typeCount=
#atlas.metric.query.general.typeUnusedCount=
#atlas.metric.query.general.entityCount=
#atlas.metric.query.general.tagCount=
#atlas.metric.query.general.entityDeleted=
#
#atlas.metric.query.entity.typeEntities=
#atlas.metric.query.entity.entityTagged=
#
#atlas.metric.query.tags.entityTags=

#########  Compiled Query Cache Configuration  #########

# The size of the compiled query cache.  Older queries will be evicted from the cache
# when we reach the capacity.

#atlas.CompiledQueryCache.capacity=1000

# Allows notifications when items are evicted from the compiled query
# cache because it has become full.  A warning will be issued when
# the specified number of evictions have occurred.  If the eviction
# warning threshold <= 0, no eviction warnings will be issued.

#atlas.CompiledQueryCache.evictionWarningThrottle=0


#########  Full Text Search Configuration  #########

#Set to false to disable full text search.
#atlas.search.fulltext.enable=true

#########  Gremlin Search Configuration  #########

#Set to false to disable gremlin search.
atlas.search.gremlin.enable=false


########## Add http headers ###########

#atlas.headers.Access-Control-Allow-Origin=*
#atlas.headers.Access-Control-Allow-Methods=GET,OPTIONS,HEAD,PUT,POST
#atlas.headers.<headerName>=<headerValue>

#手動添加以下配置,這些配置在配置文件裏面沒有,需要我們自己手動添加
# 見:http://atlas.apache.org/0.8.1/Bridge-Hive.html
######### Hive Hook Configs #######
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary
#此處很重要,可解決import-hive時超時的報錯
atlas.hook.hive.keepAliveTime=1000000

由於我只在cdh632-worker03機器上啓動atlas,關於報錯的解決方案:

[root@cdh632-worker03 module]# cd /opt/atlas/
[root@cdh632-worker03 atlas]# ./bin/atlas-start.py
-bash: ./bin/atlas-start.py: No such file or directory
[root@cdh632-worker03 atlas]# /opt/atlas/bin/atlas_start.py
Exception: ('Could not find hbase-site.xml in %s. Please set env var HBASE_CONF_DIR to the hbase client conf dir', '/opt/module/apache-atlas-sources-2.0.0/distro/target/apache-atlas-2.0.0-server/apache-atlas-2.0.0/hbase/conf')
Traceback (most recent call last):
  File "/opt/atlas/bin/atlas_start.py", line 163, in <module>
    returncode = main()
  File "/opt/atlas/bin/atlas_start.py", line 92, in main
    raise Exception("Could not find hbase-site.xml in %s. Please set env var HBASE_CONF_DIR to the hbase client conf dir", hbase_conf_dir)
Exception: ('Could not find hbase-site.xml in %s. Please set env var HBASE_CONF_DIR to the hbase client conf dir', '/opt/module/apache-atlas-sources-2.0.0/distro/target/apache-atlas-2.0.0-server/apache-atlas-2.0.0/hbase/conf')
[root@cdh632-worker03 atlas]# ln -s /etc/hbase/conf /opt/module/apache-atlas-sources-2.0.0/distro/target/apache-atlas-2.0.0-server/apache-atlas-2.0.0/conf/hbase/conf
[root@cdh632-worker03 atlas]# pwd
/opt/atlas
[root@cdh632-worker03 atlas]# vi conf/atlas-env.sh
# 解決atlas-start.py啓動時報錯:Could not find hbase-site.xml
export HBASE_CONF_DIR=/opt/atlas/conf/hbase/conf
[root@cdh632-worker03 atlas]# /opt/atlas/bin/atlas_start.py
starting atlas on host localhost
starting atlas on port 21000
.......................................................................................................
Apache Atlas Server started!!!

另外很重要的一點是import-hive在報上面超時之前的準備工作如下:

“僅需”在Hive實例的Hive Metastore Server角色執行1和2,在3執行完後必須再次執行2
1.報錯:
[root@cdh632-worker03 atlas]# hook-bin/import-hive.sh
Using Hive configuration directory [/etc/hive/conf]
Please set HIVE_HOME to the root of Hive installation
解決方案:
[root@cdh632-master01 opt]# vi ~/.bashrc
# atlas執行hook-bin下import-hive.sh以導入hive元數據
export HIVE_HOME=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hive
export PATH=:$HIVE_HOME/bin:$PATH
[root@cdh632-master01 opt]# vi ~/.bashrc
2.報錯:ERROR org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Import failed
org.apache.atlas.AtlasException: Failed to load application properties
解決方案:
[root@cdh632-worker03 atlas]# scp /opt/module/apache-atlas-sources-2.0.0/distro/target/apache-atlas-2.0.0-server/apache-atlas-2.0.0/conf/atlas-application.properties root@cdh632-master01:/etc/hive/conf/
注意看我的主機名
3.超時報錯:
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. Set system property 'org.apache.logging.log4j.simplelog.StatusLogger.level' to TRACE to show Log4j2 internal initialization logging.
Enter username for atlas :- admin
Enter password for atlas :-
18:15:01.813 [main] ERROR org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Import failed
com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
解決方案只需按官網:http://atlas.apache.org/0.8.1/Bridge-Hive.html,參數atlas.hook.hive.keepAliveTime由默認10毫秒改爲1000000
小提示;
1)查看日誌:
[root@cdh632-worker03 atlas]# vi ./logs/application.log
2)atlas2.0.0修改了atlas1.2.0的import-hive.sh所在目錄:
[root@cdh632-worker03 atlas]# find /opt/atlas/ -name import-hive.sh
/opt/atlas/hook-bin/import-hive.sh

上面解決方案都是對上面官網的實踐:

 

最終,
(2)使用kibana查看atlas在es上自動創建的索引爲

部署atlas參考文檔:

Atlas的深度實踐
CDH hadoop生態的所有組件路徑
把atlas編譯後建立軟連接至/opt/atlas是規範的:CDH系列(一)—CDH組件目錄,主機資源分配和端口說明

 

寫在最後:由於我是部署完畢後回憶所寫此篇博客,故若有未點到之處,請指出。謝謝!

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章