JanusGraph各組件版本兼容性匹配表

JanusGraph

JanusGraph提供多種後端存儲和後端索引，使其能夠更靈活的部署。本章介紹了幾種可能的部署場景，以幫助解決這種靈活性帶來的複雜性。

在討論部署場景之前，理解JanusGraph本身的角色定位和後端存儲的角色定位是非常重要的。首先，應用程序與JanusGraph進行交互大多數情況下都是進行Gremlin遍歷，然後，JanusGraph把遍歷請求發給配置好的後端（存儲後端、索引後端）執行遍歷處理。當JanusGraph以服務的形式被使用的時候，將不會有主服務（master JanusGraph Server）。應用程序可以連接任何一個JanusGraph服務實例。這樣就可以使用負載均衡把請求分配到不同的實例上。JanusGraph服務實例之間本身是沒有之間聯繫的，當遍歷處理增大的時候這更容易擴容。

JanusGraph與Apache Cassandra的好處

連續可用，沒有單點故障。
由於沒有主/從架構，因此圖形沒有讀/寫瓶頸。
彈性可擴展性允許引入和移除機器。
緩存層可確保內存中可連續訪問的數據。
通過向羣集添加更多計算機來增加緩存的大小。
與Apache Hadoop集成。

Cassandra本身優點：

適合做數據分析或數據倉庫這類需要迅速查找且數據量大的應用
存儲結構比Key-Value數據庫（像Redis）更豐富
Cassandra 的數據模型是基於Column族的四維或五維模型（聚合查詢在列表上執行得更快）

Cassandra本身缺點：

不能簡單增加服務器解決請求量增長的問題，需要數據架構師精細的規劃
數據先緩存到Mentable，再刷新到磁盤，
Memtable

JanusGraph與HBase的好處

與Apache Hadoop生態系統緊密集成。
本機支持強一致性。
通過添加更多機器實現線性可擴展性。
嚴格一致的讀寫操作。
方便的基類，用於使用HBase表支持Hadoop MapReduce作業。
支持通過JMX導出指標。

JanusGraph和CAP定理

使用數據庫時，應充分考慮CAP定理（C =一致性，A =可用性，P =可分區性）

HBase以產量爲代價優先考慮一致性，即完成請求的概率。
Cassandra以收穫爲代價優先考慮可用性，即查詢答案的完整性（可用數據/完整數據）。

JanusGraph的CAP

CAP定理說的是：一個分佈式計算機系統無法同時滿足以下三點（定義摘自Wikipedia）：

一致性（Consistency) ，所有節點訪問同一份最新的數據副本
可用性（Availability)，每次請求都能獲取到非錯的響應——但是不保證獲取的數據爲最新數據
分區容錯性（Partition tolerance），以實際效果而言，分區相當於對通信的時限要求。系統如果不能在時限內達成數據一致性，就意味着發生了分區的情況，必須就當前操作在C和A之間做出選擇。

關於JanusGraph在CAP理論上的側重，是要看底層存儲的。如果底層是Cassandra，那麼就是偏向於AP（Cassandra是最終一致性的）；如果底層是HBase，就是偏向於CP（強一致性）；BerkleyDB單機不存在這個問題。

JG. Version 0.3.1各依賴組件版本兼容性 (Release Date: October 2, 2018)

<dependency>

<groupId>org.janusgraph</groupId>

<artifactId>janusgraph-core</artifactId>

<version>0.3.1</version></dependency>

Tested Compatibility:

Apache Cassandra 2.1.20, 2.2.10, 3.0.14, 3.11.0
Apache HBase 1.2.6, 1.3.1, 1.4.4
Google Bigtable 1.0.0, 1.1.2, 1.2.0, 1.3.0, 1.4.0
Oracle BerkeleyJE 7.4.5
Elasticsearch 1.7.6, 2.4.6, 5.6.5, 6.0.1
Apache Lucene 7.0.0
Apache Solr 5.5.4, 6.6.1, 7.0.0
Apache TinkerPop 3.3.3
Java 1.8

有關0.3.1中的功能和錯誤修復的更多信息，請參閱GitHub milestone:

https://github.com/JanusGraph/janusgraph/milestone/7?closed=1

JG安裝配置

JanusGraph0.3.1 OLAP開發環境搭建

https://blog.csdn.net/qq_37286005/article/details/85071050

安裝zookeeper

這裏安裝的是單機模式。版本是zookeeper-3.4.9.tar.gz。已裝，步驟略。（看我博客-集羣安裝）

安裝Hbase單機模式

配置Hbase

1.下載：https://mirrors.tuna.tsinghua.edu.cn/apache/hbase/2.1.2/hbase-2.1.2-bin.tar.gz

2.~$ gedit .bashrc

# hbase
export HBASE_HOME=/home/raini/app/hbase
export PATH=${HBASE_HOME}/bin:$PATH

3.~$ source .bashrc

hbase-env.sh

## 追加：
export JAVA_HOME=/home/raini/app/jdk
export HBASE_CLASSPATH=/home/raini/app/hbase/conf/
export HBASE_PID_DIR=/home/raini/app/tmp/pids

# 不使用HBase自帶的zookeeper
export HBASE_MANAGES_ZK=false

zoo.cfg

在這裏我們使用的不是HBase自帶的zookeeper，而是之前已經裝好的，所以需要將我們現在的zookeeper的zoo.cfg文件複製到hbase的conf目錄下

hbase-site.xml

#添加如下內容:
<configuration>
  <property>
    <name>hbase.rootdir</name>   
    <value>hdfs://biyuzhe:9000/hbase</value>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>   
    <value>true</value>
  </property>
  <property>
    <name>hbase.zookeeper.quorum</name>   
    <value>127.0.0.1</value>
  </property>
<!--  <property>
		<name>hbase.zookeeper.property.clientPort</name>
		<value>2181</value>
  </property>
  <property>
    <name>hbase.tmp.dir</name>   
    <value>/home/raini/app/tmp/hbase/</value>
  </property>
  <property>
    <name>hbase.master</name>   
    <value>biyuzhe:60000</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/home/raini/app/tmp/hbase_zoo_dataDir</value>
  </property>
  <property>
    <name>hbase.wal.provider</name>
    <value>file://home/raini/tmp/hbase-wal</value>
  </property>-->
  <property>
           <name>dfs.replication</name>
           <value>1</value>
  </property>
  <property>
        <name>hbase.master.maxclockskew</name>
        <value>150000</value>
  </property>
  <property>

一些注意點：

<configuration>
	<property>
		<name>hbase.rootdir</name>
		<value>file:///usr/local/hbase-1.4.0/data-tmp</value>
	</property>
  	<property>
		<name>hbase.zookeeper.quorum</name>  
		<value>localhost</value>  
	</property>

	<property>
		<name>hbase.zookeeper.property.clientPort</name>
		<value>2181</value>
	</property>

	<property>
		<name>hbase.zookeeper.property.dataDir</name>  
		<value>/tmp/zookeeper</value>  
	</property>

	<property>
		<name>hbase.cluster.distributed</name>
		<value>true</value>
	</property>

	<property>
		<name>zookeeper.znode.parent</name>
		<value>/hbase</value>
	</property>
</configuration>

注意的環節：

一定要加上“僞分佈：hbase.cluster.distributed”的這個<property>標籤，否則即使是單機的分佈【雖然是單機，但是並沒有使用HBase自帶的zookeeper】，所以理論上還是應該使用僞分佈式的搭配。
hbase.rootdir這個屬性的值在筆者的環境下是file:///usr/local/hbase-1.4.0/data-tmp，並沒有使用hdfs來存儲。也就意味着不需要事先啓動hdfs。但是如果將這個目錄改爲hdfs的對應目錄，則是需要在啓用hbase之前啓用hdfs。
hbase.zookeeper.quorum指的是zookeeper服務器的地址，因爲這裏是單機版，所以直接填寫localhost即可。有些博客建議寫與hostname不同的主機ip。
hbase.zookeeper.property.clientPort指的是zookeeper的端口號，如果沒有修改的話，默認的則是2181。
zookeeper.znode.parent ZooKeeper中的Hbase的根ZNode。所有的Hbase的ZooKeeper會用這個目錄的值來配置相對路徑。【znode存放root region的地址】默認情況下，所有的Hbase的ZooKeeper文件路徑是用相對路徑，所以他們會都去這個目錄下面。默認: /hbase

---------------------

regionservers

#修改爲主機名 <----建議寫與hostname不同的主機ip

啓動HBase

[raini@biyuzhe ~]# start-all.sh #啓動hadoop

[raini@biyuzhe ~]# zkServer.sh start #啓動zookeeper

[raini@biyuzhe ~]# zkServer.sh status #查看zookeeper狀態以及角色

[raini@biyuzhe ~]# start-hbase.sh #啓動Hbase

啓動報錯：Caused by: java.lang.ClassNotFoundException: org.apache.htrace.SamplerBuilder

解決：

cp $HBASE_HOME/lib/client-facing-thirdparty/htrace-core-3.1.0-incubating.jar $HBASE_HOME/lib/

[raini@biyuzhe ~]# JPS 查看hbase進程

Hbase簡單操作

raini@biyuzhe:~$ hbase shell

hbase(main):001:0> status #查看HBase運行狀態

1 active master, 0 backup masters, 1 servers, 0 dead, 0.0000 average load

Took 0.3634 seconds

hbase(main):002:0> exit #退出

遇到問題

hbase集羣[部分]節點HRegionServer啓動後自動關閉的問題

註釋掉hbase-size.xml這部分得以解決：

<!--  <property>

<name>hbase.zookeeper.property.clientPort</name>

<value>2181</value>

  </property>

  <property>

    <name>hbase.tmp.dir</name>   

    <value>/home/raini/app/tmp/hbase/</value>

  </property>

  <property>

    <name>hbase.master</name>   

    <value>biyuzhe:60000</value>

  </property>

  <property>

    <name>hbase.zookeeper.property.dataDir</name>

    <value>/home/raini/app/tmp/hbase_zoo_dataDir</value>

  </property>

  <property>

    <name>hbase.wal.provider</name>

    <value>file://home/raini/tmp/hbase-wal</value>

  </property>-->

：應該是舊數據的影響，可刪除掉這些臨時文件

JG+HBase+Caching+ES config

設置JanusGraph使用遠程運行的HBase存儲引擎，爲了獲取更好的性能，同時使用JanusGraph的緩存組件。

janusgarph.properties：

storage.backend=hbase

storage.hostname=100.100.101.1

storage.port=2181

cache.db-cache = true

cache.db-cache-clean-wait = 20

cache.db-cache-time = 180000

cache.db-cache-size = 0.5

index.search.backend=elasticsearch

index.search.hostname=100.100.101.1, 100.100.101.2

index.search.elasticsearch.client-only=true

使用該配置遇到問題：

janusGraph（Hbase客戶端）連不上Hbase服務器，zookeeperNode我們在Hbase安裝時hbase-site.xml設置成了/hbase-jg，所以這裏需要明確指定：

storage.hbase.ext.hbase.zookeeper.property.clientPort=2181
storage.hbase.ext.zookeeper.znode.parent=/hbase-jp

JanusGraph單機部署-2法

注：

[1] ElasticSearch因爲是壓縮包的方式，只能以非root用戶啓動，所以需要使用普通用戶安裝

[2] Linux下JanusGraph自帶了一個JanusGraph Server的配置和腳本，可以直接啓動JanusGraph Server;

Linux下JanusGraph的安裝步驟

注意：這裏假設用戶名爲raini，不能用root，前面已說明。

[2] 修改權限

修改安裝包的權限，以便raini用戶能夠訪問/opt下的janusgraph包

raini@biyuzhe:~/app$ sudo chown -R raini:raini janusgraph-0.3.1-hadoop2

JanusGraph的啓動

本文采用的是JanusGraph+Berkeley+ES的部署模式，也就是說後端存儲採用BerkeleyDB、外部索引採用ElasticSearch。因此，BerkeleyDB是嵌入式的，不需要單獨啓動，但ElasticSearch需要在JanusGraph之前啓動。

啓動ElasticSearch

JanusGraph自帶了ElasticSearch的安裝包，先進入該目錄，加上&以便在後臺啓動

raini@biyuzhe:~/app/janusgraph$ elasticsearch/bin/elasticsearch &

JanusGraph的基本使用

JanusGraph的使用方式通常包括：

[1] 以嵌入式開發(Java)的方式訪問;

[2] 通過Gremlin Console控制檯訪問；

[3] 通過JanusGraph Server的方式訪問;

這裏先只介紹Gremlin Console的方式，其他方式將在後面陸續介紹。

JanusGraph Gremlin Console

[1] 啓動Gremlin Console

[raini@biyuzhe: janusgraph-0.3.1-hadoop2]$ bin/gremlin.sh

[2] 開啓一個圖數據庫實例

gremlin> graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-es.properties')

==>standardjanusgraph[berkeleyje:/opt/janusgraph-0.3.1-hadoop2/conf/../db/berkeley]

JanusGraph默認有很多種配置，這裏採用文前提到的配置模式。

[3] 獲取圖遍歷句柄

gremlin> g = graph.traversal()

==>graphtraversalsource[standardjanusgraph[berkeleyje:/opt/janusgraph-0.3.1-hadoop2/conf/../db/berkeley], standard]

[4] 通過圖遍歷句柄來進行各種圖操作

新增一個頂點(vertex)

gremlin> g.addV('person').property('name','Dennis')

==>v[4104]

查詢剛剛創建的頂點

gremlin> g.V().has('name', 'Dennis').values()

參考資料：

[1] http://janusgraph.org/

啓動janusGraph（gremlin-server）

備註：不要直接啓動bin目錄下的gremlin-server.sh，會缺少初始化，elasticsearch和cassandra等配置。

cd /path_to/janusgraph-0.3.1-hadoop2 使用：

bin/janusgraph.sh start

啓動ElasticSearch

JanusGraph自帶了ElasticSearch的安裝包，先進入該目錄，加上&以便在後臺啓動

raini@biyuzhe:~/app/janusgraph$ elasticsearch/bin/elasticsearch &

raini@biyuzhe:~/app/janusgraph$ bin/janusgraph.sh stop

啓動janusgraph server:

bin/gremlin-server.sh ./conf/gremlin-server/byz-gremlin-server.yaml

JanusGraph Server

JanusGraph通過gremlin-server提供服務，有兩種模式：WebSocket和HTTP，兩種模式無法同時存在於同一個實例上，但是可以通過創建兩個實例達到共存的目的---（0.3以後貌似可以共存了，接下來測試一下）。官網描述略長，這裏總結得簡單一些。默認後端使用HBase+ElasticSearch。

具體步驟如下：

1. 從Github release頁下載 janusgraph-{VERSION}-hadoop2.zip ，並解壓

2. 準備 .properties 文件

cp conf/janusgraph-hbase-es.properties conf/gremlin-server/janusgraph-hbase-es-server.properties

並在新文件開始添加

gremlin.graph=org.janusgraph.core.JanusGraphFactory

3. 準備 gremlin-server.yaml ，這裏寫了兩個實例配置

cp conf/gremlin-server/gremlin-server.yaml conf/gremlin-server/socket-gremlin-server.yaml

cp conf/gremlin-server/gremlin-server.yaml conf/gremlin-server/http-gremlin-server.yaml

3.1 修改 socket-gremlin-server.yaml

// host和port不爽也可以改，默認8182

graphs: {

graph: conf/gremlin-server/janusgraph-hbase-es-server.properties

}

channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer

3.2 修改 http-gremlin-server.yaml

// port一定不要和websocket模式的衝突了…… 我設置的8183

graphs: {

graph: conf/gremlin-server/janusgraph-hbase-es-server.properties

}

channelizer: org.apache.tinkerpop.gremlin.server.channel.HttpChannelizer

4. 啓動server

bin/gremlin-server.sh ./conf/gremlin-server/socket-gremlin-server.yaml

bin/gremlin-server.sh ./conf/gremlin-server/http-gremlin-server.yaml

成功後會在屏幕上打log

[gremlin-server-boss-1] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Channel started at port XXXX.

Ps:因爲我配置的host是[0.0.0.0]，所以service啓動的機器可能不確定，我這裏是node3。

所以如下conf/remote.yaml中也需要配置爲node3才能連接上

5.1 測試WebSocket

使用gremlin測試，打開 bin/gremlin.sh

~/Setups/janusgraph-0.3.1-hadoop2/bin> bin/gremlin.sh

\,,,/

(o o)

-----oOOo-(3)-oOOo-----

plugin activated: janusgraph.imports

plugin activated: tinkerpop.server

plugin activated: tinkerpop.utilities

plugin activated: tinkerpop.hadoop

plugin activated: tinkerpop.spark

plugin activated: tinkerpop.tinkergraph

gremlin> :remote connect tinkerpop.server conf/byz-remote.yaml

==>Configured localhost/127.0.0.1:8182

gremlin> :remote console (如果不執行這一步，往下每一個操作的命令前都要加上 :> 如 :>g.V().values('name'))

gremlin>

==>yiz96

:> 符號是立即執行的意思。如果修改過port，同時也要修改一下 conf/remote.conf 。

5.2 測試HTTP

1	curl -XPOST -Hcontent-type:application/json -d '{"gremlin":"g.V().values(\"name\")"}' http://localhost:8183

注意不要使用單引號，會報錯

1	{"requestId":"6542a2b5-15bb-4b8e-82cd-50ea1d12e586","status":{"message":"","code":200,"attributes":{}},"result":{"data":["yiz96"],"meta":{}}}%

失敗了：

問題：（啓動remote服務報錯）

java.lang.IllegalStateException: javax.script.ScriptException: javax.script.ScriptException: groovy.lang.MissingPropertyException: No such property: graph for class: Script1

使用：

gremlin> :remote connect tinkerpop.server conf/byz-remote.yaml

==>Configured localhost/127.0.0.1:8182

gremlin> :remote console

==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182] - type ':remote console' to return to local mode

gremlin> map = new HashMap<String, Object>();

gremlin> map

No such property: map for class: Script4

Type ':help' or ':h' for help.

Display stack trace? [yN]

gremlin>

解決：

（錯誤的解決）：

scripts/empty-sample.groovy定義了默認圖形“圖形”上的綁定，該圖形不可用。

需要從配置中更新或刪除scripts/empty-sample.groovy。或者註釋掉：
//定義默認的TraversalSource來綁定查詢 - 這個將被命名爲“g”。
// globals << [g：graph.traversal（）]

正解：

不能去掉：scripts/empty-sample.groovy中～～ {files: [scripts/empty-sample.groovy]

如下加入正確的依賴即可：

問題：remote connect gremlin-server No such property: for class: Script

解決：

很簡單，是自己使用問題。因爲一般的教程、博客、官網都不會提到這個，還是去國外網站找到的。

加上session就行。

:remote connect tinkerpop.server conf/byz-remote.yaml session

:remote console

問題：GraphFactory could not instantiate this Graph implementation

1042 [main] WARN org.apache.tinkerpop.gremlin.server.GremlinServer - Graph [graph] configured at [conf/gremlin-server/ws-janusgraph-hbase-es.properties] could not be instantiated and will not be available in Gremlin Server. GraphFactory message: GraphFactory could not instantiate this Graph implementation [class org.janusgraph.core.JanusGraphFactory]
java.lang.RuntimeException: GraphFactory could not instantiate this Graph implementation [class org.janusgraph.core.JanusGraphFactory]
at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:82)
at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:70)
at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:104)
at org.apache.tinkerpop.gremlin.server.util.DefaultGraphManager.lambda$new$0(DefaultGraphManager.java:57)
at java.util.LinkedHashMap$LinkedEntrySet.forEach(LinkedHashMap.java:671)
at org.apache.tinkerpop.gremlin.server.util.DefaultGraphManager.<init>(DefaultGraphManager.java:55)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:80)
at org.apache.tinkerpop.gremlin.server.GremlinServer.<init>(GremlinServer.java:120)
at org.apache.tinkerpop.gremlin.server.GremlinServer.<init>(GremlinServer.java:84)
at org.apache.tinkerpop.gremlin.server.GremlinServer.main(GremlinServer.java:343)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:78)
... 13 more
Caused by: java.lang.IllegalArgumentException: Could not instantiate implementation: org.janusgraph.diskstorage.hbase.HBaseStoreManager
at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:64)
at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:476)
at org.janusgraph.diskstorage.Backend.getStorageManager(Backend.java:408)
at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.<init>(GraphDatabaseConfiguration.java:1254)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:160)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:131)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:111)
... 18 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:58)
... 24 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/protobuf/generated/MasterProtos$MasterService$BlockingInterface
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:228)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119)
at org.janusgraph.diskstorage.hbase.HBaseCompat1_0.createConnection(HBaseCompat1_0.java:43)
at org.janusgraph.diskstorage.hbase.HBaseStoreManager.<init>(HBaseStoreManager.java:334)
... 29 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingInterface
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 36 more

解決：

引入hbase-protocol-1.4.9.jar與protobuf-java-2.5.0.jar包即可。

(必要時也可將hbase-hadoop2-compat-1.4.9.jar--hbase與hadoop的兼容性包引入)

JanusGraph與TinkerPop的Hadoop-Gremlin整合

本章介紹如何利用Apache Hadoop和Apache Spark配置JanusGraph以進行分佈式圖形處理。這些步驟將概述如何開始這些項目，但請參考這些項目社區以更熟悉它們。

JanusGraph-Hadoop與TinkerPop的hadoop-gremlin包一起用於通用OLAP。

對於下面示例的範圍，Apache Spark是計算框架，Apache Cassandra是存儲後端。可以使用其他包進行指示，並對配置屬性進行微小更改。

注意	本章中的示例基於在本地模式或獨立羣集模式下運行Spark。在YARN或Mesos上使用Spark時，需要進行其他配置。

配置Hadoop以運行OLAP

要從Gremlin控制檯運行OLAP查詢，需要滿足一些先決條件。您需要將Hadoop配置目錄添加到其中CLASSPATH，配置目錄需要指向實時Hadoop集羣。

Hadoop提供分佈式訪問控制的文件系統。運行在不同計算機上的Spark工作程序使用Hadoop文件系統來獲得基於文件的操作的公共源。各種OLAP查詢的中間計算可以保留在Hadoop文件系統上。

有關配置單節點Hadoop集羣的信息，請參閱官方Apache Hadoop文檔

一旦啓動並運行Hadoop集羣，我們將需要在中指定Hadoop配置文件CLASSPATH。下面的文檔希望您將這些配置文件放在下面/etc/hadoop/conf。

驗證後，按照以下步驟將Hadoop配置添加到CLASSPATH並啓動Gremlin控制檯，它將扮演Spark驅動程序的角色。

主要配置(bin/gremlin.sh)

在前邊加入：

export HADOOP_HOME=/home/raini/app/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export JAVA_OPTIONS="$JAVA_OPTIONS -Djava.library.path=$HADOOP_HOME/lib/native"
export CLASSPATH=$HADOOP_CONF_DIR

一旦添加了Hadoop配置的路徑CLASSPATH，我們就可以通過以下快速步驟驗證Gremlin控制檯是否可以訪問Hadoop集羣：

從janusGraph中啓動：

$ bin/gremlin.sh

在終端輸入：

gremlin> hdfs

==>storage[org.apache.hadoop.fs.LocalFileSystem@65bb9029] // BAD(沒配置之前)

gremlin> hdfs

==>storage[DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1229457199_1, ugi=user (auth:SIMPLE)]]] // GOOD（配置之後，可見使用hdfs進行存儲）

OLAP遍歷（使用Spark）

JanusGraph-Hadoop使用TinkerPop的hadoop-gremlin包進行通用OLAP遍歷圖，並通過利用Apache Spark並行化查詢。

配置使用Spark作爲OLAP執行引擎+Hbase後端

將需要特定於該存儲後端的附加配置。配置由gremlin.hadoop.graphReader屬性指定，該屬性指定從存儲後端讀取數據的類。

如JanusGraph的Hbase graphReader類：

HBaseInputFormat和HBaseSnapshotInputFormatHBase一起使用

以下屬性文件可用於連接Hbase中的JanusGraph實例，以便它可以與HadoopGraph一起使用來運行OLAP查詢。

Github地址：

https://github.com/JanusGraph/janusgraph/blob/d12adfbf083f575fa48860daa37bfbd0e6095369/janusgraph-dist/src/assembly/static/conf/hadoop-graph/read-hbase-snapshot.properties

conf/hadoop-graph/read-hbase-snapshot.properties

# Hadoop Graph Configuration

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph

gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.hbase.HBaseSnapshotInputFormat

gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

gremlin.hadoop.jarsInDistributedCache=true

gremlin.hadoop.inputLocation=none

gremlin.hadoop.outputLocation=output

# JanusGraph HBaseSnapshotInputFormat configuration

janusgraphmr.ioformat.conf.storage.backend=hbase

janusgraphmr.ioformat.conf.storage.hostname=localhost

janusgraphmr.ioformat.conf.storage.hbase.table=janusgraph

janusgraphmr.ioformat.conf.storage.hbase.snapshot-name=janusgraph-snapshot

janusgraphmr.ioformat.conf.storage.hbase.snapshot-restore-dir=/tmp

janusgraphmr.ioformat.conf.storage.hbase.ext.hbase.rootdir=/hbase

# SparkGraphComputer Configuration

spark.master=local[4]

spark.serializer=org.apache.spark.serializer.KryoSerializer

conf/hadoop-graph/read-hbase.properties

# Hadoop Graph Configuration

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph

gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat

gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

gremlin.hadoop.jarsInDistributedCache=true

gremlin.hadoop.inputLocation=none

gremlin.hadoop.outputLocation=output

# JanusGraph HBase InputFormat configuration

janusgraphmr.ioformat.conf.storage.backend=hbase

janusgraphmr.ioformat.conf.storage.hostname=localhost

janusgraphmr.ioformat.conf.storage.hbase.table=janusgraph

# SparkGraphComputer Configuration

spark.master=local[4]

spark.serializer=org.apache.spark.serializer.KryoSerializer

。。。更多看文檔：https://github.com/JanusGraph/janusgraph/blob/dee1400a3ab953ed5f4bd43eec8a38f2d7b6ff3c/docs/hadoop.adoc

使用Spark Standalone Cluster進行OLAP遍歷

上一節中遵循的步驟也可以與Spark獨立羣集一起使用，只需進行少量更改：

更新spark.master屬性以指向Spark主URL而不是本地URL

更新spark.executor.extraClassPath以啓用Spark執行程序以查找JanusGraph依賴項jar

將JanusGraph依賴項jar複製到每個Spark執行器計算機上一步中指定的位置

注意	我們將janusgraph-distribution / lib下的所有jar複製到/ opt / lib / janusgraph /中，並在所有worker中創建相同的目錄結構，並在所有worker中手動複製jar。

用於OLAP遍歷的最終屬性文件如下：

。。。

參考文檔：

https://github.com/JanusGraph/janusgraph/blob/dee1400a3ab953ed5f4bd43eec8a38f2d7b6ff3c/docs/hadoop.adoc

小例子:

（gremlin以及其它參考配置：janusgraph/janusgraph-dist/src/assembly/static/conf/hadoop-graph/

g.V().hasLabel('NewsPaper').has('identifier', 'xyz').inE('belongsTo').outV().hasLabel('NewsDocument')

.has('publishedDate', between(begin.getTime, end.getTime))

數據類型

（導入數據第一步，首先去掉空行、缺失頂點、重複頂點等，然後將數據做成這種格式-GraphSon）

多條：

{"id":2000,"label":"message","inE":{"link":[{"id":5,"outV":2000}]},"outE":{"link":[{"id":4,"inV":2001},{"id":5,"inV":2000}]},"properties":{"name":[{"id":2,"value":"a"}]}}
{"id":2001,"label":"message","inE":{"link":[{"id":4,"outV":2000}]},"properties":{"name":[{"id":3,"value":"b"}]}}
{"id":1000,"label":"loops","inE":{"self":[{"id":1,"outV":1000}]},"outE":{"self":[{"id":1,"inV":1000}]},"properties":{"name":[{"id":0,"value":"loop"}]}}

單條：

{
  "id": 2000,
  "label": "message",
  "inE": {
    "link": [
      {
        "id": 5,
        "outV": 2000
      }
    ]
  },
  "outE": {
    "link": [
      {
        "id": 4,
        "inV": 2001
      },
      {
        "id": 5,
        "inV": 2000
      }
    ]
  },
  "properties": {
    "name": [
      {
        "id": 2,
        "value": "a"
      }
    ]
  }
}

圖形配置

janusgraph-hbase.properties：

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=hbase
storage.hostname=localhost
cache.db-cache=true
cache.db-cache-clean-wait=20
cache.db-cache-time=180000
cache.db-cache-size=0.5
index.search.backend=elasticsearch
index.search.hostname=localhost
#storage.hbase.ext.zookeeper.znode.parent=/hbase-unsecure
storage.hbase.table=Medical-POC
index.search.index-name=Medical-POC

hadoop-graphson.properties

#gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
#gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat
#gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
#gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.defaultGraphComputer=org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
gremlin.hadoop.inputLocation=./data/byz/test-modern.json
gremlin.hadoop.outputLocation=output
gremlin.hadoop.jarsInDistributedCache=true

#####################################
# GiraphGraphComputer Configuration
#####################################
giraph.minWorkers=2
giraph.maxWorkers=2
giraph.useOutOfCoreGraph=true
giraph.useOutOfCoreMessages=true
mapred.map.child.java.opts=-Xmx1024m
mapred.reduce.child.java.opts=-Xmx1024m
giraph.numInputThreads=4
giraph.numComputeThreads=4
giraph.maxMessagesInMemory=100000

#
# SparkGraphComputer Configuration
#
#spark.master=local[4]
spark.master=spark://localhost:7077
spark.executor.memory=1g
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator
spark.driver.memory=1g
# 爲了executor能找到janus相關包
spark.executor.extraClassPath=/home/raini/app/janusgraph/lib/*

編寫數據Schema:

janusgraph-schema.groovy

def defineGratefulDeadSchema(janusGraph) {
    m = janusGraph.openManagement()
    //人信息節點label
    person = m.makeVertexLabel("person").make()
    //properties
    //使用IncrementBulkLoader導入時，去掉下面註釋
    //blid = m.makePropertyKey("bulkLoader.vertex.id").dataType(Long.class).make()
    birth = m.makePropertyKey("birth").dataType(Date.class).make()
    age = m.makePropertyKey("age").dataType(Integer.class).make()
    name = m.makePropertyKey("name").dataType(String.class).make()
    //index
    index = m.buildIndex("nameCompositeIndex", Vertex.class).addKey(name).unique().buildCompositeIndex()
    //使用IncrementBulkLoader導入時，去掉下面註釋
    //bidIndex = m.buildIndex("byBulkLoaderVertexId", Vertex.class).addKey(blid).indexOnly(person).buildCompositeIndex()
    m.commit()
}

執行Gremlin數據導入語句：

raini@biyuzhe:~/app/janusgraph$ bin/gremlin.sh

(1)
:load /home/raini/pro/GraphDatabase/test/src/main/scala/janusgraph-load/test-janusgraph-schema.groovy
graph = JanusGraphFactory.open('/home/raini/pro/GraphDatabase/test/src/main/scala/janusgraph-load/janusgraph-test.properties')
defineGratefulDeadSchema(graph)

(2)
graph = GraphFactory.open('data/zl/hadoop-graphson.properties')
blvp = BulkLoaderVertexProgram.build().bulkLoader(OneTimeBulkLoader).writeGraph('data/zl/janusgraph-test.properties').create(graph)
graph.compute(SparkGraphComputer).program(blvp).submit().get()
報錯：
java.lang.InstantiationException

(3)
graph = JanusGraphFactory.open('data/zl/janusgraph-test.properties')
g = graph.traversal()
g.V().valueMap()

Configuring JanusGraph Server for ConfiguredGraphFactory

(配置JanusGraph的默認圖形配置）

配置在：/home/raini/app/janusgraph/conf/gremlin-server/gremlin-server-configuration.yaml

文檔說明：https://docs.janusgraph.org/latest/configuredgraphfactory.html

爲了能夠使用ConfiguredGraphFactory，您必須配置服務器以使用ConfigurationManagementGraphAPI。爲此，您必須在服務器的YAML graphs映射中注入名爲“ConfigurationManagementGraph”的圖形變量。例如：

graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
ConfigurationManagementGraph: <--(修改這裏文件爲默認配置，如後端改爲Hbase

conf/JanusGraph-configurationmanagement.properties
}

在此示例中，我們的ConfigurationManagementGraph圖形將使用存儲在conf/JanusGraph-configurationmanagement.properties其中的屬性進行配置，例如，如下所示：

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=cql
graph.graphname=ConfigurationManagementGraph
storage.hostname=127.0.0.1

PS:（如上幾個參數一定爲必填項）

JG的3中使用方式

[1] 以嵌入式開發(Java)的方式訪問;

[2] 通過Gremlin Console控制檯訪問；

[3] 通過JanusGraph Server的方式訪問;

小測試例子

1.使用JanusGraph Gremlin Console方式

gremlin> graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-es.properties')

gremlin>graph.io(IoCore.graphson()).readGraph('/home/raini/app/janusgraph/data/tinkerpop-sink-v2d01.json')

gremlin> dennis = graph.addVertex(T.label, "person", "name", "Dennis","city", "Chengdu")

jady = graph.addVertex(T.label, "person", "name", "Jady","city", "Beijing")

dennis.addEdge("knows", jady, "date", "20121201")

或者gremlin> g.addV('person').property('name','Dennis')

執行圖遍歷：

gremlin> g = graph.traversal()

使用Spark:gremlin> g=graph.traversal().withComputer(SparkGraphComputer)

gremlin> g.V().has('name', 'Dennis').values()

// ==>Dennis

// ==>Chengdu

gremlin> g.V().count()

// ==>2

gremlin> g.V().hasLabel('person')

// ==>v[4296]

// ==>v[4232] <---- lable是節點唯一值

2.使用GraphFactory Gremlin Console方式

graph = GraphFactory.open(...)

g = graph.traversal()

jupiter = g.addV("god").property("name", "jupiter").property("age", 5000).next()

sky = g.addV("location").property("name", "sky").next()

g.V(jupiter).as("a").V(sky).addE("lives").property("reason", "loves fresh breezes").from("a").next()

g.tx().commit()

g.V().has("name", "jupiter").valueMap(true).tryNext()

3.使用Gremlin io 錄入GraphSon數據

gremlin> graph = JanusGraphFactory.open('conf/janusgraph-hbase-spark.properties')

gremlin>graph.io(IoCore.graphson()).readGraph('/home/raini/app/janusgraph/data/tinkerpop-sink-v2d01.json')

gremlin> dennis = graph.addVertex(T.label, "person", "name", "Dennis","city", "Chengdu")

jady = graph.addVertex(T.label, "person", "name", "Jady","city", "Beijing")

dennis.addEdge("knows", jady, "date", "20121201")

//g = graph.traversal() 需要在.properties配置使用Spark
g=graph.traversal().withComputer(SparkGraphComputer)

問題：

遠程使用

:load ../schema.groovy

時，請注意其中

.cardinality(Cardinality.SINGLE)

的

Cardinality

使用的類：

正確爲應爲janusgraph中的類：

gremlin> Cardinality.SINGLE

==>SINGLE

gremlin> Cardinality

==>class org.janusgraph.core.Cardinality

而remote的爲tinkerpop中的類：

gremlin> :remote connect tinkerpop.server conf/byz-remote.yaml session
==>Configured biyuzhe/127.0.0.1:8182-[2ed42d86-882c-42b6-af31-6ca4fb5ee712]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [biyuzhe/127.0.0.1:8182]-[2ed42d86-882c-42b6-af31-6ca4fb5ee712] - type ':remote console' to return to local mode
gremlin> Cardinality.SINGLE
No such property: SINGLE for class: org.apache.tinkerpop.gremlin.structure.VertexProperty$Cardinality
Type ':help' or ':h' for help.
Display stack trace? [yN]
gremlin> Cardinality
==>class org.apache.tinkerpop.gremlin.structure.VertexProperty$Cardinality

解決：

name = mgmt.makePropertyKey("name").dataType(String.class).cardinality(Cardinality.SINGLE).make()

改爲明確類：

.cardinality(org.janusgraph.core.Cardinality.SINGLE)

還有BulkLoader：

gremlin> BulkLoaderVertexProgram

==>class org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram

「JanusGraph」圖形數據庫 - 技術選型調研