JanusGraph -- 索引詳解（janusgraph index）

Overview

Janusgraph Index --> graph index && vertex-centric index

graph index --> composite index && mixed index 、全圖索引

composite index : 索引列全使用並且等值匹配、不需要後端索引存儲、支持唯一性、排序在內存中成本高
mixed index ：索引列任何字段都可以觸發索引、範圍查詢、全文檢索、地理檢索等、需要後端索引存儲支持、不支持唯一性、排序有索引效率高無索引也在內存中排

vertex-centric index --> janusgraph默認爲每個屬性添加該索引，組合索引滿足最做匹配原則可使用，便於查詢節點的邊（節點存在很多邊的情況下）

一：Extending JanusGraph Server

JanusGraph支持兩種類型的索引：graph index和vertex-centric index。graph index常用於根據屬性查詢Vertex或Edge的場景；vertex index在圖遍歷場景非常高效，尤其是當Vertex有很多Edge的情況下。

二：Graph Index

Graph Index是整個圖上的全局索引結構，用戶可以通過屬性高效查詢Vertex或Edge。如下面的代碼：

g.V().has('name','hercules')
g.E().has('reason', textContains('loves'))

上面的例子即爲根據屬性查找Vertex或Edge的實例，如果沒有設置索引，上述的操作將會導致全表掃描，對大圖來說是不可接受的。

JanusGraph支持兩種不同的Graph Index，Composite index和Mixed Index，Compostie非常高效和快速，但只能應用對某特定的，預定義的屬性key組合進行相等查詢。Mixed index可用在查詢任何index key的組合上並支持多條件查詢，除了相等條件要依賴於後端索引存儲。

這兩種類型的Index都是通過JanusGraph的management操作的：

JanusGraphManagement.buildIndex(String,Class）

//此操作只是獲取IndexBuilder對象，之後再由該對象通過 addKey()、buildMixedIndex()\buildCompositeIndex()\buildEdgeIndex() 創建索引

第一個參數是index的名稱，第二個參數是要索引的類（如Vertex.class），name必須唯一。如果是在同一事務中新增的屬性key所構成Index將會即刻生效，否則需要運行一個reindex proceudre來同步索引和數據，直到同步完成，否則索引不可用。推薦在初始化schema時同時定義索引。

注意：如果沒有建索引，會進行全表掃面，此時性能非常低，可以通過配置force-index參數禁止全表掃描。

1：Composite Index

Comosite index通過一個或多個固定的key組合來獲取Vertex Key或Edge，也即查詢條件是在Index中固定的。

// 在graph中有事務執行時絕不能創建索引（否則可能導致死鎖）

graph.tx().rollback()

mgmt = graph.openManagement()

name = mgmt.getPropertyKey('name')

age = mgmt.getPropertyKey('age')

// 構建根據name查詢vertex的組合索引

mgmt.buildIndex('byNameComposite',Vertex.class).addKey(name).buildCompositeIndex()

// 構建根據name和age查詢vertex的組合索引

mgmt.buildIndex('byNameAndAgeComposite',Vertex.class).addKey(name).addKey(age).buildCompositeIndex()

mgmt.commit()

//等待索引生效

mgmt.awaitGraphIndexStatus(graph,'byNameComposite').call()

mgmt.awaitGraphIndexStatus(graph,'byNameAndAgeComposite').call()

//對已有數據重新索引

mgmt = graph.openManagement()

mgmt.updateIndex(mgmt.getGraphIndex("byNameComposite"),SchemaAction.REINDEX).get()

mgmt.updateIndex(mgmt.getGraphIndex("byNameAndAgeComposite"),SchemaAction.REINDEX).get()

mgmt.commit()

需要注意的是，Composite index需要在查詢條件完全匹配(必須該索引中所有字段全部用上纔可以觸發索引)的情況下才能觸發，如上面代碼，g.V().has('name', 'hercules')和g.V().has('age',30).has('name','hercules')都是可以觸發索引的，但g.V().has('age',30)則不行，因並未對age建索引。g.V().has('name','hercules').has('age',inside(20,50))也不可以，因只支持精確匹配，不支持範圍查詢。

Index Uniqueness

Composite Index也可以作爲圖的屬性唯一約束使用，如果composite graph index被設置爲unique()，則只能存在最多一個對應的屬性組合。

graph.tx().rollback()//Never create new indexes while a transaction is active

mgmt = graph.openManagement()

name = mgmt.getPropertyKey('name')

mgmt.buildIndex('byNameUnique',Vertex.class).addKey(name).unique().buildCompositeIndex()

mgmt.commit()

//Wait for the index to become available

mgmt.awaitGraphIndexStatus(graph,'byNameUnique').call()

//Reindex the existing data

mgmt = graph.openManagement()

mgmt.updateIndex(mgmt.getGraphIndex("byNameUnique"),SchemaAction.REINDEX).get()

mgmt.commit()

注意：對於設置爲最終一致性的後端存儲，index的一致性必須被設置爲允許鎖定。

2：Mixed Index

Mixed Index支持通過其中的 任意key的組合 查詢Vertex或者Edge。Mix Index使用上更加靈活，而且支持範圍查詢等（不僅包含相等）；從另外一方面說，Mixed index效率要比Composite Index低。

與Composite key不同，Mixed Index需要配置索引後端，JanusGraph可以在一次安裝中支持多個索引後端，而且每個索引後端必須使用JanusGraph中配置唯一標識：稱爲indexing backend name。

graph.tx().rollback()//Never create new indexes while a transaction is active

mgmt = graph.openManagement()

name = mgmt.getPropertyKey('name')

age = mgmt.getPropertyKey('age')

mgmt.buildIndex('nameAndAge',Vertex.class).addKey(name).addKey(age).buildMixedIndex("search")

mgmt.commit()

//Wait for the index to become available

mgmt.awaitGraphIndexStatus(graph,'nameAndAge').call()

//Reindex the existing data

mgmt = graph.openManagement()

mgmt.updateIndex(mgmt.getGraphIndex("nameAndAge"),SchemaAction.REINDEX).get()

mgmt.commit()

上面的代碼建立了一個名爲nameAndAge的索引，該索引使用name和age屬性構成，並設定其索引後端爲"search"，對應到配置文件中爲：index.serarch.backend，如果叫solrsearch，則需要增加：index.solrsearch.backend配置。

下面展示瞭如果使用text search作爲默認的搜索行爲：

mgmt.buildIndex('nameAndAge',Vertex.class).addKey(name,Mapping.TEXT.getParameter()).addKey(age,Mapping.TEXT.getParameter()).buildMixedIndex("search")

更加詳細的使用參考：Charpter21, Index Parameter and Full-Test Search

在使用上，支持範圍查詢和索引中任何組合查詢（索引中任何字段組合都可以觸發該索引），而不僅侷限於“相等”查詢方式：

g.V().has('name', textContains('hercules')).has('age', inside(20,50))

g.V().has('name', textContains('hercules'))

g.V().has('age', lt(50))

Mixed Index支持全文檢索，範圍檢索，地理檢索和其他方式，參考Chapter20, Search Predicates and Data Types。

注意：不像composite index，mixed index不支持唯一性。

Adding Property Keys

可以向已經存在的mixed index中新增屬性，之後就可以在查詢條件中使用了。

//Never create new indexes while a transaction is activegraph.tx().rollback()

mgmt = graph.openManagement()

//創建一個新的屬性

location = mgmt.makePropertyKey('location').dataType(Geoshape.class).make()

nameAndAge = mgmt.getGraphIndex('nameAndAge')

//修改索引mgmt.addIndexKey(nameAndAge, location)

mgmt.commit()

//Wait for the index to become available

mgmt.awaitGraphIndexStatus(graph,'nameAndAge').call()

//Reindex the existing data

mgmt = graph.openManagement()

mgmt.updateIndex(mgmt.getGraphIndex("nameAndAge"),SchemaAction.REINDEX).get()

mgmt.commit()

如果索引是在同意事務中創建的，則在該事務中馬上可以使用。如果該屬性Key已經被使用，需要執行reindex procedure來保證索引中包含了所有數據，知道該過程執行完畢，否則不能使用。

Mapping Parameters

當向mixed index增加新的property key時（無論通過何種方式創建），可以指定一組參數來設置property value在後端的存儲方式。參考mapping paramters overview章節。

3：Ordering

圖查詢的集合返回順序可由order().by()指定，該方法包含了兩個參數：

排序依據的屬性名稱
升降序，incr和decr

如：

g.V().has('name', textContains('hercules')).order().by('age', decr).limit(10)

返回了name屬性中包含‘hercules’且以'age'降序返回的10條數據。

使用Order時需要注意：

composite graph index原生不支持對返回結果排序，數據會被先加載到內存中再進行排序，對於大數據集合來講成本非常高
mixed graph index本身支持排序返回，但排序中要使用的property key需要提前被加到mix index中去，如果要排序的property key不是index的一部分，將會導致整個數據集合加載到內存。

4：Label Constraint

有些情況下，我們不想對圖中具有某一label的所有Vertex或Edge進行索引，例如，我們只想對有GOD標籤的節點進行索引，此時我們可以使用indexOnly方法表示只索引具有某一Label的Vertex和Edge。如下：

//Never create new indexes while a transaction is activegraph.tx().rollback()

mgmt = graph.openManagement()

name = mgmt.getPropertyKey('name')

god = mgmt.getVertexLabel('god')

//只索引有god這一label的頂點

mgmt.buildIndex('byNameAndLabel',Vertex.class).addKey(name).indexOnly(god).buildCompositeIndex()

mgmt.commit()

//Wait for the index to become available

mgmt.awaitGraphIndexStatus(graph,'byNameAndLabel').call()

//Reindex the existing data

mgmt = graph.openManagement()

mgmt.updateIndex(mgmt.getGraphIndex("byNameAndLabel"),SchemaAction.REINDEX).get()

mgmt.commit()

label約束對mix index也是類似的，當一個有label約束的composite index被設置爲唯一時，唯一約束只應用於具有此label的vertex或edge屬性上。

5：Composite Index 和 Mixed Index對比

1. comosite key應用於確切的匹配場景，composite key不需要外部索引系統且通常具有更好的性能。

作爲一個例外，如果要精確匹配的值數量很小（如12個月份）或一個元素與圖中很多的元素有關聯，此時應使用mix index。

2. 對取範圍、全文檢索、位置查詢這樣的應用場景，應該使用mix index，而且使用mixed index可以提供order().by()的性能。

三：Vertex-centric Indexs

Vertex-centric index（頂點中心索引）是爲每個vertex建立的本地索引結構，在大型graph中，每個vertex有數千條Edge，在這些vertex中遍歷效率將會非常低（需要在內存中過濾符合要求的Edge）。Vertex-centric index可以通過使用本地索引結構加速遍歷效率，組合索引只支持最左匹配原則

如：

h = g.V().has('name','hercules').next()

g.V(h).outE('battled').has('time', inside(10,20)).inV()

如果沒有vertex-centric index，則需要便利所有的batteled邊並找出記錄，在邊的數量龐大時效率非常低。

建立一個vertex-centric index可以加速查詢：

//Never create new indexes while a transaction is activegraph.tx().rollback()

mgmt = graph.openManagement()

//找到一個property key

time = mgmt.getPropertyKey('time')

// 找到一個label

battled = mgmt.getEdgeLabel('battled')

// 創建vertex-centric index

mgmt.buildEdgeIndex(battled,'battlesByTime',Direction.BOTH,Order.decr, time)

mgmt.commit()

//Wait for the index to become available

mgmt.awaitGraphIndexStatus(graph,'battlesByTime').call()

//Reindex the existing data

mgmt = graph.openManagement()

mgmt.updateIndex(mgmt.getGraphIndex("battlesByTime"),SchemaAction.REINDEX).get()

mgmt.commit()

上面的代碼對battled邊根據time以降序建立了雙向索引。buildEdgeIndex()方法中的第一個參數是要索引的Edge的Label，第二個參數是index的名稱，第三個參數是邊的方向，BOTH意味着可以使用IN/OUT，如果只設置爲某一方向，可以減少一半的存儲和維護成本。最後兩個參數是index的排序方向，以及要索引的property key，property key可以是多個，order默認爲升序（Order.ASC）。

graph.tx().rollback()//Never create new indexes while a transaction is active

mgmt = graph.openManagement()

time = mgmt.getPropertyKey('time')

rating = mgmt.makePropertyKey('rating').dataType(Double.class).make()

battled = mgmt.getEdgeLabel('battled')

mgmt.buildEdgeIndex(battled,'battlesByRatingAndTime',Direction.OUT,Order.decr, rating, time)

mgmt.commit()

//Wait for the index to become available

mgmt.awaitRelationIndexStatus(graph,'battlesByRatingAndTime','battled').call()

//Reindex the existing data

mgmt = graph.openManagement()

mgmt.updateIndex(mgmt.getRelationIndex(battled,'battlesByRatingAndTime'),SchemaAction.REINDEX).get()

mgmt.commit()

上面的代碼建立了battlesByRatingAndTime索引，並以rating和time構成，需要注意構成索引的property key的順序非常重要，查詢時只能根據propety key定義的順序查詢。（最左匹配原則）

h = g.V().has('name','hercules').next()

g.V(h).outE('battled').property('rating',5.0)//Add some rating properties

1： g.V(h).outE('battled').has('rating', gt(3.0)).inV()

2： g.V(h).outE('battled').has('rating',5.0).has('time', inside(10,50)).inV()

3： g.V(h).outE('battled').has('time', inside(10,50)).inV()

對上面部分的代碼，只有查詢1,2是可以使用索引的，查詢3使用time查詢無法匹配先根據rating再根據time的index構造順序。可以對一個label創建多個不同的索引來支持不同的遍歷。JanusGraph自動選擇最有效的索引，Vertex-centric僅支持相等和range/interval約束。

注意：在vertex-centirc中使用的property key必須是顯式定義的且未確定的class類型（不是Object.class）才能支持排序。如果數據類型浮點型，必須使用JanusGraph的Decimal或Precision數據類型。

根據在同一事務中新建的label所創建的索引可以即刻生效，如果edge正在被使用，則需要運行reindex程序，直到該程序運行結束，否則該索引無法使用。

注意：JanusGraph自動爲每個edge label的每個property key建立了vertex-centric label（是否建立了組合vertex-centric索引？），因此即使有數千個邊也能高效查詢。

Vertex-centric label無法加速不受約束的遍歷（在所有邊中遍歷），這種遍歷隨着邊的增加會變的更慢，通常這些遍歷可以作爲受約束遍歷重寫來提高性能。

四：Ordering Traversals

下面的查詢使用了local和limit方法獲取了遍歷過程的排序子集。local（）表示只對前面元素的每一個元素進行分別操作，比如排序，是對每個節點的元素排序，不是對所有節點的所有元素排序！

h = g.V().has('name','hercules').next()

g.V(h).local(outE('battled').order().by('time', decr).limit(10)).inV().values('name')

g.V(h).local(outE('battled').has('rating',5.0).order().by('time', decr).limit(10)).values('place')

如果排序字段和排序方向與vertex-centric index一致的話，上面的查詢非常高效。

第一個查詢是要找到赫拉克勒斯最近戰鬥過的10個怪獸的名字。第二個查詢是最近10次獲得5星戰鬥的地點。在這2個查詢例子中，都限定了查詢結果的返回數量。

這類查詢中心頂點索引也會起作用，如果排序key和定義的中心頂點索引鍵的排序順序一致，battlesByTime這個索引將會對第一個查詢起作用，battlesByRatingAndTime這個索引將會對第二個查詢起作用。注意，battlesByRatingAndTime索引將不會對第一個查詢生效，因爲rating的相等查詢只會對第二個查詢起作用。

注意：vertex 排序查詢時JanusGraph對Gremlin的擴展，要使用該功需要一段冗長的語句，而且需要_()步驟將JanusGraph轉換爲Gremlin管道

如果轉載此博文，請附上本文鏈接，謝謝合作~ ：https://blog.csdn.net/csdn___lyy

如果感覺這篇文章對你有幫助，就"點贊 "或者“關注”博主，您的喜歡和關注將是我前進的最大動力！=.=

JanusGraph -- 索引詳解（janusgraph index）

Overview

一：Extending JanusGraph Server

二：Graph Index

1：Composite Index

Index Uniqueness

2：Mixed Index

Adding Property Keys

Mapping Parameters

3：Ordering

4：Label Constraint

5：Composite Index 和 Mixed Index對比

三：Vertex-centric Indexs

四：Ordering Traversals

HTML頁面關於高分屏的設置

北歐瑞典挪威芬蘭瑞士TikTok海外網紅與YouTube博主的合作模式

歐洲英國德國法國TikTok與YouTube海外網紅達人的完美合作策略

druid數據源 xml配置

項目-無侵入代碼方式使用Redis實現緩存功能

JanusGraph -- 索引詳解（janusgraph index）

教你maven項目如何自定義package打包

面試不再怕-說透動靜態代理！

以線上實例來看內存泄漏的解決方案

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結