嘗試幾個簡單的查詢
1、返回數據集"airports"中所有的airports:
FOR airport IN airports
RETURN airport
2、只返回California的airports:
FOR airport IN airports
FILTER airport.state == "CA"
RETURN airport
3、返回每個國家的機場數量
FOR airport IN airports
COLLECT state = airport.state
WITH COUNT INTO counter
RETURN {state, counter}
在上面的代碼示例中,所有關鍵字COLLECT、WITH和RETURN等都是大寫的,但它只是一個約定。你也可以將所有關鍵詞小寫或混合大小寫。但是變量名、屬性名和集合名是區分大小寫的。
圖查詢
1、返回能到達洛杉磯國際機場(Lax)的所有機場
FOR airport IN OUTBOUND 'airports/LAX' flights
RETURN DISTINCT airport
2、返回10個洛杉磯的航班和他們的目的地
FOR airport, flight IN OUTBOUND 'airports/LAX' flights
LIMIT 10
RETURN {airport, flight}
遍歷圖
對於最小深度大於2的遍歷,有兩個選項可以選擇:
-
深度優先(默認):繼續沿着從起始頂點到該路徑上的最後頂點的邊緣,或者直到達到最大遍歷深度,然後向下走其他路徑
-
廣度優先(可選):從開始頂點到下一個級別遵循所有邊緣,然後按另一個級別跟蹤鄰居的所有邊緣,並繼續這個模式,直到沒有更多的邊緣跟隨或達到最大的遍歷深度。
返回LAX直達的所有機場:
FOR airport IN OUTBOUND 'airports/LAX' flights
OPTIONS {bfs: true, uniqueVertices: 'global'}
RETURN airport
通過執行時間與之前的查詢進行比較,返回相同的機場:
FOR airport IN OUTBOUND 'airports/LAX' flights
RETURN DISTINCT airport
FOR airport IN OUTBOUND 'airports/LAX' flights
OPTIONS {bfs: true, uniqueVertices: 'global'}
RETURN DISTINCT airport
對比這兩次結果,將看到顯著的性能改進。也就是說,特定場景下使用廣度遍歷法會加快性能。
AQL中的 LET 關鍵字:
簡單表達式以及整個子查詢的結果可以存儲在變量中。若要聲明變量,請使用LET關鍵字,後面跟着變量名、等號和表達式。如果表達式是子查詢,則代碼必須位於括號中。
在下面的示例中,預先計算出發時間的時間和分鐘,並將其存儲在變量H和M中。
FOR f IN flights
FILTER f._from == 'airports/BIS'
LIMIT 100
LET h = FLOOR(f.DepTime / 100)
LET m = f.DepTime % 100
RETURN {
year: f.Year,
month: f.Month,
day: f.DayofMonth,
time: f.DepTime,
iso: DATE_ISO8601(f.Year, f.Month, f.DayofMonth, h, m)
}
最短路徑(Shortest_Path)
最短路徑查詢在兩個給定文檔之間找到連接,其邊緣數量最少。
尋找機場BIS和JFK之間的最短路徑:
FOR v IN OUTBOUND
SHORTEST_PATH 'airports/BIS'
TO 'airports/JFK' flights
RETURN v
通過查詢解釋器,我們可以看到 默認For循環遍歷,省略了 startnode 、索引命中情況、優化規則應用情況等信息
Query String (81 chars, cacheable : true):
FOR v IN OUTBOUND
SHORTEST_PATH 'airports/BIS'
TO 'airports/JFK' flights
RETURN v
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
2 ShortestPathNode 0 - FOR v /* vertex */ IN OUTBOUND SHORTEST_PATH 'airports/BIS' /* startnode */ TO 'airports/JFK' /* targetnode */ flights
3 ReturnNode 0 - RETURN v
Indexes used:
none
Shortest paths on graphs:
Id Vertex collections Edge collections
2 flights
Optimization rules applied:
none
返回從BIS到JFK的最小航班數:
LET airports = (
FOR v IN OUTBOUND
SHORTEST_PATH 'airports/BIS'
TO 'airports/JFK' flights
RETURN v
)
RETURN LENGTH(airports) - 1
LENGTH 函數可返回結果集的記錄數,這裏可用於 表示最短路徑深度
模式匹配(Pattern Matching)
目標:找出BIS與JFK之間花費時間最短的路徑
STEP1
篩選BIS到JFK的所有路徑,由於在shortest path中最短路徑深度爲2,所以這裏直接使用“IN 2 OUTBOUND”
FOR v, e, p IN 2 OUTBOUND 'airports/BIS' flights
FILTER v._id == 'airports/JFK'
LIMIT 5
RETURN p
STEP2
篩選一天內的路徑,這裏以1月1號爲例
FOR v, e, p IN 2 OUTBOUND 'airports/BIS' flights
FILTER v._id == 'airports/JFK'
FILTER p.edges[*].Month ALL == 1
FILTER p.edges[*].DayofMonth ALL == 1
LIMIT 5
RETURN p
STEP3
使用DATE_DIFF() 函數計算出發時間與到達時間的差值,然後將結果升序排列
FOR v, e, p IN 2 OUTBOUND 'airports/BIS' flights
FILTER v._id == 'airports/JFK'
FILTER p.edges[*].Month ALL == 1
FILTER p.edges[*].DayofMonth ALL == 1
FILTER DATE_ADD(p.edges[0].ArrTimeUTC, 20, 'minutes') < p.edges[1].DepTimeUTC
LET flightTime = DATE_DIFF(p.edges[0].DepTimeUTC, p.edges[1].ArrTimeUTC, 'i')
SORT flightTime ASC
LIMIT 5
RETURN { flight: p, time: flightTime }
我們來看一下這句AQL的各種Node
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
2 TraversalNode 1002001 - FOR v /* vertex */, p /* paths */ IN 2..2 /* min..maxPathDepth */ OUTBOUND 'airports/BIS' /* startnode */ flights
3 CalculationNode 1002001 - LET #10 = ((v.`_id` == "airports/JFK") && (p.`edges`[1].`DepTimeUTC` > DATE_ADD(p.`edges`[0].`ArrTimeUTC`, 20, "minutes"))) /* simple expression */
4 FilterNode 1002001 - FILTER #10
11 CalculationNode 1002001 - LET flightTime = DATE_DIFF(p.`edges`[0].`DepTimeUTC`, p.`edges`[1].`ArrTimeUTC`, "i") /* simple expression */
12 SortNode 1002001 - SORT flightTime ASC /* sorting strategy: constrained heap */
13 LimitNode 5 - LIMIT 0, 5
14 CalculationNode 5 - LET #18 = { "flight" : p, "time" : flightTime } /* simple expression */
15 ReturnNode 5 - RETURN #18
Indexes used:
By Name Type Collection Unique Sparse Selectivity Fields Ranges
2 edge edge flights false false 0.10 % [ `_from` ] base OUTBOUND
Functions used:
Name Deterministic Cacheable Uses V8
DATE_ADD true true false
DATE_DIFF true true false
Traversals on graphs:
Id Depth Vertex collections Edge collections Options Filter / Prune Conditions
2 2..2 flights uniqueVertices: none, uniqueEdges: path FILTER ((p.`edges`[*].`Month` all == 1) && (p.`edges`[*].`DayofMonth` all == 1))
Optimization rules applied:
Id RuleName
1 move-calculations-up
2 move-filters-up
3 move-calculations-up-2
4 move-filters-up-2
5 optimize-traversals
6 remove-filter-covered-by-traversal
7 remove-unnecessary-calculations-2
8 fuse-filters
9 sort-limit
優化
在這個例子中,我們的查詢需要遍歷非常多的邊,其中有些邊是不需要去遍歷的。我們這裏用vertex-centric index方法來優化。
嘗試給這三個字段加hash索引,
_from, Month, DayofMonth
在WEBUI中,發現無法添加hash和skiplist索引,原因官方解釋如下:
The hash index type is deprecated for the RocksDB storage engine. It is the same as the persistent type when using RocksDB. The type hash is still allowed for backward compatibility in the APIs, but the web interface does not offer this type anymore.
The skiplist index type is deprecated for the RocksDB storage engine. It is the same as the persistent type when using RocksDB. The type skiplist is still allowed for backward compatibility in the APIs, but the web interface does not offer this type anymore.
再上一段官方對 Persistent 索引的說明:
The persistent index type is deprecated for the MMFiles storage engine. Use the RocksDB storage engine instead, where all indexes are persistent.
The index types hash, skiplist and persistent are equivalent when using the RocksDB storage engine. The types hash and skiplist are still allowed for backward compatibility in the APIs, but the web interface does not offer these types anymore.
最終,使用WebUI是給這3個字段加了persistent類型索引後,再執行 STEP3的查詢,性能明顯提升了將近30%。
原理解釋:
如果沒有以頂點爲中心的索引,則需要跟蹤出發機場的所有外出邊緣,然後檢查它們是否滿足我們的條件(在某一天,到達期望的目的地,具有可行的中轉)。
我們創建的新索引允許在某一天(Month,DayofMonth屬性)內快速查找離開機場的外部邊緣(_from屬性),這消除了在不同天提取和過濾所有邊緣的需要。它減少了需要用原始索引檢查邊緣的數量,並節省了相當長的時間。
以下是官方對Hash索引用法的的一些解釋:
A hash index can be used to quickly find documents with specific attribute values. The hash index is unsorted, so it supports equality lookups but no range queries or sorting.
A hash index can be created on one or multiple document attributes. A hash index will only be used by a query if all index attributes are present in the search condition, and if all attributes are compared using the equality (==
) operator. Hash indexes are used from within AQL and several query functions, e.g. byExample
, firstExample
etc.
Hash indexes can optionally be declared unique, then disallowing saving the same value(s) in the indexed attribute(s). Hash indexes can optionally be sparse.
The different types of hash indexes have the following characteristics:
-
unique hash index: all documents in the collection must have different values for the attributes covered by the unique index. Trying to insert a document with the same key value as an already existing document will lead to a unique constraint violation.
This type of index is not sparse. Documents that do not contain the index attributes or that have a value of
null
in the index attribute(s) will still be indexed. A key value ofnull
may only occur once in the index, so this type of index cannot be used for optional attributes.The unique option can also be used to ensure that no duplicate edges are created, by adding a combined index for the fields
_from
and_to
to an edge collection. -
unique, sparse hash index: all documents in the collection must have different values for the attributes covered by the unique index. Documents in which at least one of the index attributes is not set or has a value of
null
are not included in the index. This type of index can be used to ensure that there are no duplicate keys in the collection for documents which have the indexed attributes set. As the index will exclude documents for which the indexed attributes arenull
or not set, it can be used for optional attributes. -
non-unique hash index: all documents in the collection will be indexed. This type of index is not sparse. Documents that do not contain the index attributes or that have a value of
null
in the index attribute(s) will still be indexed. Duplicate key values can occur and do not lead to unique constraint violations. -
non-unique, sparse hash index: only those documents will be indexed that have all the indexed attributes set to a value other than
null
. It can be used for optional attributes.
參考資料
https://www.arangodb.com/docs/stable/indexing-index-basics.html
https://www.cnblogs.com/minglex/p/9383849.html