ArangoDB-AQL簡單操作

嘗試幾個簡單的查詢

1、返回數據集"airports"中所有的airports:

FOR airport IN airports
    RETURN airport 

2、只返回California的airports:

FOR airport IN airports
    FILTER airport.state == "CA"
    RETURN airport

3、返回每個國家的機場數量

FOR airport IN airports
    COLLECT state = airport.state
    WITH COUNT INTO counter
    RETURN {state, counter}

在上面的代碼示例中,所有關鍵字COLLECT、WITH和RETURN等都是大寫的,但它只是一個約定。你也可以將所有關鍵詞小寫或混合大小寫。但是變量名、屬性名和集合名是區分大小寫的。

圖查詢

1、返回能到達洛杉磯國際機場(Lax)的所有機場

FOR airport IN OUTBOUND 'airports/LAX' flights
    RETURN DISTINCT airport

2、返回10個洛杉磯的航班和他們的目的地

FOR airport, flight IN OUTBOUND 'airports/LAX' flights
    LIMIT 10
    RETURN {airport, flight}

遍歷圖

對於最小深度大於2的遍歷,有兩個選項可以選擇:

  • 深度優先(默認):繼續沿着從起始頂點到該路徑上的最後頂點的邊緣,或者直到達到最大遍歷深度,然後向下走其他路徑

  • 廣度優先(可選):從開始頂點到下一個級別遵循所有邊緣,然後按另一個級別跟蹤鄰居的所有邊緣,並繼續這個模式,直到沒有更多的邊緣跟隨或達到最大的遍歷深度。

返回LAX直達的所有機場:

FOR airport IN OUTBOUND 'airports/LAX' flights
    OPTIONS {bfs: true, uniqueVertices: 'global'}
    RETURN airport

通過執行時間與之前的查詢進行比較,返回相同的機場:

FOR airport IN OUTBOUND 'airports/LAX' flights
    RETURN DISTINCT airport
FOR airport IN OUTBOUND 'airports/LAX' flights
    OPTIONS {bfs: true, uniqueVertices: 'global'}
    RETURN DISTINCT airport

對比這兩次結果,將看到顯著的性能改進。也就是說,特定場景下使用廣度遍歷法會加快性能。

AQL中的 LET 關鍵字:

簡單表達式以及整個子查詢的結果可以存儲在變量中。若要聲明變量,請使用LET關鍵字,後面跟着變量名、等號和表達式。如果表達式是子查詢,則代碼必須位於括號中。

在下面的示例中,預先計算出發時間的時間和分鐘,並將其存儲在變量H和M中。

FOR f IN flights
    FILTER f._from == 'airports/BIS'
    LIMIT 100
    LET h = FLOOR(f.DepTime / 100)
    LET m = f.DepTime % 100
    RETURN {
        year: f.Year,
        month: f.Month,
        day: f.DayofMonth,
        time: f.DepTime,
        iso: DATE_ISO8601(f.Year, f.Month, f.DayofMonth, h, m)
    }

最短路徑(Shortest_Path)

最短路徑查詢在兩個給定文檔之間找到連接,其邊緣數量最少。

尋找機場BIS和JFK之間的最短路徑:

FOR v IN OUTBOUND
SHORTEST_PATH 'airports/BIS'
TO 'airports/JFK' flights
RETURN v

通過查詢解釋器,我們可以看到 默認For循環遍歷,省略了 startnode 、索引命中情況、優化規則應用情況等信息

Query String (81 chars, cacheable : true):
 FOR v IN OUTBOUND
 SHORTEST_PATH 'airports/BIS'
 TO 'airports/JFK' flights
 RETURN v

Execution plan:
 Id   NodeType           Est.   Comment
  1   SingletonNode         1   * ROOT
  2   ShortestPathNode      0     - FOR v  /* vertex */ IN OUTBOUND SHORTEST_PATH 'airports/BIS' /* startnode */ TO 'airports/JFK' /* targetnode */ flights
  3   ReturnNode            0     - RETURN v

Indexes used:
 none

Shortest paths on graphs:
 Id   Vertex collections   Edge collections
  2                        flights            

Optimization rules applied:
 none


返回從BIS到JFK的最小航班數:

LET airports = (
    FOR v IN OUTBOUND
    SHORTEST_PATH 'airports/BIS'
    TO 'airports/JFK' flights
    RETURN v
)
RETURN LENGTH(airports) - 1

LENGTH 函數可返回結果集的記錄數,這裏可用於 表示最短路徑深度

模式匹配(Pattern Matching)

目標:找出BIS與JFK之間花費時間最短的路徑

STEP1

篩選BIS到JFK的所有路徑,由於在shortest path中最短路徑深度爲2,所以這裏直接使用“IN 2 OUTBOUND

FOR v, e, p IN 2 OUTBOUND 'airports/BIS' flights
FILTER v._id == 'airports/JFK'
LIMIT 5
RETURN p

STEP2

篩選一天內的路徑,這裏以1月1號爲例

FOR v, e, p IN 2 OUTBOUND 'airports/BIS' flights
FILTER v._id == 'airports/JFK'
FILTER p.edges[*].Month ALL == 1
FILTER p.edges[*].DayofMonth ALL == 1
LIMIT 5
RETURN p

STEP3

使用DATE_DIFF() 函數計算出發時間與到達時間的差值,然後將結果升序排列

FOR v, e, p IN 2 OUTBOUND 'airports/BIS' flights
FILTER v._id == 'airports/JFK'
FILTER p.edges[*].Month ALL == 1
FILTER p.edges[*].DayofMonth ALL == 1
FILTER DATE_ADD(p.edges[0].ArrTimeUTC, 20, 'minutes') < p.edges[1].DepTimeUTC
LET flightTime = DATE_DIFF(p.edges[0].DepTimeUTC, p.edges[1].ArrTimeUTC, 'i')
SORT flightTime ASC
LIMIT 5
RETURN { flight: p, time: flightTime }

我們來看一下這句AQL的各種Node

Execution plan:
 Id   NodeType             Est.   Comment
  1   SingletonNode           1   * ROOT
  2   TraversalNode     1002001     - FOR v  /* vertex */, p  /* paths */ IN 2..2  /* min..maxPathDepth */ OUTBOUND 'airports/BIS' /* startnode */  flights
  3   CalculationNode   1002001       - LET #10 = ((v.`_id` == "airports/JFK") && (p.`edges`[1].`DepTimeUTC` > DATE_ADD(p.`edges`[0].`ArrTimeUTC`, 20, "minutes")))   /* simple expression */
  4   FilterNode        1002001       - FILTER #10
 11   CalculationNode   1002001       - LET flightTime = DATE_DIFF(p.`edges`[0].`DepTimeUTC`, p.`edges`[1].`ArrTimeUTC`, "i")   /* simple expression */
 12   SortNode          1002001       - SORT flightTime ASC   /* sorting strategy: constrained heap */
 13   LimitNode               5       - LIMIT 0, 5
 14   CalculationNode         5       - LET #18 = { "flight" : p, "time" : flightTime }   /* simple expression */
 15   ReturnNode              5       - RETURN #18

Indexes used:
 By   Name   Type   Collection   Unique   Sparse   Selectivity   Fields        Ranges
  2   edge   edge   flights      false    false         0.10 %   [ `_from` ]   base OUTBOUND

Functions used:
 Name        Deterministic   Cacheable   Uses V8
 DATE_ADD    true            true        false  
 DATE_DIFF   true            true        false  

Traversals on graphs:
 Id  Depth  Vertex collections  Edge collections  Options                                  Filter / Prune Conditions                                                       
 2   2..2                       flights           uniqueVertices: none, uniqueEdges: path  FILTER ((p.`edges`[*].`Month` all == 1) && (p.`edges`[*].`DayofMonth` all == 1))

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   move-calculations-up-2
  4   move-filters-up-2
  5   optimize-traversals
  6   remove-filter-covered-by-traversal
  7   remove-unnecessary-calculations-2
  8   fuse-filters
  9   sort-limit


優化

在這個例子中,我們的查詢需要遍歷非常多的邊,其中有些邊是不需要去遍歷的。我們這裏用vertex-centric index方法來優化。

嘗試給這三個字段加hash索引,

_from, Month, DayofMonth 

在WEBUI中,發現無法添加hash和skiplist索引,原因官方解釋如下:

The hash index type is deprecated for the RocksDB storage engine. It is the same as the persistent type when using RocksDB. The type hash is still allowed for backward compatibility in the APIs, but the web interface does not offer this type anymore.

The skiplist index type is deprecated for the RocksDB storage engine. It is the same as the persistent type when using RocksDB. The type skiplist is still allowed for backward compatibility in the APIs, but the web interface does not offer this type anymore.

再上一段官方對 Persistent 索引的說明:

The persistent index type is deprecated for the MMFiles storage engine. Use the RocksDB storage engine instead, where all indexes are persistent.

The index types hash, skiplist and persistent are equivalent when using the RocksDB storage engine. The types hash and skiplist are still allowed for backward compatibility in the APIs, but the web interface does not offer these types anymore.

最終,使用WebUI是給這3個字段加了persistent類型索引後,再執行 STEP3的查詢,性能明顯提升了將近30%。

原理解釋:

如果沒有以頂點爲中心的索引,則需要跟蹤出發機場的所有外出邊緣,然後檢查它們是否滿足我們的條件(在某一天,到達期望的目的地,具有可行的中轉)。

我們創建的新索引允許在某一天(Month,DayofMonth屬性)內快速查找離開機場的外部邊緣(_from屬性),這消除了在不同天提取和過濾所有邊緣的需要。它減少了需要用原始索引檢查邊緣的數量,並節省了相當長的時間。

以下是官方對Hash索引用法的的一些解釋:

A hash index can be used to quickly find documents with specific attribute values. The hash index is unsorted, so it supports equality lookups but no range queries or sorting.

A hash index can be created on one or multiple document attributes. A hash index will only be used by a query if all index attributes are present in the search condition, and if all attributes are compared using the equality (==) operator. Hash indexes are used from within AQL and several query functions, e.g. byExample, firstExample etc.

Hash indexes can optionally be declared unique, then disallowing saving the same value(s) in the indexed attribute(s). Hash indexes can optionally be sparse.

The different types of hash indexes have the following characteristics:

  • unique hash index: all documents in the collection must have different values for the attributes covered by the unique index. Trying to insert a document with the same key value as an already existing document will lead to a unique constraint violation.

    This type of index is not sparse. Documents that do not contain the index attributes or that have a value of null in the index attribute(s) will still be indexed. A key value of null may only occur once in the index, so this type of index cannot be used for optional attributes.

    The unique option can also be used to ensure that no duplicate edges are created, by adding a combined index for the fields _from and _to to an edge collection.

  • unique, sparse hash index: all documents in the collection must have different values for the attributes covered by the unique index. Documents in which at least one of the index attributes is not set or has a value of null are not included in the index. This type of index can be used to ensure that there are no duplicate keys in the collection for documents which have the indexed attributes set. As the index will exclude documents for which the indexed attributes are null or not set, it can be used for optional attributes.

  • non-unique hash index: all documents in the collection will be indexed. This type of index is not sparse. Documents that do not contain the index attributes or that have a value of null in the index attribute(s) will still be indexed. Duplicate key values can occur and do not lead to unique constraint violations.

  • non-unique, sparse hash index: only those documents will be indexed that have all the indexed attributes set to a value other than null. It can be used for optional attributes.

參考資料

https://www.arangodb.com/docs/stable/indexing-index-basics.html

https://www.cnblogs.com/minglex/p/9383849.html

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章