ArangoDB-AQL简单操作

尝试几个简单的查询

1、返回数据集"airports"中所有的airports:

FOR airport IN airports
    RETURN airport 

2、只返回California的airports:

FOR airport IN airports
    FILTER airport.state == "CA"
    RETURN airport

3、返回每个国家的机场数量

FOR airport IN airports
    COLLECT state = airport.state
    WITH COUNT INTO counter
    RETURN {state, counter}

在上面的代码示例中,所有关键字COLLECT、WITH和RETURN等都是大写的,但它只是一个约定。你也可以将所有关键词小写或混合大小写。但是变量名、属性名和集合名是区分大小写的。

图查询

1、返回能到达洛杉矶国际机场(Lax)的所有机场

FOR airport IN OUTBOUND 'airports/LAX' flights
    RETURN DISTINCT airport

2、返回10个洛杉矶的航班和他们的目的地

FOR airport, flight IN OUTBOUND 'airports/LAX' flights
    LIMIT 10
    RETURN {airport, flight}

遍历图

对于最小深度大于2的遍历,有两个选项可以选择:

  • 深度优先(默认):继续沿着从起始顶点到该路径上的最后顶点的边缘,或者直到达到最大遍历深度,然后向下走其他路径

  • 广度优先(可选):从开始顶点到下一个级别遵循所有边缘,然后按另一个级别跟踪邻居的所有边缘,并继续这个模式,直到没有更多的边缘跟随或达到最大的遍历深度。

返回LAX直达的所有机场:

FOR airport IN OUTBOUND 'airports/LAX' flights
    OPTIONS {bfs: true, uniqueVertices: 'global'}
    RETURN airport

通过执行时间与之前的查询进行比较,返回相同的机场:

FOR airport IN OUTBOUND 'airports/LAX' flights
    RETURN DISTINCT airport
FOR airport IN OUTBOUND 'airports/LAX' flights
    OPTIONS {bfs: true, uniqueVertices: 'global'}
    RETURN DISTINCT airport

对比这两次结果,将看到显著的性能改进。也就是说,特定场景下使用广度遍历法会加快性能。

AQL中的 LET 关键字:

简单表达式以及整个子查询的结果可以存储在变量中。若要声明变量,请使用LET关键字,后面跟着变量名、等号和表达式。如果表达式是子查询,则代码必须位于括号中。

在下面的示例中,预先计算出发时间的时间和分钟,并将其存储在变量H和M中。

FOR f IN flights
    FILTER f._from == 'airports/BIS'
    LIMIT 100
    LET h = FLOOR(f.DepTime / 100)
    LET m = f.DepTime % 100
    RETURN {
        year: f.Year,
        month: f.Month,
        day: f.DayofMonth,
        time: f.DepTime,
        iso: DATE_ISO8601(f.Year, f.Month, f.DayofMonth, h, m)
    }

最短路径(Shortest_Path)

最短路径查询在两个给定文档之间找到连接,其边缘数量最少。

寻找机场BIS和JFK之间的最短路径:

FOR v IN OUTBOUND
SHORTEST_PATH 'airports/BIS'
TO 'airports/JFK' flights
RETURN v

通过查询解释器,我们可以看到 默认For循环遍历,省略了 startnode 、索引命中情况、优化规则应用情况等信息

Query String (81 chars, cacheable : true):
 FOR v IN OUTBOUND
 SHORTEST_PATH 'airports/BIS'
 TO 'airports/JFK' flights
 RETURN v

Execution plan:
 Id   NodeType           Est.   Comment
  1   SingletonNode         1   * ROOT
  2   ShortestPathNode      0     - FOR v  /* vertex */ IN OUTBOUND SHORTEST_PATH 'airports/BIS' /* startnode */ TO 'airports/JFK' /* targetnode */ flights
  3   ReturnNode            0     - RETURN v

Indexes used:
 none

Shortest paths on graphs:
 Id   Vertex collections   Edge collections
  2                        flights            

Optimization rules applied:
 none


返回从BIS到JFK的最小航班数:

LET airports = (
    FOR v IN OUTBOUND
    SHORTEST_PATH 'airports/BIS'
    TO 'airports/JFK' flights
    RETURN v
)
RETURN LENGTH(airports) - 1

LENGTH 函数可返回结果集的记录数,这里可用于 表示最短路径深度

模式匹配(Pattern Matching)

目标:找出BIS与JFK之间花费时间最短的路径

STEP1

筛选BIS到JFK的所有路径,由于在shortest path中最短路径深度为2,所以这里直接使用“IN 2 OUTBOUND

FOR v, e, p IN 2 OUTBOUND 'airports/BIS' flights
FILTER v._id == 'airports/JFK'
LIMIT 5
RETURN p

STEP2

筛选一天内的路径,这里以1月1号为例

FOR v, e, p IN 2 OUTBOUND 'airports/BIS' flights
FILTER v._id == 'airports/JFK'
FILTER p.edges[*].Month ALL == 1
FILTER p.edges[*].DayofMonth ALL == 1
LIMIT 5
RETURN p

STEP3

使用DATE_DIFF() 函数计算出发时间与到达时间的差值,然后将结果升序排列

FOR v, e, p IN 2 OUTBOUND 'airports/BIS' flights
FILTER v._id == 'airports/JFK'
FILTER p.edges[*].Month ALL == 1
FILTER p.edges[*].DayofMonth ALL == 1
FILTER DATE_ADD(p.edges[0].ArrTimeUTC, 20, 'minutes') < p.edges[1].DepTimeUTC
LET flightTime = DATE_DIFF(p.edges[0].DepTimeUTC, p.edges[1].ArrTimeUTC, 'i')
SORT flightTime ASC
LIMIT 5
RETURN { flight: p, time: flightTime }

我们来看一下这句AQL的各种Node

Execution plan:
 Id   NodeType             Est.   Comment
  1   SingletonNode           1   * ROOT
  2   TraversalNode     1002001     - FOR v  /* vertex */, p  /* paths */ IN 2..2  /* min..maxPathDepth */ OUTBOUND 'airports/BIS' /* startnode */  flights
  3   CalculationNode   1002001       - LET #10 = ((v.`_id` == "airports/JFK") && (p.`edges`[1].`DepTimeUTC` > DATE_ADD(p.`edges`[0].`ArrTimeUTC`, 20, "minutes")))   /* simple expression */
  4   FilterNode        1002001       - FILTER #10
 11   CalculationNode   1002001       - LET flightTime = DATE_DIFF(p.`edges`[0].`DepTimeUTC`, p.`edges`[1].`ArrTimeUTC`, "i")   /* simple expression */
 12   SortNode          1002001       - SORT flightTime ASC   /* sorting strategy: constrained heap */
 13   LimitNode               5       - LIMIT 0, 5
 14   CalculationNode         5       - LET #18 = { "flight" : p, "time" : flightTime }   /* simple expression */
 15   ReturnNode              5       - RETURN #18

Indexes used:
 By   Name   Type   Collection   Unique   Sparse   Selectivity   Fields        Ranges
  2   edge   edge   flights      false    false         0.10 %   [ `_from` ]   base OUTBOUND

Functions used:
 Name        Deterministic   Cacheable   Uses V8
 DATE_ADD    true            true        false  
 DATE_DIFF   true            true        false  

Traversals on graphs:
 Id  Depth  Vertex collections  Edge collections  Options                                  Filter / Prune Conditions                                                       
 2   2..2                       flights           uniqueVertices: none, uniqueEdges: path  FILTER ((p.`edges`[*].`Month` all == 1) && (p.`edges`[*].`DayofMonth` all == 1))

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   move-calculations-up-2
  4   move-filters-up-2
  5   optimize-traversals
  6   remove-filter-covered-by-traversal
  7   remove-unnecessary-calculations-2
  8   fuse-filters
  9   sort-limit


优化

在这个例子中,我们的查询需要遍历非常多的边,其中有些边是不需要去遍历的。我们这里用vertex-centric index方法来优化。

尝试给这三个字段加hash索引,

_from, Month, DayofMonth 

在WEBUI中,发现无法添加hash和skiplist索引,原因官方解释如下:

The hash index type is deprecated for the RocksDB storage engine. It is the same as the persistent type when using RocksDB. The type hash is still allowed for backward compatibility in the APIs, but the web interface does not offer this type anymore.

The skiplist index type is deprecated for the RocksDB storage engine. It is the same as the persistent type when using RocksDB. The type skiplist is still allowed for backward compatibility in the APIs, but the web interface does not offer this type anymore.

再上一段官方对 Persistent 索引的说明:

The persistent index type is deprecated for the MMFiles storage engine. Use the RocksDB storage engine instead, where all indexes are persistent.

The index types hash, skiplist and persistent are equivalent when using the RocksDB storage engine. The types hash and skiplist are still allowed for backward compatibility in the APIs, but the web interface does not offer these types anymore.

最终,使用WebUI是给这3个字段加了persistent类型索引后,再执行 STEP3的查询,性能明显提升了将近30%。

原理解释:

如果没有以顶点为中心的索引,则需要跟踪出发机场的所有外出边缘,然后检查它们是否满足我们的条件(在某一天,到达期望的目的地,具有可行的中转)。

我们创建的新索引允许在某一天(Month,DayofMonth属性)内快速查找离开机场的外部边缘(_from属性),这消除了在不同天提取和过滤所有边缘的需要。它减少了需要用原始索引检查边缘的数量,并节省了相当长的时间。

以下是官方对Hash索引用法的的一些解释:

A hash index can be used to quickly find documents with specific attribute values. The hash index is unsorted, so it supports equality lookups but no range queries or sorting.

A hash index can be created on one or multiple document attributes. A hash index will only be used by a query if all index attributes are present in the search condition, and if all attributes are compared using the equality (==) operator. Hash indexes are used from within AQL and several query functions, e.g. byExample, firstExample etc.

Hash indexes can optionally be declared unique, then disallowing saving the same value(s) in the indexed attribute(s). Hash indexes can optionally be sparse.

The different types of hash indexes have the following characteristics:

  • unique hash index: all documents in the collection must have different values for the attributes covered by the unique index. Trying to insert a document with the same key value as an already existing document will lead to a unique constraint violation.

    This type of index is not sparse. Documents that do not contain the index attributes or that have a value of null in the index attribute(s) will still be indexed. A key value of null may only occur once in the index, so this type of index cannot be used for optional attributes.

    The unique option can also be used to ensure that no duplicate edges are created, by adding a combined index for the fields _from and _to to an edge collection.

  • unique, sparse hash index: all documents in the collection must have different values for the attributes covered by the unique index. Documents in which at least one of the index attributes is not set or has a value of null are not included in the index. This type of index can be used to ensure that there are no duplicate keys in the collection for documents which have the indexed attributes set. As the index will exclude documents for which the indexed attributes are null or not set, it can be used for optional attributes.

  • non-unique hash index: all documents in the collection will be indexed. This type of index is not sparse. Documents that do not contain the index attributes or that have a value of null in the index attribute(s) will still be indexed. Duplicate key values can occur and do not lead to unique constraint violations.

  • non-unique, sparse hash index: only those documents will be indexed that have all the indexed attributes set to a value other than null. It can be used for optional attributes.

参考资料

https://www.arangodb.com/docs/stable/indexing-index-basics.html

https://www.cnblogs.com/minglex/p/9383849.html

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章