Elasticsearch:Elasticsearch SQL介紹及實例(二)

在之前的文章“Elasticsearch:Elasticsearch SQL介紹及實例”裏,我們簡要介紹了新的Elasticsearch SQL功能以及_translate API。 這篇特定的文章通過探索更復雜的功能來繼續該系列。如果你還沒準備好自己的數據,請先閱讀我前面指出來的文章。

 

複雜的例子和Elasticsearch的優點

Grouping

Elasticsearch的聚合框架(能夠彙總數十億個數據點)代表了堆棧中最強大和最受歡迎的功能之一。 從功能的角度來看,它與SQL中的GROUP BY運算符具有自然的等效性。 除了提供一些GROUP BY功能的示例外,我們還將再次使用translation API來顯示等效的聚合。

找到飛往倫敦的每個來源目的地國家的平均飛行時間。 按照國家的字母順序排列。

sql> SELECT AVG(FlightTimeHour) Avg_Flight_Time, OriginCountry FROM flights GROUP BY OriginCountry ORDER BY OriginCountry LIMIT 5;
 Avg_Flight_Time  | OriginCountry 
------------------+---------------
9.342180244924574 |AE             
13.49582274385201 |AR             
4.704097126921018 |AT             
15.081367354940724|AU             
7.998943401875511 |CA  

檢查此查詢的DSL將顯示“composite aggregation”的使用。

GET flights/_search
{
 "size": 0,
  "_source": false,
  "stored_fields": "_none_",
  "aggs": {
    "groupby": {
      "composite": {
        "size": 1000,
        "sources": [
          {
            "3471": {
              "terms": {
                "field": "OriginCountry.keyword",
                "order": "asc"
              }
            }
          }
        ]
      },
      "aggs": {
        "3485": {
          "avg": {
            "field": "FlightTimeHour"
          }
        }
      }
    }
  }
}

這裏使用的是composite aggregation。它可以幫我實現在aggregration裏的scroll功能。如果大家對這個不是很明白的話,請參閱我的另外一篇文章“在Elasticsearch中的Composite Aggregation”。上面查詢的結果返回的是:

{
  "took" : 21,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "groupby" : {
      "after_key" : {
        "3471" : "ZA"
      },
      "buckets" : [
        {
          "key" : {
            "3471" : "AE"
          },
          "doc_count" : 385,
          "3485" : {
            "value" : 9.342180244924574
          }
        },
        {
          "key" : {
            "3471" : "AR"
          },
          "doc_count" : 258,
          "3485" : {
            "value" : 13.49582274385201
          }
        },
        {
          "key" : {
            "3471" : "AT"
          },
          "doc_count" : 120,
          "3485" : {
            "value" : 4.704097126921018
          }
        },
        {
          "key" : {
            "3471" : "AU"
          },
          "doc_count" : 518,
          "3485" : {
            "value" : 15.081367354940724
          }
        },
...

我們還可以使用函數對select中定義的別名字段進行分組。

查找每月航班的數量和平均飛行時間。

POST /_sql?format=txt
{
  "query":"SELECT COUNT(*), MONTH_OF_YEAR(timestamp) AS month_of_year, AVG(FlightTimeHour) AS Avg_Flight_Time FROM flights GROUP BY month_of_year"
}

上面的查詢結果是:

   COUNT(*)    | month_of_year | Avg_Flight_Time 
---------------+---------------+-----------------
5687           |4              |8.578573065474027
7372           |5              |8.472684454688286

Composite aggregation的使用具有一個主要優點-可以確保GROUP BY實現甚至可擴展用於高基數字段,並提供一種機制來流傳輸特定聚合的所有存儲桶,類似於滾動對文檔所做的操作。 這也確保了實現不會像使用術語聚合那樣遭受相同的內存限制。 我們可以通過如下命令來翻譯相對應的composite aggregation:

POST /_sql/translate
{
  "query":"SELECT AVG(FlightTimeHour) Avg_Flight_Time, OriginCountry FROM flights GROUP BY OriginCountry ORDER BY Avg_Flight_Time"
}

相應的翻譯的結果是:

{
  "size" : 0,
  "_source" : false,
  "stored_fields" : "_none_",
  "aggregations" : {
    "groupby" : {
      "composite" : {
        "size" : 1000,
        "sources" : [
          {
            "bee1e422" : {
              "terms" : {
                "field" : "OriginCountry.keyword",
                "missing_bucket" : true,
                "order" : "asc"
              }
            }
          }
        ]
      },
      "aggregations" : {
        "803ccc93" : {
          "avg" : {
            "field" : "FlightTimeHour"
          }
        }
      }
    }
  }
}

Filtering Groups

爲了過濾組,我們可以利用HAVING運算符,該運算符也可以利用SELECT子句中指定的別名。 這對於某些SQL專家可能是不尋常的,因爲在基於RDBMS的實現中通常是不可能的,因爲SELECT是在HAVING之後執行的。 在這裏,HAVING子句使用的是在執行階段聲明的別名。 但是,我們的分析器足夠聰明,可以向前看,並選擇要在HAVING中使用的聲明。

找到每個出發城市的航班數量,平均飛行距離和第95個百分位,平均距離在3000到4000英里之間。

sql> SELECT OriginCityName, ROUND(AVG(DistanceKilometers)) avg_distance, COUNT(*) c, ROUND(PERCENTILE(DistanceKilometers,95)) AS percentile_distance FROM flights GROUP BY OriginCityName HAVING avg_distance BETWEEN 3000 AND 4000;
OriginCityName | avg_distance  |       c       |percentile_distance
---------------+---------------+---------------+-------------------
Verona         |3078.0         |120            |7927.0             
Vienna         |3596.0         |120            |7436.0             
Xi'an          |3842.0         |114            |7964.0     

爲了實現HAVING功能,SQL Elasticsearch利用Bucket Selector管道聚合,使用參數化的painless 腳本過濾值。 請注意下面的內容,將自動爲聚合選擇OriginCityName字段的關鍵字變體,而不是嘗試使用標準文本變體,這可能由於未啓用字段數據而失敗。 avgpercentile指標聚合提供與SQL變體等效的功能。

POST /_sql/translate
{
 "query": """
   SELECT OriginCityName, ROUND(AVG(DistanceKilometers)) avg_distance, COUNT(*) c, ROUND(PERCENTILE(DistanceKilometers,95)) AS percentile_distance FROM flights GROUP BY OriginCityName HAVING avg_distance BETWEEN 3000 AND 4000
 """
}

上面翻譯的結果是:

{
  "size" : 0,
  "_source" : false,
  "stored_fields" : "_none_",
  "aggregations" : {
    "groupby" : {
      "composite" : {
        "size" : 1000,
        "sources" : [
          {
            "ff6ca116" : {
              "terms" : {
                "field" : "OriginCityName.keyword",
                "missing_bucket" : true,
                "order" : "asc"
              }
            }
          }
        ]
      },
      "aggregations" : {
        "b54e054" : {
          "avg" : {
            "field" : "DistanceKilometers"
          }
        },
        "7171c519" : {
          "percentiles" : {
            "field" : "DistanceKilometers",
            "percents" : [
              95.0
            ],
            "keyed" : true,
            "tdigest" : {
              "compression" : 100.0
            }
          }
        },
        "having.8bcff206" : {
          "bucket_selector" : {
            "buckets_path" : {
              "a0" : "b54e054",
              "a1" : "b54e054"
            },
            "script" : {
              "source" : "InternalSqlScriptUtils.nullSafeFilter(InternalSqlScriptUtils.and(InternalSqlScriptUtils.gte(InternalSqlScriptUtils.round(params.a0,params.v0), params.v1), InternalSqlScriptUtils.lte(InternalSqlScriptUtils.round(params.a1,params.v2), params.v3)))",
              "lang" : "painless",
              "params" : {
                "v0" : null,
                "v1" : 3000,
                "v2" : null,
                "v3" : 4000
              }
            },
            "gap_policy" : "skip"
          }
        }
      }
    }
  }
}

文字運算符和相關性

與傳統的RDBMS相比,Elasticsearch作爲搜索引擎的獨特功能之一是它能夠通過使用相關性計算來考慮文本數據的屬性,從而對匹配進行評分,而不僅僅是簡單的“是/否”。擴展SQL語法使我們可以公開此功能,並且超越了傳統RDBMS可能提供的功能。

因此,我們引入了兩個新的運算符:QUERY和MATCH。對於熟悉Elasticsearch的人員,這些等效於基礎的multi_matchquery_string運算符。 Kibana的用戶將熟悉query_string運算符的行爲,因爲它用於爲默認搜索欄提供動力。它提供了智能的解析功能,並允許自然的語言風格的查詢。這兩個運算符的詳細信息不在本博客的討論範圍之內,但是權威的指南條目對這些概念進行了很好的介紹。

例如,請考慮以下內容:

查找按日期排序的2018-06-06至2018-06-17之間所有往返Kastrup機場的延遲航班。

Edmonton一座服務於加拿大阿爾伯塔省埃德蒙頓市及周邊地區的國際機場,全稱是“Edmonton International Airport”。 使用QUERY運算符,我們只需搜索Edmonton。

sql> SELECT timestamp, FlightNum, OriginCityName, DestCityName FROM flights WHERE QUERY('Edmonton') AND FlightDelay=true AND timestamp > '2018-06-20' AND timestamp < '2020-06-27' ORDER BY timestamp;
       timestamp        |   FlightNum   |OriginCityName | DestCityName  
------------------------+---------------+---------------+---------------
2020-04-14T22:19:48.000Z|1C0ZWE9        |Cologne        |Edmonton       
2020-04-16T04:55:07.000Z|48DVRFT        |Edmonton       |Torino         
2020-04-16T19:17:14.000Z|14KTFQB        |Edmonton       |Oslo           
2020-04-19T06:25:17.000Z|EN9FHUD        |Detroit        |Edmonton       
2020-04-21T20:35:16.000Z|H5Y0MJK        |Edmonton       |Palermo        
2020-04-23T02:03:18.000Z|KCNMKVI        |Edmonton       |Erie           
2020-04-23T09:34:02.000Z|XH9H5H3        |Paris          |Edmonton       
2020-04-25T04:22:28.000Z|GJTJ47T        |Edmonton       |Bangalore      
2020-04-26T13:23:09.000Z|PPZN0Y7        |Edmonton       |Indianapolis   
2020-04-27T00:20:57.000Z|IKFEGFL        |Edmonton       |Warsaw         
2020-04-27T22:11:51.000Z|300JHDQ        |Green Bay      |Edmonton       
2020-04-30T15:02:33.000Z|PK1ETRA        |Rome           |Edmonton       
2020-05-01T17:52:50.000Z|A2NRDPQ        |Edmonton       |Manchester     
2020-05-01T22:19:38.000Z|S9AY152        |Edmonton       |Buenos Aires   
2020-05-03T15:52:05.000Z|PJXXO9P        |Edmonton       |Buenos Aires   
2020-05-05T09:00:47.000Z|QTPABGR        |Edmonton       |Jeju City      
2020-05-05T18:49:49.000Z|YVEUZNO        |Edmonton       |Ottawa         
2020-05-06T12:46:16.000Z|TCPDEBY        |Edmonton       |Bergamo        
2020-05-07T00:00:00.000Z|SW1HB5M        |Abu Dhabi      |Edmonton       
2020-05-07T12:47:25.000Z|0HZ3PHM        |Cape Town      |Edmonton       
2020-05-08T15:26:39.000Z|T5YFSWW        |Paris          |Edmonton       
2020-05-08T16:35:16.000Z|E92FNK2        |Edmonton       |Vienna         
2020-05-09T02:34:40.000Z|PB8BSSH        |Edmonton       |Tokyo          
2020-05-10T14:06:58.000Z|ADWMNQL        |Edmonton       |Zurich         
2020-05-11T15:21:31.000Z|YB4FNOI        |Edmonton       |Vienna         
2020-05-12T22:16:10.000Z|TCE99LO        |Copenhagen     |Edmonton       
2020-05-14T00:19:45.000Z|RBJT1ZG        |Edmonton       |Palermo        
2020-05-15T12:35:39.000Z|M1NHZTB        |Edmonton       |Guangzhou      
2020-05-17T15:23:49.000Z|WC862JS        |Dublin         |Edmonton       
2020-05-18T19:39:08.000Z|99R1VXK        |Edmonton       |Naples         
2020-05-21T05:30:11.000Z|PJP5R9L        |Edmonton       |Portland       
2020-05-21T07:59:04.000Z|PK7R8IF        |Edmonton       |Winnipeg       
2020-05-22T00:00:00.000Z|RLMOSMO        |Edmonton       |Rome           
2020-05-22T17:10:22.000Z|K0SUJFG        |Tokoname       |Edmonton       
2020-05-22T19:06:34.000Z|ECEIAND        |Edmonton       |Treviso        
2020-05-23T01:20:52.000Z|VG2K3M9        |Amsterdam      |Edmonton       
2020-05-23T22:34:45.000Z|8FXIRFY        |Edmonton       |Miami   

注意,這裏沒有要求指定該字段。 只需使用QUERY運算符搜索“Edmonton”就足夠了。 此外,請注意,我們往返卡斯特魯普的航班都延遲了。 Elasticsearch查詢在這裏:

POST /_sql/translate
{
  "query": """
    SELECT timestamp, FlightNum, OriginCityName, DestCityName FROM flights WHERE QUERY('Edmonton') AND FlightDelay=true AND timestamp > '2018-06-20' AND timestamp < '2020-06-27' ORDER BY timestamp
   """
}
{
  "size" : 1000,
  "query" : {
    "bool" : {
      "must" : [
        {
          "bool" : {
            "must" : [
              {
                "query_string" : {
                  "query" : "Edmonton",
                  "fields" : [ ],
                  "type" : "best_fields",
                  "default_operator" : "or",
                  "max_determinized_states" : 10000,
                  "enable_position_increments" : true,
                  "fuzziness" : "AUTO",
                  "fuzzy_prefix_length" : 0,
                  "fuzzy_max_expansions" : 50,
                  "phrase_slop" : 0,
                  "escape" : false,
                  "auto_generate_synonyms_phrase_query" : true,
                  "fuzzy_transpositions" : true,
                  "boost" : 1.0
                }
              },
              {
                "term" : {
                  "FlightDelay" : {
                    "value" : true,
                    "boost" : 1.0
                  }
                }
              }
            ],
            "adjust_pure_negative" : true,
            "boost" : 1.0
          }
        },
        {
          "range" : {
            "timestamp" : {
              "from" : "2018-06-20",
              "to" : "2020-06-27",
              "include_lower" : false,
              "include_upper" : false,
              "boost" : 1.0
            }
          }
        }
      ],
      "adjust_pure_negative" : true,
      "boost" : 1.0
    }
  },
  "_source" : {
    "includes" : [
      "FlightNum",
      "OriginCityName",
      "DestCityName"
    ],
    "excludes" : [ ]
  },
  "docvalue_fields" : [
    {
      "field" : "timestamp",
      "format" : "epoch_millis"
    }
  ],
  "sort" : [
    {
      "timestamp" : {
        "order" : "asc",
        "missing" : "_last",
        "unmapped_type" : "date"
      }
    }
  ]
}

對於Elasticsearch的新用戶來說,這代表了一個相對複雜的查詢。 我們有一個帶有嵌套範圍,術語限制和查詢字符串運算符的布爾查詢。 對於從SQL遷移應用程序的用戶而言,這在傳統上可能是一項相當艱鉅的任務,甚至在擔心最終查詢在功能上是否正確和最佳之前也是如此。 實際的query_string運算符已嵌套在過濾器中,因爲不需要相關性(我們按日期排序),從而使我們能夠利用過濾器緩存,跳過評分並縮短響應時間。

這些運算符的參數也在SQL中公開。 最後一個示例說明了如何將MATCH查詢與跨多個字段的多個搜索詞一起使用以限制結果。

“找到往返巴塞羅那的天氣晴朗的航班”

出於示例目的,我們還通過Score() 函數進行排序並顯示相關性得分。

sql> SELECT Score(), timestamp, FlightNum, OriginCityName, DestCityName, DestWeather, OriginWeather FROM flights WHERE MATCH('*Weather,*City*', 'Lightning Barcelona', 'type=cross_fields;operator=AND') ORDER BY Score() DESC LIMIT 5;
    Score()    |       timestamp        |   FlightNum   |OriginCityName | DestCityName  |  DestWeather  |   OriginWeather   
---------------+------------------------+---------------+---------------+---------------+---------------+-------------------
6.917009       |2020-04-16T06:00:41.000Z|L637ISB        |Barcelona      |Santiago       |Rain           |Thunder & Lightning
6.917009       |2020-04-16T01:58:51.000Z|ZTOD7RQ        |Barcelona      |Dubai          |Sunny          |Thunder & Lightning
6.917009       |2020-04-22T14:02:34.000Z|QSQA5CT        |Barcelona      |Naples         |Rain           |Thunder & Lightning
6.917009       |2020-04-29T12:23:44.000Z|0GIHB62        |Barcelona      |Buenos Aires   |Clear          |Thunder & Lightning
6.917009       |2020-04-30T07:42:21.000Z|L09W9TV        |Barcelona      |Dubai          |Cloudy         |Thunder & Lightning

我們使用通配符模式來指定要匹配的字段,並要求匹配爲布爾AND。 跨字段參數不需要術語全部出現在一個字段中,而是允許它們出現在不同的字段中,前提是兩個字段都存在。 給定數據的結構,這對於匹配至關重要。

這裏的示例返回列和組。 但是,QUERY和MATCH運算符也可以與GROUP BY一起使用-有效地過濾到Elasticsearch的聚合。

交叉索引搜索和別名

到目前爲止,我們的查詢僅針對單個表/索引。 如果我們複製flights索引,並通過reindex請求將文檔複製到新的命名版本,則只要兩個索引具有相同的映射,就可以同時查詢這兩個索引。 映射中的任何差異都可能導致查詢在分析時出錯。 爲了一起查詢多個索引,用戶可以將它們添加到Elasticsearch別名中,也可以在WHERE子句中使用通配符。如果大家還記得的話,在上一篇文章“Elasticsearch:Elasticsearch SQL介紹及實例”中,我們已經把之前的索引“kibana_sample_data_flights”通過reindex的方法導入到flight1索引中。現在我們也可以通過如下的方法複製這個索引到索引flight2中。

POST _reindex
{
  "source": {
    "index": "flight1"
  },
  "dest": {
    "index": "flight2"
  }
}

我們可以通過如下的方法來設置flight1和flight2的別名爲f_alias:

POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "flight1",
        "alias": "f_alias"
      }
    },
    {
      "add": {
        "index": "flight2",
        "alias": "f_alias"
      }
    }
  ]
}

那麼我們可以通過如下的方法來查詢:

sql> SELECT FlightNum, OriginCityName, DestCityName, DestWeather, OriginWeather FROM f_alias ORDER BY timestamp DESC LIMIT 2;
   FlightNum   |OriginCityName | DestCityName  |  DestWeather  | OriginWeather 
---------------+---------------+---------------+---------------+---------------
GDZWNB0        |London         |Shanghai       |Rain           |Clear          
GDZWNB0        |London         |Shanghai       |Rain           |Clear  

JOINs

傳統RDBMS SQL實現中的JOIN允許通過單獨的表格響應中的相關列來合併不同的表格。 與Elasticsearch本地可用的選項相比,這允許數據的關係建模,並且代表了一個重要的主題。 儘管Elasticsearch SQL當前不支持JOIN運算符,但它確實允許用戶利用嵌套文檔,該文檔提供了一對多的簡單關係建模。 嵌套文檔的查詢對用戶是透明的。 爲了演示此功能,我們需要一個包含此類數據的索引。 該索引的文檔代表電子商務網站的訂單,幷包含諸如order_date,billing_city和customer_last_name之類的字段。 此外,“產品”字段包含訂單中每個產品的嵌套子文檔。 爲了加載這個文檔,我們安裝之前文章“Elasticsearch:Elasticsearch SQL介紹及實例”中介紹的那樣,只不過這次我們加載的是eCommerce的數據:

一旦數據加載完畢,我們可以在Kibana中找到一個叫做kibana_sample_data_ecommerce的索引。它的文檔的一個例子:

{
  "category" : [
    "Men's Clothing"
  ],
  "currency" : "EUR",
  "customer_first_name" : "Eddie",
  "customer_full_name" : "Eddie Underwood",
  "customer_gender" : "MALE",
  "customer_id" : 38,
  "customer_last_name" : "Underwood",
  "customer_phone" : "",
  "day_of_week" : "Monday",
  "day_of_week_i" : 0,
  "email" : "[email protected]",
  "manufacturer" : [
    "Elitelligence",
    "Oceanavigations"
  ],
  "order_date" : "2020-05-04T09:28:48+00:00",
  "order_id" : 584677,
  "products" : [
    {
      "base_price" : 11.99,
      "discount_percentage" : 0,
      "quantity" : 1,
      "manufacturer" : "Elitelligence",
      "tax_amount" : 0,
      "product_id" : 6283,
      "category" : "Men's Clothing",
      "sku" : "ZO0549605496",
      "taxless_price" : 11.99,
      "unit_discount_amount" : 0,
      "min_price" : 6.35,
      "_id" : "sold_product_584677_6283",
      "discount_amount" : 0,
      "created_on" : "2016-12-26T09:28:48+00:00",
      "product_name" : "Basic T-shirt - dark blue/white",
      "price" : 11.99,
      "taxful_price" : 11.99,
      "base_unit_price" : 11.99
    },
    {
      "base_price" : 24.99,
      "discount_percentage" : 0,
      "quantity" : 1,
      "manufacturer" : "Oceanavigations",
      "tax_amount" : 0,
      "product_id" : 19400,
      "category" : "Men's Clothing",
      "sku" : "ZO0299602996",
      "taxless_price" : 24.99,
      "unit_discount_amount" : 0,
      "min_price" : 11.75,
      "_id" : "sold_product_584677_19400",
      "discount_amount" : 0,
      "created_on" : "2016-12-26T09:28:48+00:00",
      "product_name" : "Sweatshirt - grey multicolor",
      "price" : 24.99,
      "taxful_price" : 24.99,
      "base_unit_price" : 24.99
    }
  ],
  "sku" : [
    "ZO0549605496",
    "ZO0299602996"
  ],
  "taxful_total_price" : 36.98,
  "taxless_total_price" : 36.98,
  "total_quantity" : 2,
  "total_unique_products" : 2,
  "type" : "order",
  "user" : "eddie",
  "geoip" : {
    "country_iso_code" : "EG",
    "location" : {
      "lon" : 31.3,
      "lat" : 30.1
    },
    "region_name" : "Cairo Governorate",
    "continent_name" : "Africa",
    "city_name" : "Cairo"
  }
}

通常,查詢這些文檔將要求用戶理解爲什麼我們要對產品字段使用嵌套的數據類型,並且還要了解嵌套的查詢語法。 但是,通過Elasticsearch SQL,我們能夠查詢這些嵌套文檔,就好像每個嵌套文檔都使用其父級字段代表一個單獨的行一樣(即,我們有效地扁平化了表示結構)。 考慮上面有兩個產品的訂單。 當從產品子文檔中請求字段時,查詢時將其顯示爲兩行。 如果需要,每一行還可以包含父訂單的字段。 例如:

查找航班584677所使用的帳單名稱和購買的產品。

如果我們查看一下kibana_sample_data_ecommerce,我們發現這個索引的products字段並不是我們想象的nested類型。爲此,我們需要重新定義它的mapping:

PUT orders
{
  "mappings": {
    "properties": {
      "category": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "currency": {
        "type": "keyword"
      },
      "customer_birth_date": {
        "type": "date"
      },
      "customer_first_name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "customer_full_name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "customer_gender": {
        "type": "keyword"
      },
      "customer_id": {
        "type": "keyword"
      },
      "customer_last_name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "customer_phone": {
        "type": "keyword"
      },
      "day_of_week": {
        "type": "keyword"
      },
      "day_of_week_i": {
        "type": "integer"
      },
      "email": {
        "type": "keyword"
      },
      "geoip": {
        "properties": {
          "city_name": {
            "type": "keyword"
          },
          "continent_name": {
            "type": "keyword"
          },
          "country_iso_code": {
            "type": "keyword"
          },
          "location": {
            "type": "geo_point"
          },
          "region_name": {
            "type": "keyword"
          }
        }
      },
      "manufacturer": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "order_date": {
        "type": "date"
      },
      "order_id": {
        "type": "keyword"
      },
      "products": {
        "type": "nested",
        "properties": {
          "_id": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "base_price": {
            "type": "half_float"
          },
          "base_unit_price": {
            "type": "half_float"
          },
          "category": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          },
          "created_on": {
            "type": "date"
          },
          "discount_amount": {
            "type": "half_float"
          },
          "discount_percentage": {
            "type": "half_float"
          },
          "manufacturer": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          },
          "min_price": {
            "type": "half_float"
          },
          "price": {
            "type": "half_float"
          },
          "product_id": {
            "type": "long"
          },
          "product_name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            },
            "analyzer": "english"
          },
          "quantity": {
            "type": "integer"
          },
          "sku": {
            "type": "keyword"
          },
          "tax_amount": {
            "type": "half_float"
          },
          "taxful_price": {
            "type": "half_float"
          },
          "taxless_price": {
            "type": "half_float"
          },
          "unit_discount_amount": {
            "type": "half_float"
          }
        }
      },
      "sku": {
        "type": "keyword"
      },
      "taxful_total_price": {
        "type": "half_float"
      },
      "taxless_total_price": {
        "type": "half_float"
      },
      "total_quantity": {
        "type": "integer"
      },
      "total_unique_products": {
        "type": "integer"
      },
      "type": {
        "type": "keyword"
      },
      "user": {
        "type": "keyword"
      }
    }
  }
}  

在上面,我們對原有的mapping做了如下的修改:

      "products": {
        "type": "nested",
        "properties": {
          "_id": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "base_price": {
            "type": "half_float"
          },
          "base_unit_price": {
            "type": "half_float"
          },
          "category": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          },
          "created_on": {
            "type": "date"
          },
          "discount_amount": {
            "type": "half_float"
          },
          "discount_percentage": {
            "type": "half_float"
          },
          "manufacturer": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          },
          "min_price": {
            "type": "half_float"
          },
          "price": {
            "type": "half_float"
          },
          "product_id": {
            "type": "long"
          },
          "product_name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            },
            "analyzer": "english"
          },
          "quantity": {
            "type": "integer"
          },
          "sku": {
            "type": "keyword"
          },
          "tax_amount": {
            "type": "half_float"
          },
          "taxful_price": {
            "type": "half_float"
          },
          "taxless_price": {
            "type": "half_float"
          },
          "unit_discount_amount": {
            "type": "half_float"
          }
        }
      }

在上面我加入瞭如下的一句:

     "type": "nested",

這樣我們把products這個字段設置爲nested數據類型。如果大家對nested數據類型還是不太清楚的話,請參閱我之前的文字“Elasticsearch: nested對象”。我們使用如下命令來做reindex:

POST  _reindex
{
  "source": {
    "index": "kibana_sample_data_ecommerce"
  },
  "dest": {
    "index": "orders"
  }
}

我們通過如下的方式來繼續查詢:

sql> SELECT customer_last_name, customer_first_name, products.price, products.product_id FROM orders WHERE order_id=584677;
customer_last_name|customer_first_name|  products.price  |products.product_id
------------------+-------------------+------------------+-------------------
Underwood         |Eddie              |11.989999771118164|6283               
Underwood         |Eddie              |24.989999771118164|19400 

_translate API將顯示如何使用嵌套查詢構造此查詢:

POST /_sql/translate
{
  "query": """
     SELECT customer_last_name, customer_first_name, products.price, products.product_id FROM orders WHERE order_id=584677
  """
}

上面的顯示結果是:

{
  "size" : 1000,
  "query" : {
    "bool" : {
      "must" : [
        {
          "term" : {
            "order_id" : {
              "value" : 584677,
              "boost" : 1.0
            }
          }
        },
        {
          "nested" : {
            "query" : {
              "match_all" : {
                "boost" : 1.0
              }
            },
            "path" : "products",
            "ignore_unmapped" : false,
            "score_mode" : "none",
            "boost" : 1.0,
            "inner_hits" : {
              "name" : "products_1",
              "ignore_unmapped" : false,
              "from" : 0,
              "size" : 99,
              "version" : false,
              "seq_no_primary_term" : false,
              "explain" : false,
              "track_scores" : false,
              "_source" : {
                "includes" : [
                  "products.product_id",
                  "products.price"
                ],
                "excludes" : [ ]
              }
            }
          }
        }
      ],
      "adjust_pure_negative" : true,
      "boost" : 1.0
    }
  },
  "_source" : {
    "includes" : [
      "customer_last_name",
      "customer_first_name"
    ],
    "excludes" : [ ]
  },
  "sort" : [
    {
      "_doc" : {
        "order" : "asc"
      }
    }
  ]
}

相反,如果僅查詢父字段,則僅顯示一行:

查找航班用於訂單584677的帳單名稱

sql> SELECT customer_last_name, customer_first_name FROM orders WHERE order_id=584677;
customer_last_name|customer_first_name
------------------+-------------------
Underwood         |Eddie     

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章