關於elasticsearch的mapping簡介

最近在玩elk，發現許多不瞭解的東西，雖然網上資料很多，但基本都亂且雜，通俗易懂的資料就顯得尤爲珍貴。

網上發現這篇文章不錯，以通俗易懂的語言講明白了mapping的概念。

默認mapping

Elasticsearch（簡稱ES）是一個schema-less的系統，但並不代表no shema，當我們執行以下命令：

curl -XPUT http://localhost:9200/test/item/1 -d '{"name":"zach", "description": "A Pretty cool guy."}'

ES能非常聰明的識別出"name"和"description"字段的類型是string， ES默認會創建以下的mapping。

curl -XPUT 'http://localhost:9200/test/_mapping'

mappings: {  
    item: {  
        properties: {  
            description: {  
                type: text  
            }  
            name: {  
                type: text  
            }  
        }  
    }  
}

什麼是mapping

ES的mapping非常類似於靜態語言中的數據類型：聲明一個變量爲int類型的變量，以後這個變量都只能存儲int類型的數據。同樣的，一個number類型的mapping字段只能存儲number類型的數據。

同語言的數據類型相比，mapping還有一些其他的含義，mapping不僅告訴ES一個field中是什麼類型的值，它還告訴ES如何索引數據以及數據是否能被搜索到。

當你的查詢沒有返回相應的數據，你的mapping很有可能有問題。當你拿不準的時候，直接檢查你的mapping。

剖析mapping

一個mapping由一個或多個analyzer組成，一個analyzer又由一個或多個filter組成的。當ES索引文檔的時候，它把字段中的內容傳遞給相應的analyzer，analyzer再傳遞給各自的filters。

filter的功能很容易理解：一個filter就是一個轉換數據的方法，輸入一個字符串，這個方法返回另一個字符串，比如一個將字符串轉爲小寫的方法就是一個filter很好的例子。

一個analyzer由一組順序排列的filter組成，執行分析的過程就是按順序一個filter一個filter依次調用， ES存儲和索引最後得到的結果。

總結來說， mapping的作用就是執行一系列的指令將輸入的數據轉成可搜索的索引項。

默認analyzer

回到我們的例子， ES猜測description字段是string類型，於是默認創建一個string類型的mapping，它使用默認的全局analyzer，默認的analyzer是標準analyzer, 這個標準analyzer有三個filter：token filter, lowercase filter和stop token filter。

我們可以在做查詢的時候鍵入_analyze關鍵字查看分析的過程。使用以下指令查看description字段的轉換過程：

curl -X GET "http://localhost:9200/test/_analyze?analyzer=standard&pretty=true" -d "A Pretty cool guy."  

{
  "tokens" : [
    {
      "token" : "a",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "pretty",
      "start_offset" : 2,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "cool",
      "start_offset" : 9,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "guy",
      "start_offset" : 14,
      "end_offset" : 17,
      "type" : "<ALPHANUM>",
      "position" : 3
    }
  ]
}

可以看到，我們的description字段的值轉換成了[a],[pretty], [cool], [guy]，在轉換過程中‘標點符號’都被filter過濾掉了， A、Pretty也轉成了全小寫的a、pretty，這裏比較重要的是，即使ES存儲數據的時候仍然存儲的是完整的數據，但是可以搜索到這條數據的關鍵字只剩下這四個單詞了，其他的都是拋棄掉了。

看看以單詞a來搜索的結果：

curl -X GET "http://localhost:9200/test/_search?pretty=true" -d '{  
    "query" : {  
        "term" : { "description": "a" }  
    }  
}'  


{
  "took" : 14,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "item",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "zach",
          "description" : "A Pretty cool guy."
        }
      }
    ]
  }
}

這是一個公認的簡單例子，但是它描述了ES是如何工作的，不要把mapping想成是數據類型，把它想象成是搜索數據的指令集合。

轉自http://blog.csdn.net/lvhong84/article/details/23936697

裏面貌似是es1.x或2.x版本，內容稍有修改：

1、說是A會被過濾掉，且查詢時無法查出

2、查詢時用的字段爲 text，而不是 term

關於elasticsearch的mapping簡介

詐騙（殺豬盤）網站進行滲透測試

Python 潮流週刊#50：我最喜歡的 Python 3.13 新特性！

外行也能讀懂的網絡硬件設備功能原理速成

python_day9線程、進程和協程

python_day10のPython操作 RabbitMQ、Redis、Memcache

python_day11のPython操作 pymysql && SQLAchemy

Error: Cannot retrieve metalink for repository: epel. Please verify its path and try again

Web開發（進階）- Web框架本質

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結