Elasticsearch經驗總結（持續補充）

重要
我的博客從今天起開始陸續遷移到
http://vearne.cc
敬請關注
本文新地址
http://vearne.cc/archives/65

起因：

ES在筆者所在的公司使用也有3年多了，集羣的規模達到上百臺，期間也有很多的經驗，我這裏總結出來分享給大家，技術水平有限，如有錯誤請指正。

事項：

這些事項，我把它們以問題的形式列出，並會持續補充

1. 關於shard大小的分配

ES的shard是在index創建好時，就已經分配了，所以shard數量的選擇非常重要，根據經驗shard的大小在10GB ~ 20GB 較爲合適。選擇這個大小的原因如下
1）ES是通過移動shard來實現負載均衡，如果shard過大移動會非常緩慢
2）另外每個shard相當於一個lucene實例，lucene實例也對應着一組Java線程，所以shard數也不應該過多

2. 關於index的命名設計

如果數據是隨着時間增長的，可以選擇按月，或者按天分庫
index的命名可以是
index_201701、index_201702、index_201703
或
index_20170301、index_20170302、index_20170303
然後可以爲他們指定別名index_2017，這樣可以直接使用這個別名查詢所有index庫
另外ES的庫是可以關閉的，關閉以後，不佔內存空間，只消耗硬盤空間

3. SSD OR 機械硬盤？

Elasticsearch的速度有賴於索引，大量的索引是以文件的形式存儲在硬盤上的，如果你的數據量較大，且單次的查詢或聚合量較大，那麼應該使用SSD，據我們的測試表明，再查詢的數據量較大的情況下，
使用SSD的ES速度是機械硬盤的ES速度的10倍，官方說法在正確配置的情況下，SSD的寫入速度是機械硬盤的500倍

給一個參考值
數據單條記錄1kB
操作系統Centos 6.7
內存64G
ES版本2.3 ，堆內存31GB
單個ES data node處理能力

機械硬盤	SSD
1w/min	10w/min

見參考資料[1]

If you are using SSDs, make sure your OS I/O scheduler is configured correctly. When you write data to disk, the I/O scheduler decides when that data is actually sent to the disk. The default under most *nix distributions is a scheduler called cfq (Completely Fair Queuing).

This scheduler allocates time slices to each process, and then optimizes the delivery of these various queues to the disk. It is optimized for spinning media: the nature of rotating platters means it is more efficient to write data to disk based on physical layout.

This is inefficient for SSD, however, since there are no spinning platters involved. Instead, deadline or noop should be used instead. The deadline scheduler optimizes based on how long writes have been pending, while noop is just a simple FIFO queue.

This simple change can have dramatic impacts. We’ve seen a 500-fold improvement to write throughput just by using the correct scheduler.

4. 版本問題

請確保Java版本在1.8以上，ES 5.x 比早期的版本性能有較大提升。

5. ES實例的堆大小的設定

ES的官方建議是將內存的一半大小作爲ES的堆大小，並且對內存大小不要超過32GB（實際只能到31GB左右）。
對於32GB的內存而言，只需要32-bits的指針，而對內存再大的話，就需要更長的指針。官方說法31GB的效果相當於40GB的效果
對於大內存的機器，可以部署多個ES實例。

實踐經驗表明，64GB內存的機器，ES實例堆的大小可以設到31GB左右，96GB內存的機器，ES實例堆的大小可以設到64GB

檢查堆內存設置到多大，是否能夠開啓指針壓縮技術

java -Xmx32766m -XX:+PrintFlagsFinal 2> /dev/null | grep UseCompressedOops

如上，表示如果最大堆內存設爲32766MB，jvm是否會開啓指針壓縮

詳見參考資料[2]

6. 參與選主的機器，不要設定的過多

1）在ES中，只有能夠參與選主的ES實例（master-eligible node），才能被選爲Master節點，某個實例必須收到超過半數投票人的投票，才能當選爲master節點
經驗表明，參與選主的機器過多，集羣會變得非常不穩定
正如人類社會的代議制一樣，如果每一個決策都需要全體國民決定，那這個決策過程，會變得非常低效。
2）另外參與選主的ES實例不要存放數據，也不作爲client

By default a node is a master-eligible node and a data node, plus it can pre-process documents through ingest pipelines. This is very convenient for small clusters but, as the cluster grows, it becomes important to consider separating dedicated master-eligible nodes from dedicated data nodes.

從實踐經驗看，在集羣中，挑選3個實例參與選主即可，堆內存可設爲16GB。可以與其他ES實例混部。
見參考資料[3]

7. HugePage引發的問題

在我們的集羣運行在centos6上，有段時間，我們密集的導入一批數據，觀察部分節點的負載在集羣中顯得十分突兀，影響了整體的吞能力，結果發現是centos默認開啓了HugePage，導致cpu_sys 過高
可用以下命令關閉THP特性

echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag

注意： 該配置重啓後會失效

Elasticsearch經驗總結（持續補充）

起因：

事項：

1. 關於shard大小的分配

2. 關於index的命名設計

3. SSD OR 機械硬盤？

4. 版本問題

5. ES實例的堆大小的設定

6. 參與選主的機器，不要設定的過多

7. HugePage引發的問題

爲什麼要⽤ Foundry

【筆記】動手學深度學習-預備知識

py發送email

MySQL 分庫分表方案，總結太全了。。

Qt/C++音視頻開發71-指定mjpeg/h264格式採集本地攝像頭/存儲文件到mp4/設備推流/採集推流

WPF開源輕便、快速的桌面啓動器

公司來了個新同事，把 DDD 運用得爐火純青！

我的監控世界觀(5)--如何在監控中反映業務場景

做了個工具類的小網站---tool.admaster.club

java newFixedThreadPool 報錯

2016年在讀的書

我在數據庫方面踩過的"坑"

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

Elasticsearch經驗總結（持續補充）

起因：

事項：

1. 關於shard大小的分配

2. 關於index的命名設計

3. SSD OR 機械硬盤？

4. 版本問題

5. ES實例的 堆大小的設定

6. 參與選主的機器，不要設定的過多

7. HugePage引發的問題

5. ES實例的堆大小的設定