ElasticSearch配置詳解

elasticsearch的config文件夾裏面有兩個配置文件：elasticsearch.yml和logging.yml，第一個是es的基本配置文件，第二個是日誌配置文件，es也是使用log4j來記錄日誌的，所以logging.yml裏的設置按普通log4j配置文件來設置就行了。下面主要講解下elasticsearch.yml這個文件中可配置的東西。

配置es的集羣名稱，默認是elasticsearch，es會自動發現在同一網段下的es，如果在同一網段下有多個集羣，就可以用這個屬性來區分不同的集羣。

Java代碼

cluster.name: elasticsearch

節點名，默認隨機指定一個name列表中名字，該列表在es的jar包中config文件夾裏name.txt文件中，其中有很多作者添加的有趣名字。

Java代碼

node.name: "Franz Kafka"

指定該節點是否有資格被選舉成爲node，默認是true，es是默認集羣中的第一臺機器爲master，如果這臺機掛了就會重新選舉master。

Java代碼

node.master: true

指定該節點是否存儲索引數據，默認爲true。

Java代碼

node.data: true

設置默認索引分片個數，默認爲5片。

Java代碼

index.number_of_shards: 5

設置默認索引副本個數，默認爲1個副本。

Java代碼

index.number_of_replicas: 1

設置配置文件的存儲路徑，默認是es根目錄下的config文件夾。

Java代碼

path.conf: /path/to/conf

設置索引數據的存儲路徑，默認是es根目錄下的data文件夾

Java代碼

path.data: /path/to/data

可以設置多個存儲路徑，用逗號隔開，例：

Java代碼

path.data: /path/to/data1,/path/to/data2

設置臨時文件的存儲路徑，默認是es根目錄下的work文件夾。

Java代碼

path.work: /path/to/work

設置日誌文件的存儲路徑，默認是es根目錄下的logs文件夾

Java代碼

path.logs: /path/to/logs

設置插件的存放路徑，默認是es根目錄下的plugins文件夾

Java代碼

path.plugins: /path/to/plugins

強制所有內存鎖定，不要搞什麼swap的來影響性能
設置爲true來鎖住內存。因爲當jvm開始swapping時es的效率會降低，所以要保證它不swap，可以把ES_MIN_MEM和ES_MAX_MEM兩個環境變量設置成同一個值，並且保證機器有足夠的內存分配給es。同時也要允許elasticsearch的進程可以鎖住內存，linux下可以通過`ulimit -l unlimited`命令。

Java代碼

bootstrap.mlockall: true

設置綁定的ip地址，可以是ipv4或ipv6的，默認爲0.0.0.0。

Java代碼

network.bind_host: 192.168.0.1

設置其它節點和該節點交互的ip地址，如果不設置它會自動判斷，值必須是個真實的ip地址。

Java代碼

network.publish_host: 192.168.0.1

這個參數是用來同時設置bind_host和publish_host上面兩個參數。

Java代碼

network.host: 192.168.0.1

設置節點間交互的tcp端口，默認是9300。

Java代碼

transport.tcp.port: 9300

設置是否壓縮tcp傳輸時的數據，默認爲false，不壓縮。

Java代碼

transport.tcp.compress: true

設置對外服務的http端口，默認爲9200。

Java代碼

http.port: 9200

設置內容的最大容量，默認100mb

Java代碼

http.max_content_length: 100mb

是否使用http協議對外提供服務，默認爲true，開啓。

Java代碼

http.enabled: false

網絡配置

Java代碼

#network.tcp.keep_alive : true
#network.tcp.send_buffer_size : 8192
#network.tcp.receive_buffer_size : 8192

自動發現相關配置

Java代碼

#discovery.zen.fd.connect_on_network_disconnect : true
#discovery.zen.initial_ping_timeout : 10s
#discovery.zen.fd.ping_interval : 2s
#discovery.zen.fd.ping_retries : 10

The gateway snapshot interval (only applies to shared gateways).

Java代碼

#index.gateway.snapshot_interval : 1s

分片異步刷新時間間隔

Java代碼

#index.refresh_interval : -1

Set to an actual value (like 0-all) or false to disable it.

Java代碼

index.auto_expand_replicas

Set to true to have the index read only. false to allow writes and metadata changes.

Java代碼

index.blocks.read_only

Set to true to disable read operations against the index.

Java代碼

index.blocks.read

Set to true to disable write operations against the index.

Java代碼

index.blocks.write

Set to true to disable metadata operations against the index.

Java代碼

index.blocks.metadata

Lucene index term間隔，僅用於新創建的doc

Java代碼

index.term_index_interval

Lucene reader term index divisor

Java代碼

index.term_index_divisor

When to flush based on operations.

Java代碼

index.translog.flush_threshold_ops

When to flush based on translog (bytes) size.

Java代碼

index.translog.flush_threshold_size

When to flush based on a period of not flushing.

Java代碼

index.translog.flush_threshold_period

Disables flushing. Note, should be set for a short interval and then enabled.

Java代碼

index.translog.disable_flush

The maximum size of filter cache (per segment in shard). Set to -1 to disable.

Java代碼

index.cache.filter.max_size

The expire after access time for filter cache. Set to -1 to disable.

Java代碼

index.cache.filter.expire

merge policy
All the settings for the merge policy currently configured. A different merge policy can’t be set.

A node matching any rule will be allowed to host shards from the index.

Java代碼

index.routing.allocation.include.*

A node matching any rule will NOT be allowed to host shards from the index.

Java代碼

index.routing.allocation.exclude.*

Only nodes matching all rules will be allowed to host shards from the index.

Java代碼

index.routing.allocation.require.*

Controls the total number of shards allowed to be allocated on a single node. Defaults to unbounded (-1).

Java代碼

index.routing.allocation.total_shards_per_node

When using local gateway a particular shard is recovered only if there can be allocated quorum shards in the cluster. It can be set to quorum (default), quorum-1 (or half), full and full-1. Number values are also supported, e.g. 1.

Java代碼

index.recovery.initial_shards

Disables temporarily the purge of expired docs.

Java代碼

index.ttl.disable_purge

默認索引合併因子

Java代碼

#index.merge.policy.merge_factor : 100
#index.merge.policy.min_merge_docs : 1000
#index.merge.policy.use_compound_file : true
#indices.memory.index_buffer_size : 5%

Gateway相關配置
當集羣期望節點達不到的時候，集羣就會處於block，無法正常索引和查詢，說明集羣中某個節點未能正常啓動，這正是我們期望的效果，block住，避免照成數據的不一致。
gateway的類型，默認爲local即爲本地文件系統，可以設置爲本地文件系統，分佈式文件系統，hadoop的HDFS，和amazon的s3服務器，其它文件系統的設置方法下次再詳細說。

Java代碼

gateway.type: local

設置集羣中N個節點啓動時進行數據恢復，默認爲1。

Java代碼

gateway.recover_after_nodes: 1

設置初始化數據恢復進程的超時時間，默認是5分鐘。

Java代碼

gateway.recover_after_time: 5m

設置這個集羣中節點的數量，默認爲2，一旦這N個節點啓動，就會立即進行數據恢復。

Java代碼

gateway.expected_nodes: 2

初始化數據恢復時，併發恢復線程的個數，默認爲4。

Java代碼

cluster.routing.allocation.node_initial_primaries_recoveries: 4

添加刪除節點或負載均衡時併發恢復線程的個數，默認爲4。

Java代碼

cluster.routing.allocation.node_concurrent_recoveries: 2

設置數據恢復時限制的帶寬，如入100mb，默認爲0，即無限制。

Java代碼

indices.recovery.max_size_per_sec: 0

設置這個參數來限制從其它分片恢復數據時最大同時打開併發流的個數，默認爲5。

Java代碼

indices.recovery.concurrent_streams: 5

設置這個參數來保證集羣中的節點可以知道其它N個有master資格的節點。默認爲1，對於大的集羣來說，可以設置大一點的值（2-4）。

Java代碼

discovery.zen.minimum_master_nodes: 1

設置集羣中自動發現其它節點時ping連接超時時間，默認爲3秒，對於比較差的網絡環境可以高點的值來防止自動發現時出錯。

Java代碼

discovery.zen.ping.timeout: 3s

Java代碼

discovery.zen.ping.multicast.enabled: false

設置是否打開多播發現節點，默認是true。
當禁用multcast廣播的時候，可以手動設置集羣的節點ip

設置集羣中master節點的初始列表，可以通過這些節點來自動發現新加入集羣的節點。

Java代碼

discovery.zen.ping.unicast.hosts: ["host1", "host2:port", "host3[portX-portY]"]

下面是一些查詢時的慢日誌參數設置

Java代碼

index.search.slowlog.level: TRACE
index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms
index.search.slowlog.threshold.fetch.warn: 1s
index.search.slowlog.threshold.fetch.info: 800ms
index.search.slowlog.threshold.fetch.debug:500ms
index.search.slowlog.threshold.fetch.trace: 200ms

1.設置cache大小和過期時間。

Java代碼

index.cache.field.max_size
index.cache.field.expire

例如設置：
//index中每個segment中可包含的最大的entries數目

Java代碼

index.cache.field.max_size: 50000

//過期時間爲10分鐘

Java代碼

index.cache.field.expire: 10m

2.改變cache類型。

Java代碼

index.cache.field.type: soft

默認類型爲resident，字面意思是常駐（居民），一直增加，直到內存耗盡。改爲soft就是當內存不足的時候，先clear掉佔用的，然後再往內存中放。設置爲soft後，相當於設置成了相對的內存大小。resident的話，除非內存夠大。

3.對數據進行處理。
文章中提到的是減小字段值長度，如將大寫轉成小寫。
這點上，實際中可能將數據精煉。當然，也可以把要做facet的字段做一個轉化，用int型代替。
關於string轉化int呢，可以參考M大神的: https://github.com/medcl/elasticsearch-analysis-string2int

ElasticSearch配置詳解

linux的vi命令和查找命令

kafka學習七：kafka 運維

ElasticSearch學習一：搜索實例演示(Java API)

ElasticSearch學習五：實例展示elasticsearch集羣生態,分片以及水平擴展.

RESTful WebService入門

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結