docker 安裝、配置、驗證ElasticAlert
created by fangchangtan | 2020/2/24
1.elastalert的場景用途
elastalert組件作爲elk中日誌關鍵詞的告警組件。基本的流程是,通過elk日誌獲取程序發出的不間斷的心跳、錯誤日誌關鍵詞ERROR抓取等 ,獲得對程序的健康狀態和穩定性的監控告警。
2.安裝elastalert
2.1 下載git倉庫文件
## git拉去文件
git clone https://github.com/bitsensor/elastalert.git
##切換目錄
cd elastalert
2.2在本地測試elastalert的docker安裝:
需要切換到elastalert目錄下面,(官方建議的安裝方式)
#啓動elastalert容器
sudo docker run --rm -p 3030:3030 \
-v `pwd`/config/elastalert.yaml:/opt/elastalert/config.yaml \
-v `pwd`/config/elastalert-test.yaml:/opt/elastalert/config-test.yaml \
-v `pwd`/config/config.json:/opt/elastalert-server/config/config.json \
-v `pwd`/rules:/opt/elastalert/rules \
-v `pwd`/rule_templates:/opt/elastalert/rule_templates \
--net="host" \
--name elastalert-fct2 bitsensor/elastalert:2.0.0
或者,正式的安裝方式(建議方式):
#正式環境,啓動elastalert
docker run --rm \
--name fct-elastalert \
--net "host" \
-p 3030:3030 \
-v /data/poc/trial-production/myelastalert/elastalert/config/elastalert.yaml:/opt/elastalert/config.yaml \
-v /data/poc/trial-production/myelastalert/elastalert/config/config.json:/opt/elastalert-server/config/config.json \
-v /data/poc/trial-production/myelastalert/elastalert/rules:/opt/elastalert/rules \
-v /data/poc/trial-production/myelastalert/elastalert/rule_templates:/opt/elastalert/rule_templates \
-v /data/poc/trial-production/myelastalert/elastalert/config/smtp_auth.yaml:/opt/elastalert/config/smtp_auth.yaml \
-v /data/poc/trial-production/myelastalert/elastalert/server_data:/opt/elastalert/server_data \
-v /data/poc/trial-production/myelastalert/elastalert/logs:/opt/logs \
bitsensor/elastalert:2.0.0
2.3 配置elastalert的配置文件
其中config.conf文件,主要配置需要連接的es地址,規則rule和rul_templates的路徑,要寫入的es的index的名稱;
{
"appName": "elastalert-server",
"port": 3030,
"wsport": 3333,
"elastalertPath": "/opt/elastalert",
"verbose": false,
"es_debug": false,
"debug": false,
"rulesPath": {
"relative": true,
"path": "/rules"
},
"templatesPath": {
"relative": true,
"path": "/rule_templates"
},
"es_host": "172.19.32.106",
"es_port": 9202,
"writeback_index": "elastalert_status"
}
其中,elastalert.yaml的配置如下
# The elasticsearch hostname for metadata writeback
# Note that every rule can have its own elasticsearch host
es_host: 172.19.32.106
# The elasticsearch port
es_port: 9202
# This is the folder that contains the rule yaml files
# Any .yaml file will be loaded as a rule
rules_folder: rules
# How often ElastAlert will query elasticsearch
# The unit can be anything from weeks to seconds
run_every:
seconds: 5
# ElastAlert will buffer results from the most recent
# period of time, in case some log sources are not in real time
buffer_time:
minutes: 1
# Optional URL prefix for elasticsearch
#es_url_prefix: elasticsearch
# Connect with TLS to elasticsearch
#use_ssl: True
use_ssl: False
# Verify TLS certificates
#verify_certs: True
verify_certs: False
# GET request with body is the default option for Elasticsearch.
# If it fails for some reason, you can pass 'GET', 'POST' or 'source'.
# See http://elasticsearch-py.readthedocs.io/en/master/connection.html?highlight=send_get_body_as#transport
# for details
#es_send_get_body_as: GET
# Option basic-auth username and password for elasticsearch
#es_username: someusername
#es_password: somepassword
# The index on es_host which is used for metadata storage
# This can be a unmapped index, but it is recommended that you run
# elastalert-create-index to set a mapping
writeback_index: elastalert_status
# If an alert fails for some reason, ElastAlert will retry
# sending the alert until this time period has elapsed
alert_time_limit:
days: 2
其次還有一個elastalert-test.yaml文件,該配置只是用來當你使用API來測試規則的時候,這個配置文件可以使你在爲不同的示例測試不同的規則時候,可以寫不同的寫回索引;
elastalert.yaml文件中的smtp_auth.yaml文件配置,
user: [email protected]
password: sdwtyx234
然後,配置elastalert中的告警規則, 掃描es制定索引中的最近1min中,滿足查詢過濾條件日誌的消息數量》5時候,直接發送郵件到[email protected]報警;
如下,是/rules/tank-rules.yaml的elastalert的配置規則文件。
es_host: 172.19.32.106
es_port: 9202
#rule name 必須是獨一的,不然會報錯,這個定義完成之後,會成爲報警郵件的標題
## (Required)
## Rule name, must be unique
name: fct-test-rule-name
#配置一種數據驗證的方式,有 any,blacklist,whitelist,change,frequency,spike,flatline,new_term,cardinality
#any:只要有匹配就報警;
#blacklist:compare_key字段的內容匹配上 blacklist數組裏任意內容;
#whitelist:compare_key字段的內容一個都沒能匹配上whitelist數組裏內容;
#change:在相同query_key條件下,compare_key字段的內容,在 timeframe範圍內 發送變化;
#frequency:在相同 query_key條件下,timeframe 範圍內有num_events個被過濾出 來的異常;
#spike:在相同query_key條件下,前後兩個timeframe範圍內數據量相差比例超過spike_height。其中可以通過spike_type設置具體漲跌方向是- up,down,both 。還可以通過threshold_ref設置要求上一個週期數據量的下限,threshold_cur設置要求當前週期數據量的下限,如果數據量不到下限,也不觸發;
#flatline:timeframe 範圍內,數據量小於threshold 閾值;
#new_term:fields字段新出現之前terms_window_size(默認30天)範圍內最多的terms_size (默認50)個結果以外的數據;
#cardinality:在相同 query_key條件下,timeframe範圍內cardinality_field的值超過 max_cardinality 或者低於min_cardinality
## (Required)
## Type of alert.
## the frequency rule type alerts when num_events events occur with timeframe time
##我配置的是frequency,這個需要兩個條件滿足,在相同 query_key條件下,timeframe 範圍內有num_events個被過濾出來的異常
type: frequency
#這個index 是指再kibana 裏邊的index,支持正則匹配,支持多個index,同時如果嫌麻煩直接* 也可以。
## (Required)
## Index to search, wildcard supported
index: fct-logstash*
# 只要1最近1min內,有一條事件滿足條件,就滿足規則,出發報警
num_events: 1
timeframe:
minutes: 1
#這個還是非常關鍵的地方,就是你希望程序的message裏邊出現了什麼樣的關鍵字就報警,這個其實就是elasticsearch 的query語句,支持 AND&OR等。
filter:
- query:
query_string:
query: "UNKNOWN"
#在郵件正文會顯示你定義的alert_text
alert_text: "你好,請回復郵件,方昌坦"
# Setup report smtp config
smtp_host: smtp.163.com
smtp_port: 25
smtp_ssl: False
#SMTP auth
from_addr: [email protected]
email_reply_to: [email protected]
smtp_auth_file: /opt/elastalert/config/smtp_auth.yaml
# (Required)
# # The alert is use when a match is found
alert:
- "email"
# (required, email specific)
# # a list of email addresses to send alerts to
email:
- "[email protected]"
注意: 此處需要註冊163郵箱,並開通smtp協議:
郵箱賬號:[email protected]
郵箱密碼:221123.com
smtp協議密碼:swtx234
其中smtp協議可以允許第三方用戶登錄訪問該郵箱。需要163郵箱開通smtp協議,在163郵箱設置中設置;
2.4 重啓elastalert使得配置生效
最後重新啓elastalert,是的剛纔的新配置生效;
本地測試106主機上,運行elastalert的命令如下:
docker run --rm \
--name fct-elastalert \
--net "host" \
-p 3030:3030 \
-v /data/poc/trial-production/myelastalert/elastalert/config/elastalert.yaml:/opt/elastalert/config.yaml \
-v /data/poc/trial-production/myelastalert/elastalert/config/config.json:/opt/elastalert-server/config/config.json \
-v /data/poc/trial-production/myelastalert/elastalert/rules:/opt/elastalert/rules \
-v /data/poc/trial-production/myelastalert/elastalert/rule_templates:/opt/elastalert/rule_templates \
-v /data/poc/trial-production/myelastalert/elastalert/config/smtp_auth.yaml:/opt/elastalert/config/smtp_auth.yaml \
-v /data/poc/trial-production/myelastalert/elastalert/config/smtp_auth.yaml:/opt/elastalert/config/smtp_auth.yaml \
-v /data/poc/trial-production/myelastalert/elastalert/server_data:/opt/elastalert/server_data \
-v /data/poc/trial-production/myelastalert/elastalert/logs:/opt/logs \
bitsensor/elastalert:2.0.0
3.驗證郵件推送功能(本地測試)
3.1 啓動logstash發送測試數據
爲了驗證elastalert的告警效果,需要啓動logstash向es中發送測試數據;
在172.19.32.67上,本地啓動logstash驗證:
用來接收kafka中的日誌數據,並通過logstash過濾之後放鬆到elasticsearch中的fct-logstash_*索引中;
docker run \
--rm \
--name fct-alert-logstash \
-p 5047:5044 \
-v /root/fct/logstash-test/logstash_kafka.conf:/logstash/logstash_kafka.conf \
-v /root/fct/logstash-test/logstash.yml:/usr/share/logstash/config/logstash.yml \
registry.marathon.l4lb.thisdcos.directory:5000/logstash:6.6.1 \
logstash -f /logstash/logstash_kafka.conf
3.2 成功的結果表現
出現如上所示,表明發送郵件成功!
3.3 常見錯誤總結
啓動額elastalert服務的日誌中,可以看到如下錯誤。
3.3.1 錯誤1:無法連接163郵箱服務錯誤。
運行過程提示:(提示郵箱配置不正確),需要配置正確的郵箱連接
15:43:43.085Z INFO elastalert-server: Router: Listening for GET request on /mapping/:index.
15:43:43.085Z INFO elastalert-server: Router: Listening for POST request on /search/:index.
15:43:43.090Z INFO elastalert-server: ProcessController: Starting ElastAlert
15:43:43.090Z INFO elastalert-server: ProcessController: Creating index
15:43:43.980Z INFO elastalert-server:
ProcessController: Elastic Version:6
Mapping used for string:{'type': 'keyword'}
Index elastalert_status already exists. Skipping index creation.
15:43:43.980Z INFO elastalert-server: ProcessController: Index create exited with code 0
15:43:43.981Z INFO elastalert-server: ProcessController: Starting elastalert with arguments [none]
15:43:43.991Z INFO elastalert-server: ProcessController: Started Elastalert (PID: 50)
15:43:43.992Z INFO elastalert-server: Server: Server listening on port 3030
15:43:43.993Z INFO elastalert-server: Server: Websocket listening on port 3333
15:43:43.994Z INFO elastalert-server: Server: Server started
15:44:04.860Z ERROR elastalert-server:
ProcessController: ERROR:root:Error while running alert email: Error connecting to SMTP host: Connection unexpectedly closed
15:48:06.886Z ERROR elastalert-server:
ProcessController: WARNING:elasticsearch:GET http://172.19.32.106:9202/elastalert_status/elastalert/_search?size=10000 [status:400 request:0.012s]
15:48:06.886Z ERROR elastalert-server:
ProcessController: ERROR:root:Error fetching aggregated matches: RequestError(400, u'search_phase_execution_exception', u'parse_exception: Encountered " "-" "- "" at line 1, column 13.\nWas expecting one of:\n <BAREOPER> ...\n "(" ...\n "*" ...\n <QUOTED> ...\n <TERM> ...\n <PREFIXTERM> ...\n <WILDTERM> ...\n <REGEXPTERM> ...\n "[" ...\n "{" ...\n <NUMBER> ...\n ')
15:48:26.972Z ERROR elastalert-server:
ProcessController: ERROR:root:Error while running alert email: Error connecting to SMTP host: Connection unexpectedly closed
出現該錯誤,表示郵箱沒有連接上去;請檢查配置文件是否正確;
3.3.2 錯誤警告2:163郵箱認爲發送了非法內容被攔截,導致發送郵件失敗。
SMTPDataError: (554, 'DT:SPM 163 smtp11,D8CowADn5mq2dFNewkQ5Aw--.52552S3 1582527670,please see http://mail.163.com/help/help_spam_16.htm?ip=58.49.28.162&hostid=smtp11&time=1582527670')
07:01:11.026Z ERROR elastalert-server:
ProcessController: ERROR:root:Uncaught exception running rule fct-Example-rule-name: (554, 'DT:SPM 163 smtp11,D8CowADn5mq2dFNewkQ5Aw--.52552S3 1582527670,please see http://mail.163.com/help/help_spam_16.htm?ip=58.49.28.162&hostid=smtp11&time=1582527670')
其中, •554 DT:SPM 發送的郵件內容包含了未被許可的信息,或被系統識別爲垃圾郵件。請檢查是否有用戶發送病毒或者垃圾郵件;
表明,告警程序將使用網易163郵箱發送告警程序到[email protected]和[email protected]兩個郵箱組成的郵箱用戶組。
解決方法:
1.首先,需要在163郵箱中,網頁版的首頁中,”設置“-》”常規設置“-》”反垃圾/黑白名單 “-》右側主頁中有"白名單”(添加白名單選項卡),將白名單“[email protected]”郵箱地址,添加進入白名單;
提示:目前只是簡單的走通所有的elk的告警流程,對於elastalert的各種告警規則,並沒有深究,尤其是各種告警場景的羅列,下一步需要繼續深入研究。
附註:
關於elasticalert的過濾規則,如下