ELK生態：linux系統安裝和配置logstash數據導入工具

簡介

上一篇博客講到了elasticsearch搜索引擎的管理工具kibana的安裝和配置，那麼本篇博客將會具體介紹數據導入工具logstash的安裝和配置；

Logstash是一款強大的數據處理工具，它可以實現多樣化的數據源數據全量或增量傳輸，數據標準格式處理，數據格式化輸出等的功能，常用於日誌處理。工作流程分爲三個階段：

input數據輸入階段，可接收oracle、mysql、postgresql、file等多種數據源；
filter數據標準格式化過濾階段，可過濾、格式化數據，如格式化時間、字符串等；
output數據輸出階段，可輸出到elasticsearch、mongodb、kfka等接收終端。

實踐

下載logstash-5.6.1安裝包，下載路徑：logstash-5.6.1，然後解壓之es的同級目錄（方便管理）；
配置config目錄下的logstash.yml文件，具體配置如下（其餘屬性有需要可自行添加）：

# Settings file in YAML
#
# Settings can be specified either in hierarchical form, e.g.:
#
#   pipeline:
#     batch:
#       size: 125
#       delay: 5
#
# Or as flat keys:
#
#   pipeline.batch.size: 125
#   pipeline.batch.delay: 5
#
# ------------  Node identity ------------
#
# Use a descriptive name for the node:
#設置節點名稱
# node.name: test
#
# If omitted the node name will default to the machine's host name
#
# ------------ Data path ------------------
#
# Which directory should be used by logstash and its plugins
# for any persistent needs. Defaults to LOGSTASH_HOME/data
#設置UUID文件存放路徑
path.data: /data/es/logstash-5.6.1
#
# ------------ Pipeline Settings --------------
#
# Set the number of workers that will, in parallel, execute the filters+outputs
# stage of the pipeline.
#
# This defaults to the number of the host's CPU cores.
#pipeline線程數，建議等同於cpu內核數
pipeline.workers: 10
#
# How many workers should be used per output plugin instance
#實際output時的線程數，建議等同於cpu內核數
pipeline.output.workers: 10
#
# How many events to retrieve from inputs before sending to filters+workers
#每次發送的事件數，批處理事件數，修改默認防止es集羣的網絡io過載
#默認125，數值越大，處理數據越高效，但佔用內存越高，可自行調整
pipeline.batch.size: 3000
#
# How long to wait before dispatching an undersized batch to filters+workers
# Value is in milliseconds.
#發送延時，傳輸間歇時間，默認5
pipeline.batch.delay: 100
#
stash to exit during shutdown even if there are still inflight
# events in memory. By default, logstash will refuse to quit until all
# received events have been pushed to the outputs.
#
# WARNING: enabling this can lead to data loss during shutdown
#
# pipeline.unsafe_shutdown: false
#
# ------------ Pipeline Configuration Settings --------------
#
# Where to fetch the pipeline configuration for the main pipeline
#
#設置配置文件存放路徑
path.config: /data/es/logstash-5.6.1/config/logstash.conf
#
# Pipeline configuration string for the main pipeline
#
# config.string:
#
# At startup, test if the configuration is valid and exit (dry run)
#
# config.test_and_exit: false
#
# Periodically check if the configuration has changed and reload the pipeline
# This can also be triggered manually through the SIGHUP signal
#
# config.reload.automatic: false
#
# How often to check if the pipeline configuration has changed (in seconds)
#
# config.reload.interval: 3
#
# Show fully compiled configuration as debug log message
# NOTE: --log.level must be 'debug'
#
# config.debug: false
#
# When enabled, process escaped characters such as \n and \" in strings in the
# pipeline configuration files.
#
# config.support_escapes: false
#
# ------------ Module Settings ---------------
# Define modules here.  Modules definitions must be defined as an array.
# The simple way to see this is to prepend each `name` with a `-`, and keep
# all associated variables under the `name` they are associated with, and 
# above the next, like this:
#
# modules:
#   - name: MODULE_NAME
#     var.PLUGINTYPE1.PLUGINNAME1.KEY1: VALUE
#     var.PLUGINTYPE1.PLUGINNAME1.KEY2: VALUE
#     var.PLUGINTYPE2.PLUGINNAME1.KEY1: VALUE
#     var.PLUGINTYPE3.PLUGINNAME3.KEY1: VALUE
#
# Module variable names must be in the format of 
#
# var.PLUGIN_TYPE.PLUGIN_NAME.KEY
#
# modules:
#
# ------------ Queuing Settings --------------
#
# Internal queuing model, "memory" for legacy in-memory based queuing and
# "persisted" for disk-based acked queueing. Defaults is memory
#
# queue.type: memory
#
# If using queue.type: persisted, the directory path where the data files will be stored.
# Default is path.data/queue
#
path.queue: /data/es/logstash-5.6.1/data/queue
#
# If using queue.type: persisted, the page data files size. The queue data consists of
# append-only data files separated into pages. Default is 250mb
#
# queue.page_capacity: 250mb
#
# If using queue.type: persisted, the maximum number of unread events in the queue.
# Default is 0 (unlimited)
#
# queue.max_events: 0
#
# If using queue.type: persisted, the total capacity of the queue in number of bytes.
# If you would like more unacked events to be buffered in Logstash, you can increase the
# capacity using this setting. Please make sure your disk drive has capacity greater than
# the size specified here. If both max_bytes and max_events are specified, Logstash will pick
# whichever criteria is reached first
# Default is 1024mb or 1gb
#
# queue.max_bytes: 1024mb
#
# If using queue.type: persisted, the maximum number of acked events before forcing a checkpoint
# Default is 1024, 0 for unlimited
#
# queue.checkpoint.acks: 1024
#
# If using queue.type: persisted, the maximum number of written events before forcing a checkpoint
# Default is 1024, 0 for unlimited
#
# queue.checkpoint.writes: 1024
#
# If using queue.type: persisted, the interval in milliseconds when a checkpoint is forced on the head page
# Default is 1000, 0 for no periodic checkpoint.
#
# queue.checkpoint.interval: 1000
#
# ------------ Dead-Letter Queue Settings --------------
# Flag to turn on dead-letter queue.
#
# dead_letter_queue.enable: false

# If using dead_letter_queue.enable: true, the maximum size of each dead letter queue. Entries
# will be dropped if they would increase the size of the dead letter queue beyond this setting.
# Default is 1024mb
# dead_letter_queue.max_bytes: 1024mb

# If using dead_letter_queue.enable: true, the directory path where the data files will be stored.
# Default is path.data/dead_letter_queue
#
path.dead_letter_queue: /data/es/logstash-5.6.1/data/dead_letter_queue
#
# ------------ Metrics Settings --------------
#
# Bind address for the metrics REST endpoint
#
# http.host: "127.0.0.1"
#
# Bind port for the metrics REST endpoint, this option also accept a range
# (9600-9700) and logstash will pick up the first available ports.
#
# http.port: 9600-9700
#
# ------------ Debugging Settings --------------
#
# Options for log.level:
#   * fatal
#   * error
#   * warn
#   * info (default)
#   * debug
#   * trace
#
# log.level: info
path.logs: log4j2.properties
#
# ------------ Other Settings --------------
#
# Where to find custom plugins
# path.plugins: []

配置logstash.conf文件（數據抽取主要文件）：

（1）logstash的greenplum(postgresql)的連接配置方式：

#1.連接數據庫，數據輸入階段
input {
	#如果是後臺運行，則去掉stdin{}這個配置
    stdin {
    }
    jdbc {
	  #jdbc驅動包，路徑可自定義，需自行下載
	  jdbc_driver_library => "../lib/greenplum-1.0.jar"
	  #jdbc驅動類
	  jdbc_driver_class => "com.pivotal.jdbc.GreenplumDriver"
#jdbc鏈接URL
jdbc_connection_string=>"jdbc:pivotal:greenplum://127.0.0.1:2345;DatabaseName=testIndex"
	  #jdbc用戶名
	  jdbc_user => "root"
	  #jdbc密碼
	  jdbc_password => "1234"
	  #是否分頁導入，如果全量則設置爲false
      jdbc_paging_enabled => "false"
	  #每頁數據量，即：每一次導入的數據量
      jdbc_page_size => "1000"
      #是否要將所有字段變成小寫
      lowercase_column_names => "false"
	  #導入數據的SQL文件存放路徑，路徑可自定義
      statement_filepath => "/data/es/logstash-5.6.1/config/jdbc.sql"
	  #定時器任務；多久執行一次，默認1分鐘。分、時、天、月、年
      schedule => "* * * * *"
	  #是否清除last_run_metadata_path裏面的記錄，如果爲true，則每次都從新導入數據
	  clean_run => false
	  #是否記錄上次執行的結果，如果爲true則會把上次執行的tracking_column記錄到last_run_metadata_path指定文件，增量導入的時候需要用到，路徑可自定義
	  record_last_run => true
	  last_run_metadata_path => "/data/es/logstash-5.6.1/data/jdbc.lastrun"

	#是否使用列屬性值，默認track的是timestamp的值，如果爲true，這會使用最後記#錄的時間作爲增量標記，如果爲false，則系統會自動記錄上一次導入的時間
	  use_column_value => true
	  
	  #設置添加增量的條件，根據上一次的導入時間進行數據導入
	  tracking_column => inputtime
    }
}

#2.過濾格式化數據階段
filter {
    json {
        source => "message"
        remove_field => ["message"]
    }
    code => "event.set('timestamp', event.get('@timestamp').time.localtime + 8*60*60)"
}

#3.數據輸出到ES階段
output {
    elasticsearch {
		#ES的IP訪問地址
        hosts => "127.0.0.1:9200"
		#索引名稱
        index => "testIndex"
		#索引類型名稱
		document_type => "test"
		#設置主鍵（id字段等同於數據庫的主鍵字段，可修改），默認系統自動生成
         #document_id => "%{id}"
         #累計緩衝event條數達到flush_size值會flush釋放空間一次
         flush_size => 1000
         #距離上次flush的時間之後idle_flush_time秒後也會flush一次
         idle_flush_time => 15
		#是否使用模板覆蓋，如果不需要模板，這將下面兩行刪掉或者註釋掉
		template_overwrite=>true
		#模板路徑，通用模板下載：https://download.csdn.net/download/alan_liuyue/11241484
		template=>"/data/es/logstash-5.6.1/template/logstash.json"
    }
    stdout {
        codec => json_lines
    }
}

（2）logstash的oracle的連接配置方式：
input {
    stdin {
    }
	jdbc {
	  jdbc_driver_library => "../lib/ojdbc14-10.2.0.3.0.jar"
	  jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
      jdbc_connection_string => "jdbc:oracle:thin:root/1234@//127.0.0.1:2345/orcl"
	  jdbc_user => "root"
	  jdbc_password => "1234"
      jdbc_paging_enabled => "false"
      jdbc_page_size => "1000"
      statement_filepath => "/data/es/logstash-5.6.1/config/sql/test.sql"
      schedule => "* * * * *"
	  clean_run => false
	  record_last_run => true
	  last_run_metadata_path => "/data/es/logstash-5.6.1/data/test.lastrun"
	  use_column_value => true
	  tracking_column => inputtime
	  type => "test"
    }
}

（3）logstash的sqlserver的連接配置方式：
input {
    stdin {
    }
	jdbc {
	  jdbc_driver_library => "../lib/sqljdbc4.jar"
	  jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
      jdbc_connection_string => "jdbc:sqlserver://127.0.0.1:2345;databaseName=TESTDB "
	  jdbc_user => "root"
	  jdbc_password => "1234"
      jdbc_paging_enabled => "false"
      jdbc_page_size => "1000"
      statement_filepath => "/data/es/logstash-5.6.1/config/sql/test.sql"
      schedule => "* * * * *"
	  clean_run => false
	  record_last_run => true
	  last_run_metadata_path => "/data/es/logstash-5.6.1/data/test.lastrun"
	  use_column_value => true
	  tracking_column => inputtime
	  type => "test"
    }
}

（4）logstash的mysql的連接配置方式：
input {
    stdin {
    }
	jdbc {
	  jdbc_driver_library => "../lib/ mysql-connector-java-6.0.5.jar "
	  jdbc_driver_class => " com.mysql.jdbc.Driver"
      jdbc_connection_string => "jdbc:mysql:// 127.0.0.1:2345/ TESTDB "
	  jdbc_user => "root"
	  jdbc_password => "1234"
      jdbc_paging_enabled => "false"
      jdbc_page_size => "1000"
      statement_filepath => "/data/es/logstash-5.6.1/config/sql/test.sql"
      schedule => "* * * * *"
	  clean_run => false
	  record_last_run => true
	  last_run_metadata_path => "/data/es/logstash-5.6.1/data/test.lastrun"
	  use_column_value => true
	  tracking_column => inputtime
	  type => "test"
    }
}

啓動logstash：

進入bin目錄；
　　　　執行命令 nohup ./logstash & 後臺啓動；
　　　　如果要停止，則執行命令 ps aux|grep logstash 查看進程，然後殺死進程即可；

注意事項

1. jdk版本不兼容，需要1.8版本以上的jdk，解決方法：

不改變當前JDK環境變量的情況下，可以在bin目錄下的logstash文件裏面的頭部新增如下:
　　　　export JAVA_HOME=/usr/local/jdk1.8.0_121
　　　　export PATH=$JAVA_HOME/bin:$PATH

2. logstash的增量配置的最後更新值sql_last_value默認爲timestamp類型的時間值，如果需要使用自定義的字段，則需要自行修改 sql_last_value 值（只需修改一次），然後指定更新的字段（record_last_run => true； tracking_column =>stringField），這樣logstash則會根據入庫的最後一條記錄的字段值進行改寫和實現增量；

3. logstash數據輸出階段使用的template模板，這裏的作用主要用於將字段分詞設置成ik分詞，如果不需要這個模板可直接去掉；

4. postgresql、oracle、sqlserver、mysql等數據庫連接方式都不一樣，所以如果數據源不一樣的話，可自行切換連接方式；另外，提供幾個驅動包的下載路徑，如果有需要可自行前往下載：greenplum-1.0.jar；mysql-connector-java-6.0.5.jar；ojdbc14-10.2.0.3.0.jar；sqljdbc4.jar；

補充內容

1. test.sql的書寫方式：

Select * from tableName where inputtime>:sql_last_value
說明：sql語法和普通的一樣，如果是增量的話需要使用增量字段作爲條件，
:sql_last_value爲默認寫法，logstash會自動讀取test.lastrun的時間；

2. test.lastrun的書寫方式：

— 2017-12-26 00:00:00
說明：如果增量字段是時間類型，可按照上面的格式去寫首次導入的時間，如果增量字段是字符串類型，
比如：“20171226000000”，則上面的格式也需要寫成：— ‘20171226000000’；否則增量不起作用

總結

實踐是檢驗認識真理性的唯一標準，自己動手豐衣足食~

ELK生態：linux系統安裝和配置logstash數據導入工具

簡介

實踐

注意事項

補充內容

總結

Netty詳解：Netty Bootstrap（圖解）|秒懂

RedHat 7.0系統安裝mysql 5.7.22

前端WebSocket進行消息實時推送和提示（附代碼）

RocketMQ之順序消費：Demo及實現原理分析

ELK生態：Logstash通過sql導入地理座標到ES，數據格式爲對象

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結