fluentd學習——High Availability (多級fluentd配置)

High Availability (多級fluentd配置)

http://docs.fluentd.org/articles/high-availability

Fluentd High Availability Configuration

For high-traffic websites, we recommend using a high availability configuration of Fluentd.

對於高流量的網站,我們建議使用一個高可用性fluentd配置。

Table of Contents

Message Delivery Semantics

消息傳遞語義

Fluentd is designed primarily for event-log delivery systems.

Fluentd 主要爲event-log 傳遞系統而設計的。

In such systems, several delivery guarantees are possible:

在這種系統中,幾個傳遞擔保是可能的。

  • At most once: Messages are immediately transferred. If the transfer succeeds, the message is never sent out again. However, many failure scenarios can cause lost messages (ex: no more write capacity)
  • At least once: Each message is delivered at least once. In failure cases, messages may be delivered twice.
  • Exactly once: Each message is delivered once and only once. This is what people want.
  • 至多一次:消息立即轉移。如果轉會成功,郵件不會再次發送。然而,許多失敗場景會導致失去了消息(例:沒有更多的寫作能力)
  • 至少一次:每個消息至少傳遞一次。在失敗的情況下,消息可能是兩次交付。
    正好一次:每條消息交付一次,只有一次。這纔是人們想要的。

If the system “can’t lose a single event”, and must also transfer “exactly once”, then the system must stop ingesting events when it runs out of write capacity. The proper approach would be to use synchronous logging and return errors when the event cannot be accepted.

如果系統“不能失去單一的一個事件”,還必須傳遞“正好一次”,當寫能力耗盡的時候,則系統必須停止攝取事件。正確的方法是當事件不能被接受時用同步日誌並返回錯誤。(這就是爲什麼是會有一個error.log 的日誌文件

That’s why Fluentd guarantees ‘At most once’ transfers. In order to collect massive amounts of data without impacting application performance, a data logger must transfer data asynchronously. This improves performance at the cost of potential delivery failures.

這就是爲什麼Fluentd保證“最多一次”轉移。爲了收集大量數據而不影響應用程序的性能,一個數據log文件必須轉移數據異步。這提高了性能代價的潛在交付失敗。(這是在說明爲什麼在<server></server>會有多個IP的匹配。是用來一個出錯,可以轉發另外的IP,這個應該就是轉移能力

However, most failure scenarios are preventable. The following sections describe how to set up Fluentd’s topology for high availability.

雖然,大多數失敗的場景是可以預防的。以下部分描述如何設置Fluentd的拓撲爲高可用性

Network Topology

網絡拓撲

To configure Fluentd for high availability, we assume that your network consists of ‘log forwarders’ and ‘log aggregators’.

Fluentd的高可用性配置,我們假設你的網絡由”日誌代理“和”日誌整合“

log forwarders are typically installed on every node to receive local events. Once an event is received, they forward it to the ‘log aggregators’ through the network.

  log forwarders 通常安裝在每個節點接收本地事件。一旦事件被收到,他們通過網絡提交到 log aggregators 。  

log aggregators’ are daemons that continuously receive events from the log forwarders. They buffer the events and periodically upload the data into the cloud.

log aggregators’是守護進程,不斷從  log forwarders 接收事件。他們緩衝事件和定期把數據上傳到雲。

Fluentd can act as either a log forwarder or a log aggreagator, depending on its configuration. The next sections describes the respective setups. We assume that the active log aggregator has ip ‘192.168.0.1’ and that the backup has ip ‘192.168.0.2’.

Fluentd可以作爲日誌forwarder 也可以作爲日誌aggreagator,這取決於它的配置。接下來的章節描述了各自的設置。我們假定有效的日誌聚合器有ip”192.168.0.1,備份ip”192.168.0.2’。(就是說你的發送端可以是用fluentd來收集,你的接收端也是用fluentd來收集)

Log Forwarder Configuration   (想想這個只是客戶端的日誌文件上傳,只負責上傳,但收集的日誌文件在哪兒,這個時候你應該能想到的是,哪個插件是用來收集的日誌的——tail input plugin插件

Log Forwarder     配置

Please add the following lines to your config file for log forwarders. This will configure your log forwarders to transfer logs to log aggregators.

請將下列代碼行添加到你的log forwarders的配置文件。這將配置您的  log forwarders 傳輸到日誌 log aggregators。

收集端的配置文件
# TCP input
<source>
  type forward
  port 24224
</source>

# HTTP input
<source>
  type http
  port 8888
</source>

# Log Forwarding
<match mytag.**>
  type forward

  # primary host
  <server>
    host 192.168.0.1
    port 24224
  </server>
  # use secondary host     (多個IP)
  <server>
    host 192.168.0.2
    port 24224
    standby
  </server>

  # use longer flush_interval to reduce CPU usage.
  # note that this is a trade-off against latency.
  flush_interval 60s
</match>

When the active aggregator (192.168.0.1) dies, the logs will instead be sent to the backup aggregator (192.168.0.2). If both servers die, the logs are buffered on-disk at the corresponding forwarder nodes.

當192.168.0.1死了,日誌將被髮送到備份聚合器(192.168.0.2)。如果兩臺服務器死,日誌緩衝磁盤在相應的轉發節點。

Log Aggregator Configuration

聚合端的配置(服務器端的配置就這幾句話,仔細理解)

Please add the following lines to the config file for log aggregators. The input source for the log transfer is TCP.

請將以下代碼添加到日誌聚合端的配置文件中。輸入源的日誌是用TCP傳輸的
# Input
<source>
  type forward
  port 24224
</source>

# Output  這匹配非常重要(就是匹配上面收集端的<match mytag.**>)這個問題困惑了好久
<match mytag.**>  
  ...
</match>

The incoming logs are buffered, then periodically uploaded into the cloud. If upload fails, the logs are stored on the local disk until the retransmission succeeds.

傳入日誌緩衝,然後定期上傳到雲端。如果上傳失敗,日誌存儲在本地磁盤,直到重新傳輸成功。

Failure Case Scenarios

失敗案例情況

Forwarder Failure

Forwarder 失敗

When a log forwarder receives events from applications, the events are first written into a disk buffer (specified by buffer_path). After every flush_interval, the buffered data is forwarded to aggregators.

當一個日誌forwarder 收到來自應用程序的事件,這些事件首先被寫入磁盤緩衝區(指定的緩衝區路徑)。在每次刷新間隔後,緩衝數據轉發到聚合器aggregators。

This process is inherently robust against data loss. If a log forwarder’s fluentd process dies, the buffered data is properly transferred to its aggregator after it restarts. If the network between forwarders and aggregators breaks, the data transfer is automatically retried.

這個過程內在本身是健壯的以防止數據丟失。如果一個日誌 forwarder  的fluentd進程死掉,緩衝數據被正確地轉移到其聚合器aggregator 後重啓。如果網絡在代理forwarder 和聚合器aggregator 之間斷開,數據傳輸是自動重試。

However, possible message loss scenarios do exist:

然而,可能的信息損失的場景確實存在:

  • The process dies immediately after receiving the events, but before writing them into the buffer.
  • The forwarder’s disk is broken, and the file buffer is lost.
  • 這個進程失效後立即接收事件,但在它們寫到緩衝區之前。
  • forwarder  的磁盤壞了,和文件緩衝區失去了。

Aggregator Failure

Aggregator 失敗

When log aggregators receive events from log forwarders, the events are first written into a disk buffer (specified by buffer_path). After every flush_interval, the buffered data is uploaded into the cloud.

當日志聚合器aggregators 從日誌代理forwarders接收事件時,事件首先被寫入磁盤緩衝區(指定的緩衝區路徑)。在每次刷新間隔,緩衝數據上傳到雲端。

This process is inherenty robust against data loss. If a log aggregator’s fluentd process dies, the data from the log forwarder is properly retransferred after it restarts. If the network between aggregators and the cloud breaks, the data transfer is automatically retried.

這個過程是內在本身是健壯的以防止數據丟失。如果一個日誌聚合器aggregators 的fluentd進程失效,數據從日誌 forwarder 正確重啓後重新傳送。如果網絡和雲之間的聚合器aggregators 斷開,數據傳輸是自動重試。

However, possible message loss scenarios do exist:

然而,可能的信息損失的場景確實存在:

  • The process dies immediately after receiving the events, but before writing them into the buffer.
  • The aggregator’s disk is broken, and the file buffer is lost.
  • 這個進程失效後立即接收事件, 但在它們寫到緩衝區之前 。
  • aggregators  的磁盤壞了,和文件緩衝區失去了。

Trouble Shooting

問題解答

“no nodes are available”

沒有節點可用

Please make sure that you can communicate with port 24224 using not only TCP, but also UDP. These commands will be useful for checking the network configuration.

請確保你可以溝通使用TCP端口24224,而不是UDP。這些命令將被用於檢查網絡配置。

$ telnet host 24224
$ nmap -p 24224 -sU host

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章