filebeat裏如何指定kafka的分區

文章目錄

什麼是filebeat

Filebeat是本地文件的日誌數據採集器，是使用golang語言編寫的，可監控日誌目錄或特定日誌文件（tail file），並將它們轉發給Elasticsearch或Logstatsh進行索引、kafka等。帶有內部模塊（auditd，Apache，Nginx，System和MySQL），可通過一個指定命令來簡化通用日誌格式的收集，解析和可視化。

安裝filebeat

這裏我們使用可以兩種方式安裝filebeat 7.1.1
1、使用rpm包管理器安裝

curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.1.1-x86_64.rpm 
sudo rpm -vi filebeat-7.1.1-x86_64.rpm

2、解壓安裝

curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.1.1-darwin-x86_64.tar.gz 
tar xzvf filebeat-7.1.1-darwin-x86_64.tar.gz 
cd filebeat-7.1.1-darwin-x86_64/

安裝好之後可以進入目錄 /etc/filebeat/ 看到配置文件 filebeat.yml

配置文件解讀

filebeat的配置文件分爲一下幾個部分

inputs
輸入部分，在這裏面可以配置日誌文件的輸入，以及可以做一下特殊處理
例如這段配置：
其中 - type: log 爲一個類型的日誌讀取的開始，可以通過fields字段添加額外的字段以供我們來使用，更多的解釋看代碼裏面的註釋

filebeat.inputs:
# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
- type: log ## 固定
  enabled: true #是否啓用
  paths: #日誌路徑
    - /data/test.log
  tags: ["test","mylog",'1'] #標記，選填
  tail_files: true #從尾部開始讀取日誌
  encoding: UTF-8 #編碼格式
  fields:  #附加字段
    partition: "1"
    log_topic: "testLogPlatform"
  fields_under_root: true  #把附加字段作爲一級字段
  # 多行合併的配置，比如異常信息合併爲一行
  multiline.pattern: '^[[:space:]]+(at|\.{3})\b|^Caused by:'
  multiline.negate: false
  multiline.match: after

modules
模塊配置，這裏不多做解釋
Elasticsearch template setting
ElasticSearch輸出模板配置
General
通用配置
Dashboards
儀表盤配置
Kibana
kibana相關配置
Elastic Cloud
elasticSearch公有云配置
Outputs
輸出屬性配置

輸出日誌到kafka

在filebeat裏面我們只需要在output模塊配置一下kafka的輸出配置就可以數據發送到kafka

#----------------------------- Kafka output --------------------------------
output.kafka:
  hosts: ["172.16.161.51:9002","172.16.161.51:9003","172.16.161.51:9004"]
  topic: 'testTopic'
  partition.hash:
    reachable_only: false

如何輸出到不同的主題

根據上面的配置我們是把topic固定爲testTopic了，無法輸出到其他的主題裏面去，但是我們希望不同類型的數據輸出到不同的主題裏面去，這時候上面定義的附加字段就可以發揮作用了，如下面的例子，我們在上面的配置中定義了附加字段 log_topic: "testLogPlatform" 那麼我們這裏可以使用表達式'%{[log_topic]}' 來動態的輸出到不同的主題

#----------------------------- Kafka output --------------------------------
output.kafka:
  hosts: ["172.16.161.51:9002","172.16.161.51:9003","172.16.161.51:9004"]
  topic: '%{[log_topic]}'
  partition.hash:
    reachable_only: false

如何輸出到指定的分區

filebeat的分區分發方式一共有3種
1、隨機分發
2、輪詢分發
3、hash分發
上面說到了如何輸出到指定的主題，但是我們除了需要輸出到指定的主題之外有可能還需要輸出到指定的分區，但是filebeat沒有爲我們提供輸出到指定分區的配置，苦思冥想，filebeat爲我們提供了hash的方式來做分區分發，而且還可以指定字段作爲求hash值的字段。
於是上面的附加字段又發揮作用了，例如下面的配置：
使用了附加字段partition 求hash值然後進行分發

#----------------------------- Kafka output --------------------------------
output.kafka:
  hosts: ["172.16.161.51:9002","172.16.161.51:9003","172.16.161.51:9004"]
  topic: '%{[log_topic]}'
  partition.hash:
    reachable_only: false
    hash: ['partition']

關於附加字段 topic和partition 的值如何確定會在日誌平臺後臺的使用上說明。

如何獲取真實的分區

上面使用了hash的方式之後由於我們還是不知道這個字段經過hash之後他得分區到底是哪一個，因此我們不得不翻開了filebeat的源碼。


func cfgHashPartitioner(log *logp.Logger, config *common.Config) (func() partitioner, error) {
	cfg := struct {
		Hash   []string `config:"hash"`
		Random bool     `config:"random"`
	}{
		Random: true,
	}
	if err := config.Unpack(&cfg); err != nil {
		return nil, err
	}

	if len(cfg.Hash) == 0 {
		return makeHashPartitioner, nil
	}

	return func() partitioner {
		// 1、根據指定的字段使用hash方式獲取分區
		return makeFieldsHashPartitioner(log, cfg.Hash, !cfg.Random)
	}, nil
}

func makeHashPartitioner() partitioner {
	generator := rand.New(rand.NewSource(rand.Int63()))
	hasher := fnv.New32a()

	return func(msg *message, numPartitions int32) (int32, error) {
		if msg.key == nil {
			return int32(generator.Intn(int(numPartitions))), nil
		}

		hash := msg.hash
		if hash == 0 {
			hasher.Reset()
			if _, err := hasher.Write(msg.key); err != nil {
				return -1, err
			}
			msg.hash = hasher.Sum32()
			hash = msg.hash
		}

		// create positive hash value
		return hash2Partition(hash, numPartitions)
	}
}

func makeFieldsHashPartitioner(log *logp.Logger, fields []string, dropFail bool) partitioner {
	generator := rand.New(rand.NewSource(rand.Int63()))
	hasher := fnv.New32a()

	return func(msg *message, numPartitions int32) (int32, error) {
		hash := msg.hash
		if hash == 0 {
			hasher.Reset()

			var err error
			for _, field := range fields {
				// 2、驗證字段hash是否報錯，一般不會報錯
				err = hashFieldValue(hasher, msg.data.Content.Fields, field)
				if err != nil {
					break
				}
			}

			if err != nil {
				if dropFail {
					log.Errorf("Hashing partition key failed: %+v", err)
					return -1, err
				}

				msg.hash = generator.Uint32()
			} else {
			//3、上面驗證之後在進行一次hash求和
				msg.hash = hasher.Sum32()
			}
			hash = msg.hash
		}
		//4、根據hash值使用分區數目求模，獲取最終的分區編號
		return hash2Partition(hash, numPartitions)
	}
}

func hash2Partition(hash uint32, numPartitions int32) (int32, error) {
	p := int32(hash)
	if p < 0 {
		p = -p
	}
	return p % numPartitions, nil
}

func hashFieldValue(h hash.Hash32, event common.MapStr, field string) error {
	type stringer interface {
		String() string
	}

	type hashable interface {
		Hash32(h hash.Hash32) error
	}

	v, err := event.GetValue(field)
	if err != nil {
		return err
	}

	switch s := v.(type) {
	case hashable:
		err = s.Hash32(h)
	case string:
		_, err = h.Write([]byte(s))
	case []byte:
		_, err = h.Write(s)
	case stringer:
		_, err = h.Write([]byte(s.String()))
	case int8, int16, int32, int64, int,
		uint8, uint16, uint32, uint64, uint:
		err = binary.Write(h, binary.LittleEndian, v)
	case float32:
		tmp := strconv.FormatFloat(float64(s), 'g', -1, 32)
		_, err = h.Write([]byte(tmp))
	case float64:
		tmp := strconv.FormatFloat(s, 'g', -1, 32)
		_, err = h.Write([]byte(tmp))
	default:
		// try to hash using reflection:
		err = binary.Write(h, binary.LittleEndian, v)
		if err != nil {
			err = fmt.Errorf("can not hash key '%v' of unknown type", field)
		}
	}
	return err
}

通過閱讀上面最主要的方法func makeFieldsHashPartitioner(log *logp.Logger, fields []string, dropFail bool) partitioner 可以把整個hash的過程簡化成下面這樣，因此我們可以通過下面這個方法獲取我們在filebeat裏面配置的hash字段的值對應的分區了

func testHash(val string,partitionNum int32) int32 {
	hasher := fnv.New32a()
	hasher.Write([]byte(val))
	hash := hasher.Sum32()
	p := int32(hash)
	if p < 0 {
		p = -p
	}
	i := p % partitionNum
	fmt.Println(i)
	return i;
}

如何解決hash衝突

由於存在hash衝突的情況發生，也就是不同的值卻出現了相同的hash值，爲了解決這個問題，我們可以使用數字作爲hash的key去求hash值，同時判斷這個hash值是否已經使用了，如果使用了那麼就使用下一個key再來求hash值，知道出現一個未使用的爲止。

filebeat裏如何指定kafka的分區

文章目錄

什麼是filebeat

安裝filebeat

配置文件解讀

輸出日誌到kafka

如何輸出到不同的主題

如何輸出到指定的分區

如何獲取真實的分區

如何解決hash衝突

filebeat裏如何指定kafka的分區

JVM常見問題彙總

redis自動安裝腳本（源碼安裝）

消息中間件常見問題彙總

使用Kettle進行數據同步（增量）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結