通過memberlist庫實現gossip管理集羣以及集羣數據交互

通過memberlist庫實現gossip管理集羣以及集羣數據交互

概述

memberlist庫的簡單用法如下,注意下面使用for循環來執行list.Join,原因是一開始各節點都沒有runing,直接執行Join會出現連接拒絕的錯誤。

package main

import (
	"fmt"
	"github.com/hashicorp/memberlist"
	"time"
)

func main() {
	/* Create the initial memberlist from a safe configuration.
	   Please reference the godoc for other default config types.
	   http://godoc.org/github.com/hashicorp/memberlist#Config
	*/
	list, err := memberlist.Create(memberlist.DefaultLocalConfig())
	if err != nil {
		panic("Failed to create memberlist: " + err.Error())
	}

	t := time.NewTicker(time.Second * 5)
	for {
		select {
		case <-t.C:
			// Join an existing cluster by specifying at least one known member.
			n, err := list.Join([]string{"192.168.80.129"})
			if err != nil {
				fmt.Println("Failed to join cluster: " + err.Error())
				continue
			}
			fmt.Println("member number is:", n)
			goto END
		}
	}
END:
	for {
		select {
		case <-t.C:
			// Ask for members of the cluster
			for _, member := range list.Members() {
				fmt.Printf("Member: %s %s\n", member.Name, member.Addr)
			}
		}
	}

	// Continue doing whatever you need, memberlist will maintain membership
	// information in the background. Delegates can be used for receiving
	// events when members join or leave.
}

memberlist的兩個主要接口如下:

  1. Create:根據入參配置創建一個Memberlist,初始化階段Memberlist僅包含本節點狀態。注意此時並不會連接到其他節點,執行成功之後就可以允許其他節點加入該memberlist。

  2. Join:使用已有的Memberlist來嘗試連接給定的主機,並與之同步狀態,以此來加入某個cluster。執行該操作可以讓其他節點了解到本節點的存在。最後返回成功建立連接的節點數以及錯誤信息,如果沒有與任何節點建立連接,則返回錯誤。

    注意當join一個cluster時,至少需要指定集羣中的一個已知成員,後續會通過gossip同步整個集羣的成員信息。

memberlist提供的功能主要分爲兩塊:維護成員狀態(gossip)以及數據同步(boardcast、SendReliable)。下面看幾個相關接口。

接口

memberlist.Create的入參要求給出相應的配置信息,DefaultLocalConfig()給出了通用的配置信息,但還需要實現相關接口來實現成員狀態的同步以及用戶數據的收發。注意下面有些接口是必選的,有些則可選:

type Config struct {
	// ...
	// Delegate and Events are delegates for receiving and providing
	// data to memberlist via callback mechanisms. For Delegate, see
	// the Delegate interface. For Events, see the EventDelegate interface.
	//
	// The DelegateProtocolMin/Max are used to guarantee protocol-compatibility
	// for any custom messages that the delegate might do (broadcasts,
	// local/remote state, etc.). If you don't set these, then the protocol
	// versions will just be zero, and version compliance won't be done.
	Delegate                Delegate
	Events                  EventDelegate
	Conflict                ConflictDelegate
	Merge                   MergeDelegate
	Ping                    PingDelegate
	Alive                   AliveDelegate
	//...
}

memberlist使用如下類型的消息來同步集羣狀態和處理用戶消息:

const (
	pingMsg messageType = iota
	indirectPingMsg
	ackRespMsg
	suspectMsg
	aliveMsg
	deadMsg
	pushPullMsg
	compoundMsg
	userMsg // User mesg, not handled by us
	compressMsg
	encryptMsg
	nackRespMsg
	hasCrcMsg
	errMsg
)

Delegate

如果要使用memberlist的gossip協議,則必須實現該接口。所有這些方法都必須是線程安全的。

type Delegate interface {
	// NodeMeta is used to retrieve meta-data about the current node
	// when broadcasting an alive message. It's length is limited to
	// the given byte size. This metadata is available in the Node structure.
	NodeMeta(limit int) []byte

	// NotifyMsg is called when a user-data message is received.
	// Care should be taken that this method does not block, since doing
	// so would block the entire UDP packet receive loop. Additionally, the byte
	// slice may be modified after the call returns, so it should be copied if needed
	NotifyMsg([]byte)

	// GetBroadcasts is called when user data messages can be broadcast.
	// It can return a list of buffers to send. Each buffer should assume an
	// overhead as provided with a limit on the total byte size allowed.
	// The total byte size of the resulting data to send must not exceed
	// the limit. Care should be taken that this method does not block,
	// since doing so would block the entire UDP packet receive loop.
	GetBroadcasts(overhead, limit int) [][]byte

	// LocalState is used for a TCP Push/Pull. This is sent to
	// the remote side in addition to the membership information. Any
	// data can be sent here. See MergeRemoteState as well. The `join`
	// boolean indicates this is for a join instead of a push/pull.
	LocalState(join bool) []byte

	// MergeRemoteState is invoked after a TCP Push/Pull. This is the
	// state received from the remote side and is the result of the
	// remote side's LocalState call. The 'join'
	// boolean indicates this is for a join instead of a push/pull.
	MergeRemoteState(buf []byte, join bool)
}

主要方法如下:

  • NotifyMsg:用於接收用戶消息(userMsg)。注意不能阻塞該方法,否則會阻塞整個UDP/TCP報文接收循環。此外由於數據可能在方法調用時被修改,因此應該事先拷貝數據。

    該方法用於接收通過UDP/TCP方式發送的用戶消息(userMsg):

    注意UDP方式並不是立即發送的,它會隨gossip週期性發送或在處理pingMsg等消息時發送從GetBroadcasts獲取到的用戶消息。

    //使用UDP方式將用戶消息傳輸到給定節點,消息大小受限於memberlist的UDPBufferSize配置。沒有使用gossip機制
    func (m *Memberlist) SendBestEffort(to *Node, msg []byte) error
    //與SendBestEffort機制相同,只不過一個指定了Node,一個指定了Node地址
    func (m *Memberlist) SendToAddress(a Address, msg []byte) error
    //使用TCP方式將用戶消息傳輸到給定節點,消息沒有大小限制。沒有使用gossip機制
    func (m *Memberlist) SendReliable(to *Node, msg []byte) error
    
  • GetBroadcasts:用於在gossip週期性調度或處理處理pingMsg等消息時攜帶用戶消息,因此並不是即時的。通常會把需要發送的消息通過TransmitLimitedQueue.QueueBroadcast保存起來,然後在發送時通過TransmitLimitedQueue.GetBroadcasts獲取需要發送的消息。見下面TransmitLimitedQueue的描述。

  • LocalState:用於TCP Push/Pull,用於向遠端發送除成員之外的信息(可以發送任意數據),用於定期同步成員狀態。參數join用於表示將該方法用於join階段,而非push/pull。

  • MergeRemoteState:TCP Push/Pull之後調用,接收到遠端的狀態(即遠端調用LocalState的結果)。參數join用於表示將該方法用於join階段,而非push/pull。

定期(PushPullInterval)調用pushPull來隨機執行一次完整的狀態交互。但由於pushPull會與其他節點同步本節點的所有狀態,因此代價也比較大。

EventDelegate

僅用於接收成員的joining 和leaving通知,可以用於更新本地的成員狀態信息。

type EventDelegate interface {
	// NotifyJoin is invoked when a node is detected to have joined.
	// The Node argument must not be modified.
	NotifyJoin(*Node)

	// NotifyLeave is invoked when a node is detected to have left.
	// The Node argument must not be modified.
	NotifyLeave(*Node)

	// NotifyUpdate is invoked when a node is detected to have
	// updated, usually involving the meta data. The Node argument
	// must not be modified.
	NotifyUpdate(*Node)
}

ChannelEventDelegate實現了簡單的EventDelegate接口:

type ChannelEventDelegate struct {
  Ch chan<- NodeEvent
}

ConflictDelegate

用於通知某個client在執行join時產生了命名衝突。通常是因爲兩個client配置了相同的名稱,但使用了不同的地址。可以用於統計錯誤信息。

type ConflictDelegate interface {
	// NotifyConflict is invoked when a name conflict is detected
	NotifyConflict(existing, other *Node)
}

MergeDelegate

在集羣執行merge操作時調用。NotifyMerge方法的參數peers提供了對端成員信息。可以不實現該接口。

type MergeDelegate interface {
	// NotifyMerge is invoked when a merge could take place.
	// Provides a list of the nodes known by the peer. If
	// the return value is non-nil, the merge is canceled.
	NotifyMerge(peers []*Node) error
}

PingDelegate

用於通知觀察者完成一個ping消息(pingMsg)要花費多長時間。可以在NotifyPingComplete中(使用histogram)統計ping的執行時間。

type PingDelegate interface {
	// AckPayload is invoked when an ack is being sent; the returned bytes will be appended to the ack
	AckPayload() []byte
	// NotifyPing is invoked when an ack for a ping is received
	NotifyPingComplete(other *Node, rtt time.Duration, payload []byte)
}

AliveDelegate

當接收到aliveMsg消息時調用的接口,可以用於添加日誌和指標等信息。

type AliveDelegate interface {
	// NotifyAlive is invoked when a message about a live
	// node is received from the network.  Returning a non-nil
	// error prevents the node from being considered a peer.
	NotifyAlive(peer *Node) error
}

Broadcast

可以隨gossip將數據廣播到memberlist集羣。

// Broadcast is something that can be broadcasted via gossip to
// the memberlist cluster.
type Broadcast interface {
	// Invalidates checks if enqueuing the current broadcast
	// invalidates a previous broadcast
	Invalidates(b Broadcast) bool

	// Returns a byte form of the message
	Message() []byte

	// Finished is invoked when the message will no longer
	// be broadcast, either due to invalidation or to the
	// transmit limit being reached
	Finished()
}

Broadcast接口通常作爲TransmitLimitedQueue.QueueBroadcast的入參:

func (q *TransmitLimitedQueue) QueueBroadcast(b Broadcast) {
	q.queueBroadcast(b, 0)
}

alertmanager中的實現如下:

type simpleBroadcast []byte

func (b simpleBroadcast) Message() []byte                       { return []byte(b) }
func (b simpleBroadcast) Invalidates(memberlist.Broadcast) bool { return false }
func (b simpleBroadcast) Finished()     
TransmitLimitedQueue

TransmitLimitedQueue主要用於處理廣播消息。有兩個主要的方法:QueueBroadcastGetBroadcasts,前者用於保存廣播消息,後者用於在發送的時候獲取需要廣播的消息。隨gossip週期性調度或在處理pingMsg等消息時調用GetBroadcasts方法。

// TransmitLimitedQueue is used to queue messages to broadcast to
// the cluster (via gossip) but limits the number of transmits per
// message. It also prioritizes messages with lower transmit counts
// (hence newer messages).
type TransmitLimitedQueue struct {
	// NumNodes returns the number of nodes in the cluster. This is
	// used to determine the retransmit count, which is calculated
	// based on the log of this.
	NumNodes func() int

	// RetransmitMult is the multiplier used to determine the maximum
	// number of retransmissions attempted.
	RetransmitMult int

	mu    sync.Mutex
	tq    *btree.BTree // stores *limitedBroadcast as btree.Item
	tm    map[string]*limitedBroadcast
	idGen int64
}

小結

memberlist中的消息分爲兩種,一種是內部用於同步集羣狀態的消息,另一種是用戶消息。

GossipInterval週期性調度的有兩個方法:

  • gossip:用於同步aliveMsgdeadMsgsuspectMsg消息
  • probe:用於使用pingMsg消息探測節點狀態
	// GossipInterval and GossipNodes are used to configure the gossip
	// behavior of memberlist.
	//
	// GossipInterval is the interval between sending messages that need
	// to be gossiped that haven't been able to piggyback on probing messages.
	// If this is set to zero, non-piggyback gossip is disabled. By lowering
	// this value (more frequent) gossip messages are propagated across
	// the cluster more quickly at the expense of increased bandwidth.
	//
	// GossipNodes is the number of random nodes to send gossip messages to
	// per GossipInterval. Increasing this number causes the gossip messages
	// to propagate across the cluster more quickly at the expense of
	// increased bandwidth.
	//
	// GossipToTheDeadTime is the interval after which a node has died that
	// we will still try to gossip to it. This gives it a chance to refute.
	GossipInterval      time.Duration
	GossipNodes         int
	GossipToTheDeadTime time.Duration

用戶消息又分爲兩種:

  • 週期性同步:
    • PushPullInterval爲週期,使用Delegate.LocalStateDelegate.MergeRemoteState以TCP方式同步用戶信息;
    • 使用Delegate.GetBroadcasts隨gossip發送用戶信息。
  • 主動發送:使用SendReliable等方法實現主動發送用戶消息。
alertmanager的處理

alertmanager通過兩種方式發送用戶消息,即UDP方式和TCP方式。在alertmanager中,當要發送的數據大於MaxGossipPacketSize/2將採用TCP方式(SendReliable方法),否則使用UDP方式(Broadcast接口)。

func (c *Channel) Broadcast(b []byte) {
	b, err := proto.Marshal(&clusterpb.Part{Key: c.key, Data: b})
	if err != nil {
		return
	}

	if OversizedMessage(b) {
		select {
		case c.msgc <- b: //從c.msgc 接收數據,並使用SendReliable發送
		default:
			level.Debug(c.logger).Log("msg", "oversized gossip channel full")
			c.oversizeGossipMessageDroppedTotal.Inc()
		}
	} else {
		c.send(b)
	}
}

func OversizedMessage(b []byte) bool {
	return len(b) > MaxGossipPacketSize/2
}

demo

這裏實現了一個簡單的基於gossip管理集羣信息,並通過TCP給集羣成員發送信息的例子。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章