Membase Cluster Manager

Membase Cluster Manager

Client applications access these services via the admin port (8091) and data ports (11211 or 11210).  The cluster communicates internally on ports

To keep throughput high and latency low, Membase Server will always keep metadata about all items in memory.

When configuring a Membase Server, a memory quota is set.Membase Server will automatically migrate items from memory to disk when the configured memory quota is reached. If those items are later accessed, they will be moved back into system memory. For efficiency purposes, these operations are performed on a regular basis in the background.但Membase認爲是硬盤空間是無限的,由OS進行管理硬盤空間不足的情況。

Storage paths do not need to be uniform across all server nodes in a cluster. If a server that was a standalone cluster joins another cluster, the storage path for that server remains unchanged.

Membase has asynchronous persistence as a feature

Design

Each instance of ep-engine(最終一致的引擎) in a given node will have a certain memory quota associated with it.  That amount of memory will always store the index to the entire working set. By doing so, we ensure most items are quickly fetched and checks for the existence of items is always fast.

In addition to the quota, there are two watermarks the engine will use to determine when it is necessary to start freeing up available memory. These are mem_low_watand mem_high_wat.

As the system is loaded with data, eventually the mem_low_wat is passed. At this time, a background job is scheduled to reclaim the cached values of replica items from RAM...freeing up memory for further data growth. This is called "ejection" and can only take place on items that have already been written to disk...marking them as "clean". As data continues to load, it will evenutally reach/pass mem_high_wat. The job will continue to run, now ejecting active items as well until memory is below mem_low_wat. If the rate of incoming items is faster than the writing of items to disk, the system may return errors indicating there is not enough space. This will continue until there is available memory.

Consequences of Memory faster than Disk

Obviously, the migration of data to disk is generally much slower and has much lower throughput than setting things in memory. When an application is setting or otherwise mutating data faster than it can be migrated out of memory to make space available for incoming data, the behavior of the server may be a bit different than the client expects with memcached. In the case of memcached, items are evicted from memory, and the newly mutated item is stored. In the case of Membase, however, the expectation is that we'll migrate items to disk.

As described above, current Membase (1.7.1 as of this writing) will return a SERVER_ERROR when there is not enough space, similar to memcached. This is not sufficient for a Membase deployment though, as some existing processes, like bulk loading, expect to go beyond the memory quota and do so quickly. There are effectively three possible approaches to this problem:

When accepting a set/mutation is required and space is not available due to memory quota being hit:

  1. Block that request until space becomes available.
  2. Return a SERVER_ERROR with a standard message indicating that a retry later is likely to succeed
  3. Slow client requests for set/mutation after mem_high_wat is reached, reducing the pressure and return a SERVER_ERROR with a standard message indicating a retry is likely to succeed

With option 1, the challenge is that many clients expect memcached operations to either succeed or fail in a relatively short time period... usually miliseconds. If Membase were to slow down to the latency and throughput associated with disk migrations, some clients may "give up", and then application developers could end up doing retries, etc. This would cause both saturation on the backend and churning on the front end.

With option 2, the challenge is that the client must be able to handle the error correctly. Most existing applications probably will handle it correctly out of the box. End user applications performing things like bulk loading will need to be enhanced to understand the details of the SERVER_ERROR response and backoff appropriately. This is a common thing to do in other distributed systems like HTTP, where an HTTP 503 response indicates temporary overloading of a server.

Option 3 provides a good balance of client responsibilites and server responsibilities. As the server becomes busier, it will appear to slow down from the client perspective. If the client throughput goes beyond what the server can practically handle, it will return the SERVER_ERROR indicating it is temporarily unable to service the request.

Currently, we have implemented option 2. The handling of SERVER_ERROR is required by the client.

Utilities

Some aspects of this can be changed with flushctl. The mem_high_wat and mem_low_wat values can be tuned per bucket.


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章