The ultimate MySQL high availability solution(壓縮版)

The ultimate MySQL high availability solution

hingo's picture

clustering suites(使用集羣的工具包來實現HA) is just categorically the wrong approach to database high-availability. 集羣本身還是有用的

Clustering suites = FAIL

The following picture shows how a database high-availability setup would be implemented with any of these clustering solutions:

  

                                   一種框架

So what these clustering frameworks do, is to probe at MySQL from the outside to determine if MySQL still seems to be healthy. In practice you always have a master-slave kind of setup where you are mostly interested in probing the master, but of course the framework is also doing some checks to verify that the slave / standby node is available, in case we should need it later.

If the clustering framework notices that the master MySQL instance is not responding, it initiates failover procedures to the standby node (the slave):

What exactly happens here depends on the setup: for MySQL replication you typically just move thevirtual ip(浮動IP必須能被路由識別,只是一臺機子同時使用兩個IP,一個固定IP,一個隨具體要求而變的IP) to point to the other node, where MySQL is already running. 

On a picture it seems simple enough. What could possibly go wrong?

Commonly the agents are written as shell scripts similar to your traditional Unix init script. And I just don't think a complex thing like a high-availability solution should be implemented in bash.簡單說,生成的日誌多,無用的信息多,配置麻煩

And despite all those log messages, I still managed to make this thing silently fail on startup without any error message. And so forth...And high-availability is a complex topic - this is not just another text editor someone is developing. 提供的信息不充分。So issues like those generally don't give me the confidence that this thing will actually work in production and do the right thing when the network is down.

 If your MySQL instance crashes,it doesn't help if your choice of clustering software just starts a new instance somewhere else, or randomly moves a vip somewhere else. You actuallyneed to replicate the data to begin with

clustering solutions漏洞:

  • At a specified interval - the heartbeat - the clustering solution will see if your MySQL instance is still running.Problem: If you have 5000 trx/sec, you could have fifty-thousand failuresbefore an attempt to fix the error is made(在間隙發送的數據很可能丟失).
  • The clustering solution takes small peeks into MySQL from the outside,but other than that MySQL remains a black box. (監控和MySQL的本身無關,錯誤信息),監控並不能保證各MySQL-Server的數據的一致性。
  • Typically the clustering software itself will have some communication going on, which has the benefit that it verifies that the network connection between nodes is ok. This is in itself useful, sure. Problem: But just like above, if the clustering software detects that network has failed, it's still mostly unaware of the state of MySQL. The failover decision is done blindly(監控認爲網絡不好,其實網絡好的很,況且Master也沒有FAIL,沒有必要進行failover)
  • In a typical setup like above, the clustering software is actually just checking whether there is a network connection between the two MySQL nodes.(只檢查網絡, 並不檢查MySQL-Server是否連到對方) Kind of like the guy in the joke who was searching for his keys under the lamp where there was light, not where he actually lost the keys.

    In this architecture you typically have one writeable master. When something is wrong with that master, you need to do failover. The failover needs to happen atomically: if you have many application servers you cannot have a situation where some writes are still headed for the old master while others are already happening on the new one. Typically this is handled by using a virtual ip that is assigned to one node at a time. Sounds like a simple solution, but in practice it's entirely possible (for poorly implemented clustering solutions)to simply assign the virtual ip to both nodes at the same time! Robert Hodges has a nice blog posts about problems with virtual ip's.

   Often failovers are also expensive,This brings in the problem of false positives: Often these clustering solutions react to small hiccups(打嗝 ) in the network and start a failover, even when the situation is then quickly restored(沒有必要的故障轉移).

The committing of transactions, the replication of data and the detection of failuresneed to be done together by the same piece of software.

Why Galera has none of these problems

The above novel gives a background to why I have been so interested in Galera lately. Let's look at how you'd implement a MySQL high-availability cluster with Galera:

Galera implements synchronous multi-master replication. So the first benefit of this is: there is no virtual ip, andI don't need to choose which node is the master and which are the slaves.

It is just needed for the cluster to elect a quorum in error situations - you always need to have at least 3 nodes in quorum based clustering.

So what happens when one node fails?

Notice the contrast to the previous list of problems:

  • Thanks to synchronous replication and Galera's quorum mechanism, no commits are lost anywhere. When the failure happens, it will be detected as part of the replication process. The Galera nodes themselves figure out which node is thrown out of the cluster, and the remaining ones - who "have quorum" - proceed to commit the transaction. Application nodes that connected to the failing node will of course receive errors as their commits fail.
  • There is no need for maintaining master-slave state, virtual ip or to do any failover(無Master-Slave之分,數據完全同步).The application can connect to any of the Galera nodes and if the transaction succeeds, then it succeeds. If it fails, retry with another node.(實線表式可連接,但app 一次只連一個Master,Master之間的備份有數據庫自己來同步)
  • since the replication is synchronous there is no slave lag //沒有延遲,

But wait, this is not yet the whole story.


用戶端的解決方案:

但是,users are not happy, because when a node has failed, the application will continue to try to connect to it and you'll get a lot of connection errors at your application server. (因爲他們一根筋,沒有人通知他可以連接別的Server).So in practice people still use either virtual ip or some kind of tcp/ip load balancer in front of Galera.(但這是不必要的,還有可能導致單節點問題)

JDBC提供的方案:

It's no wonder, because the needed JDBC parameter is still not really explained anywhere in the MySQL manual. 

when using a Galera cluster,you should use mysql:loadbalance: in front of your JDBC connection string. This allows you to then give a list of MySQL nodes,which are all writeable masters. The JDBC driver will connect to any one of them and commit the transaction. If a node is not available, it will just try another one. (If a transaction was already in progress, it will fail with an exception, you can then retry it and it will just connect to a new node.) 

You should read the blog post for details, in particular on how to use the blacklist timeout argument.

mysql:loadbalance能起到多大作用?

讀寫分離如何實現?(手工編碼?),在Galera方案中不存在讀寫分離,因爲數據庫是完全同步的,即使寫一個數據庫,實際上寫的也是多個數據庫。

This again proves the principle that the best solutions also tend to be simple solutions. If your solution is complex, you're not doing it right.

下面的評論也很有意思,還沒壓縮乾淨,:-)

 
 
 
EllisGL's picture

Galera MySQL with a load

Galera MySQL with a load balancer (HAProxy) has been working well for me.

hingo's picture

I bet it has! And thank you

I bet it has! And thank you for sharing - I haven't actually run Galera in production myself, yet.

Even if my post argues you don't need the loadbalancer (if you use JDBC, for PHP and others this doesn't really exist yet) if you are using a load balancer with Galera, and until now everyone did that, you are still way ahead what everyone else is doing with MySQL replication and other really suboptimal solutions.

ellisgl's picture

Yeah - using a load balancer

Yeah - using a load balancer makes it a bit easier to make it plug-n-play so to say. No need to redo DB connection support in the app. You could augment it further if you needed to with read/write splitting and have a couple nodes used for writing only and just put that in your load balancer as another IP or another port (3307 or something).

ellisgl's picture

I use HAProxy so it's it's

I use HAProxy so it's it's easy to just change one config setting in an app, instead of having to add in the rest of the stuff. Also, HAProxy keeps an eye on each server, so if one is down, you don't have to worry. Also it creates a central location for adding/removing servers from the LB. You solution works, but if you have 10+ apps using the DBs, it can be a pain to update all the configs. Only draw back is that you can't read/write split natively with HAProxy, and would have to be done at the app level.

hingo's picture

So the MySQL JDBC driver can

So the MySQL JDBC driver can do all that. Except you are right, if you want to change the configuration - like add more nodes - then of course you have to do it for each app server. On the other hand, centralized configuration is convenient, but there is a risk you create a new single point of failure. I don't know about HAProxy specifically, just that this has been my experience in general.

ellisgl's picture

I have my HAProxy

I have my HAProxy implementation backed with Corosync/Pacemaker with a simple 2 way rsync on the configuration. So if the main box goes down, the 2nd comes up pretty quickly.

hingo's picture

That's the right way to do

That's the right way to do it. (Except, I'm not a fan of corosync/pacemaker either, but at least in theory this is the right way.)

The ultimate MySQL high availability solution | MySQL | Syng's picture

Pingback

[...] comments to that blog I'm indeed convinced that Baron is right. For one thing, it includes the... Read more... Categories: MySQL     Share | Related [...]

Baron Schwartz's picture

Nice article -- gently disagree

Henrik,

I support your views except, that there is a large set of people who use traditional replication due to its benefits, and want to manage promoting a replica for faster recovery times (not really HA). For their business needs, all things considered, it can be a Very Good Solution. (I mentioned some of these use cases in my blog post, which you linked to.)

All of the failure modes you mentioned are possible, but it's also possible for the clustering software to verify whether it's safe to do a failover, and refuse to do so otherwise. Or to do it manually. This is the approach we're taking with the Percona Replication Manager project, which is still in early development. In practice, I expect this to result in the tool refusing to do anything in a great many cases, which is FINE. When that happens, they should call us (assuming they are Percona customers) and a human can safely resolve the situation in a matter of minutes.

But for real High Availability with capital letters, you will hear absolutely no argument from me. Replication isn't it, and these external clustering systems aren't the way to go.

hingo's picture

MySQL replication basically

MySQL replication basically runs the web as we know it. So yes, it's not doing that bad. I'm familiar with your argument here.

And yes, part of the problem is that the clustering solutions available don't impress with their design or implementation. I'm sure a solution by you or Yoshinori-san can seriously improve over current state of the art. But even so, your solution will be based on MySQL replication and thus suboptimal.

And of course, until now we didn't have a better solution. Well, MySQL Cluster, but not for InnoDB. Even now, we have something great like Galera but you basically have to patch and compile your own MySQL server. So yeah, no points for ease of use.

At the same time, I've tried Galera, I think it is quite easy to use when you get it ready packaged. There's no binlog positions you have to keep track of. So it's not only better for data durability, I argue people will find it having better usability too. And not only is the replication itself easier to use, it enables you to simplify your architecture. This is a state I want to reach.

As long as someone who ships binaries of a MySQL fork integrates it first - hint, hint :-) - so we don't need to patch MySQL ourselves.

Alex's picture

That's the way to do it.

Baron, the approach you're taking with Percona Replication Manager is nothing short of laudable. Indeed, they don't trust computers to dispatch air traffic or supervise children in the kindergarten, yet insist on fully automatic failover or it is not "highly available". Task automation is indispensable of course, but in many cases only a human can make the decision.

There is a catch there though - in the case of some massive failure (like the recent EC2 zone outage), human operators may not be able to cope with all requests on a timely basis. However there's little doubt that they would still do a better job than a fully automated tool.

As for the High Availability with capital letters - that term should be ditched in the gutter as it embodies something as unrealistic and unattainable as "eternal bliss". Better Availability would be more appropriate.

Baron Schwartz's picture

HighER Availability

Right, High Availability is always a matter of degree, not an absolute.

This discussion reminds me of something insightful that Josh Berkus once wrote. He asked and answered the question, "what kind of cluster?" He posits that we should focus more on the users, and less on the technology, which I like. I also agree with his three types of cluster users. They match closely the customers I've worked with.

http://it.toolbox.com/blogs/database-soup/the-three-database-clustering-...

Alex's picture

Bad users?...

Thanks for pointing to a great article. Very insightful indeed. Unfortunately it highlights the same old "mathematical correctness vs. use case" cognitive gap. Take just one requirement from Transactional User:

To do this with zero data loss and 99.999% or better uptime.

According to CAP theorem this is a pretty tough call. Because 99.999% uptime pretty much means A+P, whereas zero data loss is C. So Transactional User is forced to compromise and that makes him unhappy, and he thinks it is because developers are not trying hard enough to see to his needs. I guess developers tried for decades, but you can't jump the laws of logic, not in programming at least.

Online User is just patently insane. Damn, I wish, I _really_ wish I could make a Solution that would work well for him! But that's impossible. No wonder they are unhappy.

I think that it is high time that the _users_ mend their ways and make few steps towards developers (i.e. distributed computing realities). It maybe in the form of changing their software, or changing their expectations, but it has to be done - or their expectations will never be met. (And usually, when you're in a dead end, you're on a wrong way anyways). And this is exactly why I applauded Percona's human approach which I see as a change of expectations (in this context).

Demetrius Albuquerque's picture

MySQL-MHA: "MySQL Master High Availability manager and tools"

[...] In fact, sysbench makes this very easy as it allows me to give a list of nodes separated by commas, such as --mysql-host=host1,host2,host3. This is in the same spirit I outlined in my recent post The Ultimate MySQL High-Availability Solution. [...]

Upcoming conferences: Highload++ Moscow and Percona Live Lon's picture

Pingback

[...] / High Availability Solution which is based on my thinking developed in my recent blog post The ultimate MySQL high availability solution and many benchmarks and functional tests I've done while evaluating these [...]

MySQL HA shootout at Percona Live UK | OpenLife.cc's picture

Pingback

[...] truly will be presenting Choosing a MySQL Replication / High Availability Solution. If you've been reading my blog, you already know that I will advice the audience to choose a solution that is based on synchronous [...]

Henrik Ingo: Upcoming conferences: Highload++ Moscow and Per's picture

Pingback

[...] / High Availability Solution which is based on my thinking developed in my recent blog post The ultimate MySQL high availability solution and many benchmarks and functional tests I’ve done while evaluating these [...]

MySQL/Galera 1.0 Released « IT Primer's picture

Pingback

[...] Ingo described Galera as the ultimate MySQL HA solution. GA_googleAddAttr("AdOpt", "1"); GA_googleAddAttr("Origin", "other"); [...]

發佈了29 篇原創文章 · 獲贊 4 · 訪問量 9萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章