David Mytton爲什麼從MySQL遷移到MongoDB數據庫

題記:

    工作辭了,在家閒着也是閒着,研究了下non-relational數據庫,恰巧看到robbin大哥寫的“NOSQL數據庫探討”,便迫切想學習下,瞭解到MongoDB一些基本知識後,就去瞅了下在robbin大哥的文中提及到的一個MongoDB移植案例,如:

“由於Mongo可以支持複雜的數據結構,而且帶有強大的數據查詢功能,因此非常受到歡迎,很多項目都考慮用MongoDB來替代MySQL來實現不是特別複雜的Web應用,比方說why we migrated from MySQL to MongoDB就是一個真實的從MySQL遷移到MongoDB的案例,由於數據量實在太大,所以遷移到了Mongo上面,數據查詢的速度得到了非常顯著的提升。”

    從中感到了作者的歡喜和憂愁,有翻譯不妥或理解不到位的,還請指正:)

 

1,David爲什麼要遷移?

 

原文如下:

寫道
The problem we encountered was administrative. We wanted to scale using replication but found that MySQL had a hard time keeping up, especially with the initial sync. As such, backups became an issue, but we solved that. However, scaling MySQL onto multiple clustered servers as we plan to do in the future is difficult. You either do this through replication but that is only really suited to read-heavy applications; or using MySQL cluster. The cluster looks very good but I have read about some problems with it and was unsure of it’s suitability for our needs.

 

看上去大概的意思是說:我們遇到了管理上的麻煩,雖然我們解決了備份問題。我們試圖通過MySql集羣解決,集羣看上去很好但對於一個大量寫應用來說卻遇到了困難,同時我們也不確定集羣是否適應我們的需求。

 

於是David選擇更換MySQL,選擇了MongoDB。

 

2、爲什麼選擇MongonDB?

 

寫道
Very easy to install.
PHP module available.
Very easy replication, including master-master support. In testing this caught up with our live DB very quickly and stayed in sync without difficulty.
Automated sharding being developed.
Good documentation.

 我想最重要的一點應該是:Very easy replication, including master-master support. In testing this caught up with our live DB very quickly and stayed in sync without difficulty.

 

 非常容易的數據拷貝並且快速、一致。

 

3、移植MongonDB後的問題。

 

Schema-less:

寫道
Schema-less

This means things are much more flexible for future structure changes but it also means that every row records the field names. We had relatively long, descriptive names in MySQL such as timeAdded or valueCached. For a small number of rows, this extra storage only amounts to a few bytes per row, but when you have 10 million rows, each with maybe 100 bytes of field names, then you quickly eat up disk space unnecessarily. 100 * 10,000,000 = ~900MB just for field names!

We cut down the names to 2-3 characters. This is a little more confusing in the code but the disk storage savings are worth it. And if you use sensible names then it isn’t that bad e.g. timeAdded -> tA. A reduction to about 15 bytes per row at 10,000,000 rows means ~140MB for field names – a massive saving.


 

靈活的BSON文本存儲結構意味着每條記錄都帶有了字段名,從而處理不當會導致空間的浪費,於是David減縮了字段名。

 

The database-per-customer method doesn’t work

寫道
The database-per-customer method doesn’t work

MongoDB stores data in flat files using their own binary storage objects. This means that data storage is very compact and efficient, perfect for high data volumes. However, it allocates a set of files per database and pre-allocates those files on the filesystem for speed:

This was a problem because MongoDB was frequently pre-allocating in advance when the data would almost never need to “flow” into another file, or only a tiny amount of another file. This is particularly the case with free accounts where we clear out data after a month. Such pre-allocation caused large amounts of disk space to be used up.

We therefore changed our data structure so that we had a single DB, thus making the most efficient use of the available storage. There is no performance hit for doing this because the files are split out, unlike MySQL which uses a single file per table.

 

 

MongoDB的文件存儲是以“database”爲顆粒的,不像MySQL爲每個table使用一個單獨的文件。並且避免生成硬盤碎片,mongonDB是預申請硬盤空間,以指數遞增,所以如果數據組織不好的話,會導致文件中實際使用空間遠小於佔用硬盤的空間,所以David更改了數據組織結構以更高效得利用空間。

 

Unexpected locking and blocking

寫道
Unexpected locking and blocking

In MongoDB, removing rows locks and blocks the entire database. Adding indexes also does the same. When we imported our data, this was causing problems because large data sets were causing the locks to exist for some time until the indexing had completed. This is a not a problem when you first create the “collection” (tables in MySQL) because there are only a few (or no) rows, but creating indexes later will cause problems.

Previously in MySQL we would delete rows by using a wide ranging WHERE clause, for example to delete rows by date range or server ID. Now in MongoDB we have to loop through all the rows and delete them individually. This is slower, but it prevents the locking issue.

  

 

 在MongonDB中,刪除rows需要阻塞整個database,增加index也一樣,相對Mysql來說,速度慢了,但防止出現關於鎖的問題。

 

Corruption

寫道
Corruption

In MySQL if a database (more likely a few tables) become corrupt, you can repair them individually. In MongoDB, you have to repair on a database level. There is a command to do this but it reads all the data and re-writes it to a new set of files. This means all data is checked and means you will probably have some disk space freed up as files are compacted but it also means the entire database is locked and blocked during the time it takes. With our database being around 60GB, this operation takes several hours.

 

mysql中各類table可以獨立的修復,而mongonDB的修復是database級別的,所有的data都會被檢查。

 

寫道
Performance

Our reasons for moving to MongoDB were not performance, however it has turned out that in many cases, query times are significantly faster than with MySQL. This is because MongoDB stores as much data in RAM as possible and so it becomes as fast as using something like memcached for the cached data. Even non-cached data is very fast.


 

 

選擇MongonDB不是因爲性能問題,但MongoDB的查詢性能也還快,類似有個memcached緩存了數據一樣。

 

另外,

    MongonDB不支持事務。

    適合寫完後馬上讀操作。

    刪除記錄的時候不清理空間,只標記“刪除”,以後可重複利用。

   

 

看完後,感覺MongoDB相對Mysql來說,只能說各有優略吧。

 

 

Comments(提取了一些個人覺得有價值的問題):

 

問:爲什麼不選擇CouchDB?

答:MongonDB的查詢與SQL很類似,CouchDB的KEY/VALUE查詢形式相比複雜,並且mongoDB提供php模塊。

 

問:爲什麼不考慮memcache&hadoop?

答:map/reduce查詢並不是我們需要的。

 

問:爲什麼不考慮SenSage or Vertica?

答:對於一個新興公司來說,商業產品成本太高。

 

問:你需要一個什麼樣的數據複製,有多少節點需要拷貝?Keyspace產品 適合你麼?

答:兩個都是新的產品,我們覺得mongoDB更成熟,另外提供PHP的模塊是一大優勢。

 

問:你有考慮過阻塞對應用的影響嗎?

答:是的,阻塞會導致應用一直等待最終超時。

 

問:爲什麼不考慮TC/TT?

答:一時沒有找到可工作的libs,TC/TT不是設計爲了複雜查詢,僅僅是KV數據庫。

 

問:Hi, did you try other mysql engines besides Myisam before moving to Mongodb?

答:MyISAM was the most suitable for the type of usage we were exeperiencing – many reads and few rights. We used InnoDB (and still do) for the billing and customer systems where we need transactions.

 

 

希望對大家使用MongoDB有所幫助:)

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章