SolrCloud wiki翻譯(6)近實時搜索, 索引複製,災難恢復

原文出自:http://my.oschina.net/zengjie/blog/203876

SolrCloud and Replication

SolrCloud與索引複製

Replication ensures redundancy for your data, and enables you to send an update request to any node in the shard.  If that node is a replica, it will forward the request to the leader, which then forwards it to all existing replicas, using versioning to make sure every replica has the most up-to-date version.  This architecture enables you to be certain that your data can be recovered in the event of a disaster, even if you are using Near Real Time searching.

索引複製確保爲你的數據提供了冗餘,並且你可以把一個更新請求發送到shard裏面的任意一個節點。如果收到請求的節點是replica節點,它會把請求轉發給leader節點,然後leader節點會把這個請求轉發到所有存活的replica節點上去,他們通過使用版本控制來確保每個replica節點的數據都是最新的版本。SolrCloud的這種結構讓數據能夠在一個災難事故之後恢復,即便你正在使用的是一個近實時的搜索系統。

Near Real Time Searching

近實時搜索

If you want to use the NearRealtimeSearch support, enable auto soft commits in your solrconfig.xml file before storing it into Zookeeper. Otherwise you can send explicit soft commits to the cluster as you need.

如果你想要獲得近實時搜索的支持,在solrconfig.xml放到ZooKeeper之前打開索引自動softCommit的特性。另外如果你需要的話可以明確的發送一個softCommit請求給集羣。

SolrCloud doesn't work very well with separated data clusters connected by an expensive pipe. The root problem is that SolrCloud's architecture sends documents to all the nodes in the cluster (on a per-shard basis), and that architecture is really dictated by the NRT functionality.

如果你的數據分佈在一個節點之間傳輸數據代價非常高的集羣中,那麼SolrCloud可能不會運行的很好。其根本原因是因爲SolrCloud的架構會把文檔發送給集羣中的所有節點(會在每個shard的節點之間發送),而這種架構實際上是基於近實時功能的。

Imagine that you have a set of servers in China and one in the US that are aware of each other. Assuming 5 replicas, a single update to a shard may make multiple trips over the expensive pipe before it's all done, probably slowing indexing speed unacceptably.

想象一下你有一系列的服務器是在放在中國,還有一些放在美國,並且它們都知道彼此的存在。假設有5個replica節點,一個發送給shard的單獨請求在完成之前可能在高代價的連接上傳輸多次,很可能把索引速度拖慢到一個不可接受的程度。

So the SolrCloud recommendation for this situation is to maintain these clusters separately; nodes in China don't even know that nodes exist in the US and vice-versa.  When indexing, you send the update request to one node in the US and one in China and all the node-routing after that is local to the separate clusters. Requests can go to any node in either country and maintain a consistent view of the data.

因此SolrCloud對這種情況的建議是把這些集羣分開維護;放在中國的節點不用知道放在美國的節點的存在,反之亦然。當索引的時候,你把更新請求發送到一個放在美國的節點同時也發送到一個放在中國的節點,然後發送之後兩個分開的集羣之間的節點路由都是在各自集羣本地進行的。

However, if your US cluster goes down, you have to re-synchronize the down cluster with up-to-date information from China. The process requires you to replicate the index from China to the repaired US installation and then get everything back up and working.

然而,如果你在美國的集羣宕機了,你必須將最新的數據相關信息從中國的機器上重新同步到美國的集羣中。這個處理需要你把索引從中國的機器上拷貝到美國的集羣中,然後備份好數據就可以繼續正常工作了。

Disaster Recovery for an NRT system

近實時系統的災難恢復

Use of Near Real Time (NRT) searching affects the way that systems using SolrCloud behave during disaster recovery.

使用近實時搜索會影響使用SolrCloud的系統在災難恢復時候的行爲方式。

The procedure outlined below assumes that you are maintaining separate clusters, as described above.   Consider, for example, an event in which the US cluster goes down (say, because of a hurricane), but the China cluster is intact.  Disaster recovery consists of creating the new system and letting the intact cluster create a replicate for each shard on it, then promoting those replicas to be leaders of the newly created US cluster.

下面所述的處理過程是假設你正在維護一個分開的集羣,跟上面所述的情況一樣。考慮到如下這個例子,在美國的集羣出現了宕機事件(可以說是因爲一場颶風),但是在中國的集羣卻是完好無損的。災難恢復由以下流程構成,首先創建一個新的系統並且讓完整的集羣在這個系統裏面爲每一個shard都創建一個replica節點,然後把這些replica節點全部晉升爲新創建的美國集羣裏面的leader節點。

Here are the steps to take:

如下是需要進行的步驟:

  1. Take the downed system offline to all end users.
  2. Take the indexing process offline.
  3. Repair the system.
  4. Bring up one machine per shard in the repaired system as part of the ZooKeeper cluster on the good system, and wait for replication to happen, creating a replica on that machine.  (SoftCommits will not be repeated, but data will be pulled from the transaction logs if necessary.)
    Icon

    SolrCloud will automatically use old-style replication for the bulk load. By temporarily having only one replica, you'll minimize data transfer across a slow connection.

  5. Bring the machines of the repaired cluster down, and reconfigure them to be a separate Zookeeper cluster again, optionally adding more replicas for each shard.
  6. Make the repaired system visible to end users again.
  7. Start the indexing program again, delivering updates to both systems.
  1. 使宕機的系統對所有用戶來說都變成離線狀態。
  2. 停止提供索引處理服務。
  3. 修復系統。
  4. 從待修復系統中每個shard拿一臺機器加入到沒有問題的系統中作爲ZooKeeper集羣的一部分,然後等待索引複製的開始,在每臺機器上都會創建一個replica節點。(軟提交不會被複制,但是如果有必要的話會從事務日誌中拉取相關的數據)SolrCloud會自動的使用舊的主從方式來進行索引的批量加載。由於你只是臨時的創建一個replica節點,所以通過慢速的連接傳輸的數據會減少到最少。
  5. 把這些本來是待修復集羣中的機器停掉,然後把它們重新配置成一個分開的ZooKeeper集羣,可以爲每個shard添加更多的replica節點。
  6. 讓修復的系統對所有終端用戶可見。
  7. 重新啓動索引程序,把更新請求同時分發給兩個系統。
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章