Shard Architechure

Architectural Overview

 

Shards

In a production situation, each shard will consist of multiple servers to ensure availability and automated failover.(Shardi中的3個均相同replica爲了HA)

Shard Keys

To partition a collection, we specify a shard key pattern. it names one or more fields to define the key upon which we distribute data.  Some example shard key patterns include the following:

{ state : 1 }
{ name : 1 }
{ _id : 1 }
{ lastname : 1, firstname : 1 }
{ tag : 1, timestamp : -1 }

MongoDB's sharding is order-preserving; adjacent data by shard key tend to be on the same serverThe config database stores all the metadata indicating the location of data by range:



Chunks

Chunks grow to a maximum size, usually 64MB.  Once a chunk has reached that approximate size, the chunk splits into two new chunks.  When a particular shard has excess data, chunks will then migrate to other shards in the system The addition of a new shard will also influence the migration of chunks.

When choosing a shard key, the values must be of high enough cardinality (granular enough) that data can be broken into many chunks, and thus distribute-able. (只是建議而已)

If it is possible that a single value within the shard key range might grow exceptionally large, it is best to use a compound shard key instead so that further discrimination of the values will be possible.

Config DB Processes

The config servers store the cluster's metadata

 Note that config server use their own replication model; they are not run in as a replica set.

If any of the config servers is down, the cluster's meta-data goes read only. However, even in such a failure state, the MongoDB cluster can still be read from and written to.???

Routing Processes (mongos)

The mongos process can be thought of as a routing and coordination process that makes the various components of the cluster look like a single system.  When receiving client requests, the mongos process routes the request to the appropriate server(s) and merges any results to be sent back to the client.

mongos processes have no persistent state; rather, they pull their state from the config server on startup. Any changes that occur on the the config servers are propagated to each mongos process.

mongos processes can run on any server desired. They may be run on the shard servers themselves, but are lightweight enough to exist on each application server. There are no limits on the number of mongos processes that can be run simultaneously since these processes do not coordinate between one another.

Operation Types

For targeted operations, mongos communicates with a very small number of shards -- often a single shard.  Such targeted operations are quite efficient.//有目標的尋找

Global operations involve the mongos process reaching out to all (or most) shards in the system.

The following table shows various operations and their type.  For the examples below, assume a shard key of { x : 1 }.

Operation Type 
Comments 
db.foo.find( { x : 300 } ) 
Targeted 
Queries a single shard. 
db.foo.find( { x : 300, age : 40 } ) Targeted Queries a single shard. 
db.foo.find( { age : 40 } ) 
Global Queries all shards. 
db.foo.find() 
Global sequential 
db.foo.find(...).count() 
Variable Same as the corresponding find() operation
db.foo.find(...).sort( { age : 1 } ) 
Global parallel
db.foo.find(...).sort( { x : 1 } ) 
Global sequential
db.foo.count() 
Global parallel
db.foo.insert( <object> ) 
Targeted  
db.foo.update( { x : 100 }, <object> ) 
db.foo.remove( { x : 100 } ) 
Targeted  
db.foo.update( { age : 40 }, <object> ) 
db.foo.remove( { age : 40 } ) 
Global 
 
db.getLastError() 
   
db.foo.ensureIndex(...) 
Global  

Server Layout

 the load is almost certainly low on the config servers.  Here is an example where some sharing of physical machines is used to lay out a cluster. The outer boxes are machines (or VMs) and the inner boxes are processes.

In the picture about a given connection to the database simply connects to a random mongos. mongos is generally very fast so perfect balancing of those connections is not essential. Additionally the implementation of a driver could be intelligent about balancing these connections (but most are not at the time of this writing).

Yet more configurations are imaginable, especially when it comes to mongos. Alternatively, as suggested earlier, the mongosprocesses can exists on each application server. There is some potential benefit to this configuration, as the communications between app server and mongos then can occur over the localhost interface.

Exactly three config server processes are used in almost all sharded mongo clusters. This provides sufficient data safety; more instances would increase coordination cost among the config servers.

Configuration

Sharding Components

First, start the individual shards (mongod's), config servers, and mongos processes.

Shard Servers

To get started with a simple test, we recommend running a single mongod process per shard(簡單的配置先配一個)

Config Servers

Run mongod on the config server(s) with the --configsvr command line parameter. 

 --configsvr           declare this is a config db of a cluster; default port 
                        27019; default dir /data/configdb

Note: Replicating data to each config server is managed by the router (mongos); they have a synchronous replication protocol optimized for three machines, if you were wondering why that number. 1-3個

mongos Router

Run mongos on the servers of your choice.  Specify the --configdb parameter to indicate location of the config database(s). Note: use dns names, not ip addresses, for the --configdb parameter's value. Otherwise moving config servers later is difficult.

Configuring the Shard Cluster

Start by connecting to one of the mongos processes, and then switch to the admin database before issuing any commands.

The mongos will route commands to the right machine(s) in the cluster and, if commands change metadata, the mongos will update that on the config servers. So, regardless of the number of mongos processes you've launched, you'll only need run these commands on one of those processes.

You can connect to the admin database via mongos like so:

./mongo <mongos-hostname>:<mongos-port>/admin
> db //db用於輸出當前的數據庫
admin
Adding shards

You must explicitly add each shard to the cluster's configuration using the addshard command:

> db.runCommand( { addshard : "<serverhostname>[:<port>]" } );
{"ok" : 1 , "added" : ...}

Run this command once for each shard in the cluster.

If the individual shards consist of replica sets, they can be added by specifying replicaSetName/<serverhostname>[:port][,serverhostname2[:port],...], where at least one server in the replica set is given.

> db.runCommand( { addshard : "foo/<serverhostname>[:<port>]" } );
{"ok" : 1 , "added" : "foo"}

Any databases and collections that existed already in the mongod/replica set will be incorporated to the cluster. The databases will have as the "primary" host that mongod/replica set and the collections will not be sharded (but you can do so later by issuing ashardCollection command).//剛加進來的shard原來db中的內容並不變,需要執行shardCollection 命令纔會進行切片

Optional Parameters

name
Each shard has a name, which can be specified using the name option. If no name is given, one will be assigned automatically.

maxSize
The addshard command accepts an optional maxSize parameter.  This parameter lets you tell the system a maximum amount of disk space in megabytes to use on the specified shard. 

As an example:

> db.runCommand( { addshard : "sf103", maxSize:100000/*MB*/ } );
Listing shards

To see current set of configured shards, run the listshards command:

> db.runCommand( { listshards : 1 } );

This way, you can verify that all the shard have been committed to the system.

Removing a shard

See the removeshard command.

Enabling Sharding on a Database

Once you've added one or more shards, you can enable sharding on a database. Unless enabled, all data in the database will be stored on the same shard. After enabling you then need to run shardCollection on the relevant collections (i.e., the big ones).

> db.runCommand( { enablesharding : "<dbname>" } );

Once enabled, mongos will place new collections on the primary shard for that database. Existing collections within the database will stay on the original shard. To enable partitioning of data, we have to shard an individual collection.??什麼是primary shard

Sharding a Collection

When sharding a collection, "pre-splitting", that is, setting a seed set of key ranges, is recommended. Without a seed set of ranges, sharding works, however the system must learn the key distribution and this will take some time; during this time performance is not as high. The presplits do not have to be particularly accurate; the system will adapt to the actual key distribution of the data regardless.

Use the shardcollection command to shard a collection. When you shard a collection, you must specify the shard key. If there is data in the collection, mongo will require an index to be created upfront (it speeds up the chunking process); otherwise, an index will be automatically created for you.

 
> db.runCommand( { shardcollection : "<namespace>",
                   key : <shardkeypatternobject> });
Running the "shardcollection" command will mark the collection as sharded with a specific key. Once called, there is currently no way to disable sharding or change the shard key, even if all the data is still contained within the same shard. It is assumed that the data may already be spread around the shards. If you need to "unshard" a collection, drop it (of course making a backup of data if needed), and recreate the collection (loading the backup data).

For example, let's assume we want to shard a GridFS chunks collection stored in the test database. We'd want to shard on thefiles_id key, so we'd invoke the shardcollection command like so:

 > db.runCommand( { shardcollection : "test.fs.chunks", key : { files_id : 1 } } )
{ "collectionsharded" : "mydb.fs.chunks", "ok" : 1 }

You can use the {unique: true} option to ensure that the underlying index enforces uniqueness so long as the unique index is a prefix of the shard key. (note: prior to version 2.0 this worked only if the collection is empty).

db.runCommand( { shardcollection : "test.users" , key : { email : 1 } , unique : true } );

If the "unique: true" option is not used, the shard key does not have to be unique.

db.runCommand( { shardcollection : "test.products" , key : { category : 1, _id : 1 } } );

You can shard on multiple fields if you are using a compound index.

In the end, picking the right shard key for your needs is extremely important for successful sharding. Choosing a Shard Key.


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章