Sharding Mongo DB

Mongo DB is one of the most widely used NoSql database available in the industry today.

Not only does it scale and can handle huge amounts of data , it also offer awesome out of the box features which are simply amazing  when you are starting out.

One such feature is out of the box database sharding.

What is database sharding?

sharded-collection

Database sharding refers to the practice of breaking up your database into smaller pieces for the obvious added performance boost.

You break up your data set into a smaller set, and each set lives on a separate server. On the face of it there are 2 advantages:

  1. The amount of processing that you have to do get a certain amount of data is much less compared to let’s say when the all the data was living on a single server.
  2. You can make do with a smaller server compared to a larger server.

For cash striped startups and growing companies which generate huge amounts of data these things can me survival or not.

Sharding: No SQL vs SQL

Sharding an SQL database is much more difficult compared to a No SQL, for the simple reason that the a SQL database has relationships and foreign key relations to maintain, however No SQL databases have no such complications.

However having said that it is very much possible to shard and maintaining SQL databases which are sharded. It’s just more painful, in comparison to No SQL databases.

So for the sake of simplicity we shall only go through how to actually run mongo in a sharded cluster mode.

 

Mongo Sharded architecture

sharded-cluster

 

A mongo cluster consists of the following parts:

  1. A router or a query router : These are the query routing servers. They route queries to various servers and figure out where which shard stores which data from the data cache on the config server. Depending on the required data, they pass on the query to the relevant shard and get back the data.
  2. Config servers: These are configuration databases which store metadata about the data stored in different shards.
  3. Shards: The actual database shards which contain the actual data.

 

All the above can be multiple in number and can live on different machines, this even though the load on your cluster might be a lot, the load on a single server never really becomes very large.

Key based sharding

You can shard a database basis a particular key in mongo.

Diagram of the shard key value space segmented into smaller ranges or chunks.

You can shard a key in mongo and the data is different shards can divided basis the key space for that particular key.

So in the above figure you have a key x and basis the value of the key, the data is divided into 4 chunks. These chunks are actually equal to the number of shards that are available in the database, and the config servers essentially store what range of keys is stored in which shard.

So in a very general way the following data will be stored by a config server operating for the above figure

Chunk Name                                  Key Range

Chunk 1(Shard1)                             minkey to -75

Chunk 2(Shard2)                             -75 to 25

Chunk 3(Shard3)                             25 to 175

Chunk 4(Shard4)                              175 to maxkey

Setup a mongo cluster on you local

Note: This assumes you have installed mongo on your system

1. Start the config server

           sudo mongod –configsvr –dbpath /var/lib/clusterdir/c0/ –port 27019

This runs the config server which stores all the chunk metadata data.

2. Start the query router

          mongos -configdb localhost:27019 –port 27010

This starts a query server which is going to be connected to the . After running this command you should be able to see some output in the config server console. It just says your query router has connected to your config server.

3. Run the shards

For this make a directory in your home folder and name them d1, d2 and d3. After that you can start the shards as follows

          sudo mongod –port 27012 –dbpath d1

sudo mongod –port 27013 –dbpath d2

sudo mongod –port 27014 –dbpath d3

The above three commands should start all the three shards on your local system. Now all that remains is connecting all these together.

4. Setup the query server

login to the query server using the following command

         mongo –port 27011 –host localhost

and the shards created in the previous steps to the database. You can do this by the following commands:

         sh.addShard(“localhost:27012”)

         sh.addShard(“localhost:27013”)

         sh.addShard(“localhost:27014”)

and create a database by the following command:

        use sharding_test

this will create a new database for you.

5. Enable sharding for the database.

        sh.enableSharding(‘sharding_test’)

this command tells mongo to enable support for multiple shards for the database as such.

6.  Create collection and a hashed key to the collection.

       db.createCollection(‘test_collection’)

now lets create a hashed index on the key called ‘sharding_key’. This is the key basis which we the database will actually split the collection over multiple number of shards.

       db.test_collection.ensureIndex({sharding_key: “hashed”})

This creates an hashed index over the key sharding_key and now we are ready to actually shard our collection basis this.

       sh.shardCollection(“sharding_test.test_collection”,{“sharding_key”: “hashed”})

Your collection and database both are sharded. You can in theory add as many number of shards as you want to and ideally scale to Terabytes of data.

You can now start adding data to your collection, however please ensure that you always add a sharding_key and watch the data populate into several shards.

To actually get a stat about how is your data scaling, you can run the following command to see the data density accross different data shards.

        db.test_collection.getShardDistribution()

Conclusion

Mongo offers a wide range of services and features right out of the box and should be definitely explored .

Do leave a comment and tell me what you think!

 

 

I am a developer and tech enthusiast based out of New Delhi, India. I love python , have a love-hate relationship with Javascript, and feel that filing out tax forms are easier than Java.

Leave a Reply