MongoDB - Replication and Sharding
Scaling your Mongo instance is
frequently needed and there are various methods you can use to improve
scalability on your system which includes sharding and replication. Each
of these allows you to spread your database across different servers but
they work differently.
Replication :
Replication creates multiple
servers, all containing your entire database, and this can be used to
maximize the uptime of your database.
Replication, on the other
hand, is useful for uptime and failover. Having your full database on
multiple systems makes it possible for Mongo to handle system issues by
moving traffic to the other systems.
While Mongo used to have the
master-slave replication you may have used in other systems, the current
recommended model is to create a replica set where a group of nodes are
configured to synchronize their data and failover automatically if a node
goes down. One additional advantage to having replicas is that when you
need to perform expensive operations, like backups or extensive write
operations, you can do this on one of the other nodes knowing that the
data will be sent to all of the other nodes automatically. You can also
use replicas to allow for more database reads by spreading the requests
across all of the nodes.
The replica set chooses a
primary node but if something happens to that node a secondary
node is elected by the remaining nodes to primary and, when the original
primary comes back online, it is set up as a secondary node. This is all
done transparently by the database without you having to handle it at
all. If you only have a couple of replicas or the same number in
multiple locations, you can create an arbiter node which simply
provides an additional vote when determining the new primary
node. Both techniques allow you to make your MongoDB instance more
reliable and performant and I encourage you to learn more about
them as you move forward with your database exploration.
Check the below diagram for
replica set:
Sharding :
Sharding splits the data
across multiple servers so you can combine smaller systems to host a more
extensive database.
It allows you to partition
your database onto multiple servers allowing for more storage and a
greater capacity for read/write operations. Multiple CPUs can handle
an increased load and the read/write operations to your database will
be shared across the systems.
Usage:
For a very large dataset or a
system with a high volume of traffic, sharding can be an excellent
solution particularly in situations where the database is running on
virtualized systems with limited resources.
This can allow you to scale without having to invest in larger
servers. The database interface doesn't change at all. Drivers in the
Mongo shell can interact with the full system just as if it was were a
single server.
Check the below skeleton for sharding: