Unexpected Fail over warning on Couchbase servers

This is the “expected” behavior. Let me explain it, with a cluster of 3 nodes and 1 replica.

So you have started with 1 node, so in this case you have only “active documents” (no replica)

Then you add another node, and do a rebalance. Once it is done you have 50% of the active data on each node, and 50% of the replica on each node.

Let’s add a new node again, just to have a more “realistic” cluster of 3 nodes. So the node is added and cluster is rebalanced. This means now you have, as you can guess 33.33% on each node (Active and Replica)

So what you have notice is that the Rebalance is an expensive operation, since the cluster has to move data between all the nodes. (moving active and replicas).

You have a now a well balanced 3 nodes cluster.

Now you stop one node, or one node crashes… this means that some of the data are not accessible (they are still here not available, you do not lose anything).

Here you have 2 options:
– if you restart the server, nothing to do the cluster is back online entirely. (3 nodes cluster well balances)

you do a failover on the node that is off. Let’s explain this in detail.
So what is happening here: Couchbase will do that as fast as possible to be sure all the data are available(read and write). So the only thing that is happening here is: promote the replicas to active (for the keys that were active on the node that is off now)

So what is the status now?
– all the data are accessible in read/write for the application on 2 nodes, so you have 50% of the active data on each node.
– BUT you do not have all the replicas since:
– the replicas that are on the node that is off are “not present”
– the replicas that have been promoted are not present anymore

This is why you see the message “Fail Over Warning: Rebalance required, some data is not currently replicated!” in your console.

Does it make sense?

So to be able to get back in a status that is “balanced” you need to do a rebalance.

Note: when you failover of node, this node is removed from the cluster, and to add it back you need to add it, and rebalanced. (the data that are on this server are just “ignored”)

Hope this clarify the message.

Some pointers about this:
– http://docs.couchbase.com/couchbase-manual-2.2/#couchbase-admin-tasks-failover5
– http://docs.couchbase.com/couchbase-manual-2.2/#couchbase-admin-tasks-failover-addback4

Leave a Reply

Your email address will not be published. Required fields are marked *