Ceph

Ceph: mon is down and/or can’t rejoin the quorum

Sometimes, we have seen that a Ceph mon down and could not rejoin the ceph mon quorum, even though that specific ceph mon is up and running (along with ceph-mon process is also up and running).

A quick solution as below:

As one ceph mon is down and out of quorum, then its safe to remove the down mon node from the quorum with below steps:

Pre-requisites: Connect to ceph mon node (or controller node, where down ceph mon is installed and check if its running or not using “ps -ef | grep ceph-mon”. [ If its running with nonresponsive, then stop/kill this process]. The output of this “ps” command should be empty.

Remove the ceph mon data directory:
  # rm -f /var/lib/ceph/mon/ceph-node-x

Create new auth key
    # ceph auth get mon. -o key.txt

Get a copy of mon map (like monmap.bin)
   # ceph mon getmap -o monmap.bin

Inject the down and out of quorum ceph mon into ceph monmap

   # ceph-mon -i node-x --mkfs --inject-monmap map.bin --keyring key.txt

Now, start the ceph mon service
    # start ceph-mon id=node-x
        or
    # systemctl start ceph-mon@id

Remove the monmap.bin and key.txt files.

 

Ref: 
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/#recovering-a-monitor-s-broken-monmap
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s