Sometimes, we have seen that a Ceph mon down and could not rejoin the ceph mon quorum, even though that specific ceph mon is up and running (along with ceph-mon process is also up and running).
A quick solution as below:
As one ceph mon is down and out of quorum, then its safe to remove the down mon node from the quorum with below steps:
Pre-requisites: Connect to ceph mon node (or controller node, where down ceph mon is installed and check if its running or not using “ps -ef | grep ceph-mon”. [ If its running with nonresponsive, then stop/kill this process]. The output of this “ps” command should be empty.
Remove the ceph mon data directory: # rm -f /var/lib/ceph/mon/ceph-node-x Create new auth key # ceph auth get mon. -o key.txt Get a copy of mon map (like monmap.bin) # ceph mon getmap -o monmap.bin Inject the down and out of quorum ceph mon into ceph monmap # ceph-mon -i node-x --mkfs --inject-monmap map.bin --keyring key.txt Now, start the ceph mon service # start ceph-mon id=node-x or # systemctl start ceph-mon@id Remove the monmap.bin and key.txt files.
Ref: http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/#recovering-a-monitor-s-broken-monmap