Ceph: mon is down and/or can’t rejoin the quorum

Sometimes, we have seen that a Ceph mon down and could not rejoin the ceph mon quorum, even though that specific ceph mon is up and running (along with ceph-mon process is also up and running).

A quick solution as below:

As one ceph mon is down and out of quorum, then its safe to remove the down mon node from the quorum with below steps:

Pre-requisites: Connect to ceph mon node (or controller node, where down ceph mon is installed and check if its running or not using “ps -ef | grep ceph-mon”. [ If its running with nonresponsive, then stop/kill this process]. The output of this “ps” command should be empty.

Remove the ceph mon data directory:
  # rm -f /var/lib/ceph/mon/ceph-node-x

Create new auth key
    # ceph auth get mon. -o key.txt

Get a copy of mon map (like monmap.bin)
   # ceph mon getmap -o monmap.bin

Inject the down and out of quorum ceph mon into ceph monmap

   # ceph-mon -i node-x --mkfs --inject-monmap map.bin --keyring key.txt

Now, start the ceph mon service
    # start ceph-mon id=node-x
    # systemctl start ceph-mon@id

Remove the monmap.bin and key.txt files.



Ceph: RBD fails to map an image to a block device

When I tried to map an image to block device, I got the below error using ceph “rbd map” command:

#rbd create image01 --size 1024 -p rbdbench
#rbd map image01 -p rbdbench --name client.admin
rbd: sysfs write failed
RBD image feature set mismatch. You can disable features unsupported by the
kernel with "rbd feature disable".
In some cases useful info is found in syslog - try "dmesg | tail" or so.
rbd: map failed: (6) No such device or address

This can be fixed by passing appropriate features enabled either manually or set he default feature set in the Ceph  (/etc/ceph/ceph.conf) configuration.

 To fix the above issue: To resolve this issue, edit

Option#1:  Added the line in ceph config i.e /etc/ceph/ceph.conf file:

rbd default features = 3

Option#2: Alternatively, we can enable the “layering” feature by adding “--image-feature layering" flag, when we create the image. For example:

#rbd create image02 --size 1024 --pool rbdbench --image-feature layering
# rbd map image02 -p rbdbench --name client.admin



Ceph: Whats new in this release

Here is the quick list of new features/functionalities added in ceph releases.

Please find the more details on each release with LTS and its retirement dates  Ceph release

Luminous (RC) (2017) – Release notes

BlueStore backend for ceph-osd is now stable.
There is a new daemon, ceph-mgr, is a required part of any Ceph deployment. 
Multiple active MDS daemons is now considered stable.
The ceph status (ceph -s) command has a fresh look.
S3 bucket lifecycle API has been added.

Kraken (2016) – Release notes

AsyncMessenger supported
BlueStore backend declared as stable
RGW : metadata indexing via Elastic search, index resharding, compression
S3 bucket lifecycle API, RGW Export NFS version 3 throw Ganesha
Rados support overwrites on erasure-coded pools / RBD on erasure coded pool (experimental)

Jewel (2016)  – Release notes

CephFS declared as stable
RGW multi-site re-architected (Allow active/active configuration)
AWS4 compatibility
RBD mirroring supported
BlueStore (experimental)
Support for NFS version 3 has been added to the RGW NFS gateway.

Infernalis (2015) –  Release notes

Erasure coding declared as stable and support many new features
New features for Swift API (Object expiration,…)

Hammer (2015) – Release notes

RGW object versioning, bucket shardingversioning
Crush straw2

Giant (2014) – Release notes

LRC erasure code
CephFS journal recovery, diagnostic toolsCephFS journal recovery, diagnostic tools

Firefly (2014) – Release notes

Erasure coding
Cache tieringtiering
Key/value OSD backendbackend
Standalone radosgw (with civetweb)

Emperor (2013) – Release notes

Multi-datacenter replication for the  radosgw
Improved usability
Improved crc32c performance
Validate S3 tokens against Keystone

Dumpling (2013) – Release notes

Multi-site support for radosgw
RESTful API endpoint for Ceph cluster administration
Object namespaces in librados.




Ceph, Rados GW

Ceph: Running radosgw services

Here I will discuss, how to start and stop the radosgw services, on where the radosgw packages installed.

To starting the radosgw by the using the below command:

#systemctl status  ceph-radosgw@rgw.ceph-001.service
ceph-radosgw@rgw.ceph-001.service - Ceph rados gateway
Loaded: loaded (/lib/systemd/system/ceph-radosgw@.service; enabled; vendor pr
   Active: active (running) since Mon 2017-05-05 17:28:13 IST; 1 weeks 2 days ag
Main PID: 20739 (radosgw)
   CGroup: /system.slice/system-ceph\x2dradosgw.slice/ceph-radosgw@rgw.ceph-001.
         └─20739 /usr/bin/radosgw -f --cluster ceph --name client.rgw.ceph-001

#systemctl stop  ceph-radosgw@rgw.ceph-001.service
ceph-radosgw@rgw.ceph-001.service - Ceph rados gateway 
Loaded: loaded (/lib/systemd/system/ceph-radosgw@.service; enabled; vendor pr 
   Active: inactive (dead) since Wed 2017-05-14 21:37:40 IST; 3s ago
 Process: 20739 ExecStart=/usr/bin/radosgw -f --cluster ${CLUSTER} --name clien
Main PID: 20739 (code=exited, status=0/SUCCESS)
May 14 21:37:40 ceph-005 systemd[1]: Stopping Ceph rados gateway...
May 14 21:37:40 ceph-005 radosgw[20739]: 2017-05-14 21:37:40.233133 7fe1bfc0ca00
May 14 21:37:40 ceph-005 systemd[1]: Stopped Ceph rados gateway.

#systemctl start  ceph-radosgw@rgw.ceph-001.service
#systemctl status  ceph-radosgw@rgw.ceph-001.service
ceph-radosgw@rgw.ceph-005.service - Ceph rados gateway  
Loaded: loaded (/lib/systemd/system/ceph-radosgw@.service; enabled; vendor pr
   Active: active (running) since Wed 2017-05-14 21:37:48 IST; 2s ago
Main PID: 751 (radosgw)
   CGroup: /system.slice/system-ceph\x2dradosgw.slice/ceph-radosgw@rgw.ceph-001.
           └─751 /usr/bin/radosgw -f --cluster ceph --name client.rgw.ceph-001 

Alternatively, you can use below commands also:

 service ceph-radosgw@rgw.ceph-001 start/stop/status




Ceph, Openstack

Cloud nodes – nf_conntrack_max recommendations

It’s recommended to set the nf_conntrack_max as below across all nodes in the cloud environment (for all computes, controller and storage nodes):

Set the nf_conntrack_max to 1048576 (default is 65536)

#sysctl -w net.netfilter.nf_conntrack_max=1048576

#echo 24576 > /sys/module/nf_conntrack/parameters/hashsize

And add the below line to /etc/modprob.conf

options io_conntrack_hashsize=24576

Note: hashsize can be nf_conntrack_max by 4 or 8.

NOTE:  Check the defaults as below:

cat /etc/sysctl.conf | grep nf_conntrack_max
cat  /proc/sys/net/netfilter/nf_conntrack_max 
cat /sys/module/nf_conntrack/parameters/hashsize

Ceph: Reduce the pg number on a pool

Why the pg number required to be reduced?

  • The default pool’s page number may be higher.
  • Ceph cluster usage, recovery and re-balance time negatively impacts with higher PG numbers.

Warning: This process requires a maintenance window of ceph cluster and a can take a significant amount of time of downtime based the data on the pools.

In general, Ceph does not allow decreasing pg/pgp number for a pool


  1. Plan a schedule a maintenance, during which no client should use the pool, which required to decrease the pg number. If needed stop appropriate client services to stop the IO on this pool.
  2. If the pool, contain a significant amount of data, the steps will take a while, because it copies all data for that pool into a new pool, then deletes the old pool.
  3. In the below script, edit the parameters for pool_name and new_pg as per requirements.
#!/bin/bash -x
pool='.users'  // set the pool to be reduced the pg num
pool_new='.users.new'  // new pool used temp
new_pg=64          // Set the new pg number required. For ex, set as 64
ceph osd pool create $pool_new $new_pg
rados cppool $pool  $pool_new
ceph osd pool delete $pool $pool --yes-i-really-really-mean-it
ceph osd pool rename $pool_new $pool

NOTE: Make sure, new pool use same crushrule-set as current pool 

Notes: Please keep the below items before running the above steps.

  1. Do not use this above steps for a pool, which contain snapshots.
  2. Please test this on the staging environment before doing it on production.
  3. Ensure you have enough storage capacity to hold a copy of the biggest pool, you plan to reduce the pg number.   Use “ceph df” to check the free/available raw space.
  4. Make  there are no near full OSDs on the Ceph cluster.they may cause “full” state during pool data copy, which may lead to stop  above the process and may cause an Ceph storage outage. 

Ceph: Health WARN – too many PGs per OSD

Ceph health (or status) reported warning: too many PGs per OSD, how to solve this?

too many PGs per OSD (320 > max 300)

What is this warning means:

The average number PGs in an  (default number is 300)

 => The total number of PGs in all pools / Total number of OSDs,

If the above is more than the default (i.e 300), ceph monitor will report warning.

How to solve/suppress this warning message:

Use injectargs to modify the “mon_pg_warn_max_per_osd to 0”, temporarily,the till
the ceph mon server restart.

# ceph tell mon.* injectargs  "--mon_pg_warn_max_per_osd 0" 

To make the above change persistence, update the ceph.conf with below line and
restart the ceph mons:

mon_pg_warn_max_per_osd = 0