Ceph: How to add the SSD journal with dm-crypt enabled

Here am sharing the steps for adding a journal with dm-crypt enabled.


Create a key using the journal partition’s uuid das below;

dd  bs=key-size count=1 if=/dev/urandom of=/etc/ceph/dmcrypt-keys/<journal_partition uuid>

For ex:

  dd bs=512 count=1 if=/dev/urnadom of=/etc/ceph/dmcrypt-keys/uuid
How to find the jounral partition uuid:
 #ls -l /dev/disk/by-partuuid/

Now, create use the below command to do dm-crypted partition:

  1. cryptsetup -v –cipher aes-xts-plain64 –key-size 512 –key-file  /etc/ceph/dmcrypt-keys/uuid luksFomat  <journal partition>
cryptsetup -v --cipher aes-xts-plain64 --key-size 512 --key-file  /etc/ceph/dmcrypt-keys/uuid luksFomat  /dev/sdc5

2.  cryptsetup -v open –type luks <jounral partition> <uuid>

#cryptsetup -v open --type liks /dev/sdc5   uuid

Now, stop the OSD and flush the current journal.

#systemctl stop ceph-osd@<id>
#ceph-osd --flush-journal -i <id>

Now, go to /var/lib/ceh/osd/ceph-<id>/   and move the current journal and journal_dmcrypt to old as below:

# mv jounral jounral.old
# mv journal_dmcrypt jounral_dmcrypt.old

Now create soft links for the newly created journal and jounral_dmcrypt files as below:

# ln -s /dev/disk/by-partuuid/<uuid> ./journal_dmcrypt
# ln -s /dev/mapper/<uuid>  ./jounral

NOTE: the above files, permission may be required to change with ceph:ceph as below:

 # chown ceph:ceph journal jounral_dmcrypt

Now, create osd journal as below:

# ceph-osd --mkjournal -i <id>

Now, start the osd as below:

#systemclt start ceph-osd@<id>

OSD should be up and in with new journal.




Ceph: mon is down and/or can’t rejoin the quorum

Sometimes, we have seen that a Ceph mon down and could not rejoin the ceph mon quorum, even though that specific ceph mon is up and running (along with ceph-mon process is also up and running).

A quick solution as below:

As one ceph mon is down and out of quorum, then its safe to remove the down mon node from the quorum with below steps:

Pre-requisites: Connect to ceph mon node (or controller node, where down ceph mon is installed and check if its running or not using “ps -ef | grep ceph-mon”. [ If its running with nonresponsive, then stop/kill this process]. The output of this “ps” command should be empty.

Remove the ceph mon data directory:
  # rm -f /var/lib/ceph/mon/ceph-node-x

Create new auth key
    # ceph auth get mon. -o key.txt

Get a copy of mon map (like monmap.bin)
   # ceph mon getmap -o monmap.bin

Inject the down and out of quorum ceph mon into ceph monmap

   # ceph-mon -i node-x --mkfs --inject-monmap map.bin --keyring key.txt

Now, start the ceph mon service
    # start ceph-mon id=node-x
    # systemctl start ceph-mon@id

Remove the monmap.bin and key.txt files.



Ceph: RBD fails to map an image to a block device

When I tried to map an image to block device, I got the below error using ceph “rbd map” command:

#rbd create image01 --size 1024 -p rbdbench
#rbd map image01 -p rbdbench --name client.admin
rbd: sysfs write failed
RBD image feature set mismatch. You can disable features unsupported by the
kernel with "rbd feature disable".
In some cases useful info is found in syslog - try "dmesg | tail" or so.
rbd: map failed: (6) No such device or address

This can be fixed by passing appropriate features enabled either manually or set he default feature set in the Ceph  (/etc/ceph/ceph.conf) configuration.

 To fix the above issue: To resolve this issue, edit

Option#1:  Added the line in ceph config i.e /etc/ceph/ceph.conf file:

rbd default features = 3

Option#2: Alternatively, we can enable the “layering” feature by adding “--image-feature layering" flag, when we create the image. For example:

#rbd create image02 --size 1024 --pool rbdbench --image-feature layering
# rbd map image02 -p rbdbench --name client.admin



Ceph: Increase OSD start timeout

For encrypted OSDs, need to increase OSD start timeout from default value “300” to “900”, to start the OSD. How can we do this? just follow the below 2 steps, which can be achieved:

#cp /lib/systemd/system/ceph-disk@.service /etc/systemd/system/
#sed -i "s/CEPH_DISK_TIMEOUT=300/CEPH_DISK_TIMEOUT=900/" /etc/systemd/system/ceph-disk@.service

Now restart the OSDs:

# systemctl restart ceph-osd\*.service ceph-osd.target

Ceph: Whats new in this release

Here is the quick list of new features/functionalities added in ceph releases.

Please find the more details on each release with LTS and its retirement dates  Ceph release

Luminous (RC) (2017) – Release notes

BlueStore backend for ceph-osd is now stable.
There is a new daemon, ceph-mgr, is a required part of any Ceph deployment. 
Multiple active MDS daemons is now considered stable.
The ceph status (ceph -s) command has a fresh look.
S3 bucket lifecycle API has been added.

Kraken (2016) – Release notes

AsyncMessenger supported
BlueStore backend declared as stable
RGW : metadata indexing via Elastic search, index resharding, compression
S3 bucket lifecycle API, RGW Export NFS version 3 throw Ganesha
Rados support overwrites on erasure-coded pools / RBD on erasure coded pool (experimental)

Jewel (2016)  – Release notes

CephFS declared as stable
RGW multi-site re-architected (Allow active/active configuration)
AWS4 compatibility
RBD mirroring supported
BlueStore (experimental)
Support for NFS version 3 has been added to the RGW NFS gateway.

Infernalis (2015) –  Release notes

Erasure coding declared as stable and support many new features
New features for Swift API (Object expiration,…)

Hammer (2015) – Release notes

RGW object versioning, bucket shardingversioning
Crush straw2

Giant (2014) – Release notes

LRC erasure code
CephFS journal recovery, diagnostic toolsCephFS journal recovery, diagnostic tools

Firefly (2014) – Release notes

Erasure coding
Cache tieringtiering
Key/value OSD backendbackend
Standalone radosgw (with civetweb)

Emperor (2013) – Release notes

Multi-datacenter replication for the  radosgw
Improved usability
Improved crc32c performance
Validate S3 tokens against Keystone

Dumpling (2013) – Release notes

Multi-site support for radosgw
RESTful API endpoint for Ceph cluster administration
Object namespaces in librados.




Ceph, Rados GW

Ceph: Running radosgw services

Here I will discuss, how to start and stop the radosgw services, on where the radosgw packages installed.

To starting the radosgw by the using the below command:

#systemctl status  ceph-radosgw@rgw.ceph-001.service
ceph-radosgw@rgw.ceph-001.service - Ceph rados gateway
Loaded: loaded (/lib/systemd/system/ceph-radosgw@.service; enabled; vendor pr
   Active: active (running) since Mon 2017-05-05 17:28:13 IST; 1 weeks 2 days ag
Main PID: 20739 (radosgw)
   CGroup: /system.slice/system-ceph\x2dradosgw.slice/ceph-radosgw@rgw.ceph-001.
         └─20739 /usr/bin/radosgw -f --cluster ceph --name client.rgw.ceph-001

#systemctl stop  ceph-radosgw@rgw.ceph-001.service
ceph-radosgw@rgw.ceph-001.service - Ceph rados gateway 
Loaded: loaded (/lib/systemd/system/ceph-radosgw@.service; enabled; vendor pr 
   Active: inactive (dead) since Wed 2017-05-14 21:37:40 IST; 3s ago
 Process: 20739 ExecStart=/usr/bin/radosgw -f --cluster ${CLUSTER} --name clien
Main PID: 20739 (code=exited, status=0/SUCCESS)
May 14 21:37:40 ceph-005 systemd[1]: Stopping Ceph rados gateway...
May 14 21:37:40 ceph-005 radosgw[20739]: 2017-05-14 21:37:40.233133 7fe1bfc0ca00
May 14 21:37:40 ceph-005 systemd[1]: Stopped Ceph rados gateway.

#systemctl start  ceph-radosgw@rgw.ceph-001.service
#systemctl status  ceph-radosgw@rgw.ceph-001.service
ceph-radosgw@rgw.ceph-005.service - Ceph rados gateway  
Loaded: loaded (/lib/systemd/system/ceph-radosgw@.service; enabled; vendor pr
   Active: active (running) since Wed 2017-05-14 21:37:48 IST; 2s ago
Main PID: 751 (radosgw)
   CGroup: /system.slice/system-ceph\x2dradosgw.slice/ceph-radosgw@rgw.ceph-001.
           └─751 /usr/bin/radosgw -f --cluster ceph --name client.rgw.ceph-001 

Alternatively, you can use below commands also:

 service ceph-radosgw@rgw.ceph-001 start/stop/status




Ceph, Openstack

Cloud nodes – nf_conntrack_max recommendations

It’s recommended to set the nf_conntrack_max as below across all nodes in the cloud environment (for all computes, controller and storage nodes):

Set the nf_conntrack_max to 1048576 (default is 65536)

#sysctl -w net.netfilter.nf_conntrack_max=1048576

#echo 24576 > /sys/module/nf_conntrack/parameters/hashsize

And add the below line to /etc/modprob.conf

options io_conntrack_hashsize=24576

Note: hashsize can be nf_conntrack_max by 4 or 8.

NOTE:  Check the defaults as below:

cat /etc/sysctl.conf | grep nf_conntrack_max
cat  /proc/sys/net/netfilter/nf_conntrack_max 
cat /sys/module/nf_conntrack/parameters/hashsize