128MBHere is a quick way to increase the TCMALLOC_THREAD_CHACHE BYTES for Ceph OSD:(These steps give based on ubuntu  OS specific -default is 32M)

Step#1: Add the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES to 128MB to /etc/default/ceph file as below:

# /etc/default/ceph
# Environment file for ceph daemon systemd unit files.

# Increase tcmalloc cache size to 128MB


Now, export the “TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES”  in /etc/init/ceph-osd.conf as below:

    test -f /etc/default/ceph && . /etc/default/ceph
    exec /usr/bin/ceph-osd --cluster="${cluster:-ceph}" -i "$id" -f --setuser ceph --setgroup ceph
end script

Now, restart the ceph OSDs to see the tcmalloc thread cache increase.


1.Check the CPU % ussage for tcmalloc using “perf top” command on ceph osd nodes.

2.If the CPU% usage above 20%, then the above thread cache increase will help.



Ceph: Check OSDs version

On ceph clsuter, how to we check the all OSDs versions?

Just use the below command to see the OSD’s version:

$ ceph tell osd.* version

$ceph tell osd.* version
osd.0: {
    "version": "ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90)"
osd.4: {
    "version": "ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90)"


Ceph: Deep scrub distribution per week

In Ceph cluster, deep scrubbing (by-default weekly), fully reading all data – means read the data and use checksums to ensure data integrity.

Use the “ceph -s” command,  we can see,  if  the how many deep scrub(s) running.

But there is not simple command to see, how the deep scrub distributed across the week time frame:

Use the below command – to get the deep-scrub distribution per a week:

$ for date in `ceph pg dump | grep active | awk '{print $20}'`; do date +%A -d $date; done | sort | uniq -c
  dumped all in format plain  
   43010 Monday
    2149 Sunday   
  16509 Tuesday

From the above output, we can see that,  how many deep-scrubs performed on a particular day.

Use the below command – to get the deep-scrub distribution per hours:

$ for date in `ceph pg dump | grep active | awk '{print $21}'`; do date +%H -d $date; done | sort | uniq -c
  dumped all in format plain
     61668 00

From the above output, we can see that,  how many deep-scrubs performed on a particular hour (start 00 hour to 23 hour).

Note:  By default, deep-scrub will done weekly once. This can be changed by updating the below ceph configuration flag:

osd_deep_scrub_interval = 604800   // Once in a week = 60*60*24*7

Can change the above using the injectargs on all OSDs as below:

$ceph tell osd.* injectargs 'osd_deep_scrub_interval 1209600'
Above command sets the deep-scrub to 2 weeks once.
NOTE: Keep this setting persistent, update the ceph.conf file with below line
in [osd] section:
osd_deep_scrub_interval = 1209600

Ceph:How to do PG scrub and deep-scrub

How to run the scrubbing and deep scrubbing operations on a PG?

Use the below commands to do the scrub and deep-scrub operations on a PG:

# ceph pg scrub <pg_id>         // For doing the scrub on a PG
# ceph pg deep-scrub <pg_id>    // For doing the deep-scrub on a PG

For ex:
# ceph pg scrub 8.ff7
instructing pg 8.ff7 on osd.1 to scrub

# ceph pg deep-scrub 8.ff7
instructing pg 8.ff7 on osd.1 to deep-scrub

Ceph: Reducing OSD scrub IO priority

Note: The below tip will work with ceph version >= 0.80.8.

Ceph cluster  is busy with scrubbing  operations and it impact the client’s performance, then we would like to like to reduce the scrubbing IO priority.

By default, the disk I/O of a   Ceph OSD thread scrubbing is the same as all other threads. It can be reduced with thread_ioprio configurations for all OSDs.


Check  the current value  of osd_disk_thread_ioprio_class and osd_disk_thead_ioproio_priority:

# ceph daemon osd.1 config get osd_disk_thread_ioprio_class
{ "osd_disk_thread_ioprio_class": ""}
# ceph daemon osd.1 config get osd_disk_thread_ioprio_priority
{ "osd_disk_thread_ioprio_priority": "-1"}

Set the above config values as below for all OSDs (and also update these values in ceph.conf files also as  “osd_disk_thread_ioprio_priority=7” and “osd_disk_thread_ioprio_class idle” to make these changes persistent):

ceph tell osd.* injectargs '--osd_disk_thread_ioprio_priority 7'
ceph tell osd.* injectargs '--osd_disk_thread_ioprio_class idle'

The  changes work will only with Linux kernel  CFQ scheduler.
Use the below command to change the scheduler to [cfq] mode.

# cat /sys/block/sda/queue/scheduler
noop [deadline] cfq
# echo cfq | tee /sys/block/sda/queue/scheduler

NOTE: The above setting should reduce OSD scrubbing priority and  will be useful to slow down scrubbing on an OSD, which is busy with user operations. Once all the scrub operations are back to normal state,  it’s a recommended to revert back the changes.



Ceph: OSD benchmark

How to we check the ceph osds raw IO performance?

Use “ceph tell” to see how well it performs by running a simple throughput benchmark. By default, the test writes 1 GB in total in 4-MB chunks.

Ceph OSD IO results:

$ ceph tell osd.0 bench
osd.0: { "bytes_written": 1073741824,
  "blocksize": 4194304,
  "bytes_per_sec": "545153639"}

NOTE: To run the osd bench for all osds:
$ ceph tell osd.* bench

To display the osd bench output in plain test format, use the below command:
$ ceph tell osd.0 bench -f plain
osd.0: bench: wrote 1024 MB in blocks of 4096 kB in 2.834567 sec at 374 MB/sec

Note: The above osd benchmark uses the journal – mean all write go through the journal, which was configured in the cluster OSDs.