Ceph: Health WARN – too many PGs per OSD

Ceph health (or status) reported warning: too many PGs per OSD, how to solve this?

health HEALTH_WARN
too many PGs per OSD (320 > max 300)

What is this warning means:

The average number PGs in an  (default number is 300)

 => The total number of PGs in all pools / Total number of OSDs,

If the above is more than the default (i.e 300), ceph monitor will report warning.

How to solve/suppress this warning message:

Use injectargs to modify the “mon_pg_warn_max_per_osd to 0”, temporarily,the till
the ceph mon server restart.

# ceph tell mon.* injectargs  "--mon_pg_warn_max_per_osd 0" 

To make the above change persistence, update the ceph.conf with below line and
restart the ceph mons:

mon_pg_warn_max_per_osd = 0

 

 

 

One thought on “Ceph: Health WARN – too many PGs per OSD

  1. Three thoughts:

    1) The ratio is actually

    # PG’s / (# OSD’s / replication)

    2) Better than disabling the warning altogether is to raise it to a value that acknowledges that the only way to actually fix it is delete and recreate the pool, so one generally lives with it. 320 isn’t disastrous; the author of pgcalc below once told me that in practice things don’t get bad until at least 400 or 500.

    So, it would be better to inject and configure a value in this case of perhaps 350. One wants to select a value that is above the current ratio, but also above where the ratio will be if you have an OSD host down due to failure.

    Disabling it altogether leaves you in the blind if someone, say, adds a dozen set of OpenStack pools. I say this from experience.

    3) It’s important to note that adding OSD’s will dilute the ratio back into the desirable range without altering the warning threshold.

    ceph.com/pgcalc or https://access.redhat.com/labsinfo/cephpgc are extremely useful tools for calculating numbers of PG’s.

    Liked by 1 person

Leave a comment