Ceph

Ceph – Diff between erasure and replicated pool type

Here, I just collected information from multiple docs (from blogs and ceph docs) for easy to understand the replicated and erasure code pool type.

A Ceph pool is associated to a type to sustain the loss of an OSD (i.e. a disk since most of the time there is one OSD per disk). The default choice when creating pool is replicated, meaning every object is copied on multiple disks. The Erasure code  pool type can be used instead to save space.

We can set how many OSD are allowed to fail without losing data. The pool type which may either be replicated to recover from lost OSDs by keeping multiple copies of the objects or erasure to get a kind of general RADI5  capability.

Replicated pools

  1. It is the desired number of copies/replicas of an object.
  2. A typical configuration stores an object and one additional copy (i.e., size = 2).
  3. The replicated pools require more raw storage,  but implement all Ceph operations.
  4. It is the ruleset specified by the osd pool default crush replicated ruleset config variable.
  5. The replicated pool crush ruleset targets faster hardware to provide better response times.

Calculating the storage of a replicated pool is easy.  Just divide the amount of space you have by the “size” (number of replicas) of the storage pool.

For ex: Let’s work with some rough numbers: 25 OSDs of 4TB each.

Raw size: 25*4  = 100TB
Size 2  : 100/2  = 50TB
Size 3  : 100/3  = 33.33TB

Replicated pools are expensive in terms of storage.

Erasure coded pools  –  

Erasure coding (EC) is a method of data protection in which data is broken into fragments , encoded and then store in a distributed manner. Erasure coding makes use of a mathematical equation to achieve data protection. The entire concept revolves around the following equation.

n = k + m  where , 
k  =  The number of chunks original data divided into.
m =  The extra codes added to original data chunks to provide data protection.
n  =  The total number of chunks created after erasure coding process.
  1. It is the number of coding chunks (i.e. m=2 in the erasure coded profile) .
  2. It require less raw storage but only implement a subset of the available operations for instance, partial write is not supported).
  3. it is erasure-code if the default erasure code profile is used or {pool-name} otherwise.
  4. This ruleset will be created implicitly if it doesn’t exist already.
  5. The erasure-coding support has higher computational requirements.
  6. The erasure-coded pool crush ruleset targets hardware designed for cold storage with high latency and slow access time.
Recovery : To perform recovery operation, we would require any k chunks out of n chunks and thus can tolerate failure of any chunks
Reliability Level : We can tolerate failure upto any m chunks.
Encoding Rate (r) : r = k / n , where r < 1
Storage Required : 1 / r
EXAMPLE#1:
(3,5) ERASURE CODE FOR ANY DATA FILE WOULD LOOK LIKE
N = 5 , K = 3 AND M = 2  ( M = N - K )
ERASURE CODING EQUATION  5 = 3 + 2
So  2  coded chunks will be added to 3 data chunks to form 5 total chunks that will be stored distributedly. In an event of failure , to construct the original file , we need any 3 chunks out of these 5 chunks to recover it. Hence in this example we can manage loss of any 2 chunks.
Encoding rate (r) = 3 / 5 = 0.6 < 1
Storage Required = 1 / 0.6 = 1.6 times of original file.

If the original file size is 1GB then to store this file in a ceph cluster erasure coded (3,5) pool , you would need 1.6GB of storage space.

Use cases:
1. COLD STORAGEAn erasure-coded pool is created to store a large number of 1GB objects (imaging,   genomics, etc.) and 10% of them are read per month. New objects are added every day and the objects are not modified after being written. On average there is one write for 10,000 reads.

A replicated pool is created and set as a cache tier for the erasure coded pool. An agent demotes objects (i.e. moves them from the replicated pool to the erasure-coded pool) if they have not been accessed in a week.

2. CHEAP MULTIDATACENTER STORAGE

Ten datacenters are connected with dedicated network links. Each datacenter contains the same amount of storage with no power-supply backup and no air-cooling system.

An erasure-coded pool is created with a crush map ruleset that will ensure no data loss if at most three datacenters fail simultaneously. The overhead is 50% with erasure code configured to split data in six (k=6) and create three coding chunks (m=3). With replication the overhead would be 400% (four replicas).

References:
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s