PDA

View Full Version : CephFS, free space reporting



polezhaevdmi
24-Feb-2017, 19:20
As I see, Ceph reports the amount of used/free storage space as 'raw' data, without replication or erasure coding taken into account.
This is acceptable for object storage, but terribly wrong way for CephFS.
Look at the example, please.
- Cluster has 3 OSD ~33 GB each.
- Two pools (data and metadata) for CephFS are created with replication factor = 3 rule.
- The CephFS was used to store NFS export for VMWare datastore.
- The 25 GB .VMDK was placed to the datastore created.
- Ceph reports the usage as "78250 MB used, 22822 MB / 101072 MB avail". This is correct for raw storage.
- VMWare reports as follows. 232
- Does that mean the ~22 GB VMDK file from CephFS datastore might be added to virtual infrastructure? No!
- Is that correct data for administering VMWare? No!
- Which numbers will be clear for VM Admin? "~25 GB used, ~7GB / ~33 GB avail".
CephFS uses particular pool, which is coupled to particular rule, which defines the replication factor (or redundancy factor) in unique way. All data for calculation on-the-flight are present.

Does anybody know the reason not to divide the raw statistics by that factor, which will give the usable result?

ab
27-Feb-2017, 14:48
My only, admittedly-wild, guess, other than this being a bug, is that
perhaps it is assuming clients will have access to the replication
information and assume they will do the arithmetic themselves as you did,
but that's not normal, at least to me, for a filesystem. When using RAID
(6, 10, whatever) the filesystem usually has no clue about how many disks
are there,their state, if/when they may be changed or need a rebalance;
the filesystem is just the filesystem and only knows about the partition
size, so then tools that query the filesystem can get a reliable read
directly from them without understanding the underlying layers.

As a result, I'd guess this is a bug, and probably an upstream (Ceph) bug.
With that in mind, does this seem to match what you are seeing, which
would indicate some kind of reason for it?

http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2016-June/010546.html

It ay help us to see the exact commands you are running. If your
screenshot has those, then I'm sorry it is not pulling up for me...
probably a problem on my end, but text-based output is generally preferred
anyway when possible.

--
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below...

polezhaevdmi
10-Mar-2017, 16:57
Yes, that issue SES nested from main Ceph. However, I'm suspecting the SUSE vote will be valuable and the state might turn to good.
From my point of view CephFS is not completely separate, as it is a part of Ceph. Thus, it has all rights to get the list of pools, pool parameters, 'df lists' data.

At OpenNebula:

Because the replica count is a per-pool thing, and a filesystem can use multiple pools with different replica counts (via files having
different layouts), giving the raw free space is the most consistent thing we can do.
Easiest - yes, but not consistent and not convenient for others.
The problem will rise again at VI administrator level (which must look at 'ceph df' before any desision) or at the orchestration level (which might be completely overhelmed) and will require the same calculations.

As for multiple pools, I see no problems to calculate free space upon request:
- Get list of pools, which were used for filesystem.
- Add the ( pool_raw_free / pool_redundancy_factor ) over all pools used.
Still not a complex task.


help us to see the exact commands you are running.
I killed the previous Ceph cluster, so the numbers will be new :)
Actual configuration: 3x OSD, 64G space per OSD. CephFS pools replication factor = 2. Written file size = 25G.


ceph02admin:~ # ceph osd dump | grep 'replicated size'
...
pool 13 'cephfs_data' replicated size 2 min_size 2 ...
pool 14 'cephfs_metadata' replicated size 2 min_size 2 ...


Ceph reports about pools

ceph02admin:~ # ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
176G 126G 51589M 28.48
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
...
cephfs_data 13 25600M 28.90 62987M 6405
cephfs_metadata 14 42196k 0.07 62987M 39


OS reports about CephFS

ceph02admin:~ # df -k
Filesystem 1K-blocks Used Available Use% Mounted on
...
172.18.66.61:6789,172.18.66.62,172.18.66.63:/ 185503744 52809728 132694016 29% /mnt/cephfs
...

VMWare reports about datastore
Capacity = 176,91 GB
Used = 50,30 GB
Free = 126,61 GB

As for me, the possible acceptable answers are:
Capacity = 177 GB, Used = 25 GB, Free = 63 GB
Capacity = 88 GB, Used = 25 GB, Free = 63 GB

polezhaevdmi
13-Mar-2017, 14:43
Some thoughts after weekend:

- Add the ( pool_raw_free / pool_redundancy_factor ) over all pools used.
Not simply "add", as all pools still using the same and united object space.
Instead:
- maximum of the ( pool_raw_free / pool_redundancy_factor ) across all pools, included into CephFS.

Commonly saying, such approach will break the rule "Free + Allocated == Total", but I'm suspecting the amount of "free" is preferrable over non-formal rules.

Also, I'm cleanly undestanding the reason of multiple data pools usage in particular CephFS. Why?
To set the different pool parameters (http://docs.ceph.com/docs/hammer/rados/operations/pools/) (such as quoting / caching / replicas) in single filesystem? Why? To add more placement groups? Add them to the first pool instead... I don't know...