Results 1 to 4 of 4

Thread: CephFS, free space reporting

Hybrid View

  1. #1

    CephFS, free space reporting

    As I see, Ceph reports the amount of used/free storage space as 'raw' data, without replication or erasure coding taken into account.
    This is acceptable for object storage, but terribly wrong way for CephFS.
    Look at the example, please.
    - Cluster has 3 OSD ~33 GB each.
    - Two pools (data and metadata) for CephFS are created with replication factor = 3 rule.
    - The CephFS was used to store NFS export for VMWare datastore.
    - The 25 GB .VMDK was placed to the datastore created.
    - Ceph reports the usage as "78250 MB used, 22822 MB / 101072 MB avail". This is correct for raw storage.
    - VMWare reports as follows. vmw01.png
    - Does that mean the ~22 GB VMDK file from CephFS datastore might be added to virtual infrastructure? No!
    - Is that correct data for administering VMWare? No!
    - Which numbers will be clear for VM Admin? "~25 GB used, ~7GB / ~33 GB avail".
    CephFS uses particular pool, which is coupled to particular rule, which defines the replication factor (or redundancy factor) in unique way. All data for calculation on-the-flight are present.

    Does anybody know the reason not to divide the raw statistics by that factor, which will give the usable result?

  2. #2

    Re: CephFS, free space reporting

    My only, admittedly-wild, guess, other than this being a bug, is that
    perhaps it is assuming clients will have access to the replication
    information and assume they will do the arithmetic themselves as you did,
    but that's not normal, at least to me, for a filesystem. When using RAID
    (6, 10, whatever) the filesystem usually has no clue about how many disks
    are there,their state, if/when they may be changed or need a rebalance;
    the filesystem is just the filesystem and only knows about the partition
    size, so then tools that query the filesystem can get a reliable read
    directly from them without understanding the underlying layers.

    As a result, I'd guess this is a bug, and probably an upstream (Ceph) bug.
    With that in mind, does this seem to match what you are seeing, which
    would indicate some kind of reason for it?

    It ay help us to see the exact commands you are running. If your
    screenshot has those, then I'm sorry it is not pulling up for me...
    probably a problem on my end, but text-based output is generally preferred
    anyway when possible.

    Good luck.

    If you find this post helpful and are logged into the web interface,
    show your appreciation and click on the star below...

  3. #3

    Re: CephFS, free space reporting

    Yes, that issue SES nested from main Ceph. However, I'm suspecting the SUSE vote will be valuable and the state might turn to good.
    From my point of view CephFS is not completely separate, as it is a part of Ceph. Thus, it has all rights to get the list of pools, pool parameters, 'df lists' data.

    At OpenNebula:
    Because the replica count is a per-pool thing, and a filesystem can use multiple pools with different replica counts (via files having
    different layouts), giving the raw free space is the most consistent thing we can do.
    Easiest - yes, but not consistent and not convenient for others.
    The problem will rise again at VI administrator level (which must look at 'ceph df' before any desision) or at the orchestration level (which might be completely overhelmed) and will require the same calculations.

    As for multiple pools, I see no problems to calculate free space upon request:
    - Get list of pools, which were used for filesystem.
    - Add the ( pool_raw_free / pool_redundancy_factor ) over all pools used.
    Still not a complex task.

    help us to see the exact commands you are running.
    I killed the previous Ceph cluster, so the numbers will be new
    Actual configuration: 3x OSD, 64G space per OSD. CephFS pools replication factor = 2. Written file size = 25G.

    ceph02admin:~ # ceph osd dump | grep 'replicated size'
    pool 13 'cephfs_data' replicated size 2 min_size 2 ...
    pool 14 'cephfs_metadata' replicated size 2 min_size 2 ...
    Ceph reports about pools
    ceph02admin:~ # ceph df
        SIZE     AVAIL     RAW USED     %RAW USED
        176G      126G       51589M         28.48
        NAME                        ID     USED       %USED     MAX AVAIL     OBJECTS
        cephfs_data                 13     25600M     28.90        62987M        6405
        cephfs_metadata             14     42196k      0.07        62987M          39
    OS reports about CephFS
    ceph02admin:~ # df -k
    Filesystem                                    1K-blocks     Used Available Use% Mounted on
    ...,, 185503744 52809728 132694016  29% /mnt/cephfs
    VMWare reports about datastore
    Capacity = 176,91 GB
    Used = 50,30 GB
    Free = 126,61 GB

    As for me, the possible acceptable answers are:
    Capacity = 177 GB, Used = 25 GB, Free = 63 GB
    Capacity = 88 GB, Used = 25 GB, Free = 63 GB

  4. #4

    Re: CephFS, free space reporting

    Some thoughts after weekend:
    - Add the ( pool_raw_free / pool_redundancy_factor ) over all pools used.
    Not simply "add", as all pools still using the same and united object space.
    - maximum of the ( pool_raw_free / pool_redundancy_factor ) across all pools, included into CephFS.

    Commonly saying, such approach will break the rule "Free + Allocated == Total", but I'm suspecting the amount of "free" is preferrable over non-formal rules.

    Also, I'm cleanly undestanding the reason of multiple data pools usage in particular CephFS. Why?
    To set the different pool parameters (such as quoting / caching / replicas) in single filesystem? Why? To add more placement groups? Add them to the first pool instead... I don't know...
    Last edited by polezhaevdmi; 13-Mar-2017 at 14:45.


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts