View Full Version : SES 4 issues

09-Feb-2017, 14:04
OK, let's investigate and have play with 3-node cluster, plus a monitoring node. Documentation is here (https://www.suse.com/documentation/ses-4/book_storage_admin/data/book_storage_admin.html).
CephFS is officially production-ready.
SES 4 still based on 10.x.x Ceph core. Not actual "Kraken" (11.x.x), which claims the stress-test stability for BlueStore 'engine' (detailed presentation is here (http://www.slideshare.net/sageweil1/bluestore-a-new-faster-storage-backend-for-ceph)).
The Calamary management project looks dead, as it stuck with 2014.7 version of SALT, which is completely outdated. The OpenATTIC package came instead. Installs easily, operates fine, looks nice - great choice, SUSE! Hope, it will last for long time.

09-Feb-2017, 14:19
First issue I found coupled with CephFS (preliminary steps are simple like a charm). Tried to mount CephFS in a kernel during boot, putting corresponding line into /etc/fstab.
Got the emergency mode prompt because of "libceph -101 connect error". The ifconfig command shows no external interfaces at this stage. Tried to comment the line out, boot the system, uncomment the line and mount - success.
Thus, possibilities are:
- The network mount via libceph improperly marked as vital to boot the kernel and causes system to interrupt the boot process.
- System improperly tries to make the mount before NIC drivers are loaded, which causes the fault.
- System improperly does not ignore the improper network mount. Instead, it must ignore it and try the mount again after all hardware and network layer will be completely initialized.
Seems something is wrong with libceph or SLES12SP2 kernel.

09-Feb-2017, 16:13
Workaround I found. The modified line in /etc/fstab works fine,, /mnt/cephfs ceph name=admin,secretfile=/etc/ceph/secret.key,auto,x-systemd.automount,noatime 0 0
Possibly, the SES 4 documentation should be revised.

09-Feb-2017, 17:23

it seems that systemd (systemd-fstab-generator) is not yet able to recognize CephFS as a network file system. So instead of creating an automount (which I believe may cause errors if *accessed* during early boot stage, before networking is up), you may want to add the mount option


See "man systemd.mount", sections on automatic dependencies and fstab entries. YMMV, I currently have no test bed to try before posting.


10-Feb-2017, 17:08

Tried. No, the "x-systemd.requires=network-online.target" option causes OS to boot without external network interfaces, loopback interface is only visible.
However, the "requires-mounts-for=..." looks helpful for clusters with multiple protocol acess to Ceph data (CIFS + NFS).

Have a nice weekend!

15-Feb-2017, 12:03
Another issue found. Steps I took:
- Created ~176 GiB filesystem using SES cluster;
- Mounted in at /mnt/cephfs point;
- Made /mnt/cephfs/NFS directory;
- Exported /mnt/cephfs/NFS via SES4 NFS server functionality;
- Added the export as VMWare datastore;
- Deployed a VM with virtual drive (/dev/sdb) at NFS-mounted datastore;
- Loaded the /dev/sdb via FIO-generated load.
- After night load the cluster health came to ERROR due to OSD overload (>90%) and filesystem stuck at write operation. OK.

ceph02admin:/mnt/cephfs/NFS # ceph -s
cluster bd8aa69c-f316-4aa6-9128-6225c80024f6
30 pgs backfill_toofull
30 pgs stuck unclean
recovery 862/49712 objects degraded (1.734%)
recovery 13382/49712 objects misplaced (26.919%)
1 full osd(s)
2 near full osd(s)
nearfull,full,sortbitwise,require_jewel_osds flag(s) set
monmap e1: 3 mons at {ceph02node01=,ceph02node02=,ceph02node03=}
election epoch 10, quorum 0,1,2 ceph02node01,ceph02node02,ceph02node03
fsmap e11: 1/1/1 up {0=fsgw01=up:active}
osdmap e124: 3 osds: 3 up, 3 in; 30 remapped pgs
flags nearfull,full,sortbitwise,require_jewel_osds
pgmap v169762: 250 pgs, 13 pools, 84846 MB data, 21423 objects
165 GB used, 11207 MB / 176 GB avail
862/49712 objects degraded (1.734%)
13382/49712 objects misplaced (26.919%)
220 active+clean
30 active+remapped+backfill_toofull
client io 1125 B/s rd, 1 op/s rd, 0 op/s wr
- Stopped the VM and deleted the virtual drive at datastore. OSD remained overloaded as the filesystem space not released. Not OK.

ceph02admin:/mnt/cephfs/NFS # ceph df detail
176G 11207M 165G 93.81 21423
rbd 0 - N/A N/A 0 0 3010M 0 0 0 0 0
iscsipool 3 - N/A N/A 5174 0 4516M 4 4 229 9k 25 10348
.rgw.root 4 - N/A N/A 1588 0 3010M 4 4 2 04 4 4764
default.rgw.control 5 - N/A N/A 0 0 3010M 8 8 0 0 0
default.rgw.data.root 6 - N/A N/A 0 0 3010M 0 0 0 0 0
default.rgw.gc 7 - N/A N/A 0 0 3010M 32 32 477 10 31808 0
default.rgw.log 8 - N/A N/A 0 0 3010M 127 127 110 5k 736k 0
default.rgw.users.uid 9 - N/A N/A 551 0 3010M 1 1 1 5 1653
default.rgw.users.email 10 - N/A N/A 14 0 3010M 1 1 0 1 42
default.rgw.users.keys 11 - N/A N/A 14 0 3010M 1 1 0 1 42
default.rgw.users.swift 12 - N/A N/A 14 0 3010M 1 1 0 1 42
cephfs_data 13 - N/A N/A 84804M 94.86 4516M 21205 21205 6 51 821k 162G
cephfs_metadata 14 - N/A N/A 42965k 0.92 4516M 39 39 835 13 275k 85930k
- Found a lot strays in the statistics. Not OK.

ceph02node01:~ # ceph daemon mds.fsgw01 perf dump | grep stray
"num_strays": 12913,
"num_strays_purging": 0,
"num_strays_delayed": 0,
"strays_created": 19913,
"strays_purged": 7000,
"strays_reintegrated": 0,
"strays_migrated": 0
- Tried to clean the filesystem data.

ceph02node01:~ # ceph daemon mds.fsgw01 flush journal
"message": "",
"return_code": 0
- Ceph cluster came to OK state.

ceph02node01:~ # ceph -s
cluster bd8aa69c-f316-4aa6-9128-6225c80024f6
health HEALTH_OK
monmap e1: 3 mons at {ceph02node01=,ceph02node02=,ceph02node03=}
election epoch 10, quorum 0,1,2 ceph02node01,ceph02node02,ceph02node03
fsmap e11: 1/1/1 up {0=fsgw01=up:active}
osdmap e158: 3 osds: 3 up, 3 in
flags sortbitwise,require_jewel_osds
pgmap v170084: 250 pgs, 13 pools, 2039 kB data, 210 objects
199 MB used, 176 GB / 176 GB avail
250 active+clean
client io 1015 B/s rd, 2 op/s rd, 0 op/s wr

The question remained:
- What and how should be tweaked to make SES filesystem to purge all released data immediately?

24-Feb-2017, 11:45
Saying more.
I just loaded the chain "VMDK -> VMWare Datastore -> NFS -> CephFS" using FIO at single VM for 3 days.
What is THAT, how IT happens and how to prevent THAT in the future? :cool:
SES cluster still in the OK health, it still has 3 nodes with 3 monitors and 3 OSD ~64GB each.

228 229

24-Feb-2017, 12:27
SUSE-customized OpenAttic does not allow CRUSH map and ruleset change. Yes, the SES 4 online manual presents view-mode only (https://www.suse.com/documentation/ses-4/book_storage_admin/data/ceph_oa_webui_ceph.html), but the original software is much functional.

230 231

24-Feb-2017, 15:45
Two more issues, which are, possibly, coupled:
- The SUSE-customized OpenATTIC does not display "Nodes" information ("No matching records found").
- OpenATTIC log file has a lot of error messages about RADOS gateway keyring, SALT and performance data file.

2017-02-24 15:21:24,421 - INFO - openattic_systemd#loggedfunc - Calling /ceph_deployment::invoke_salt_key(dbus.Array([dbus.String(u'-L')], signature=dbus.Signature('s')))
2017-02-24 15:21:25,119 - ERROR - ceph.models#set_performance_data_options - Set performance_data_options failed: XML file '/var/lib/pnp4nagios/perfdata/ceph02admin/Check_CephRbd_bd8aa69c-f316-4aa6-9128-6225c80024f6_rbd_iscsi.xml' could not be found.
2017-02-24 15:22:30,870 - ERROR - ceph.librados#__init__ - No usable keyring
2017-02-24 15:22:31,086 - ERROR - ceph.librados#__init__ - No usable keyring
2017-02-24 15:22:37,875 - ERROR - ceph.librados#__init__ - No usable keyring
OpenATTIC reinstall/reinit was in vain.

Database openattic exists, owned by openattic
Creating tables ...
Installing custom SQL ...
Installing indexes ...
Installed 70 object(s) from 2 fixture(s)
We have an admin already, not creating default user.
Found lo
Found eth0
Found eth1
The authentication token for 'openattic' does already exist.
checking disk /dev/sda
serial not found
Checking Ceph cluster ceph (bd8aa69c-f316-4aa6-9128-6225c80024f6)... known
Checking Ceph OSD 0... known
Checking Ceph OSD 1... known
Checking Ceph OSD 2... known
Checking Ceph pool rbd... known
Checking Ceph pool iscsipool... known
Checking Ceph pool .rgw.root... known
Checking Ceph pool default.rgw.control... known
Checking Ceph pool default.rgw.data.root... known
Checking Ceph pool default.rgw.gc... known
Checking Ceph pool default.rgw.log... known
Checking Ceph pool default.rgw.users.uid... known
Checking Ceph pool default.rgw.users.email... known
Checking Ceph pool default.rgw.users.keys... known
Checking Ceph pool default.rgw.users.swift... known
Checking Ceph pool cephfs_data... known
Checking Ceph pool cephfs_metadata... known
Checking Ceph mds fsgw01... skipped
Checking Ceph mon ceph02node01... skipped
Checking Ceph mon ceph02node02... skipped
Checking Ceph mon ceph02node03... skipped
Checking Ceph auth entity mds.fsgw01... found
Checking Ceph auth entity osd.0... found
Checking Ceph auth entity osd.1... found
Checking Ceph auth entity osd.2... found
Checking Ceph auth entity client.admin... found
Checking Ceph auth entity client.bootstrap-mds... found
Checking Ceph auth entity client.bootstrap-osd... found
Checking Ceph auth entity client.bootstrap-rgw... found
Checking Ceph auth entity client.rgw.rgw01... found
Checking Ceph auth entity client.rgw.rgw02... found
Checking Ceph auth entity client.rgw.rgw03... found
Updating Nagios configs: adding detected Ceph clusters
/etc/openattic/cli.conf already exists
Completed successfully.

I tried to set the group and access for keyring files (as mentioned in the OpenATTIC documenttion). In vain as well.

ceph02admin:/etc/ceph # ls -la
total 256
drwxr-xr-x 1 root root 360 Feb 6 18:02 .
drwxr-xr-x 1 root root 5576 Feb 24 14:12 ..
-rw-r--r-- 1 cephadm root 1077 Feb 3 18:13 .cephdeploy.conf
-rw-rw-rw- 1 cephadm users 221326 Feb 24 13:59 ceph-deploy-ceph.log
-rw-r----- 1 cephadm openattic 71 Feb 3 18:43 ceph.bootstrap-mds.keyring
-rw-r----- 1 cephadm openattic 71 Feb 3 18:43 ceph.bootstrap-osd.keyring
-rw-r----- 1 cephadm openattic 71 Feb 3 18:43 ceph.bootstrap-rgw.keyring
-rw-r----- 1 cephadm openattic 63 Feb 3 18:43 ceph.client.admin.keyring
-rw-r----- 1 cephadm openattic 632 Feb 6 13:08 ceph.conf
-rw-r----- 1 cephadm openattic 73 Feb 3 18:20 ceph.mon.keyring
-rwxr-xr-x 1 root root 92 Dec 9 16:28 rbdmap
-rw-r----- 1 root root 42 Feb 6 18:02 secret.key

However, the cluster is fresh and simple: 3x MON/OSD/RGW nodes + Admin node with OpenATTIC. The ceph.conf is also simple and cluster ID the same, as mentioned above:

fsid = bd8aa69c-f316-4aa6-9128-6225c80024f6
public_network =
cluster_network =
mon_initial_members = ceph02node01, ceph02node02, ceph02node03
mon_host =,,
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

host = ceph02node01
rgw_dns_name = ceph02node01
rgw_frontends = civetweb port=7480

host = ceph02node02
rgw_dns_name = ceph02node02
rgw_frontends = civetweb port=7480

host = ceph02node03
rgw_dns_name = ceph02node03
rgw_frontends = civetweb port=7480
The keyring file has the default user registered:

ceph02admin:/etc/ceph # ceph-authtool -l ./ceph.client.admin.keyring
key = AQBar5RYl3mcARAA+axYUM0Y7nJazb2HEbsfIA==

24-Feb-2017, 16:38
Additional information: list of entities

ceph02admin:/etc/ceph # ceph auth list
installed auth entries:

key: AQAkhJhYld6SBRAAYh2z8BsseBNlIC7qCStInQ==
caps: [mds] allow
caps: [mon] allow profile mds
caps: [osd] allow rwx
key: AQD0s5RYOrMoKxAAvPgCsrM++8K3eGZNvczdkw==
caps: [mon] allow profile osd
caps: [osd] allow *
caps: [mon] allow profile osd
caps: [osd] allow *
key: AQAYtJRYAxdNHxAA4x77lNo8dKbZ6oH9MFYWdg==
caps: [mon] allow profile osd
caps: [osd] allow *
key: AQBar5RYl3mcARAA+axYUM0Y7nJazb2HEbsfIA==
caps: [mds] allow *
caps: [mon] allow *
caps: [osd] allow *
key: AQBar5RYuOGhGhAAR6n05tlzpuBwkb2Ik4oO8A==
caps: [mon] allow profile bootstrap-mds
key: AQBar5RYwwHjMhAAzJ2GE+6oVhYxlrzPsp/qEg==
caps: [mon] allow profile bootstrap-osd
key: AQBbr5RYsfGEERAA5i3pu79Kl6w9CtCjZRsFdA==
caps: [mon] allow profile bootstrap-rgw
caps: [mon] allow rw
caps: [osd] allow rwx
key: AQATWZhYIT0JGhAAjIRx5VfvmtdgZLsUrEyJCg==
caps: [mon] allow rw
caps: [osd] allow rwx
key: AQAVWZhY/7mjLhAAueAmh9bh3U3L4tD+62glbA==
caps: [mon] allow rw
caps: [osd] allow rwx