PDA

View Full Version : state.orch ceph.stage.discovery does not collect HDD info



vazaari
13-Mar-2018, 15:47
Hi,

I have four nodes (1x admin, 3x osd, mons,..) prepared for the CEPH cluster.
For some reason discovery stage does not create templates for the osd nodes.

stage.0 completes successfully.
stage.1 completes successfully but osd templates are missing:



# ls -la profile-default/stack/default/ceph/minions/
total 0
drwxr-xr-x 1 salt salt 0 Mar 13 16:03 .
drwxr-xr-x 1 salt salt 14 Mar 13 16:03 ..



# ls -la profile-default/cluster/
total 0
drwxr-xr-x 1 salt salt 0 Mar 13 16:03 .
drwxr-xr-x 1 salt salt 38 Mar 13 16:03 ..

Deepsea monitor output for stage.1:


Starting stage: ceph.stage.1
Parsing ceph.stage.1 steps... ✓


Stage initialization output:
salt-api : valid
deepsea_minions : valid
master_minion : valid
ceph_version : valid

[1/4] minions.ready(timeout=300)........................ ......... ✓ (0.4s)

[2/4] ceph.refresh on
tw-ceph-admin............................................. . ✓ (0.3s)

[3/4] populate.proposals................................ ......... ✓ (5s)

[4/4] proposal.populate................................. ......... ✓ (1s)

Ended stage: ceph.stage.1 succeeded=4/4 time=28.3s

Master can see all minions:

# salt-key -L
Accepted Keys:
tw-ceph-admin
tw-ceph-node1
tw-ceph-node2
tw-ceph-node3
Denied Keys:
Unaccepted Keys:
Rejected Keys:

All (pretended to be) osd nodes have per 3 unformated hdd:

# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 136.6G 0 disk
├─sda1 8:1 1 7M 0 part
├─sda2 8:2 1 2G 0 part [SWAP]
└─sda3 8:3 1 134.6G 0 part /
sdb 8:16 1 136.6G 0 disk
sdc 8:32 1 136.6G 0 disk
sr0 11:0 1 1024M 0 rom

Please assist to find out what am I doing wrong?

Additional info:

# lsb_release -a
LSB Version: n/a
Distributor ID: SUSE
Description: SUSE Linux Enterprise Server 12 SP3
Release: 12.3
Codename: n/a



# zypper se -s --installed-only | grep ses-release
i+ | ses-release | package | 5-1.54 | x86_64 | SES5
i+ | ses-release | package | 5-1.54 | x86_64 | SUSE-Enterprise-Storage-5-Pool
i | ses-release-cd | package | 5-1.54 | x86_64 | SES5

Weird warning:

salt-master[28198]: [WARNING ] Although 'dmidecode' was found in path, the current user cannot execute it. Grains output might not be accurate.

Firewall stopped, apparmor disabled.


brg,
Serhiy.

vazaari
14-Mar-2018, 09:20
Stage 0

Starting stage: ceph.stage.0
Parsing ceph.stage.0 steps... ✓


Stage initialization output:
deepsea_minions : valid
master_minion : valid
ceph_version : valid

[1/14] ceph.salt-api on
tw-ceph-admin............................................. . ✓ (5s)

[2/14] ceph.sync on
tw-ceph-admin............................................. . ✓ (1s)

[3/14] ceph.repo on
tw-ceph-admin............................................. . ✓ (0.8s)

[4/14] ceph.updates on
tw-ceph-admin............................................. . ✓ (10s)

[5/14] filequeue.remove(item=lock)....................... ......... ✓ (0.0s)

[6/14] ceph.updates.restart on
tw-ceph-admin............................................. . ✓ (2s)

[7/14] filequeue.add(item=complete)...................... ......... ✓ (0.0s)

[8/14] minions.ready(timeout=300)........................ ......... ✓ (0.4s)

[9/14] ceph.repo on
tw-ceph-node2............................................. . ✓ (0.3s)
tw-ceph-node3............................................. . ✓ (0.3s)
tw-ceph-node1............................................. . ✓ (0.3s)
tw-ceph-admin............................................. . ✓ (0.3s)

[10/14] ceph.packages.common on
tw-ceph-node2............................................. . ✓ (2s)
tw-ceph-node3............................................. . ✓ (2s)
tw-ceph-node1............................................. . ✓ (2s)
tw-ceph-admin............................................. . ✓ (3s)

[11/14] ceph.sync on
tw-ceph-node2............................................. . ✓ (1.0s)
tw-ceph-node3............................................. . ✓ (1s)
tw-ceph-node1............................................. . ✓ (1s)
tw-ceph-admin............................................. . ✓ (1s)

[12/14] ceph.mines on
tw-ceph-node2............................................. . ✓ (2s)
tw-ceph-node3............................................. . ✓ (2s)
tw-ceph-node1............................................. . ✓ (2s)
tw-ceph-admin............................................. . ✓ (2s)

[13/14] ceph.updates on
tw-ceph-node2............................................. . ✓ (19s)
tw-ceph-node3............................................. . ✓ (16s)
tw-ceph-node1............................................. . ✓ (21s)
tw-ceph-admin............................................. . ✓ (11s)

[14/14] ceph.updates.restart on
tw-ceph-node2............................................. . ✓ (3s)
tw-ceph-node3............................................. . ✓ (3s)
tw-ceph-node1............................................. . ✓ (3s)
tw-ceph-admin............................................. . ✓ (3s)

Ended stage: ceph.stage.0 succeeded=14/14 time=92.7s

vazaari
14-Mar-2018, 14:45
Seems the problem is the lack of storage nodes and osds.

So I can generate templates with the following command:

salt-run proposal.populate leftovers=True standalone=True target='tw-ceph-node*'

but it generates empty osds list:

# cat profile-default/stack/default/ceph/minions/tw-ceph-node1.yml
ceph:
storage:
osds: {}

Could you provide sample of .yml for storage role?

thsundel
14-Mar-2018, 15:11
Seems the problem is the lack of storage nodes and osds.

So I can generate templates with the following command:

salt-run proposal.populate leftovers=True standalone=True target='tw-ceph-node*'

but it generates empty osds list:

# cat profile-default/stack/default/ceph/minions/tw-ceph-node1.yml
ceph:
storage:
osds: {}

Could you provide sample of .yml for storage role?

Here is an sample:

ceph:
storage:
osds:
/dev/disk/by-id/ata-ST4000VN0001-1SF178_Z4F0PS49:
db: /dev/disk/by-id/nvme-20000000001000000e4d25c0bf42f4c00
db_size: 500m
format: bluestore
wal: /dev/disk/by-id/nvme-20000000001000000e4d25c0bf42f4c00
wal_size: 500m
....And so on for next osds.

Thomas

thsundel
14-Mar-2018, 15:13
ceph:
storage:
osds:
/dev/disk/by-id/ata-ST4000VN0001-1SF178_Z4F0PS49:
db: /dev/disk/by-id/nvme-20000000001000000e4d25c0bf42f4c00
db_size: 500m
format: bluestore
wal: /dev/disk/by-id/nvme-20000000001000000e4d25c0bf42f4c00
wal_size: 500m

Reposting with better "layout"

Thomas

vazaari
14-Mar-2018, 16:11
Thomas, thanks a lot.

I've created the the yml with the following content:

ceph:
storage:
osds:
/dev/sdb:
format: bluestore
standalone: true
/dev/sdc:
format: bluestore
standalone: true


stage.2 passed successfully
stage.3 ends with errors

Module function osd.deploy threw an exception. Exception: Mine on tw-ceph-node1 for cephdisks.list

hdds are still not recognized:

# salt 'tw-ceph-node*' cephdisks.list
tw-ceph-node2:
tw-ceph-node3:
tw-ceph-node1:


Nodes itself see the disks, f.e.:


hwinfo --disk | egrep 'sdb|sdc'
SysFS ID: /class/block/sdb
Device File: /dev/sdb (/dev/sg1)
Device Files: /dev/sdb, /dev/disk/by-id/scsi-1ADAPTEC_ARRAY_4022A45B, /dev/disk/by-id/scsi-25ba4224000d00000, /dev/disk/by-id/scsi-SServeRA_disk1_4022A45B, /dev/disk/by-path/pci-0000:04:00.0-scsi-0:0:1:0
SysFS ID: /class/block/sdc
Device File: /dev/sdc (/dev/sg2)
Device Files: /dev/sdc, /dev/disk/by-id/scsi-1ADAPTEC_ARRAY_354EA45B, /dev/disk/by-id/scsi-25ba44e3500d00000, /dev/disk/by-id/scsi-SServeRA_disk2_354EA45B, /dev/disk/by-path/pci-0000:04:00.0-scsi-0:0:2:0

vazaari
15-Mar-2018, 13:04
Seems that deepsea does not like hdds under raid controllers or some of them.
In my case raid controller does not support jbod mode so I have to create volumes which recognised by host as ordinary hdds.

I've tried to deploy ceph with 'ceph-deploy' and it also failed to create osds...
But I've created one partition per each hdd and osds had been created successfully with


# ceph-deploy osd create tw-ceph-nodeY --data /dev/sdX1

After that I played a little bit with destroy, purge,.. and get success with unpartitioned disk:

- on the osd node side
zap disk and reboot host
ceph-volume lvm prepare --osd-id {id} --data /dev/sdX

- on the ceph-deploy node
ceph-deploy osd create tw-ceph-nodeY --data /dev/sdX

Now I'm trying to 'migrate' ceph-deploy to deepsea (as in the case of ses[3,4] to ses5 ).
I successfuly have passed stages 0,1,2, edited /srv/modules/runners/validate.py to bypass '4 nodes' requirement and stuck on the stage 3:


[13/44] ceph.osd.auth on
tw-ceph-admin............................................. . ❌ (2s)

Ended stage: ceph.stage.3 succeeded=12/44 failed=1/44 time=85.1s

Failures summary:

ceph.osd.auth (/srv/salt/ceph/osd/auth):
tw-ceph-admin:
auth /srv/salt/ceph/osd/cache/bootstrap.keyring: Command "ceph auth add client.bootstrap-osd -i /srv/salt/ceph/osd/cache/bootstrap.keyring" run
stdout:
stderr: Error EINVAL: entity client.bootstrap-osd exists but caps do not match

vazaari
15-Mar-2018, 13:29
The following trick helps to solve 'caps do not match' and pass the stage 3:

# ceph auth caps client.bootstrap-osd mgr \
"allow r" mon "allow profile bootstrap-osd"

vazaari
15-Mar-2018, 13:38
Closer and closer...
Faced the igw issue with stage 4:

[6/16] ceph.igw on
tw-ceph-node1............................................. . ❌ (4s)
...
Ended stage: ceph.stage.4 succeeded=15/16 failed=1/16 time=547.9s
...
Failures summary:

ceph.igw (/srv/salt/ceph/igw):
tw-ceph-node1:
reload lrbd: Module function service.restart executed
enable lrbd: Service lrbd has been enabled, and is dead

Not yet solved.

thsundel
15-Mar-2018, 15:30
Closer and closer...
Faced the igw issue with stage 4:

[6/16] ceph.igw on
tw-ceph-node1............................................. . ❌ (4s)
...
Ended stage: ceph.stage.4 succeeded=15/16 failed=1/16 time=547.9s
...
Failures summary:

ceph.igw (/srv/salt/ceph/igw):
tw-ceph-node1:
reload lrbd: Module function service.restart executed
enable lrbd: Service lrbd has been enabled, and is dead

Not yet solved.

Maybe this is what you encountered: https://www.novell.com/support/kb/doc.php?id=7018668

Thomas

vazaari
15-Mar-2018, 16:10
Maybe this is what you encountered: https://www.novell.com/support/kb/doc.php?id=7018668

Thomas

I've recreated rdb & iscsi from the openAttic.