Page 1 of 2 12 LastLast
Results 1 to 10 of 11

Thread: state.orch ceph.stage.discovery does not collect HDD info

Hybrid View

  1. #1

    state.orch ceph.stage.discovery does not collect HDD info

    Hi,

    I have four nodes (1x admin, 3x osd, mons,..) prepared for the CEPH cluster.
    For some reason discovery stage does not create templates for the osd nodes.

    stage.0 completes successfully.
    stage.1 completes successfully but osd templates are missing:

    Code:
    # ls -la profile-default/stack/default/ceph/minions/
    total 0
    drwxr-xr-x 1 salt salt  0 Mar 13 16:03 .
    drwxr-xr-x 1 salt salt 14 Mar 13 16:03 ..
    Code:
    # ls -la profile-default/cluster/
    total 0
    drwxr-xr-x 1 salt salt  0 Mar 13 16:03 .
    drwxr-xr-x 1 salt salt 38 Mar 13 16:03 ..
    Deepsea monitor output for stage.1:
    Code:
    Starting stage: ceph.stage.1
    Parsing ceph.stage.1 steps... ✓
    
    
    Stage initialization output:
    salt-api                 : valid
    deepsea_minions          : valid
    master_minion            : valid
    ceph_version             : valid
    
    [1/4]     minions.ready(timeout=300)................................. ✓ (0.4s)
    
    [2/4]     ceph.refresh on
              tw-ceph-admin.............................................. ✓ (0.3s)
    
    [3/4]     populate.proposals......................................... ✓ (5s)
    
    [4/4]     proposal.populate.......................................... ✓ (1s)
    
    Ended stage: ceph.stage.1 succeeded=4/4 time=28.3s
    Master can see all minions:
    Code:
    # salt-key -L
    Accepted Keys:
    tw-ceph-admin
    tw-ceph-node1
    tw-ceph-node2
    tw-ceph-node3
    Denied Keys:
    Unaccepted Keys:
    Rejected Keys:
    All (pretended to be) osd nodes have per 3 unformated hdd:
    Code:
    # lsblk
    NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
    sda      8:0    1 136.6G  0 disk 
    ├─sda1   8:1    1     7M  0 part 
    ├─sda2   8:2    1     2G  0 part [SWAP]
    └─sda3   8:3    1 134.6G  0 part /
    sdb      8:16   1 136.6G  0 disk 
    sdc      8:32   1 136.6G  0 disk 
    sr0     11:0    1  1024M  0 rom
    Please assist to find out what am I doing wrong?

    Additional info:
    Code:
    # lsb_release -a
    LSB Version:	n/a
    Distributor ID:	SUSE
    Description:	SUSE Linux Enterprise Server 12 SP3
    Release:	12.3
    Codename:	n/a
    Code:
    # zypper se -s --installed-only | grep ses-release
    i+ | ses-release                                | package     | 5-1.54                                  | x86_64 | SES5                             
    i+ | ses-release                                | package     | 5-1.54                                  | x86_64 | SUSE-Enterprise-Storage-5-Pool   
    i  | ses-release-cd                             | package     | 5-1.54                                  | x86_64 | SES5
    Weird warning:
    Code:
    salt-master[28198]: [WARNING ] Although 'dmidecode' was found in path, the current user cannot execute it. Grains output might not be accurate.
    Firewall stopped, apparmor disabled.


    brg,
    Serhiy.
    Last edited by vazaari; 13-Mar-2018 at 15:54.

  2. #2

    Re: state.orch ceph.stage.discovery does not collect HDD inf

    Stage 0
    Code:
    Starting stage: ceph.stage.0
    Parsing ceph.stage.0 steps... ✓
    
    
    Stage initialization output:
    deepsea_minions          : valid
    master_minion            : valid
    ceph_version             : valid
    
    [1/14]    ceph.salt-api on
              tw-ceph-admin.............................................. ✓ (5s)
    
    [2/14]    ceph.sync on
              tw-ceph-admin.............................................. ✓ (1s)
    
    [3/14]    ceph.repo on
              tw-ceph-admin.............................................. ✓ (0.8s)
    
    [4/14]    ceph.updates on
              tw-ceph-admin.............................................. ✓ (10s)
    
    [5/14]    filequeue.remove(item=lock)................................ ✓ (0.0s)
    
    [6/14]    ceph.updates.restart on
              tw-ceph-admin.............................................. ✓ (2s)
    
    [7/14]    filequeue.add(item=complete)............................... ✓ (0.0s)
    
    [8/14]    minions.ready(timeout=300)................................. ✓ (0.4s)
    
    [9/14]    ceph.repo on
              tw-ceph-node2.............................................. ✓ (0.3s)
              tw-ceph-node3.............................................. ✓ (0.3s)
              tw-ceph-node1.............................................. ✓ (0.3s)
              tw-ceph-admin.............................................. ✓ (0.3s)
    
    [10/14]   ceph.packages.common on
              tw-ceph-node2.............................................. ✓ (2s)
              tw-ceph-node3.............................................. ✓ (2s)
              tw-ceph-node1.............................................. ✓ (2s)
              tw-ceph-admin.............................................. ✓ (3s)
    
    [11/14]   ceph.sync on
              tw-ceph-node2.............................................. ✓ (1.0s)
              tw-ceph-node3.............................................. ✓ (1s)
              tw-ceph-node1.............................................. ✓ (1s)
              tw-ceph-admin.............................................. ✓ (1s)
    
    [12/14]   ceph.mines on
              tw-ceph-node2.............................................. ✓ (2s)
              tw-ceph-node3.............................................. ✓ (2s)
              tw-ceph-node1.............................................. ✓ (2s)
              tw-ceph-admin.............................................. ✓ (2s)
    
    [13/14]   ceph.updates on
              tw-ceph-node2.............................................. ✓ (19s)
              tw-ceph-node3.............................................. ✓ (16s)
              tw-ceph-node1.............................................. ✓ (21s)
              tw-ceph-admin.............................................. ✓ (11s)
    
    [14/14]   ceph.updates.restart on
              tw-ceph-node2.............................................. ✓ (3s)
              tw-ceph-node3.............................................. ✓ (3s)
              tw-ceph-node1.............................................. ✓ (3s)
              tw-ceph-admin.............................................. ✓ (3s)
    
    Ended stage: ceph.stage.0 succeeded=14/14 time=92.7s

  3. #3

    Re: state.orch ceph.stage.discovery does not collect HDD inf

    Seems the problem is the lack of storage nodes and osds.

    So I can generate templates with the following command:
    Code:
    salt-run proposal.populate leftovers=True standalone=True target='tw-ceph-node*'
    but it generates empty osds list:
    Code:
    # cat  profile-default/stack/default/ceph/minions/tw-ceph-node1.yml
    ceph:
      storage:
        osds: {}
    Could you provide sample of .yml for storage role?

  4. Re: state.orch ceph.stage.discovery does not collect HDD inf

    Quote Originally Posted by vazaari View Post
    Seems the problem is the lack of storage nodes and osds.

    So I can generate templates with the following command:
    Code:
    salt-run proposal.populate leftovers=True standalone=True target='tw-ceph-node*'
    but it generates empty osds list:
    Code:
    # cat  profile-default/stack/default/ceph/minions/tw-ceph-node1.yml
    ceph:
      storage:
        osds: {}
    Could you provide sample of .yml for storage role?
    Here is an sample:

    ceph:
    storage:
    osds:
    /dev/disk/by-id/ata-ST4000VN0001-1SF178_Z4F0PS49:
    db: /dev/disk/by-id/nvme-20000000001000000e4d25c0bf42f4c00
    db_size: 500m
    format: bluestore
    wal: /dev/disk/by-id/nvme-20000000001000000e4d25c0bf42f4c00
    wal_size: 500m
    ....And so on for next osds.

    Thomas

  5. Re: state.orch ceph.stage.discovery does not collect HDD inf

    Code:
    ceph:
      storage:
        osds:
          /dev/disk/by-id/ata-ST4000VN0001-1SF178_Z4F0PS49:
            db: /dev/disk/by-id/nvme-20000000001000000e4d25c0bf42f4c00
            db_size: 500m
            format: bluestore
            wal: /dev/disk/by-id/nvme-20000000001000000e4d25c0bf42f4c00
            wal_size: 500m
    Reposting with better "layout"

    Thomas

  6. #6

    Re: state.orch ceph.stage.discovery does not collect HDD inf

    Thomas, thanks a lot.

    I've created the the yml with the following content:
    Code:
    ceph:
      storage:
        osds:
          /dev/sdb:
            format: bluestore
            standalone: true
          /dev/sdc:
            format: bluestore
            standalone: true
    stage.2 passed successfully
    stage.3 ends with errors
    Code:
    Module function osd.deploy threw an exception. Exception: Mine on tw-ceph-node1 for cephdisks.list
    hdds are still not recognized:
    Code:
    # salt 'tw-ceph-node*' cephdisks.list
    tw-ceph-node2:
    tw-ceph-node3:
    tw-ceph-node1:
    Nodes itself see the disks, f.e.:
    Code:
    hwinfo --disk | egrep 'sdb|sdc'
      SysFS ID: /class/block/sdb
      Device File: /dev/sdb (/dev/sg1)
      Device Files: /dev/sdb, /dev/disk/by-id/scsi-1ADAPTEC_ARRAY_4022A45B, /dev/disk/by-id/scsi-25ba4224000d00000, /dev/disk/by-id/scsi-SServeRA_disk1_4022A45B, /dev/disk/by-path/pci-0000:04:00.0-scsi-0:0:1:0
      SysFS ID: /class/block/sdc
      Device File: /dev/sdc (/dev/sg2)
      Device Files: /dev/sdc, /dev/disk/by-id/scsi-1ADAPTEC_ARRAY_354EA45B, /dev/disk/by-id/scsi-25ba44e3500d00000, /dev/disk/by-id/scsi-SServeRA_disk2_354EA45B, /dev/disk/by-path/pci-0000:04:00.0-scsi-0:0:2:0

  7. #7

    Re: state.orch ceph.stage.discovery does not collect HDD inf

    Seems that deepsea does not like hdds under raid controllers or some of them.
    In my case raid controller does not support jbod mode so I have to create volumes which recognised by host as ordinary hdds.

    I've tried to deploy ceph with 'ceph-deploy' and it also failed to create osds...
    But I've created one partition per each hdd and osds had been created successfully with

    Code:
    # ceph-deploy osd create tw-ceph-nodeY --data /dev/sdX1
    After that I played a little bit with destroy, purge,.. and get success with unpartitioned disk:

    - on the osd node side
    zap disk and reboot host
    ceph-volume lvm prepare --osd-id {id} --data /dev/sdX

    - on the ceph-deploy node
    ceph-deploy osd create tw-ceph-nodeY --data /dev/sdX

    Now I'm trying to 'migrate' ceph-deploy to deepsea (as in the case of ses[3,4] to ses5 ).
    I successfuly have passed stages 0,1,2, edited /srv/modules/runners/validate.py to bypass '4 nodes' requirement and stuck on the stage 3:

    Code:
    [13/44]   ceph.osd.auth on
              tw-ceph-admin.............................................. ❌ (2s)
    
    Ended stage: ceph.stage.3 succeeded=12/44 failed=1/44 time=85.1s
    
    Failures summary:
    
    ceph.osd.auth (/srv/salt/ceph/osd/auth):
      tw-ceph-admin:
        auth /srv/salt/ceph/osd/cache/bootstrap.keyring: Command "ceph auth add client.bootstrap-osd -i /srv/salt/ceph/osd/cache/bootstrap.keyring" run
            stdout: 
            stderr: Error EINVAL: entity client.bootstrap-osd exists but caps do not match
    Last edited by vazaari; 15-Mar-2018 at 13:06.

  8. #8

    Re: state.orch ceph.stage.discovery does not collect HDD inf

    The following trick helps to solve 'caps do not match' and pass the stage 3:
    Code:
    # ceph auth caps client.bootstrap-osd mgr \
     "allow r" mon "allow profile bootstrap-osd"

  9. #9

    Re: state.orch ceph.stage.discovery does not collect HDD inf

    Closer and closer...
    Faced the igw issue with stage 4:
    Code:
    [6/16]    ceph.igw on
              tw-ceph-node1.............................................. ❌ (4s)
    ...
    Ended stage: ceph.stage.4 succeeded=15/16 failed=1/16 time=547.9s
    ...
    Failures summary:
    
    ceph.igw (/srv/salt/ceph/igw):
      tw-ceph-node1:
        reload lrbd: Module function service.restart executed
        enable lrbd: Service lrbd has been enabled, and is dead
    Not yet solved.

  10. Re: state.orch ceph.stage.discovery does not collect HDD inf

    Quote Originally Posted by vazaari View Post
    Closer and closer...
    Faced the igw issue with stage 4:
    Code:
    [6/16]    ceph.igw on
              tw-ceph-node1.............................................. ❌ (4s)
    ...
    Ended stage: ceph.stage.4 succeeded=15/16 failed=1/16 time=547.9s
    ...
    Failures summary:
    
    ceph.igw (/srv/salt/ceph/igw):
      tw-ceph-node1:
        reload lrbd: Module function service.restart executed
        enable lrbd: Service lrbd has been enabled, and is dead
    Not yet solved.
    Maybe this is what you encountered: https://www.novell.com/support/kb/doc.php?id=7018668

    Thomas

Page 1 of 2 12 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •