ceph - wrong osd id with lvm+filestore

Posted on Wed 04 July 2018 in Ceph • 2 min read

Not sure why...but i've found a strange ceph-volume behavior with lvm and filestore.

ceph-volume lvm list shows the wrong osd id while the affected osd is online with a another id.

$ mount | grep ceph-2
/dev/mapper/vg00-datalv1 on /var/lib/ceph/osd/ceph-2 type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
$ cat /var/lib/ceph/osd/ceph-2/whoami 
2
$ sudo ceph osd metadata osd.2 | egrep "id|objectstore"
    "id": 2,
    "osd_objectstore": "filestore",
$ sudo ceph-volume lvm list
[...]
====== osd.8 =======

  [data]    /dev/vg00/datalv1

      type                      data
      journal uuid              XqM6CP-embw-gIfs-UN2Q-gRDR-TVWP-y1q5Te
      osd id                    8
      cluster fsid              ed62dbfb-f0f7-4b13-ace0-4ccea0c4a6bf
      cluster name              ceph
      osd fsid                  38e7bfb3-ad57-4979-b8a9-3f875e6cb6f5
      encrypted                 0
      data uuid                 W3h12f-xg3y-ij1Z-F70h-yx2n-SyD9-ioNEC7
      cephx lockbox secret
      crush device class        None
      data device               /dev/vg00/datalv1
      vdo                       0
      journal device            /dev/vg00/journallv1

  [journal]    /dev/vg00/journallv1

      type                      journal
      journal uuid              XqM6CP-embw-gIfs-UN2Q-gRDR-TVWP-y1q5Te
      osd id                    8
      cluster fsid              ed62dbfb-f0f7-4b13-ace0-4ccea0c4a6bf
      cluster name              ceph
      osd fsid                  38e7bfb3-ad57-4979-b8a9-3f875e6cb6f5
      encrypted                 0
      data uuid                 W3h12f-xg3y-ij1Z-F70h-yx2n-SyD9-ioNEC7
      cephx lockbox secret
      crush device class        None
      data device               /dev/vg00/datalv1
      vdo                       0
      journal device            /dev/vg00/journallv1

And if you try to start the osd via ceph-volume lvm trigger with the "wrong" ID 8 it will...

$ sudo ceph-volume lvm trigger 8-38e7bfb3-ad57-4979-b8a9-3f875e6cb6f5
Running command: mount -t xfs -o rw,noatime,inode64 /dev/vg00/datalv1 /var/lib/ceph/osd/ceph-8
Running command: ln -snf /dev/vg00/journallv1 /var/lib/ceph/osd/ceph-8/journal
Running command: chown -R ceph:ceph /dev/dm-2
Running command: systemctl enable ceph-volume@lvm-8-38e7bfb3-ad57-4979-b8a9-3f875e6cb6f5
Running command: systemctl start ceph-osd@8
--> ceph-volume lvm activate successful for osd ID: 8

$ sudo cat /var/log/ceph/ceph-osd.8.log
2018-07-04 19:28:34.754576 7f346e67fd80  0 set uid:gid to 167:167 (ceph:ceph)
2018-07-04 19:28:34.754598 7f346e67fd80  0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 3755
2018-07-04 19:28:34.754872 7f346e67fd80 -1 OSD id 2 != my id 8

FAIL! Same with the correct ID 2...

[vagrant@ceph-osd2 ~]$ sudo ceph-volume lvm trigger 2-38e7bfb3-ad57-4979-b8a9-3f875e6cb6f5
-->  RuntimeError: could not find osd.2 with fsid 38e7bfb3-ad57-4979-b8a9-3f875e6cb6f5

To fix that problem we need to adjust the datatag: ceph.osd_id on the LVM device.

$ sudo lvs -o lv_tags vg00/datalv1
  LV Tags                                                                                                                                                                                                                                                                                                                                                                                                                                     
  ceph.cephx_lockbox_secret=,ceph.cluster_fsid=ed62dbfb-f0f7-4b13-ace0-4ccea0c4a6bf,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.data_device=/dev/vg00/datalv1,ceph.data_uuid=W3h12f-xg3y-ij1Z-F70h-yx2n-SyD9-ioNEC7,ceph.encrypted=0,ceph.journal_device=/dev/vg00/journallv1,ceph.journal_uuid=XqM6CP-embw-gIfs-UN2Q-gRDR-TVWP-y1q5Te,ceph.osd_fsid=38e7bfb3-ad57-4979-b8a9-3f875e6cb6f5,ceph.osd_id=8,ceph.type=data,ceph.vdo=0
$ sudo lvs -o lv_tags vg00/journallv1
  LV Tags                                                                                                                                                                                                                                                                                                                                                                                                                                        
  ceph.cephx_lockbox_secret=,ceph.cluster_fsid=ed62dbfb-f0f7-4b13-ace0-4ccea0c4a6bf,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.data_device=/dev/vg00/datalv1,ceph.data_uuid=W3h12f-xg3y-ij1Z-F70h-yx2n-SyD9-ioNEC7,ceph.encrypted=0,ceph.journal_device=/dev/vg00/journallv1,ceph.journal_uuid=XqM6CP-embw-gIfs-UN2Q-gRDR-TVWP-y1q5Te,ceph.osd_fsid=38e7bfb3-ad57-4979-b8a9-3f875e6cb6f5,ceph.osd_id=8,ceph.type=journal,ceph.vdo=0
  1. Remove the old datatag

    lvchange --deltag ceph.osd_id=8 vg00/datalv1
    lvchange --deltag ceph.osd_id=8 vg00/journallv1
  2. Add the correct datatag

    lvchange --addtag ceph.osd_id=2 vg00/datalv1
    lvchange --addtag ceph.osd_id=2 vg00/journallv1

And et voilà

$ sudo ceph-volume lvm trigger 2-38e7bfb3-ad57-4979-b8a9-3f875e6cb6f5
Running command: mount -t xfs -o rw,noatime,inode64 /dev/vg00/datalv1 /var/lib/ceph/osd/ceph-2
Running command: ln -snf /dev/vg00/journallv1 /var/lib/ceph/osd/ceph-2/journal
Running command: chown -R ceph:ceph /dev/dm-2
Running command: systemctl enable ceph-volume@lvm-2-38e7bfb3-ad57-4979-b8a9-3f875e6cb6f5
 stderr: Created symlink from /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-2-38e7bfb3-ad57-4979-b8a9-3f875e6cb6f5.service to /usr/lib/systemd/system/ceph-volume@.service.
Running command: systemctl start ceph-osd@2
--> ceph-volume lvm activate successful for osd ID: 2
$ sudo cat /var/log/ceph/ceph-osd.2.log
2018-07-04 19:40:04.075588 7fa9cbf6bd80  0 set uid:gid to 167:167 (ceph:ceph)                                                                                                                                                                                                                                                
2018-07-04 19:40:04.075608 7fa9cbf6bd80  0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 4165                                                                                                                                                                     
2018-07-04 19:40:04.080821 7fa9cbf6bd80  0 pidfile_write: ignore empty --pid-file                       
2018-07-04 19:40:04.109636 7fa9cbf6bd80  0 load: jerasure load: lrc load: isa                                                                                                                                                                                                                                                
2018-07-04 19:40:04.110273 7fa9cbf6bd80  0 filestore(/var/lib/ceph/osd/ceph-2) backend xfs (magic 0x58465342)                                                                                                                                                                                       
2018-07-04 19:40:04.121305 7fa9cbf6bd80  0 filestore(/var/lib/ceph/osd/ceph-2) start omap initiation
[...]