ceph – setting up rbd-mirror between two ceph clusters

Environment
2x ceph cluster (aio) running centos 7.2 /w ceph jewel. Added a 2nd crush rule to both clusters:

rule rep_osd {
	ruleset 1
	type replicated
	min_size 1
	max_size 10
	step take default
	step choose firstn 0 type osd
	step emit
}

(ceph crush map)

Setup

Install the rbd-mirror package in both sides. Technically they can run on any host even when they are not part of the cluster.

[root@ceph01 ~]# yum install -y rbd-mirror
[root@ceph04 ~]# yum install -y rbd-mirror
[root@ceph01 ~]# rbd --cluster primary mirror pool info
Mode: disabled
[root@ceph04 ~]# rbd --cluster secondary mirror pool info
Mode: disabled

Check that the cluster name is set. All systemd unit files are including that file during the startup.

[root@ceph01 ~]# grep -i cluster /etc/sysconfig/ceph 
CLUSTER=primary
[root@ceph04 ~]# grep -i cluster /etc/sysconfig/ceph 
CLUSTER=secondary

Create a key on both clusters which is able to access (rwx) the pool. (ceph authorization (caps))

[root@ceph01 ~]# ceph --cluster primary auth get-or-create client.primary mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=rbd' -o /etc/ceph/primary.client.primary.keyring
[root@ceph04 ~]# ceph --cluster secondary auth get-or-create client.secondary mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=rbd' -o /etc/ceph/secondary.client.secondary.keyring

Enable pool mirroring and verify that it is active.

[root@ceph01 ~]# rbd --cluster primary mirror pool enable rbd pool
[root@ceph01 ~]# rbd --cluster primary mirror pool info
Mode: pool
Peers: none
[root@ceph04 ~]# rbd --cluster secondary mirror pool enable rbd pool
[root@ceph04 ~]# rbd --cluster secondary mirror pool info
Mode: pool
Peers: none

Copy the keys and configs between the clusters. The rbd-mirror in the primary cluster requires the key from the secondary and vice versa.

[root@ceph01 ~]# scp /etc/ceph/primary.client.primary.keyring /etc/ceph/primary.conf root@ceph04:/etc/ceph/
primary.client.primary.keyring
primary.conf
[root@ceph04 ~]# scp /etc/ceph/secondary.client.secondary.keyring /etc/ceph/secondary.conf root@ceph01:/etc/ceph/
secondary.client.secondary.keyring  
secondary.conf

Enable/start the ceph-rbd-mirror – extend the unit name with the local cluster name.

[root@ceph01 ceph]# systemctl start ceph-rbd-mirror@primary
[root@ceph04 ceph]# systemctl start ceph-rbd-mirror@secondary

Add the remote cluster as a peer. Example: client.secondary represent the key name and @secondary the cluster name. That mean rbd-mirror is looking for a key like /etc/ceph/secondary.client.secondary.keyring.

[root@ceph01 ceph]# rbd --cluster primary mirror pool peer add rbd client.secondary@secondary 
49c28a78-ef7d-4f12-b003-7ce69f091b85
[root@ceph04 ceph]# rbd --cluster secondary mirror pool peer add rbd client.primary@primary
02053868-7dd7-4029-b287-53a205fdd668

Thats it! Now create a rbd image and activate the exclusive-lock and journaling feature. (man 8 rbd)

[root@ceph01 ceph]# rbd --cluster primary create test-1 --size 5M --image-feature exclusive-lock,journaling
[root@ceph01 ceph]# rbd --cluster primary create test-2 --size 5M --image-feature exclusive-lock,journaling

The test-1 image is active on the primary cluster, test-2 is active on the secondary cluster.

[root@ceph04 ceph]# rbd --cluster secondary mirror image demote rbd/test-1
[root@ceph01 ceph]# rbd --cluster primary mirror image promote rbd/test-1

[root@ceph01 ceph]# rbd --cluster primary mirror image demote rbd/test-2
[root@ceph04 ceph]# rbd --cluster secondary mirror image promote rbd/test-2
[root@ceph01 ceph]# rbd --cluster primary mirror pool status --verbose
health: OK
images: 2 total
    1 replaying
    1 stopped

test-1:
  global_id:   ed021ec4-2a44-4b9f-9efa-10590ffcb916
  state:       up+stopped
  description: remote image is non-primary or local image is primary
  last_update: 2016-10-14 14:49:07

test-2:
  global_id:   d99bbff5-14fb-4e07-a596-69e55608f14a
  state:       up+replaying
  description: replaying, master_position=[object_number=3, tag_tid=4, entry_tid=3], mirror_position=[object_number=3, tag_tid=4, entry_tid=3], entries_behind_master=0
  last_update: 2016-10-14 14:49:09

[root@ceph01 ceph]# rbd --cluster primary ls -l
NAME    SIZE PARENT FMT PROT LOCK 
test-1 5120k          2           
test-2 5120k          2      excl 
[root@ceph04 ceph]# rbd --cluster secondary mirror pool status --verbose
health: OK
images: 2 total
    1 replaying
    1 stopped

test-1:
  global_id:   ed021ec4-2a44-4b9f-9efa-10590ffcb916
  state:       up+replaying
  description: replaying, master_position=[object_number=0, tag_tid=3, entry_tid=0], mirror_position=[object_number=0, tag_tid=3, entry_tid=0], entries_behind_master=0
  last_update: 2016-10-14 14:49:21

test-2:
  global_id:   d99bbff5-14fb-4e07-a596-69e55608f14a
  state:       up+stopped
  description: remote image is non-primary or local image is primary
  last_update: 2016-10-14 14:49:21

[root@ceph04 ceph]# rbd --cluster secondary ls -l
NAME    SIZE PARENT FMT PROT LOCK 
test-1 5120k          2      excl 
test-2 5120k          2