ceph metasearch – elasticsearch backend – part 2


  • ceph cluster (kraken release)
  • elasticsearch

The rgw syncer is only used/triggered in multisite configurations – so we need to setup a second zone for the metasearch.

environment / settings

export rgwhost=""
export elastichost=""
export realm="demo"
export zonegrp="zone-1"
export 1zone="zone1-a"
export 2zone="zone1-b" # used for metasearch
export sync_akey="$( cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 20 | head -n 1 )"
export sync_skey="$( cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 40 | head -n 1 )"
export user_akey="$( cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 20 | head -n 1 )"
export user_skey="$( cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 40 | head -n 1 )"

setup (see also part1)

create first zone
# radosgw-admin realm create --rgw-realm=${realm} --default
# radosgw-admin zonegroup create --rgw-realm=${realm} --rgw-zonegroup=${zonegrp} --endpoints=http://${rgwhost}:80 --master --default
# radosgw-admin zone create --rgw-realm=${realm} --rgw-zonegroup=${zonegrp} --rgw-zone=${1zone} --endpoints=http://${rgwhost}:80 --access-key=${sync_akey} --secret=${sync_skey} --master --default
# radosgw-admin user create --uid=sync --display-name="zone sync" --access-key=${sync_akey} --secret=${sync_skey} --system
# radosgw-admin period update --commit
# systemctl restart ceph-radosgw@rgw.${rgwhost}
create second zone
# radosgw-admin zone create --rgw-realm=${realm} --rgw-zonegroup=${zonegrp} --rgw-zone=${2zone} --access-key=${sync_akey} --secret=${sync_skey} --endpoints=http://${rgwhost}:81
# radosgw-admin zone modify --rgw-realm=${realm} --rgw-zonegroup=${zonegrp} --rgw-zone=${2zone} --tier-type=elasticsearch --tier-config=endpoint=http://${elastichost}:9200,num_replicas=1,num_shards=10
# radosgw-admin period update --commit

Restart the first radosgw and the start the second radosgw. For example:

# screen -dmS rgw2zone radosgw --keyring /etc/ceph/ceph.client.admin.keyring -f --rgw-zone=${2zone} --rgw-frontends="civetweb port=81"

Check elasticsearch for the new index:

# curl http://${elastichost}:9200/_cat/indices | grep rgw-${realm}
yellow open rgw-demo    z0UiKOOFQl682yILobYbMw 5 1 1 0 11.7kb 11.7kb

modify header/metadata

create a user

radosgw-admin user create --uid=rmichel --display-name="rmichel" --access-key=${user_akey} --secret=${user_skey}

upload some test data….

s3cmd is configured with the ${user_akey} + ${user_skey} and the ${rgwhost}:80 as the endpoint.

# s3cmd modify --add-header x-amz-meta-color:green s3://bucket1/admin.key
modify: 's3://bucket1/admin.key'
# s3cmd info s3://bucket1/admin.key
s3://bucket1/admin.key (object):
   File size: 63
   Last mod:  Thu, 27 Apr 2017 21:14:55 GMT
   MIME type: text/plain
   Storage:   STANDARD
   MD5 sum:   ee40e385a45c4855bd360cfbdbd48711
   SSE:       none
   policy:    <?xml version="1.0" encoding="UTF-8"?><ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>bucket1</Name><Prefix></Prefix><Marker></Marker><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>admin.key</Key><LastModified>2017-04-27T21:14:55.494Z</LastModified><ETag>&quot;ee40e385a45c4855bd360cfbdbd48711&quot;</ETag><Size>63</Size><StorageClass>STANDARD</StorageClass><Owner><ID>rmichel</ID><DisplayName>rmichel</DisplayName></Owner></Contents></ListBucketResult>
   cors:      none
   ACL:       rmichel: FULL_CONTROL
   x-amz-meta-color: green
   x-amz-meta-s3cmd-attrs: uid:0/gname:root/uname:root/gid:0/mode:33152/mtime:1493326171/atime:1493326171/md5:ee40e385a45c4855bd360cfbdbd48711/ctime:1493326171

query elasticsearch

The radosgw creates a index with the name rgw-${realm} (ref ceph.git)

In my case the url is http://${elastichost}:9200/rgw-${realm}/

# curl | python -m json.tool
    "_shards": {
        "failed": 0,
        "successful": 5,
        "total": 5
    "hits": {
        "hits": [
                "_id": "d9b0c7a5-f9e5-4c6e-a0c2-48642840c98b.14125.1:admin.key:",
                "_index": "rgw-demo",
                "_score": 0.23691465,
                "_source": {
                    "bucket": "bucket1",
                    "instance": "",
                    "meta": {
                        "content_type": "text/plain",
                        "custom": {
                            "color": "green",
                            "s3cmd-attrs": "uid:0/gname:root/uname:root/gid:0/mode:33152/mtime:1493326171/atime:1493326171/md5:ee40e385a45c4855bd360cfbdbd48711/ctime:1493326171"
                        "etag": "ee40e385a45c4855bd360cfbdbd48711",
                        "mtime": "2017-04-27T21:14:55.483Z",
                        "size": 63,
                        "x-amz-copy-source": "/bucket1/admin.key",
                        "x-amz-date": "Thu, 27 Apr 2017 21:14:55 +0000",
                        "x-amz-metadata-directive": "REPLACE"
                    "name": "admin.key",
                    "owner": {
                        "display_name": "rmichel",
                        "id": "rmichel"
                    "permissions": [
                "_type": "object"
        "max_score": 0.23691465,
        "total": 1
    "timed_out": false,
    "took": 102

ceph radosgw (set)lifecycle – AWS v4 is broken

First – s3cmd config
Setting signature_v2 = true is not enough! You have to set --signature-v2 as a parameter.

Second – ‘Prefix’ tag
You have specify a Prefix tag – and yes with a captial P!

without prefix tag
[root@kraken ~]# s3cmd setlifecycle lc.xml s3://bucket1 --signature-v2
ERROR: S3 error: 403 (AccessDenied)
with closing prefix tag
[root@kraken ~]# s3cmd setlifecycle lc.xml s3://bucket1 --signature-v2
ERROR: S3 error: 403 (AccessDenied)
with prefix tag – working!
[root@kraken ~]# s3cmd setlifecycle lc.xml s3://bucket1 --signature-v2
s3://bucket1/: Lifecycle Policy updated

ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7)
s3cmd version 1.6.1

ceph – setting up rbd-mirror between two ceph clusters

2x ceph cluster (aio) running centos 7.2 /w ceph jewel. Added a 2nd crush rule to both clusters:

rule rep_osd {
	ruleset 1
	type replicated
	min_size 1
	max_size 10
	step take default
	step choose firstn 0 type osd
	step emit

(ceph crush map)


Install the rbd-mirror package in both sides. Technically they can run on any host even when they are not part of the cluster.

[root@ceph01 ~]# yum install -y rbd-mirror
[root@ceph04 ~]# yum install -y rbd-mirror
[root@ceph01 ~]# rbd --cluster primary mirror pool info
Mode: disabled
[root@ceph04 ~]# rbd --cluster secondary mirror pool info
Mode: disabled

Check that the cluster name is set. All systemd unit files are including that file during the startup.

[root@ceph01 ~]# grep -i cluster /etc/sysconfig/ceph 
[root@ceph04 ~]# grep -i cluster /etc/sysconfig/ceph 

Create a key on both clusters which is able to access (rwx) the pool. (ceph authorization (caps))

[root@ceph01 ~]# ceph --cluster primary auth get-or-create client.primary mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=rbd' -o /etc/ceph/primary.client.primary.keyring
[root@ceph04 ~]# ceph --cluster secondary auth get-or-create client.secondary mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=rbd' -o /etc/ceph/secondary.client.secondary.keyring

Enable pool mirroring and verify that it is active.

[root@ceph01 ~]# rbd --cluster primary mirror pool enable rbd pool
[root@ceph01 ~]# rbd --cluster primary mirror pool info
Mode: pool
Peers: none
[root@ceph04 ~]# rbd --cluster secondary mirror pool enable rbd pool
[root@ceph04 ~]# rbd --cluster secondary mirror pool info
Mode: pool
Peers: none

Copy the keys and configs between the clusters. The rbd-mirror in the primary cluster requires the key from the secondary and vice versa.

[root@ceph01 ~]# scp /etc/ceph/primary.client.primary.keyring /etc/ceph/primary.conf root@ceph04:/etc/ceph/
[root@ceph04 ~]# scp /etc/ceph/secondary.client.secondary.keyring /etc/ceph/secondary.conf root@ceph01:/etc/ceph/

Enable/start the ceph-rbd-mirror – extend the unit name with the local cluster name.

[root@ceph01 ceph]# systemctl start ceph-rbd-mirror@primary
[root@ceph04 ceph]# systemctl start ceph-rbd-mirror@secondary

Add the remote cluster as a peer. Example: client.secondary represent the key name and @secondary the cluster name. That mean rbd-mirror is looking for a key like /etc/ceph/secondary.client.secondary.keyring.

[root@ceph01 ceph]# rbd --cluster primary mirror pool peer add rbd client.secondary@secondary 
[root@ceph04 ceph]# rbd --cluster secondary mirror pool peer add rbd client.primary@primary

Thats it! Now create a rbd image and activate the exclusive-lock and journaling feature. (man 8 rbd)

[root@ceph01 ceph]# rbd --cluster primary create test-1 --size 5M --image-feature exclusive-lock,journaling
[root@ceph01 ceph]# rbd --cluster primary create test-2 --size 5M --image-feature exclusive-lock,journaling

The test-1 image is active on the primary cluster, test-2 is active on the secondary cluster.

[root@ceph04 ceph]# rbd --cluster secondary mirror image demote rbd/test-1
[root@ceph01 ceph]# rbd --cluster primary mirror image promote rbd/test-1

[root@ceph01 ceph]# rbd --cluster primary mirror image demote rbd/test-2
[root@ceph04 ceph]# rbd --cluster secondary mirror image promote rbd/test-2
[root@ceph01 ceph]# rbd --cluster primary mirror pool status --verbose
health: OK
images: 2 total
    1 replaying
    1 stopped

  global_id:   ed021ec4-2a44-4b9f-9efa-10590ffcb916
  state:       up+stopped
  description: remote image is non-primary or local image is primary
  last_update: 2016-10-14 14:49:07

  global_id:   d99bbff5-14fb-4e07-a596-69e55608f14a
  state:       up+replaying
  description: replaying, master_position=[object_number=3, tag_tid=4, entry_tid=3], mirror_position=[object_number=3, tag_tid=4, entry_tid=3], entries_behind_master=0
  last_update: 2016-10-14 14:49:09

[root@ceph01 ceph]# rbd --cluster primary ls -l
test-1 5120k          2           
test-2 5120k          2      excl 
[root@ceph04 ceph]# rbd --cluster secondary mirror pool status --verbose
health: OK
images: 2 total
    1 replaying
    1 stopped

  global_id:   ed021ec4-2a44-4b9f-9efa-10590ffcb916
  state:       up+replaying
  description: replaying, master_position=[object_number=0, tag_tid=3, entry_tid=0], mirror_position=[object_number=0, tag_tid=3, entry_tid=0], entries_behind_master=0
  last_update: 2016-10-14 14:49:21

  global_id:   d99bbff5-14fb-4e07-a596-69e55608f14a
  state:       up+stopped
  description: remote image is non-primary or local image is primary
  last_update: 2016-10-14 14:49:21

[root@ceph04 ceph]# rbd --cluster secondary ls -l
test-1 5120k          2      excl 
test-2 5120k          2   

Google Software Updater fuckups

To disable the ksfetch (ks = keystone) daemon (which comes with google products) there are several ways to do this.

  1. Uninstall the Google Software Update Agent
        $ /Library/Google/GoogleSoftwareUpdate/GoogleSoftwareUpdate.bundle/Contents/Resources/GoogleSoftwareUpdateAgent.app/Contents/Resources/ksinstall [--nuke]

    The --nuke parameter will also remove ksfetch related stuff.

  2. Set the checkInterval to the maximum (24h). Default is 5h (18000)
        $ defaults read com.google.Keystone.Agent checkInterval
        $ defaults write com.google.Keystone.Agent checkInterval 86400

entropy inside a virtual machine

Sometimes my ceph-(test!)deployments inside a VM failed.

The Problem is that the kernel/cpu can not provide enough entropy (random numbers) for the ceph-create-keys command – so it stuck/hang. It is not a ceph problem! This can also happen with ssl commands.

But first things first – we need to check the available entropy on a system:

cat /proc/sys/kernel/random/entropy_avail

The read-only file entropy_avail gives the available entropy.
Normally, this will be 4096 (bits), a full entropy pool (see man 4 random)

Values less than 100-200, means you have a problem!

For a virtual machine we can create a new device – virtio-rng. Here is a xml example for libvirt.

<rng model='virtio'>
  <backend model='random'>/dev/random</backend>

That is ok for ONE virtual machine on the hypervisor. Usually we find more than one virtual machine. Therefore we need to install the rng-tools package on the virtual machines.

$pkgmgr install rng-tools
systemctl enable rngd
systemctl start rngd

That’s it! That solved a lot of my problems πŸ˜‰

openvswitch and OpenFlow


Layer 1

ovs-ofctl del-flow BRIDGE
ovs-ofctl add-flow BRIDGE priority=500,in_port=1,actions=output:2
ovs-ofctl add-flow BRIDGE priority=500,in_port=2,actions=output:1
ovs-ofctl dump-flows BRIDGE

Layer 2

ovs-ofctl del-flow BRIDGE
ovs-ofctl add-flow BRIDGE dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,actions=output:2
ovs-ofctl add-flow BRIDGE dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:01,actions=output:1
ovs-ofctl add-flow BRIDGE dl_type=0x806,nw_proto=1,actions=flood
ovs-ofctl dump-flows BRIDGE 

Layer 3

ovs-ofctl del-flow BRIDGE
ovs-ofctl add-flow BRIDGE priority=500,dl_type=0x800,nw_src=,nw_dst=,actions=normal
ovs-ofctl add-flow BRIDGE priority=800,ip,nw_src=,actions=mod_nw_tos=184,normal
ovs-ofctl add-flow BRIDGE arp,nw_dst=,actions=output:1
ovs-ofctl add-flow BRIDGE arp,nw_dst=,actions=output:2
ovs-ofctl add-flow BRIDGE arp,nw_dst=,actions=output:3
ovs-ofctl dump-flows BRIDGE 

Layer 4

ovs-ofctl del-flow BRIDGE 
ovs-ofctl add-flow BRIDGE arp,actions=normal
ovs-ofctl add-flow BRIDGE priority=500,dl_type=0x800,nw_proto=6,tp_dst=80,actions=output:3
ovs-ofctl add-flow BRIDGE priority=800,ip,nw_src=,actions=normal
ovs-ofctl dump-flows BRIDGE 



Priority rules

When no priority is set is the default – 32768! Allowed values are from 0 to 65536. A higher priority will match at first.


dl_type and nw_proto

dl_type and nw_proto are filters to match a specific network packet. Generally dl_type is for L2 (matches ethertype) and nw_proto (matches IP protocol type) for L3 actions. For example:

dl_type=0x800 – for ipv4 packets

dl_type=0x86dd – for ipv6 packets

dl_type=0x806 and nw_proto=1 – match only arp requests (ARP opcode, see layer 2)

dl_type=0x800 or ip (as keyword, see layer 3) has the same meaning

ip and nw_proto=17 – udp packets

ip and nw_proto=6 – tcp packets

Parameters for actions can be (excerpt)

  • normal – Default mode, OVS acts like a normal L2 switch
  • drop – drops all packets
  • output – defineΒ the output port for a packet/rule
  • resubmit – useful for multiple tables, resend a packet to a port or table
  • flood – forword all packets on all port except the port on which it was received
  • strip_vlan – remove a vlan tag from a packet
  • set_tunnel – set a tunnel id (gre & vxlan)
  • mod_vlan_vid – add a vlan tag for a packet
  • learn – complex foo πŸ˜‰

ovs-ofctl man page

Example from a openstack node (w/ GRE, see table 22) – ovs flows from the br-tun device

[root@node1 ~]# ovs-ofctl dump-flows br-tun
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=1221.218s, table=0, n_packets=0, n_bytes=0, idle_age=1221, priority=0 actions=drop
cookie=0x0, duration=1221.323s, table=0, n_packets=747, n_bytes=54800, idle_age=0, priority=1,in_port=1 actions=resubmit(,2)
cookie=0x0, duration=1220.226s, table=0, n_packets=0, n_bytes=0, idle_age=1220, priority=1,in_port=2 actions=resubmit(,3)
cookie=0x0, duration=1221.126s, table=2, n_packets=0, n_bytes=0, idle_age=1221, priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)
cookie=0x0, duration=1221.051s, table=2, n_packets=747, n_bytes=54800, idle_age=0, priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,22)
cookie=0x0, duration=1220.974s, table=3, n_packets=0, n_bytes=0, idle_age=1220, priority=0 actions=drop
cookie=0x0, duration=1218.706s, table=3, n_packets=0, n_bytes=0, idle_age=1218, priority=1,tun_id=0x3f7 actions=mod_vlan_vid:1,resubmit(,10)
cookie=0x0, duration=1217.462s, table=3, n_packets=0, n_bytes=0, idle_age=1217, priority=1,tun_id=0x442 actions=mod_vlan_vid:2,resubmit(,10)
cookie=0x0, duration=1220.898s, table=4, n_packets=0, n_bytes=0, idle_age=1220, priority=0 actions=drop
cookie=0x0, duration=1220.821s, table=10, n_packets=0, n_bytes=0, idle_age=1220, priority=1 actions=learn(table=20,hard_timeout=300,priority=1,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0-&gt;NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]-&gt;NXM_NX_TUN_ID[],output:NXM_OF_IN_PORT[]),output:1
cookie=0x0, duration=1220.742s, table=20, n_packets=0, n_bytes=0, idle_age=1220, priority=0 actions=resubmit(,22)
cookie=0x0, duration=1220.666s, table=22, n_packets=137, n_bytes=21860, idle_age=13, priority=0 actions=drop
cookie=0x0, duration=1220.093s, table=22, n_packets=610, n_bytes=32940, idle_age=0, hard_age=1217, dl_vlan=2 actions=strip_vlan,set_tunnel:0x442,output:2
cookie=0x0, duration=1219.970s, table=22, n_packets=0, n_bytes=0, idle_age=1219, hard_age=1218, dl_vlan=1 actions=strip_vlan,set_tunnel:0x3f7,output:2

Gentoo – initramfs with busybox, lvm and some more…


mkdir -p /usr/src/initramfs/{bin,lib/modules,dev,etc,mnt/root,proc,root,sbin,sys}
cp -a /dev/{null,console,tty,sda*} /usr/src/initramfs/dev/


USE="static make-symlinks -pam -savedconfig" emerge --root=/usr/src/initramfs/ -av busybox

LVM provides already a static binary πŸ™‚

cp /sbin/lvm.static /usr/src/initramfs/lvm

ldap initial configuration

A more or less initial configuration for openldap (>2.4)

# to import run:
# ldapmodify -Y EXTERNAL -H ldapi:/// -f $filename
# to verfiy run:
# ldapsearch -Y EXTERNAL -H ldapi:/// -b "olcDatabase={1}hdb,cn=config"
# to create a password:
# slappasswd -h {SSHA} -s admin

dn: olcDatabase={1}hdb,cn=config
changetype: modify
replace: olcSuffix
olcSuffix: dc=example,dc=de
replace: olcAccess
olcAccess: {0}to attrs=userPassword,shadowLastChange by dn="cn=admin,dc=example,dc=de" write by anonymous auth by self write by * none
olcAccess: {1}to dn.base="" by * read
olcAccess: {2}to * by self write by dn="cn=admin,dc=example,dc=de" write by * read
replace: olcRootDN
olcRootDN: cn=admin,dc=example,dc=de
replace: olcRootPW
olcRootPW: {SSHA}4RHgrU6ghLqA21CNI8biQblHtEodToyd

TLS config

dn: cn=config
changetype: modify
add: olcTLSCipherSuite
olcTLSCipherSuite: AES128+EECDH:AES128+EDH
add: olcTLSCACertificateFile
olcTLSCACertificateFile: /etc/ssl/ca.crt
add: olcTLSCertificateFile
olcTLSCertificateFile: /etc/ssl/cert.crt
add: olcTLSCertificateKeyFile
olcTLSCertificateKeyFile: /etc/ssl/keyfile.key
add: olcTLSVerifyClient
# never - allow - try - demand
olcTLSVerifyClient: demand

openldap – tls config
openldap – access

systemd – abstract

Rescue Mode


Analyzing the boot process

* systemd-analyze
* systemd-analyze blame
* systemd-analyze plot > /tmp/plot.svg

Start/Stop/Disable services

* systemctl start/stop/restart/mask [service]
* systemctl daemon-reload
* systemctl list-units –type=[timer,service,target,mounts,…]


* journalctl -u ssh
* _PID=1
* -b

Custom unit