ceph metasearch – elasticsearch backend

Fetch zonegroup configuration (json struct)
# radosgw-admin zonegroup get > /tmp/zonegroup.json

change the tier_type to elasticsearch

Import the configuration
# radosgw-admin zonegroup set --infile /tmp/zonegroup.json
Fetch zone configuration (json struct)
# radosgw-admin zone get > /tmp/zone.json

Add the following parameter endpoint & {url} for the section tier_config

    "tier_config": [
        {
            "key": "endpoint",
            "val": "http:\/\/192.168.122.71:9200"
        }
    ],
Import the configuration
# radosgw-admin zone set --infile /tmp/zone.json

OR

# radosgw-admin zone modify --rgw-zonegroup={zonegroup-name} --rgw-zone={zone-name} --tier-config=endpoint={url}
Update & Commit
# radosgw-admin period update --commit

to be continued… part2

Fixing ceph partition uuid or OSD data dir is not mounted

OSD_UUID

4fbd7e29-9d25-41b8-afd0-062c0ceff05d

JOURNAL_UUID

45b0969e-9b03-4f30-b4c6-b4b80ceff106

To fix the partition uuid

sgdisk --info=##partnr## -t ##partnr##:##part-uuid## /dev/##disk##

eg.
sgdisk --info=1 -t 1:4fbd7e29-9d25-41b8-afd0-062c0ceff05d /dev/sda1

Ref: /lib/udev/rules.d/95-ceph-osd.rules

[notepad] ceph journal size/ssd speed

ceph journal size (doc)

osd journal size = {2 * (expected throughput * filestore max sync interval)}

The default for filestore max sync interval is 5 therefore for a 10Gbit network the “perfect” size would be

osd journal size = { 2 * ( 1280 * 5 ) } = 12.5 GB

ceph ssd speed (journal)

The optimum would be sum of all disk seq write speeds – 11 disks with ~110mb/s = ~1210mb/s – an Intel P3520 might would fit.

How many journals per ssd?

Oh thats easy.

Journals = (ssd seq write speed) / (hdd seq write speed)

Journals = 1350 / 115 = ~11

(For the Intel P3520 with 11 hdds)

ceph – setting up rbd-mirror between two ceph clusters

Environment
2x ceph cluster (aio) running centos 7.2 /w ceph jewel. Added a 2nd crush rule to both clusters:

rule rep_osd {
	ruleset 1
	type replicated
	min_size 1
	max_size 10
	step take default
	step choose firstn 0 type osd
	step emit
}

(ceph crush map)

Setup

Install the rbd-mirror package in both sides. Technically they can run on any host even when they are not part of the cluster.

[root@ceph01 ~]# yum install -y rbd-mirror
[root@ceph04 ~]# yum install -y rbd-mirror
[root@ceph01 ~]# rbd --cluster primary mirror pool info
Mode: disabled
[root@ceph04 ~]# rbd --cluster secondary mirror pool info
Mode: disabled

Check that the cluster name is set. All systemd unit files are including that file during the startup.

[root@ceph01 ~]# grep -i cluster /etc/sysconfig/ceph 
CLUSTER=primary
[root@ceph04 ~]# grep -i cluster /etc/sysconfig/ceph 
CLUSTER=secondary

Create a key on both clusters which is able to access (rwx) the pool. (ceph authorization (caps))

[root@ceph01 ~]# ceph --cluster primary auth get-or-create client.primary mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=rbd' -o /etc/ceph/primary.client.primary.keyring
[root@ceph04 ~]# ceph --cluster secondary auth get-or-create client.secondary mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=rbd' -o /etc/ceph/secondary.client.secondary.keyring

Enable pool mirroring and verify that it is active.

[root@ceph01 ~]# rbd --cluster primary mirror pool enable rbd pool
[root@ceph01 ~]# rbd --cluster primary mirror pool info
Mode: pool
Peers: none
[root@ceph04 ~]# rbd --cluster secondary mirror pool enable rbd pool
[root@ceph04 ~]# rbd --cluster secondary mirror pool info
Mode: pool
Peers: none

Copy the keys and configs between the clusters. The rbd-mirror in the primary cluster requires the key from the secondary and vice versa.

[root@ceph01 ~]# scp /etc/ceph/primary.client.primary.keyring /etc/ceph/primary.conf root@ceph04:/etc/ceph/
primary.client.primary.keyring
primary.conf
[root@ceph04 ~]# scp /etc/ceph/secondary.client.secondary.keyring /etc/ceph/secondary.conf root@ceph01:/etc/ceph/
secondary.client.secondary.keyring  
secondary.conf

Enable/start the ceph-rbd-mirror – extend the unit name with the local cluster name.

[root@ceph01 ceph]# systemctl start ceph-rbd-mirror@primary
[root@ceph04 ceph]# systemctl start ceph-rbd-mirror@secondary

Add the remote cluster as a peer. Example: client.secondary represent the key name and @secondary the cluster name. That mean rbd-mirror is looking for a key like /etc/ceph/secondary.client.secondary.keyring.

[root@ceph01 ceph]# rbd --cluster primary mirror pool peer add rbd client.secondary@secondary 
49c28a78-ef7d-4f12-b003-7ce69f091b85
[root@ceph04 ceph]# rbd --cluster secondary mirror pool peer add rbd client.primary@primary
02053868-7dd7-4029-b287-53a205fdd668

Thats it! Now create a rbd image and activate the exclusive-lock and journaling feature. (man 8 rbd)

[root@ceph01 ceph]# rbd --cluster primary create test-1 --size 5M --image-feature exclusive-lock,journaling
[root@ceph01 ceph]# rbd --cluster primary create test-2 --size 5M --image-feature exclusive-lock,journaling

The test-1 image is active on the primary cluster, test-2 is active on the secondary cluster.

[root@ceph04 ceph]# rbd --cluster secondary mirror image demote rbd/test-1
[root@ceph01 ceph]# rbd --cluster primary mirror image promote rbd/test-1

[root@ceph01 ceph]# rbd --cluster primary mirror image demote rbd/test-2
[root@ceph04 ceph]# rbd --cluster secondary mirror image promote rbd/test-2
[root@ceph01 ceph]# rbd --cluster primary mirror pool status --verbose
health: OK
images: 2 total
    1 replaying
    1 stopped

test-1:
  global_id:   ed021ec4-2a44-4b9f-9efa-10590ffcb916
  state:       up+stopped
  description: remote image is non-primary or local image is primary
  last_update: 2016-10-14 14:49:07

test-2:
  global_id:   d99bbff5-14fb-4e07-a596-69e55608f14a
  state:       up+replaying
  description: replaying, master_position=[object_number=3, tag_tid=4, entry_tid=3], mirror_position=[object_number=3, tag_tid=4, entry_tid=3], entries_behind_master=0
  last_update: 2016-10-14 14:49:09

[root@ceph01 ceph]# rbd --cluster primary ls -l
NAME    SIZE PARENT FMT PROT LOCK 
test-1 5120k          2           
test-2 5120k          2      excl 
[root@ceph04 ceph]# rbd --cluster secondary mirror pool status --verbose
health: OK
images: 2 total
    1 replaying
    1 stopped

test-1:
  global_id:   ed021ec4-2a44-4b9f-9efa-10590ffcb916
  state:       up+replaying
  description: replaying, master_position=[object_number=0, tag_tid=3, entry_tid=0], mirror_position=[object_number=0, tag_tid=3, entry_tid=0], entries_behind_master=0
  last_update: 2016-10-14 14:49:21

test-2:
  global_id:   d99bbff5-14fb-4e07-a596-69e55608f14a
  state:       up+stopped
  description: remote image is non-primary or local image is primary
  last_update: 2016-10-14 14:49:21

[root@ceph04 ceph]# rbd --cluster secondary ls -l
NAME    SIZE PARENT FMT PROT LOCK 
test-1 5120k          2      excl 
test-2 5120k          2   

Google Software Updater fuckups

To disable the ksfetch (ks = keystone) daemon (which comes with google products) there are several ways to do this.

  1. Uninstall the Google Software Update Agent
        $ /Library/Google/GoogleSoftwareUpdate/GoogleSoftwareUpdate.bundle/Contents/Resources/GoogleSoftwareUpdateAgent.app/Contents/Resources/ksinstall [--nuke]
        

    The --nuke parameter will also remove ksfetch related stuff.

  2. Set the checkInterval to the maximum (24h). Default is 5h (18000)
        $ defaults read com.google.Keystone.Agent checkInterval
        $ defaults write com.google.Keystone.Agent checkInterval 86400
        

ejabberd + letsencrypt (ssl config)

[...]
listen: 
  - 
    port: 5222
    module: ejabberd_c2s
    certfile: "/etc/ejabberd/ejabberd.pem"
    starttls: true
    starttls_required: true
    protocol_options:
      - "no_sslv2"
      - "no_sslv3"
      - "no_tlsv1"
      - "no_tlsv1_1"
    ciphers: "ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256"
    dhfile: "/etc/ejabberd/dh2048.pem"
    [...]
  - 
    port: 5269
    ip: "::"
    module: ejabberd_s2s_in
    protocol_options:
      - "no_sslv2"
      - "no_sslv3"
      - "no_tlsv1"
      - "no_tlsv1_1"

[...]
s2s_use_starttls: required
s2s_certfile: "/etc/ejabberd/ejabberd.pem"
s2s_dhfile: "/etc/ejabberd/dh2048.pem"
s2s_ciphers: "ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256"

s2s_protocol_options:
  - "no_sslv2"
  - "no_sslv3"
  - "no_tlsv1"
  - "no_tlsv1_1"

Links https://docs.ejabberd.im/admin/guide/configuration/

RHEV/ovirt – can’t switch SPM role – async_tasks are stucked

On the host with the SPM role

$ vdsClient -s 0 getAllTasksStatuses
{'status': {'message': 'OK', 'code': 0}, 'allTasksStatus': {'feb3aaa5-ec1c-42a6-8f17-f7c94891b43f': {'message': '1 jobs completed successfully', 'code': 0, 'taskID': '631fd441-0955-49da-9376-1cba24764aa7', 'taskResult': 'success', 'taskState': 'finished'}, 'b4fe0c6d-d458-4ed2-a9e2-2c0d41914b8f': {'message': '1 jobs completed successfully', 'code': 0, 'taskID': '67e1a2e8-3747-43fa-b0dd-fc469a6f6a02', 'taskResult': 'success',
'taskState': 'finished'}}}

On the RHEV/ovirt manager

$ for i in b4fe0c6d-d458-4ed2-a9e2-2c0d41914b8f feb3aaa5-ec1c-42a6-8f17-f7c94891b43f; do psql --dbname=engine --command="DELETE FROM async_tasks WHERE vdsm_task_id='${i}'"; done
$ for j in b4fe0c6d-d458-4ed2-a9e2-2c0d41914b8f feb3aaa5-ec1c-42a6-8f17-f7c94891b43f; do vdsClient -s 0 clearTask ${j}; done

RHEV/ovirt – find stucked / zombie tasks

Random notes

$ vdsClient -s 0 getAllTasksStatuses
$ vdsClient stopTask <taskid>
$ vdsClient clearTask <taskid>
$ su - postgres
$ psql -d engine -U postgres
> select * from job order by start_time desc;
> select DeleteJob('702e9f6a-e2a3-4113-bd7d-3757ba6bc4ef');

or

/usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "select * from job;"