[notepad] ceph journal size/ssd speed

ceph journal size (doc)

osd journal size = {2 * (expected throughput * filestore max sync interval)}

The default for filestore max sync interval is 5 therefore for a 10Gbit network the “perfect” size would be

osd journal size = { 2 * ( 1280 * 5 ) } = 12.5 GB

ceph ssd speed (journal)

The optimum would be sum of all disk seq write speeds – 11 disks with ~110mb/s = ~1210mb/s – an Intel P3520 might would fit.

How many journals per ssd?

Oh thats easy.

Journals = (ssd seq write speed) / (hdd seq write speed)

Journals = 1350 / 115 = ~11

(For the Intel P3520 with 11 hdds)

ceph – setting up rbd-mirror between two ceph clusters

Environment
2x ceph cluster (aio) running centos 7.2 /w ceph jewel. Added a 2nd crush rule to both clusters:

rule rep_osd {
	ruleset 1
	type replicated
	min_size 1
	max_size 10
	step take default
	step choose firstn 0 type osd
	step emit
}

(ceph crush map)

Setup

Install the rbd-mirror package in both sides. Technically they can run on any host even when they are not part of the cluster.

[root@ceph01 ~]# yum install -y rbd-mirror
[root@ceph04 ~]# yum install -y rbd-mirror
[root@ceph01 ~]# rbd --cluster primary mirror pool info
Mode: disabled
[root@ceph04 ~]# rbd --cluster secondary mirror pool info
Mode: disabled

Check that the cluster name is set. All systemd unit files are including that file during the startup.

[root@ceph01 ~]# grep -i cluster /etc/sysconfig/ceph 
CLUSTER=primary
[root@ceph04 ~]# grep -i cluster /etc/sysconfig/ceph 
CLUSTER=secondary

Create a key on both clusters which is able to access (rwx) the pool. (ceph authorization (caps))

[root@ceph01 ~]# ceph --cluster primary auth get-or-create client.primary mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=rbd' -o /etc/ceph/primary.client.primary.keyring
[root@ceph04 ~]# ceph --cluster secondary auth get-or-create client.secondary mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=rbd' -o /etc/ceph/secondary.client.secondary.keyring

Enable pool mirroring and verify that it is active.

[root@ceph01 ~]# rbd --cluster primary mirror pool enable rbd pool
[root@ceph01 ~]# rbd --cluster primary mirror pool info
Mode: pool
Peers: none
[root@ceph04 ~]# rbd --cluster secondary mirror pool enable rbd pool
[root@ceph04 ~]# rbd --cluster secondary mirror pool info
Mode: pool
Peers: none

Copy the keys and configs between the clusters. The rbd-mirror in the primary cluster requires the key from the secondary and vice versa.

[root@ceph01 ~]# scp /etc/ceph/primary.client.primary.keyring /etc/ceph/primary.conf root@ceph04:/etc/ceph/
primary.client.primary.keyring
primary.conf
[root@ceph04 ~]# scp /etc/ceph/secondary.client.secondary.keyring /etc/ceph/secondary.conf root@ceph01:/etc/ceph/
secondary.client.secondary.keyring  
secondary.conf

Enable/start the ceph-rbd-mirror – extend the unit name with the local cluster name.

[root@ceph01 ceph]# systemctl start ceph-rbd-mirror@primary
[root@ceph04 ceph]# systemctl start ceph-rbd-mirror@secondary

Add the remote cluster as a peer. Example: client.secondary represent the key name and @secondary the cluster name. That mean rbd-mirror is looking for a key like /etc/ceph/secondary.client.secondary.keyring.

[root@ceph01 ceph]# rbd --cluster primary mirror pool peer add rbd client.secondary@secondary 
49c28a78-ef7d-4f12-b003-7ce69f091b85
[root@ceph04 ceph]# rbd --cluster secondary mirror pool peer add rbd client.primary@primary
02053868-7dd7-4029-b287-53a205fdd668

Thats it! Now create a rbd image and activate the exclusive-lock and journaling feature. (man 8 rbd)

[root@ceph01 ceph]# rbd --cluster primary create test-1 --size 5M --image-feature exclusive-lock,journaling
[root@ceph01 ceph]# rbd --cluster primary create test-2 --size 5M --image-feature exclusive-lock,journaling

The test-1 image is active on the primary cluster, test-2 is active on the secondary cluster.

[root@ceph04 ceph]# rbd --cluster secondary mirror image demote rbd/test-1
[root@ceph01 ceph]# rbd --cluster primary mirror image promote rbd/test-1

[root@ceph01 ceph]# rbd --cluster primary mirror image demote rbd/test-2
[root@ceph04 ceph]# rbd --cluster secondary mirror image promote rbd/test-2
[root@ceph01 ceph]# rbd --cluster primary mirror pool status --verbose
health: OK
images: 2 total
    1 replaying
    1 stopped

test-1:
  global_id:   ed021ec4-2a44-4b9f-9efa-10590ffcb916
  state:       up+stopped
  description: remote image is non-primary or local image is primary
  last_update: 2016-10-14 14:49:07

test-2:
  global_id:   d99bbff5-14fb-4e07-a596-69e55608f14a
  state:       up+replaying
  description: replaying, master_position=[object_number=3, tag_tid=4, entry_tid=3], mirror_position=[object_number=3, tag_tid=4, entry_tid=3], entries_behind_master=0
  last_update: 2016-10-14 14:49:09

[root@ceph01 ceph]# rbd --cluster primary ls -l
NAME    SIZE PARENT FMT PROT LOCK 
test-1 5120k          2           
test-2 5120k          2      excl 
[root@ceph04 ceph]# rbd --cluster secondary mirror pool status --verbose
health: OK
images: 2 total
    1 replaying
    1 stopped

test-1:
  global_id:   ed021ec4-2a44-4b9f-9efa-10590ffcb916
  state:       up+replaying
  description: replaying, master_position=[object_number=0, tag_tid=3, entry_tid=0], mirror_position=[object_number=0, tag_tid=3, entry_tid=0], entries_behind_master=0
  last_update: 2016-10-14 14:49:21

test-2:
  global_id:   d99bbff5-14fb-4e07-a596-69e55608f14a
  state:       up+stopped
  description: remote image is non-primary or local image is primary
  last_update: 2016-10-14 14:49:21

[root@ceph04 ceph]# rbd --cluster secondary ls -l
NAME    SIZE PARENT FMT PROT LOCK 
test-1 5120k          2      excl 
test-2 5120k          2   

ejabberd + letsencrypt (ssl config)

[...]
listen: 
  - 
    port: 5222
    module: ejabberd_c2s
    certfile: "/etc/ejabberd/ejabberd.pem"
    starttls: true
    starttls_required: true
    protocol_options:
      - "no_sslv2"
      - "no_sslv3"
      - "no_tlsv1"
      - "no_tlsv1_1"
    ciphers: "ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256"
    dhfile: "/etc/ejabberd/dh2048.pem"
    [...]
  - 
    port: 5269
    ip: "::"
    module: ejabberd_s2s_in
    protocol_options:
      - "no_sslv2"
      - "no_sslv3"
      - "no_tlsv1"
      - "no_tlsv1_1"

[...]
s2s_use_starttls: required
s2s_certfile: "/etc/ejabberd/ejabberd.pem"
s2s_dhfile: "/etc/ejabberd/dh2048.pem"
s2s_ciphers: "ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256"

s2s_protocol_options:
  - "no_sslv2"
  - "no_sslv3"
  - "no_tlsv1"
  - "no_tlsv1_1"

Links https://docs.ejabberd.im/admin/guide/configuration/

RHEV/ovirt – can’t switch SPM role – async_tasks are stucked

On the host with the SPM role

$ vdsClient -s 0 getAllTasksStatuses
{'status': {'message': 'OK', 'code': 0}, 'allTasksStatus': {'feb3aaa5-ec1c-42a6-8f17-f7c94891b43f': {'message': '1 jobs completed successfully', 'code': 0, 'taskID': '631fd441-0955-49da-9376-1cba24764aa7', 'taskResult': 'success', 'taskState': 'finished'}, 'b4fe0c6d-d458-4ed2-a9e2-2c0d41914b8f': {'message': '1 jobs completed successfully', 'code': 0, 'taskID': '67e1a2e8-3747-43fa-b0dd-fc469a6f6a02', 'taskResult': 'success',
'taskState': 'finished'}}}

On the RHEV/ovirt manager

$ for i in b4fe0c6d-d458-4ed2-a9e2-2c0d41914b8f feb3aaa5-ec1c-42a6-8f17-f7c94891b43f; do psql --dbname=engine --command="DELETE FROM async_tasks WHERE vdsm_task_id='${i}'"; done
$ for j in b4fe0c6d-d458-4ed2-a9e2-2c0d41914b8f feb3aaa5-ec1c-42a6-8f17-f7c94891b43f; do vdsClient -s 0 clearTask ${j}; done

RHEV/ovirt – find stucked / zombie tasks

Random notes

$ vdsClient -s 0 getAllTasksStatuses
$ vdsClient stopTask <taskid>
$ vdsClient clearTask <taskid>
$ su - postgres
$ psql -d engine -U postgres
> select * from job order by start_time desc;
> select DeleteJob('702e9f6a-e2a3-4113-bd7d-3757ba6bc4ef');

or

/usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "select * from job;"

entropy inside a virtual machine

Sometimes my ceph-(test!)deployments inside a VM failed.

The Problem is that the kernel/cpu can not provide enough entropy (random numbers) for the ceph-create-keys command – so it stuck/hang. It is not a ceph problem! This can also happen with ssl commands.

But first things first – we need to check the available entropy on a system:

cat /proc/sys/kernel/random/entropy_avail

The read-only file entropy_avail gives the available entropy.
Normally, this will be 4096 (bits), a full entropy pool (see man 4 random)

Values less than 100-200, means you have a problem!

For a virtual machine we can create a new device – virtio-rng. Here is a xml example for libvirt.

<rng model='virtio'>
  <backend model='random'>/dev/random</backend>
</rng>

That is ok for ONE virtual machine on the hypervisor. Usually we find more than one virtual machine. Therefore we need to install the rng-tools package on the virtual machines.

$pkgmgr install rng-tools
systemctl enable rngd
systemctl start rngd

That’s it! That solved a lot of my problems 😉

Openstack Horizon – leapyear bug

Switching the language in the dashboard ends with a error.

day is out of range for month

eg. https://bugs.launchpad.net/horizon/+bug/1551099

[Mon Feb 29 09:20:05 2016] [error] Internal Server Error: /settings/
[Mon Feb 29 09:20:05 2016] [error] Traceback (most recent call last):
[Mon Feb 29 09:20:05 2016] [error]   File "/usr/lib64/python2.6/site-packages/django/core/handlers/base.py", line 112, in get_response
[Mon Feb 29 09:20:05 2016] [error]     response = wrapped_callback(request, *callback_args, **callback_kwargs)
[Mon Feb 29 09:20:05 2016] [error]   File "/usr/lib64/python2.6/site-packages/horizon/decorators.py", line 36, in dec
[Mon Feb 29 09:20:05 2016] [error]     return view_func(request, *args, **kwargs)
[Mon Feb 29 09:20:05 2016] [error]   File "/usr/lib64/python2.6/site-packages/horizon/decorators.py", line 52, in dec
[Mon Feb 29 09:20:05 2016] [error]     return view_func(request, *args, **kwargs)
[Mon Feb 29 09:20:05 2016] [error]   File "/usr/lib64/python2.6/site-packages/horizon/decorators.py", line 36, in dec
[Mon Feb 29 09:20:05 2016] [error]     return view_func(request, *args, **kwargs)
[Mon Feb 29 09:20:05 2016] [error]   File "/usr/lib64/python2.6/site-packages/django/views/generic/base.py", line 69, in view
[Mon Feb 29 09:20:05 2016] [error]     return self.dispatch(request, *args, **kwargs)
[Mon Feb 29 09:20:05 2016] [error]   File "/usr/lib64/python2.6/site-packages/django/views/generic/base.py", line 87, in dispatch
[Mon Feb 29 09:20:05 2016] [error]     return handler(request, *args, **kwargs)
[Mon Feb 29 09:20:05 2016] [error]   File "/usr/lib64/python2.6/site-packages/django/views/generic/edit.py", line 171, in post
[Mon Feb 29 09:20:05 2016] [error]     return self.form_valid(form)
[Mon Feb 29 09:20:05 2016] [error]   File "/srv/www/openstack-dashboard/openstack_dashboard/wsgi/../../openstack_dashboard/dashboards/settings/user/views.py", line 38, in form_valid
[Mon Feb 29 09:20:05 2016] [error]     return form.handle(self.request, form.cleaned_data)
[Mon Feb 29 09:20:05 2016] [error]   File "/srv/www/openstack-dashboard/openstack_dashboard/wsgi/../../openstack_dashboard/dashboards/settings/user/forms.py", line 89, in handle
[Mon Feb 29 09:20:05 2016] [error]     expires=_one_year())
[Mon Feb 29 09:20:05 2016] [error]   File "/srv/www/openstack-dashboard/openstack_dashboard/wsgi/../../openstack_dashboard/dashboards/settings/user/forms.py", line 32, in _one_year
[Mon Feb 29 09:20:05 2016] [error]     now.minute, now.second, now.microsecond, now.tzinfo)
[Mon Feb 29 09:20:05 2016] [error] ValueError: day is out of range for month

SUSE Openstack Cloud – sleshammer – pre/post scripts – pxe trigger

Enable root login for the sleshammer image

(it is used by the suse cloud as a hardware discovery image)

The sleshammer image will mount “/updates” over nfs from the admin node and execute the control.sh. This script will check if there are some pre/post-hooks and will possibly execute them.

root@admin:/updates # cat /updates/discovered-pre/set-root-passwd.hook
#!/bin/bash
echo "root" | passwd --stdin root

echo
echo
echo "ROOT LOGIN IS NOW ENABLED!"
echo
echo
sleep 10

Make sure that the hook set as executable!

SUSE Openstack Cloud supports only pre and post scripts. discovered is the state – discovery or hardware-installed should also work.

BTW: You can also create custom control.sh-script (and also hooks) for a node!

mkdir /updates/d52-54-00-9e-a6-90.cloud.default.net/
cp /updates/control.sh /updates/d52-54-00-9e-a6-90.cloud.default.net/

Some random notes – discovery/install

default pxelinux configuration
(see http://admin-node:8091/discovery/pxelinux.cfg/)

DEFAULT discovery
PROMPT 0
TIMEOUT 10
LABEL discovery
  KERNEL vmlinuz0
  append initrd=initrd0.img crowbar.install.key=machine-install:34e4b23a970dbb05df9c91e0c1cf4b512ecaa7b839c942b95d86db1962178ead69774a9dc8630b13da171bcca0ea204c07575997822b3ec1de984da97fca5b84 crowbar.hostname=d52-54-00-8b-c2-17.cloud.default.net crowbar.state=discovery
  IPAPPEND 2

allocated node

The sleshammer-image will wait for this entry (.*_install) on the admin-node once you allocate a node.

DEFAULT suse-11.3_install
PROMPT 0
TIMEOUT 10
LABEL suse-11.3_install
  KERNEL ../suse-11.3/install/boot/x86_64/loader/linux
  append initrd=../suse-11.3/install/boot/x86_64/loader/initrd   crowbar.install.key=machine-install:34e4b23a970dbb05df9c91e0c1cf4b512ecaa7b839c942b95d86db1962178ead69774a9dc8630b13da171bcca0ea204c07575997822b3ec1de984da97fca5b84 install=http://192.168.124.10:8091/suse-11.3/install autoyast=http://192.168.124.10:8091/nodes/d52-54-00-8b-c2-17.cloud.default.net/autoyast.xml ifcfg=dhcp4 netwait=60
  IPAPPEND 2

Gentoo – initramfs with busybox, lvm and some more…

Preparations

mkdir -p /usr/src/initramfs/{bin,lib/modules,dev,etc,mnt/root,proc,root,sbin,sys}
cp -a /dev/{null,console,tty,sda*} /usr/src/initramfs/dev/

busybox

USE="static make-symlinks -pam -savedconfig" emerge --root=/usr/src/initramfs/ -av busybox

LVM
LVM provides already a static binary 🙂

cp /sbin/lvm.static /usr/src/initramfs/lvm

ldap initial configuration

A more or less initial configuration for openldap (>2.4)

##
# to import run:
# ldapmodify -Y EXTERNAL -H ldapi:/// -f $filename
#
# to verfiy run:
# ldapsearch -Y EXTERNAL -H ldapi:/// -b "olcDatabase={1}hdb,cn=config"
#
# to create a password:
# slappasswd -h {SSHA} -s admin
##

dn: olcDatabase={1}hdb,cn=config
changetype: modify
replace: olcSuffix
olcSuffix: dc=example,dc=de
-
replace: olcAccess
olcAccess: {0}to attrs=userPassword,shadowLastChange by dn="cn=admin,dc=example,dc=de" write by anonymous auth by self write by * none
olcAccess: {1}to dn.base="" by * read
olcAccess: {2}to * by self write by dn="cn=admin,dc=example,dc=de" write by * read
-
replace: olcRootDN
olcRootDN: cn=admin,dc=example,dc=de
-
replace: olcRootPW
olcRootPW: {SSHA}4RHgrU6ghLqA21CNI8biQblHtEodToyd

TLS config

dn: cn=config
changetype: modify
add: olcTLSCipherSuite
olcTLSCipherSuite: AES128+EECDH:AES128+EDH
-
add: olcTLSCACertificateFile
olcTLSCACertificateFile: /etc/ssl/ca.crt
-
add: olcTLSCertificateFile
olcTLSCertificateFile: /etc/ssl/cert.crt
-
add: olcTLSCertificateKeyFile
olcTLSCertificateKeyFile: /etc/ssl/keyfile.key
-
add: olcTLSVerifyClient
# never - allow - try - demand
olcTLSVerifyClient: demand

Refs
openldap – tls config
openldap – access