Eugene de Beste A Blog About My Personal Experiences in Tech.

Testing Multi-Machine Ansible Roles with Travis-CI and Docker compose

A little while ago I spent some time writing various Ansible roles and playbooks for the infrastructure at my place of work. My Ansible skills are intermediate and by no means refined. As a result of this, a lot of the roles were not developed to best practice specifications.

I took some time to try to improve my roles and properly test them before using them by taking advantages of the free continuous testing service that Travis-CI offers. I quickly ran into an issue while working on my role for deploying the Slurm scheduler, that being how do I test the deployment if I only have one VM to work on?

The answer? Docker and Docker-compose!

Read More »

OpenStack backed with Ceph Erasure Coded Pools

If you’re impatient, skip to the solution section

Over the last few months I’ve been working with the University of Cape Town on the Ilifu research cloud project. The focus for the initial release of the cloud is mainly to provide compute and storage to astronomy and bioinformatics use cases.

The technology powering this cloud is the ever-growing-in-popularity combination of OpenStack (Queens release) as the virtualisation platform and Ceph (Luminous) as the storage backend. We’re utilising the Kolla and Kolla-ansible projects to deploy the OpenStack side of things. I am the lead on the Ceph deployment and opted for the Ceph-ansible method of deployment.

We ran into some issues getting the OpenStack services to work on the Ceph cluster when using erasure coded pools…

Read More »

Using CWL and Toil to Wrap an Ad-hoc Astronomy Data Processing Pipeline

Blasting Students with Science
Blasting Students with Science

I was recently invited to give a workshop on reproducible scientific workflows to students as part of the Inter-university Institute for Data Intensive Astronomy’s (IDIA) “JEDI” programme. The overall purpose of this workshop was to introduce students from the African continent to various topics that are being dealt with in the data science space. A large focus here was machine learning.

This post details some of my experiences with preparing the original pipeline, working CWL around it and also teaching people how to do it.

Read More »

DNS Woes With NGINX Reverse Proxy

While configuring some of the internal services that we host for external access through our NGINX proxy VM, I started noticing some strange behaviour. Every once in a little while, when requesting a page that was being passed through the proxy, the proxy server would respond with a 502 Bad Gateway. It turns out that there were some issues with the resolver module for NGINX. I’ll detail how I fixed it below.

Read More »

Generating Let's Encrypt Wildcard Certificates

With the recent release of Let’s Encrypt’s ACMEv2 protocol implementation, they’ve gained the ability to not only supply SSL certificates for single domains, but also all subdomains. I’ve been interested in switching from our previous CA to Let’s Encrypt when their wildcard support dropped, because it makes renewal of certificates significantly easier due to automation capabilities of the platform. This blog post describes how to generate a wildcard certificate using Certbot.

Read More »

Installing Ubuntu Server 16.04 on Hardware Backed Software RAID1 (FakeRAID)

I encountered an issue booting Ubuntu Server 16.04 when installing it on a SuperMicro server that had been configured with Software RAID 1. The operating system install script would detect that a RAID environment was active and it would install correctly, but when booting into the newly installed system I would be presented with a blank screen and a blinking cursor. Grub wasn’t even loading.

To solve this issue I booted into a live CD of Ubuntu 16.04 and did the following from the terminal:

Read More »

Understanding Ceph Placement Groups (TOO_MANY_PGS)

The Issue

My first foray into Ceph was at the end of last year. We had a small 72TB cluster that was split across 2 OSD nodes. I was tasked to upgrade the Ceph release running on the cluster from Jewel to Luminous, so that we could try out the new Bluestore storage backend, and add two more OSD nodes to the cluster which brought us up to a humble 183TB.

After the upgrade was complete, I noticed the Ceph dashboard and Ceph -s command state the following warning:

Health check update: too many PGs per OSD (232 > max 200) (TOO_MANY_PGS) 

Read More »

Removing CephFS from a Ceph Cluster (Luminous)

While upgrading the packages for the Ceph cluster at SANBI, I encountered an issue where the Ceph MDS daemon was causing the CephFS filesystem to become unresponsive and stuck in the active(laggy) state. I decided to strip down the CephFS deployment and reinstall it, since the existing one was for testing (set up before my time) and I wanted to do the process of setting it up from scratch.

Read More »

Installing FreeIPA 4 on an Ubuntu Environment

At SANBI we’ve been using an old combination of OpenLDAP + Kerberos and nsswitch to provide LDAP with NFS directories for user accounts for our virtual machines and HPC cluster. This was originally put in place to make authentication into machines easier and to allow users to access and use the cluster without manual setup of directories and user accounts. Over time this set-up has grown to be messy and more effort to maintain than worth while.

Read More »

Hello World!

Welcome to my blog

I understand that that’s an incredibly unoriginal start to a blog, but hey, I’m not the most creative kid on the bloc.

So, welcome to my blog. This will be used for my technology related musings and may occasionally include some non-tech related topics as well.

Read More »