Migrating my private infrastructure to Nomad
5 min read

Migrating my private infrastructure to Nomad

(current draft)

During my off time I like to tinker with new technologies and one that wasn't on my radar until the beginning of the year was Hashicorp Nomad.

It's a workload orchestrator which means it's a tool to help get my applications and websites deployed on servers and make that process as simple as possible. You might already know Kubernetes, it basically does the same thing but with a bit more bells and whistles than Nomad. I've extensively used Kubernetes in my day job and decided that it wasn't a good fit for my small scale deployments and workloads.

Most of my websites, including this one are approached very pragmatically. I have one ansible script for updates & installing basic tools on the target servers and then a single bash script to install and update the applications. This didn't "scale" well however since there was always a cognitive load associated with switching from one project to the other. Added to that, I wasn't using the servers very efficiently since I essentially had 1 Website/application per server. Those costs add up and there surely must be a better way: orchestration. Optimising for cognitive load reduction was however the most important problem to tackle.

Project building blocks

Usually when starting a new project - as an IT guy - there are a few "housekeeping" tasks that need to be done if it is to be considered a long term project. Below are some of the steps in no particular order:

  • Decide how to name the project
  • Buying a domain
  • Configuring the domain DNS
  • Getting an SSL certificate (up for debate, but I like to use TLS everywhere)
  • Configuring a load balancer and node
  • Setting up a Git repository
  • Deploy a first version (manually)
  • Create a deployment automation job (Github action)
  • Make a first PR/code change and update the app/website
  • Run a Backup Job (backup & replay)

I usually keep track of these things in a TODO.md file in the repository to know what to do if I go back to the project on the next weekend. It is however crucial to do these tasks mentioned above as fast as possible since they provide 0 validation to the MVP/end product and are just there to make life easier.

Many times I did all the above and then considered the project "done". That was a mistake in many of those times. The most important metrics are the time to first deployment and the time to update, I try to minimize those with this approach.

With a growing number of projects, things get hairy, especially when updating some of the code for the updates or updating the servers themselves. I have to document and remember the file structure for every server, where the configs are and co... It quickly gets frustrating and is not a good use of my time.

I needed a way to automate most of it (except the parts that are manual like buying the domain).

The following was done on the Hetzner Cloud.

Getting started with Nomad

Getting up and running with Nomad is quite simple. The steps can be summed up with: you download the nomad binary, edit a configuration and start a systemd service, that's it.

There is however a little bit of thought that needs to be put into creating a Nomad cluster.

A Nomad cluster is made of Servers and Clients. The Nomad Servers are "control" units that help with the scheduling and other tasks within the cluster. The Nomad Clients are "worker" elements on which the workloads are being scheduled and executed. Hashicorp recommands having a minimum of 3 Nomad Servers (or an odd number of them because of the cluster consenus algorithm used)

In order to make the deployment and updating of the infrastructure simple, I created two packer scripts. One for generating a nomad-server image, and one for the nomad-client image. Both images have packages like cni plugins, falco (cluster security), nomad, consul, vault and other packages (more on those later). To update a node, I just need to create a node with the new image and add it to the cluster while draining and deleting the old ones. This has been practical to update nomad versions and to install security patches.

Once this configuration is done, I create 3 servers using the nomad-server image and 1 client using the nomad-client image and see what happens. This could have been done with terraform, but I did not bother with that yet.

Why go through all this trouble though?

Creating this cluster gives me now the capability of running commands like:

nomad job run new-ghost-blog

nomad job run new-wordpress-blog

and have the DNS, SSL, load balancing and more automatically configured.

Standardizing deployments

In order to run workloads on the Nomad cluster I need to write nomad jobs. Nomad jobs contain all the information required to get something to run in the cluster.

Below is an example on how to write a Nomad job to run code-server in the cluster, which is essentially VSCode in the browser:

job "code-server" {
  region      = "global"
  datacenters = ["dc1"]
  type        = "service"

  group "code-server" {
    count = 1

    network {
      port "http" {
        to = 8443
      }

    }

    service {
      name = "code-server"
      port = "http"

      check {
        type     = "http"
        path     = "/"
        interval = "2s"
        timeout  = "30s"
      }
    }

    task "code-server-runner" {
      driver = "docker"

      config {
        image        = "linuxserver/code-server"
        ports = ["http"]
      }

      env {
        PGID=1000 
        PUID=1000
      }


      resources {
        cpu    = 1000
        memory = 1000
      }
    }
  }
}

It contains all the information that I need:

  • Which docker image is being used
  • Which ports are open
  • Which healthchecks are done
  • Which environment variables are used
  • and more...

And all of this at one glance. This is a very simple job example but it already had me sold over docker-compose.

To get this job running, the only command that is needed is nomad job run code-server.hcl and I'm in business. If I want to update it, I locally work on the docker image, push a new one and redeploy. That's it.

This of course only scratches the surface and might not be the ideal pitch for Nomad but picture this:

The code-server app (or wordpress or ghost blog) deployment needs to be exposed to the internet through an URL, which means we need:

  • A load balancer ( haproxy or nginx, which is a nomad job)
  • SSL Certs (certbot, also a nomad job)
  • Some way to connect the app with all the above (nomad configuration contained in the deployment)
  • Volumes (also nomad configuration)

This was a bit fast but what I want to say is that by writing a configuration for the code-server app, I can have it configured automatically to all the other services in the cluster. The Load balancer automatically picks up the new service, gets a Let's encrypt certificate for it, updates the Hetzner DNS and I'm in business and I can reuse this configuration for another app / job.

This was a huge productivity win since I can now focus on the application itself. Backups are also just that, nomad (cron) jobs.

The added benefit is that I can now host multiple applications on the same server, thus cutting costs by a significant factor.

Advanced example

(tbd)

Github actions

I have a self hosted github runner that is connected to the cluster and deploys the jobs. The Github actions are extremely readable and essentially consist of one command: nomad job run job.hcl.

So far this setup has been a breath of fresh air and is quite simple for a 1 man operation.

Further Notes

That's it for now, I barely scratched the surface of what Nomad can offer but I'll hopefully get more time and elaborate on some of the cool things (Vault & Consul integration) another time.