No-ops linux part 2: Hosting a simple container on a lean mean systemd machine

Published 2025-05-13
cloudlinuxopscdnansibleduckdb

This post is part of the series on no-ops linux deployment. The previous post covered local development of linux server configuration and essential configuration. This installment covers a janky podman installation and configures a reverse proxy to send traffic to a simple container deployment. The final post covers a more challenging deployment with jobs and rolling restarts, and discusses the strengths and weaknesses of this approach to hosting.

At the completion of the previous post, we had automatic installation of a functional Ubuntu server with the bare essentials installed. We did this by writing a base-install ansible role. There's still a missing ingredient before we can start deploying containers, though!

It's time to introduce ✨podman✨

podman is a tool for running containers. It's CLI-compatible with docker, and has deep and useful integrations with systemd, the init on most modern Linux installations. I want to use systemd to manage my containers for me with podman systemd units. This has nice features like auto-updating images, takes care of getting the log where I can read it, can restart failed containers and so on. podman will also run docker-compose files for you.

Ubuntu 24.04 (codename Noble) ships with podman 4.9, which is a year old and missing some features I want:

  • systemd template support for quadlet files (we'll get to this, don't worry)
  • Some limited support for using k8s YAML with podman (I know I said I wanted to avoid YAML, but this may come in handy)
  • Many quality of life improvements to quadlets

Ubuntu 25.04 (codename Plucky) has podman 5.4, which has everything I want, but 25.04 isn't Long Term Support and not available at all cloud providers.

It's a little dirty, but what I'll do is to add the podman package from 25.04. This is not without risk, it could break things in the future. We'll have to hope that grizzled ops veterans avert their eyes and the gods of fortune and luck are with us.

We'll add the 25.04 sources to apt, using the new-fangled and cool .sources format. I need to use a template for this because my laptop is running on arm, but I'm probably going to end up provisioning to an x86 machine, and there are different package URIs for those. Let's create roles/podman/templates/plucky.sources.j2 with this content:

Types: deb
{% if ansible_architecture == 'aarch64' or ansible_architecture == 'arm64' %}
URIs: http://ports.ubuntu.com/ubuntu-ports
{% else %}
URIs: http://archive.ubuntu.com/ubuntu
{% endif %}
Suites: plucky plucky-updates
Components: main universe restricted
Signed-By: /usr/share/keyrings/ubuntu-archive-keyring.gpg

The {% if ... } is jinja2-syntax, and ansible knows what kind of architecture the machine has. But just adding this, we risk upgrading tons of packages to the 25.04 distribution, and we don't want that. We can limit the blast radius by setting up a .pref file that pins only the packages we think we need to 25.04. I found a list at this issue that looks good. We'll add this to roles/podman/tasks/main.yml which should hopefully fix it:

---
- name: Prefer plucky for podman
  copy:
    dest: /etc/apt/preferences.d/podman-plucky.pref
    content: |
      Package: podman buildah golang-github-containers-common crun libgpgme11t64 libgpg-error0 golang-github-containers-image catatonit conmon containers-storage
      Pin: release n=plucky
      Pin-Priority: 991

      Package: libsubid4 netavark passt aardvark-dns containernetworking-plugins libslirp0 slirp4netns
      Pin: release n=plucky
      Pin-Priority: 991

      Package: *
      Pin: release n=plucky
      Pin-Priority: 400

- name: Add plucky as a source
  template:
    dest: /etc/apt/sources.list.d/plucky.sources
    src: plucky.sources.j2

- name: Install podman
  apt:
    update_cache: true
    name:
      - podman
    state: present

Next, we'll need to add this to the roles list in initialize.yml. It should now look like this:

---
- name: Initialize Ubuntu host
  hosts: all
  become: true
  roles:
    - name: base-install
      vars:
        authorized_keys:
          - ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIE9K1p8B8FRCWJ0Ax4obDu+UsLzGgXDIdTYkCZ8FF54b
    - podman

Let's test it with vagrant provision, then check:

vagrant ssh -- podman --version
podman version 5.4.1

Fantastic. This role was much less work than the previous one!

We need to talk about caddy

I need my server to respond to HTTP and HTTPS, so we need something listening on port 80 and 443 to route http traffic to applications running in containers. I've had lots of experience using nginx and HAProxy, which are both excellent products. But I want this setup to be easily reproducible, fire-and-forget, and I haven't found a great way to automate letsencrypt TLS certificates with these proxies. I have done it before, I just want something that's more friendly to me.

Caddy promises to do all the heavy lifting with almost no setup and seems to have modern defaults, with very little configuration required in general. I like that a lot, so I want to give it a go. A proxy is a part of the stack that we can easily replace later if we want. I hear traefik integrates with podman and letsencrypt too.

We could choose to run caddy in a container to make the host operating system even leaner. For now, I want to let unattended-upgrades deal with patching it. So we'll set it up with apt. This probably gives us older, more stable, releases with fewer features. I think it should be easy to change our minds later if we discover something really cool in a release we don't have.

There are many ways to configure caddy, but it looks like using /etc/caddy/Caddyfile will be the quickest way to get started with the apt package. I'm going to need to proxy to several backends, and I don't want to centralize the configuration to this file, but thankfully it has an import directive. So this configuration here should do what I want:

{
	email robin@example.com
	servers {
		timeouts {
			read_body 5s
			read_header 5s
			write 5s
			idle 3m
		}
	}
}

import /etc/caddy/proxies.d/*

If you want to use this, please enter your actual email, or letsencrypt won't be able to reach you. Also check if you want to adjust those timeouts, these are global. We set read_body and read_header timeouts to low-ish values to make it a bit harder for mean clients to drain all our sockets. We'll put the configuration in roles/caddy/files/Caddyfile.

Next up, we need some more ✨tasks✨ in roles/caddy/tasks/main.yml:

---
- name: Install caddy
  apt:
    state: present
    update_cache: true
    name: caddy

- name: Ensure proxies.d
  file:
    dest: /etc/caddy/proxies.d/
    state: directory

- name: Set up global caddyfile
  copy:
    dest: /etc/caddy/Caddyfile
    src: Caddyfile
  notify: reload caddy

- name: Enable caddy
  systemd:
    name: caddy
    enabled: yes

You may have noticed that there's a notify: reload caddy instruction here, so we'll also need a handler in roles/caddy/handlers/main.yml:

---
- name: reload caddy
  systemd:
    name: caddy
    state: reloaded

Let's keep expanding our initialize.yml playbook and add the caddy role:

---
- name: Initialize Ubuntu host
  hosts: all
  become: true
  roles:
    - name: base-install
      vars:
        authorized_keys:
          - ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIE9K1p8B8FRCWJ0Ax4obDu+UsLzGgXDIdTYkCZ8FF54b
    - name: podman
    - name: caddy

vagrant provision is happy, so I am happy too. Later when we set up applications, we'll make them write their proxy configuration to /etc/caddy/proxies.d/appname, so we'll revisit caddy once we've got an application running.

Behold, an empty server that could run something

We have all the pieces we need to run some sort of application now:

  • orchestration with systemd
  • containerization with podman
  • loadbalancing and http routing with caddy

We did all this work, and all we have to show for it is that we've got a server listening on port 80 and 443, telling us nothing is found anywhere. That won't do at all.

Currently, I need to run two applications on here:

  • eugene-web, source code available here. This is written in Rust and checks postgres SQL migrations for issues on a simple API.
  • kollektivkart, source code available here. This is written in Python and DuckDB, and visualizes where delays in norwegian public transit occur. This backend has a "data pond"; it relies on around ~40GB of data in an S3 bucket and runs jobs to keep it updated.

It makes the most sense to start deploying eugene-web, since it is a very basic stateless backend.

I want to isolate these services from one another at the linux-user level, so we'll need a user that can have systemd units. Since I foresee this being required for both the apps, we'll fill in the app-user role now.

To be an app-user or not to be

Since we do not want to install our applications with systemd on the root level, we have to jump through some hoops here. In particular, we need to enable linger for the users, so their systemd units can come up on reboots. Otherwise, the user must be logged in to have their units running. We also need to create the actual users, and we'll want to set up some authorized keys for them.

💡Linger is critical for this setup to work. The Arch Linux wiki has a technical explanation, look under section 2.2 Automatic start-up of systemd user instances.

We'll also create the folders for where the systemd units and the quadlet definitions go. In total, we get this nice and cute roles/app-user/tasks/main.yml, but we'll probably find a reason to revisit and make it terrible later on:

---
- name: "Create {{ user }}"
  user:
    name: "{{ user }}"
    state: present
    shell: /bin/bash
    createhome: yes

- name: "Enable linger for {{ user }}"
  command: loginctl enable-linger {{ user }}
  
- name: "Configure authorized keys for {{ user }}"
  with_items: "{{ authorized_keys }}"
  authorized_key:
    user: "{{ user }}"
    key: "{{ item }}"
    state: present

- name: "Create quadlet home for {{ user }}"
  file:
    path: "/home/{{ user }}/.config/containers/systemd"
    state: directory
    mode: "0700"
    owner: "{{ user }}"
    group: "{{ user }}"

- name: "Create systemd units home for {{ user }}"
  file:
    path: "/home/{{ user }}/.config/systemd/user"
    state: directory
    mode: "0700"
    owner: "{{ user }}"
    group: "{{ user }}"


- name: "Create systemd wants home for {{ user }}"
  file:
    path: "/home/{{ user }}/.config/systemd/user/default.target.wants"
    state: directory
    mode: "0700"
    group: "{{ user }}"
    owner: "{{ user }}"

These definitions will be shared between the kollektivkart and the eugene backends, but we're not adding them to initialize.yml. Instead, we'll use the dependencies: key in roles/eugene/meta/main.yml to ensure that it includes an app-user role with a concrete variable for user.

Careful With That Lock, Eugene

Let's start by making the directory tree we'll need:

mkdir -p roles/eugene/{tasks,meta,defaults,templates,files}

Did you notice that we introduced two new subfolders in the role all at once? With no warning up front? Don't worry, we'll put something in those right away so we can discuss what they are for.

In roles/eugene/defaults/main.yml, we'll put this snippet:

user: eugene

This defines a value for the user parameter required by the app-user role, and allows whoever calls us to override it. Aren't we being nice to our future selves? 🙌

In roles/eugene/meta/main.yml, we'll put this:

dependencies:
  - role: app-user

This says that in order for the eugene role to work, it depends on the app-user role with the same user variable to exist. It means that when people ask for eugene, they also get app-user. They don't have to worry about remembering it.

I'll put this into roles/app-user/meta/main.yml too:

allow_duplicates: yes

Otherwise, when we make the next application role and make it depend on app-user, it'll think app-user has already done its thing, even if the {{ user }} variable is different.

Now we can try to modify initialize.yml:

---
- name: Initialize Ubuntu host
  hosts: all
  become: true
  roles:
    - name: base-install
      vars:
        authorized_keys:
          - ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIE9K1p8B8FRCWJ0Ax4obDu+UsLzGgXDIdTYkCZ8FF54b
    - name: podman
    - name: caddy
    - name: eugene

Oh no, vagrant provision tells us we forgot about the authorized keys:

TASK [app-user : Configure authorized keys for eugene] *************************
fatal: [default]: FAILED! => {"msg": "'authorized_keys' is undefined"}

We don't want to duplicate those, so we no longer want to set them directly on the base-install role. For now, we'll stick to inlining it into the playbook and think about this issue later:

---
- name: Initialize Ubuntu host
  hosts: all
  become: true
  vars:
    authorized_keys:
      - ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIE9K1p8B8FRCWJ0Ax4obDu+UsLzGgXDIdTYkCZ8FF54b
  roles:
    - name: base-install
    - name: podman
    - name: caddy
    - name: eugene
💡We could change all the roles from objects back into strings now, but I would like to explicitly pass vars: to them at some point, so I won't.

Notice how we just rudely moved it up two levels, and it became a global variable that all the roles can use. I really prefer passing variables like this explicitly to the roles. I think I'll still be able to sleep at night, though, since this is a basic setup for only myself.

Note that this means that the same set of keys will be used for both admin and the app-user roles. Probably not what we'd want if we were doing anything important! We can pass different sets of keys to the base-install and eugene, so that you can deploy eugene without being allowed sudo. But for now, vagrant provision is happy again, and it's time to make the eugene-specific tasks.

What is this mythical quadlet anyway?

Finally getting to the fun part. By my count, that took around 260 lines of YAML. Feel free to take a break, you've deserved it!

💡I don't have a better way of explaining what a quadlet is than stating that it's a nice systemd-wrapping around podman concepts like containers and networks. Previously, there used to be a way to use a generator to make systemd units out of podman definitions. It's still there. It is handy to know about when things don't work like expected. It will often tell you why your quadlets aren't working. It will either tell you about errors or generate systemd units corresponding to your quadlets. To check all the quadlets in ~/.config/containers/systemd:
/usr/lib/systemd/system-generators/podman-system-generator --user --dryrun

We'll make a eugene.container quadlet now. Let's put it in roles/eugene/files/eugene.container:

[Unit]
Description=Eugene API for SQL migration validation
After=network.target

[Container]
Image=ghcr.io/kaaveland/eugene-web:latest
PublishPort=127.0.0.1:3000:3000
StopSignal=SIGKILL
AutoUpdate=registry

[Service]
SyslogIdentifier=eugene-web
CPUQuota=100%
MemoryMax=128M

[Install]
WantedBy=default.target

Here we're describing what we're running, then specifying in the [Container] section what image to run, and where to publish the port. We're publishing it to the loopback address on 127.0.0.1, on port 3000 where caddy can find it. We're also telling podman how to kill the container nicely, and that we should auto-update it from the registry.

💡This [Container] uses just a tiny subset of what we can set on it. Check the documentation to discover more fun settings! You can discover more [Service] settings here. You can read about AutoUpdate too. By default, it'll update containers daily at midnight.

I feel like the networking here is suboptimal, but I'll work on that some other time.

In the [Service] section, we set some limits for how many resources it can use. We'll allow eugene to use one whole CPU core and 128MB RAM, which is significantly more than what it really needs.

In the [Install] section, we're telling systemd that we want this thing to run on boot. Next, we're going to need a caddy configuration. Let's put that in roles/eugene/templates/eugene.caddy.j2:

api{{ env_suffix }}.kaveland.no {
    encode
    handle /eugene/app/* {
        reverse_proxy localhost:3000
    }
    log
}

There's a potential problem that will require us to refactor later here. If we want to publish other apis to this hostname, we probably need to modify this file, which currently belongs to the eugene role. We'll accept this technical debt and move on with our lives for now. I decided to introduce an env_suffix variable here so that I can make environments if I ever feel like it's too exciting to have only a single deployment. If env_suffix isn't defined, we'll get an error, so we'll need to pass it to this module. It seems... prudent for me to set up a machine on api-test.kaveland.no to check that everything works before I take over api.kaveland.no, so let's pass it from initialize.yml:

---
- name: Initialize Ubuntu host
  hosts: all
  become: true
  vars:
    authorized_keys:
      - ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIE9K1p8B8FRCWJ0Ax4obDu+UsLzGgXDIdTYkCZ8FF54b
  roles:
    - name: base-install
    - name: podman
    - name: caddy
    - name: eugene
      vars:
        env_suffix: -test 

Let's tie it together in roles/eugene/tasks/main.yml:

---
- name: Set up eugene quadlet unit
  copy:
    dest: "/home/{{ user }}/.config/containers/systemd/eugene.container"
    owner: "{{ user }}"
    group: "{{ user }}"
    mode: "0600"
    src: eugene.container

- name: Reload systemd
  command: machinectl shell {{ user }}@ /bin/systemctl --user daemon-reload

- name: Enable eugene
  command: machinectl shell {{ user }}@ /bin/systemctl --user enable eugene

- name: Start eugene
  command: machinectl shell {{ user }}@ /bin/systemctl --user start eugene

- name: Configure reverse proxy
  template:
    dest: "/etc/caddy/proxies.d/eugene.caddy"
    src: eugene.caddy
    owner: root
    group: root
    mode: "0644"
  notify: reload caddy

Here we're using machinectl to do systemd shenanigans because of DBus issues if we try to use become_user: "{{ user }}" with ansible. Ideally, we would like to run these commands with the ansible modules, but this is good enough for me. In here, we issue systemctl --user daemon-reload to make systemd discover our quadlet, then we enable and start it. Let's check if it's running:

ssh -p2222 eugene@localhost systemctl --user status eugene
× eugene.service - Eugene API for SQL migration validation
     Loaded: loaded (/home/eugene/.config/containers/systemd/eugene.container; generated)
     Active: failed (Result: exit-code) since Tue 2025-05-13 22:45:34 CEST; 3min 13s ago
   Duration: 57ms
    Process: 79280 ExecStart=/usr/bin/podman run --name systemd-eugene --cidfile=/run/user/1002/eugene.cid --replace --rm --cgroups=split --stop-signal SIGKILL --sdnotify=conmon -d --label io.containers.autoupdate=registry --publish 127.0.0.1:3000:3000  ghcr.io/kaaveland/eugene-web:latest (code=exited, status=1/FAILURE)
    Process: 79303 ExecStopPost=/usr/bin/podman rm -v -f -i --cidfile=/run/user/1002/eugene.cid (code=exited, status=0/SUCCESS)
   Main PID: 79280 (code=exited, status=1/FAILURE)
        CPU: 117ms

Bummer. The developer (me) hasn't built the eugene-web image with arm64-support (although eugene-cli has both arm64, x86 and even a .exe). That was dumb of me. But otherwise, this is working as intended:

 ssh -p2222 admin@localhost sudo reboot 0
 ssh -p2222 admin@localhost whoami
 admin
 ssh -p2222 eugene@localhost systemctl --user status eugene
× eugene.service - Eugene API for SQL migration validation
     Loaded: loaded (/home/eugene/.config/containers/systemd/eugene.container; generated)
     Active: failed (Result: exit-code) since Tue 2025-05-13 22:51:09 CEST; 6s ago
   Duration: 62ms
    Process: 1848 ExecStart=/usr/bin/podman run --name systemd-eugene --cidfile=/run/user/1002/eugene.cid --replace --rm --cgroups=split --stop-signal SIGKILL --sdnotify=conmon -d --label io.containers.autoupdate=registry --publish 127.0.0.1:3000:3000 ghcr.io/kaaveland/eugene-web:latest (code=exited, status=1/FAILURE)
    Process: 1949 ExecStopPost=/usr/bin/podman rm -v -f -i --cidfile=/run/user/1002/eugene.cid (code=exited, status=0/SUCCESS)
   Main PID: 1848 (code=exited, status=1/FAILURE)
        CPU: 209ms

So, once we get this on an x86 machine, it'll be fire-and-forget, with systemd and podman taking good care of eugene. Perfect!

eugene-web is almost the best-case for something we'll host. It starts in milliseconds and can handle a few hundred requests a second on a single CPU core. We can write many useful applications that are like this! If we can really go a no-ops route here, the CI/CD side of things will just be a pipeline that pushes a docker image. That seems like something we should be able to manage!

Still, it's a little unsatisfactory to only be able to host the simplest possible application. The next post takes a look at something that is a little gnarlier to host, in particular it doesn't restart almost instantly, and it has a job 😧

Tilbake til bloggen