homepage/content/papers/pid1.md

5.5 KiB

title section
Futzing with PID 1 computing

I've been working with somebody who, I think, is the lead person behind a Linux Distribution. We've been discussing how to change PID 1, and I've begun to realize I know a lot about this.

I'll be discussing Arch Linux because that's what I use, but most distributions follow a very similar pattern.

What PID 1 Needs To Do

In Arch Linux, there's an early userspace PID 1 which does some preliminaries such as mounting and pivoting /, enabling the keyboard and graphics card, and a few other things.

When the main PID 1 starts, it needs to do the following at a minimum:

  • Mount /tmp, /proc, /sys, /run, /dev
  • Create some temporary directories
  • Set the system clock
  • Populate some of /dev
  • Load modules
  • Set the hostname
  • fsck /
  • never exit

You might be thinking to yourself that this could all be done in a shell script. As a matter of fact, that is exactly how I do it on my computer. My /sbin/init is a Bourne shell script. Yours could be, too.

Never Exit

That last step is kind of interesting. If PID 1 ever exits, the kernel panics and basically halts. So you want your PID 1 to stay running forever, even after something has powered down or rebooted the computer.

Because of this requirement, it's typical to have PID 1 manage keeping important programs (daemons) running. There are all sorts of approaches to this, ranging from systemd at the heavy end, doing all sorts of things like managing hardware and communicating over dbus; to runit at the light end, managing only the starting and stopping of supervisors, which themselves manage the daemons.

Incidentally, the threat of kernel panic and immediate halting is why some people (myself included) feel PID1 should be very simple and easy to check for bugs.

How Runit Manages Daemons

I use runit as my daemon manager. Specifically, the runit from busybox, but Gerrit Pape's runit is almost identical as far as this article is concerned.

Runit starts off as a program called runsvdir, which is what my /sbin/init hands off to with exec runsvdir /var/service. runsvdir has a fairly simple job: start a new runsv process for each subdirectory of /var/service. If a runsv process dies, restart it.

runsv

runsv, in turn, runs the run script in the subdirectory. When run exits, it runs finish, waits a few seconds, and runs run again, until the end of time.

If there is a log directory, its run and finish scripts are handled the same way, except that stdout from the parent's run is piped to stdin on the log's run.

This simple approach makes it pretty easy to keep services alive, provided they can stay in the foreground. For example, here's the run script I use for sshd:

#! /bin/sh
exec 2>&1
exec /usr/bin/sshd -D -e

That redirects stderr to stdout, for the logger. Then it runs sshd in the foreground (the "no daemon" mode), and logs to stderr (now stdout).

There are a few wrinkles to what runsv does. If the file down exists, it doesn't try to start run. And there's an sv program for communicating with runsv.

sv

The sv program communicates with an instance of runsv through some magic pipes in the supervise directory. sv has a few common commands, and a few obscure ones. I'll go over the common ones.

sv status foo asks runsv what the current status of the foo service is. It will tell you what state it's trying to maintain, what state it's actually in, and how long it's been in that state. It also reports back about the log service for that directory, if there is one.

sv up foo tells runsv to strive to have the foo service up. That means it will run the run script as detailed above.

sv -v up foo is just like sv up, except the -v causes sv to wait until the service is confirmed up. It will wait up to 7 seconds (you can set the time with -w) for the service to be in the running state, and will also run the check script in the service directory, if there is one, to perform any additional checks on the service actually working. It returns 0 if the service is up and check passes, and non-0 in any other case, so this is the command you want to use in a run script to make sure a dependency has started.

sv down foo tells runsv to strive to have the foo service down. (runsv will try to kill it.)

sv check foo will check if the desired state is the actual state. This means if you asked for foo to be up, it will return 0 if and only if it's up. But it also means that if you asked for foo to be down, it will return 0 if and only if it's down. There's a good chance you actually want sv -v up foo instead. I never use sv check, personally, but I'm listing it here because it seems to confuse people.

There are more sv commands, but these are the ones I use most frequently.

Important Services

The init steps above will get your machine booted, but it might not be very useful. For instance, you might like to be able to log in. You'll want to run a getty for that, and maybe something like xdm or gdm to log in to X11.

Kernel Uevents

The Linux kernel sends out something called a "uevent" whenever the hardware configuration changes. For instance, when a new USB device is plugged in. The usual program to handle these is called udev, which is now part of systemd. Busybox comes with one called mdev that does a lot of what udev provides.

I'll detail that here at some point.