177 lines
5.4 KiB
Markdown
177 lines
5.4 KiB
Markdown
Title: Futzing with PID 1
|
|
|
|
I've been working with somebody who,
|
|
I think,
|
|
is the lead person behind a Linux Distribution.
|
|
We've been discussing how to change PID 1,
|
|
and I've begun to realize I know a lot about this.
|
|
|
|
I'll be discussing Arch Linux because that's what I use,
|
|
but most distributions follow a very similar pattern.
|
|
|
|
|
|
What PID 1 Needs To Do
|
|
====================
|
|
|
|
In Arch Linux,
|
|
there's an early userspace PID 1 which does some preliminaries such as
|
|
mounting and pivoting /,
|
|
enabling the keyboard and graphics card,
|
|
and a few other things.
|
|
|
|
When the main PID 1 starts,
|
|
it needs to do the following at a minimum:
|
|
|
|
* Mount /tmp, /proc, /sys, /run, /dev
|
|
* Create some temporary directories
|
|
* Set the system clock
|
|
* Populate some of /dev
|
|
* Load modules
|
|
* Set the hostname
|
|
* fsck /
|
|
* never exit
|
|
|
|
You might be thinking to yourself that this could all be done in a shell script.
|
|
As a matter of fact,
|
|
that is exactly how I do it on my computer.
|
|
My `/sbin/init` is a Bourne shell script.
|
|
Yours could be, too.
|
|
|
|
|
|
Never Exit
|
|
--------------------
|
|
|
|
That last step is kind of interesting.
|
|
If PID 1 ever exits,
|
|
the kernel panics and basically halts.
|
|
So you want your PID 1 to stay running forever,
|
|
even after something has powered down or rebooted the computer.
|
|
|
|
Because of this requirement,
|
|
it's typical to have PID 1 manage
|
|
keeping important programs (daemons) running.
|
|
There are all sorts of approaches to this,
|
|
ranging from systemd at the heavy end,
|
|
doing all sorts of things like managing hardware and communicating over dbus;
|
|
to runit at the light end,
|
|
managing only the starting and stopping of supervisors,
|
|
which themselves manage the daemons.
|
|
|
|
Incidentally,
|
|
the threat of kernel panic and immediate halting
|
|
is why some people
|
|
(myself included)
|
|
feel PID1 should be very simple and easy to check for bugs.
|
|
|
|
|
|
How Runit Manages Daemons
|
|
==================
|
|
|
|
I use runit as my daemon manager.
|
|
Specifically, the runit from busybox,
|
|
but Gerrit Pape's runit is almost identical as far as this article is concerned.
|
|
|
|
Runit starts off as a program called `runsvdir`,
|
|
which is what my `/sbin/init` hands off to with
|
|
`exec runsvdir /var/service`.
|
|
`runsvdir` has a fairly simple job:
|
|
start a new `runsv` process for each subdirectory of `/var/service`.
|
|
If a `runsv` process dies, restart it.
|
|
|
|
runsv
|
|
-------
|
|
|
|
`runsv`, in turn, runs the `run` script in the subdirectory.
|
|
When `run` exits, it runs `finish`, waits a few seconds,
|
|
and runs `run` again, until the end of time.
|
|
|
|
If there is a `log` directory,
|
|
its `run` and `finish` scripts are handled the same way,
|
|
except that stdout from the parent's `run` is piped to
|
|
stdin on the log's `run`.
|
|
|
|
This simple approach makes it pretty easy to keep services alive,
|
|
provided they can stay in the foreground.
|
|
For example, here's the `run` script I use for `sshd`:
|
|
|
|
#! /bin/sh
|
|
exec 2>&1
|
|
exec /usr/bin/sshd -D -e
|
|
|
|
That redirects stderr to stdout, for the logger.
|
|
Then it runs sshd in the foreground (the "no daemon" mode),
|
|
and logs to stderr (now stdout).
|
|
|
|
There are a few wrinkles to what `runsv` does.
|
|
If the file `down` exists,
|
|
it doesn't try to start `run`.
|
|
And there's an `sv` program for communicating with `runsv`.
|
|
|
|
sv
|
|
----
|
|
|
|
The `sv` program communicates with an instance of `runsv`
|
|
through some magic pipes in the `supervise` directory.
|
|
`sv` has a few common commands,
|
|
and a few obscure ones.
|
|
I'll go over the common ones.
|
|
|
|
`sv status foo` asks runsv what the current status of the `foo` service is.
|
|
It will tell you what state it's trying to maintain,
|
|
what state it's actually in,
|
|
and how long it's been in that state.
|
|
It also reports back about the log service for that directory,
|
|
if there is one.
|
|
|
|
`sv up foo` tells runsv to strive to have the `foo` service up.
|
|
That means it will run the `run` script as detailed above.
|
|
|
|
`sv -v up foo` is just like `sv up`,
|
|
except the `-v` causes `sv` to wait until the service is confirmed up.
|
|
It will wait up to 7 seconds (you can set the time with `-w`)
|
|
for the service to be in the `running` state,
|
|
and will also run the `check` script in the service directory,
|
|
if there is one,
|
|
to perform any additional checks on the service actually working.
|
|
It returns 0 if the service is up and `check` passes,
|
|
and non-0 in any other case,
|
|
so this is the command you want to use in a `run` script
|
|
to make sure a dependency has started.
|
|
|
|
`sv down foo` tells runsv to strive to have the `foo` service down.
|
|
(`runsv` will try to kill it.)
|
|
|
|
`sv check foo` will check if the desired state is the actual state.
|
|
This means if you asked for `foo` to be up,
|
|
it will return 0 if and only if it's up.
|
|
But it also means that if you asked for `foo` to be down,
|
|
it will return 0 if and only if it's down.
|
|
There's a good chance you actually want `sv -v up foo` instead.
|
|
I never use `sv check`, personally,
|
|
but I'm listing it here because it seems to confuse people.
|
|
|
|
There are more `sv` commands,
|
|
but these are the ones I use most frequently.
|
|
|
|
Important Services
|
|
===============
|
|
|
|
The init steps above will get your machine booted,
|
|
but it might not be very useful.
|
|
For instance,
|
|
you might like to be able to log in.
|
|
You'll want to run a `getty` for that,
|
|
and maybe something like `xdm` or `gdm` to log in to X11.
|
|
|
|
Kernel Uevents
|
|
-------------------
|
|
|
|
The Linux kernel sends out something called a "uevent"
|
|
whenever the hardware configuration changes.
|
|
For instance, when a new USB device is plugged in.
|
|
The usual program to handle these is called `udev`,
|
|
which is now part of `systemd`.
|
|
Busybox comes with one called `mdev` that does a lot of what `udev` provides.
|
|
|
|
I'll detail that here at some point.
|