I’ve been running a public web server since 1999, when my employer
registered schmonz.com for me as a gag gift. Last week, I
learned from Twitterbrausen
that in German, “Schmonz” means something akin to “bullshit”. That’s not
what my employer had meant by it; I consider nonetheless that my
incessant blogging has acquired a fine new patina of significance.
As I recall, when I was first looking for web server software, there was
not a wide variety to choose from.
Apache
was popular and featureful, a safe default choice. As a novice
programmer, I was very much taken with the idea of building dynamic
sites, and Apache offered many ways to go about that. Done deal.
In the intervening years, my server machine has changed several
times, from
Macintosh IIci
to
Mini-ITX
box to
Mac Mini
to
Xen Virtual private server.
(I’m particularly fond of the present arrangement wherein hardware is
someone else’s problem and I continue to have root access.) No
matter the system architecture, the OS has always been
NetBSD,
which remains
unobtrusively thrilling,
and the web server has always been Apache, which has gradually become
more noisome.
Between my own sites and those of friends I’ve hosted, I’ve needed many
times to adapt my Apache configuration to accommodate changes in
external modules (such as
mod_php),
to interfaces (such as
PHP via FastCGI instead),
and within Apache itself (such as
basic access control).
Each time I forcibly revisited my config, I found myself revisiting my
discomfort with its complexity. I never felt sure that I understood
exactly, in its entirety, what my Apache installation would and wouldn’t
do. And as a result of years of entanglement and unclarity, I never saw
a way to give my users full administrative control over their own sites.
I’ve
been imagining moving off Apache
for a while. But it always seemed like a project, so I never did
anything about it. I can’t usually afford to start on something unless I
know I’m going to be able to stop soon, and I won’t usually want to stop
unless I know how I can easily start next time. That leaves me needing a
sequence of small-enough steps in my desired direction. Or, more
precisely, two expectations: that at least one such sequence exists, and
that I’ll be able to discover one as I go.
Conveniently, I’ve had plenty of professional practice at
incremental problem-solving,
enough to identify my first few steps and start making progress.
Here’s the rest of the sequence, naming the
refactorings
I’ve found along the way.
Step 1: Extract Virtual Host
I wanted to see what I’d learn by persuading one site to become its own
self-contained thing running its own Apache instance. I picked a
relatively basic site, told the
system Apache to reverse-proxy that virtual host,
added just enough configuration to
start a site-specific Apache on localhost,
verified that as far as I could discern the site
worked equally well, and cut over to the new configuration.
Inserting a proxy usually means, at the very least, server logs start
reporting requests coming from the proxy’s IP rather than the browser’s.
For this to be a refactoring, the system Apache needed to send an
X-Forwarded-For header (it automatically does), and the site-specific Apache needed
to know to look for it (by enabling the bundled
mod_remoteip).
Manually starting an instance of a service usually means the system
won’t automatically know how to do the same next time it boots up. For
this to be a refactoring, I needed to
add an entry to the site owner’s crontab.
To validate that the site would continue to be served by its
own Apache as well as it’d been served the old way, I rebooted the
system. The site stayed up.
Step 2: Extract More Virtual Hosts
Good, because there were 17 more sites to go. Each of them would also be
listening on its own non-standard port on localhost. To identify them
at a glance in netstat, I added the port to /etc/services. Now I had
a pattern worth repeating.
Some sites were more complex than others (PHP, language negotiation,
other wrinkles), but I didn’t need to invent their configurations from
scratch, merely uncover the tiny portions of the existing giant config
that were relevant and copy them over.
Near the end, I couldn’t start new Apache instances without increasing
some kernel IPC parameters (kern.ipc.msgmni from 40 to 80,
kern.ipc.semmni from 10 to 20). This felt like a small backward step.
I hoped to be able to undo it later.
It also might have felt like a small step backward to suddenly have lots
more instances of Apache. But it was a large step forward in my
understanding.
Step 3: Remove Dependency (on Apache Modules)
En route to that understanding, I was fairly sure I’d reduced the system
Apache to a single responsibility: being a reverse HTTP proxy. To
validate that it was no longer serving any other purpose, I turned off
most LoadModule directives — even the typical and enabled-by-default
ones — leaving only those that prevented Apache from running when I
tried turning them off.
Step 4: Substitute Apache with Bozohttpd
I’d been hoping to replace Apache with bozohttpd. Now that I had
small, explicit per-site configurations, I could try converting one. The
site worked, but the logs were missing lots of basic information. I
still think this is where I want to go, but since it’s not a
refactoring, I can’t go there yet.
Step 5: Substitute Apache with Lighttpd
I tried
converting the same site from Apache to lighttpd,
which is a
little more featureful than bozohttpd. The site worked, and with
mod_extforward
enabled, its server logs were indistinguishable from
Apache’s. I gzipped the now-retired Apache config to prevent it from
being used by mistake while keeping it for reference, updated the
site’s crontab entry to start Lighttpd instead of Apache, and
rebooted. Bingo!
Step 6: Substitute More Apaches with Lighttpd
I converted a bunch more sites. After doing a few, I figured out how to
extract shared configuration. Simpler sites have extremely short config
files (just a few lines). More complex sites only define what’s unusual
about them.
Step 7: Remove Dependency (on Apache PHP FastCGI)
With a few Apache-powered sites left to convert, I was pretty sure none
of them was using PHP. To test this hypothesis, I stopped the php-fpm
service. After a week, with nothing broken, I uninstalled it.
With only a few Apache-powered sites remaining, could I return kernel
IPC parameters to their default values? Yes, all the Lighttpd and Apache
sites ran just fine that way.
Step 8: Get Married
Getting married is the opposite of a refactoring. There’s no internal
change, but many callers have new expectations.
Step 9: Substitute Remaining Apaches with Lighttpd
I expected three sites to be relatively tricky to convert:
- theschleiers.com
needed language negotiation to provide English or German content. I
didn’t want to futz with it until there was clearly no longer any
urgent need for information about the wedding.
- agilein3minut.es
needed SSL, which I wasn’t sure whether to proxy at all. Turned out
to be easy to proxy because it’s the only HTTPS site I host at
present, and it looks like it might continue to not be a big deal if
and when I host more.
- schmonz.com
needed
fancy URL rewriting for compatibility
with
the site’s previous incarnation.
I assumed it was going to, anyway. I wound up being able to translate
most of its Apache
mod_rewrite
config to
Lighttpd’s expressive conditional redirects,
and needed hardly any
special-snowflake cleverness.
Once they were converted, there were zero remaining Apache-powered sites.
Step 10: Substitute Apache with Pound
A single Apache instance remained: the system one that was nothing but a
reverse proxy to a bunch of Lighttpd instances.
Had I known that’d be its only job, I’d have chosen software designed
for the purpose. I knew that now, and chose Pound. On a non-standard
port, I figured out how to express a few sites’ worth of reverse
proxying in Pound’s configuration language, continued
until I’d translated everything
in the Apache config, stopped Apache, and started Pound.
Step 11: Remove Dependency (on Apache)
Not a single Apache instance remained. To my knowledge, all sites were
operating as normal. After a week, I uninstalled Apache, deleted its
corresponding Unix user and group, and gzipped all its config files for
reference.
Summary
Apache had been serving multiple roles. I brought the number down to zero,
then got rid of it. To do that, I…
- Decoupled Apache (the virtual-host multiplexer) from Apache (the web server)
- Gave each site its own Apache web server instance
- Found a suitable replacement web server and converted all instances
- Found a suitable replacement virtual-host multiplexer and switched to it
- Turned software off, and left it off for a while, before uninstalling
For human site visitors, all of these steps were genuine refactorings.
(Atypical and automated visitors might notice the HTTP header reporting
different server software.) For site owners, most of these steps were
also genuine refactorings. (In a couple cases, using the shared
Lighttpd config required changing the names of log files by a small
nonzero amount.)
I replaced one big application with two small ones. Better. Still, could
be more better.
Room for improvement
The replacement virtual-host multiplexer (Pound) feels simple, good, and
necessary, in the sense that nothing like it is included with the OS.
The replacement web server (Lighttpd) feels simpler and better, by far
— I understand what it’s doing, my users finally have full
administrative control over their own sites, and unlike Apache, this
configuration doesn’t require extra system resources — but NetBSD
does include a web server, the one I experimented with in Step 4. If
bozohttpd did a few more things, then “Replace Lighttpd with
Bozohttpd” would be a refactoring, one that could be followed
immediately by “Remove Dependency (on Lighttpd)”.
Next steps
I’ve been
practicing C.
In some kind of cosmic coincidence, next week I’ll be joining a project
that’s being developed primarily in C. Hacking on bozohttpd will be
good practice. Here’s the incremental sequence of features awaiting my
next increment of time and attention, perhaps on tomorrow’s
transatlantic flight:
- Optionally log to a file (instead of
syslog or stderr)
- Optionally log more information (say, in Apache’s “combined” format)
- Optionally specify a proxy or proxies that can pass an
X-Forwarded-For header whose contents we’ll use as the true client
source address (for logs, access control decisions, etc.)
Since I believe I’ll be able to stop, I’ll be able to start. It might
not be terribly long before I have more progress to share.