Zero-downtime restarts have landed

I'm thrilled to announce that zero-downtime restarts, which I've been hacking on for the past week or two, have just landed in pump.io master!

Zero-downtime restarts require at least two cluster workers and MongoDB as a Databank driver (we'll eventually relax the latter requirement as we continue to test the feature). Here's how it works:

  1. An administrator sends SIGUSR2 to the master pump.io process (note that SIGUSR1 is reserved by Node.js)
  2. The master process builds a queue of worker processes that need to be restarted
  3. The master process picks a random worker from the queue and sends it a signal asking it to gracefully shut down
  4. The worker process shuts down its HTTP server, which causes it to stop accepting new connections - it will do the same for the bounce server, if applicable
  5. The worker shuts down its database connection once the HTTP server is completely shut down, meaning that it's done servicing in-flight requests
  6. The worker closes its connection with the master process and Node.js automatically terminates due to there being no listeners on the event loop
  7. The master recognizes the death of the worker process, replaces it, waits for the new worker to signal that it's listening for connections, and repeats from step 3 until the queue is empty

This works because only one worker is shut down at a time, allowing the other workers to continue servicing requests while the one worker is restarted. We wait until the new worker actually signals it's ready to process requests before beginning the process for another worker.

Such a feature requires careful error handling, so there are a lot of built-in checks to prevent administrators from shooting themselves in the foot:

  • If there's a restart already in progress, SIGUSR2 is ignored
  • If there's only 1 cluster worker, the restart request is refused (because there would be downtime and you should just restart the master)
  • The master process will load a magic number from the new code and compare it with the old magic number loaded when the master process started - if they don't match, SIGUSR2 will be refused. This number will be incremented for things that would make zero-downtime restarts cause problems, for example:

    • The logic in the master process itself changing
    • Cross-process logic changing, such that a new worker communicating with old workers would cause problems
    • Database changes
  • If a worker process doesn't shut itself down within 30 seconds, it will be killed
  • If a zero-downtime restart fails for any reason, the master process will refuse SIGUSR2 and will not respawn any more cluster workers, even if they crash - this is because something must have gone seriously wrong, either with the master, the workers, or the new code, and it's better to just restart everything. Currently this condition occurs when:

    • A new worker died directly after being spawned (e.g. from invalid JSON in pump.io.json)
    • A new worker signaled that it couldn't bind to the appropriate ports

While these checks do a lot to catch problems, they're not a silver bullet, and we strongly recommend that administrators watch their logs as they trigger restarts. However, this is still a huge win for the admin experience - the most exciting part of this for me is that it's the first step we need to take towards having fully automatic updates, which has been a dream of mine for a long while now.

Admins running from git master can start experimenting with this feature today, and it will be released during the next release cycle - i.e. with the 5.1 beta and stable, not the current 5.0 beta. Since this is highly experimental, we want this to have as much time for testing as possible. You can also check out the official documentation on this feature.

I hope people enjoy this! And as always, feel free to report any bugs.

Express 4.x in pump.io core

So I thought I'd take a moment to announce that the upgrade from Express 2.x to Express 4.x is finally complete! I fixed up the last couple test failures last Wednesday, and the branch got merged on Thursday.

A long time coming

Believe it or not, the work to do this upgrade started almost an entire year ago. Express 2.x has been outdated and unmaintained for a long time now, so upgrading has been a high priority. However, it wasn't as simple as adjusting a version number - there were a staggering number of changes that needed to be made due to Express deprecating, removing, and changing things around. One of the most significant problems was the fact that the old template system that we used, utml, was not compatible with Express 3.x and above. That meant that we had to rewrite every single template into a modern language - an effort that resulted in over a thousand lines changed!

However, the time for Express 4.x has finally arrived. With that and some other trivial version bumps, I'm proud to announce that pump.io is fully up-to-date in terms of dependencies with only three, non-critical exceptions. Whooooo!

Immediate benefits

There are a lot of reasons this is immediately awesome:

  1. Express 4.x fixes significant performance problems that existed in Express 3.x
  2. Relatedly, Express 4.x fixes some security problems present in 3.x
  3. The fact that our dependencies are finally up-to-date means that we can (and do!) now make use of Greenkeeper and the Node Security Platform to automatically track dependencies to make sure they're up-to-date and not introducing security vulnerabilities

That last one is particularly significant. Greenkeeper and NSP will continuously monitor the project's dependencies and automate away a lot of the pain that's associated with keeping pump.io up-to-date. Everyone will get a more secure and stable codebase because of this setup.

Looking forward

The Express 4.x upgrade is a big change, and it's definitely possible that stuff has broken. We want to make sure that breakage doesn't make it into production. This change went into pump.io 4.0, which will go through our normal release cycle. That means it'll be in beta for a month before being released. As a part of that, Jason Self - who's kind enough to administer Datamost - has agreed to have a test day where Datamost upgrades to the beta for a day, then downgrade it again. This test day will give us much wider exposure than we would've gotten otherwise, which will be incredibly valuable feedback in the effort to identify and fix regressions. We haven't set a date yet, but if you'd like to join Jason in helping us find bugs, please get in touch with the community. We'd love your help.

Beyond the immediate release, though, there's still things to look forward to. Express 4.x gives us a better way to structure routing code, and a refactor to use this structure is planned. There's a lot of room for improvement. But really, the most important benefit is this: technical debt is a far less pressing issue than before. That means that we can shift focus and spend more time fixing user-facing bugs, adding useful features, and generally improving the experience for our users. I couldn't be more excited.

Pump.io 1.0.0 is now available!

Pump.io 1.0.0 is officially available! Whoooo!

I just wanted to write up an announcement real quick to celebrate. Here's a sample what's gone into this release:

  • Node 4.x support
  • Lots of security improvements including a better cross-site scripting scrubber and security-related headers that help protect the web UI (most notably, the web UI now declares a Content Security Policy)
  • Minor improvements to the API to make it (slightly) smarter
  • LibreJS support
  • Numerous dependency upgrades, most notably Connect
  • And of course, tons of minor bugfixes and improvements

For more details, see the brand-new change log.

And of course since we're now past 0.x.x releases, we're now officially making a commitment to the community to make only API-compatible changes going forward (or at least, until 2.0.0!).

As this release does improve security and fixes a lot of bugs, node administrators are encouraged to upgrade as soon as possible. If you have a global, npm-based install, you can upgrade with:

sudo npm install -g pump.io

And with a source-based install:

git pull
git checkout v1.0.0
npm install --production

If you're upgrading from 0.3.0, everything should Just Work(tm). Don't forget to restart your daemon!

One final note - the rumors are true. While we're not doing so yet, we are, in fact, planning to deprecate running under Node.js 0.10 and 0.12 very soon. Also, if you upgrade to Node.js 4.x early, the new, better XSS scrubber will be enabled - however, be aware that pump.io is far less tested under Node.js 4.x and you are likely to run into more bugs than you would under 0.10 or 0.12. This is an unfortunate situation, but sadly there's really nothing to be done about it. :(

Special thanks to Menno Vossen, Laura Arjona, Evan Prodromou, Jan Kusanagi and all the other volunteers who did so many different things to make this release happen. It truly wouldn't have happened without you.

Enjoy the release!

With <3,


Pump.io: call for testers, call for feedback

So in my last post, I mentioned that I'd left a couple things for a second blog post. This is that post, and instead of being about all the cool stuff going on in the pumpiverse, it's about stuff that you - yes, you! - can do! It's super easy too.

Call for testers

So as I mentioned, the utml-to-jade branch is basically finished (see PR #1170). Since switching templating languages is a huge, huge change, by definition touching every single part of the Web UI, we want to make sure it's well-tested. This is especially critical given the fact that the Web UI unfortunately has very little test coverage.

That's where pump.io system administrators come in. If you're a sysadmin and you're willing to test this change on your node, we'd very much appreciate it. There's very little risk, since I think I've squashed all the regressions that happened, but you should be willing to report bugs if you do run into them. In particular, you should look out for:

  1. Links that seem to encompass too much text
  2. Missing spaces - e.g. AJ Jordanat [date] instead of AJ Jordan at [date]
  3. HTML code showing up on the page - e.g. Test note<br /> instead of Test note followed by a line break

Note that the utml-to-jade branch incorporates all changes in the master branch, so you may want to check out the advice in Running from Git master.

Sound interesting? Want to take part in the development of pump.io? Installing is super easy:

$ [sudo] npm install -g e14n/pump.io#utml-to-jade

This will work even if you already have a (non-source) install of pump.io - just make sure to restart the server afterwards.

Note that this command has some semi-terrible logic to build Jade templates on install (this is a workaround for a deficiency in npm). If you get a scary warning message from npm, please file an issue, making sure to include the full log.

Call for design feedback

The other big thing that's happening is the pump.io 1.0.0 tshirt we're designing! I've spent quite a bit of time working on a variety of candidate designs which can be viewed in this ownCloud share. Obviously we want the coolest tshirt possible, so we're looking for any design feedback that people have. Anyone with some spare time can glance through the designs, and we'd be thrilled to get everyone's opinions.

If this sounds interesting, I'd welcome you to check out the drafts. As always, get in touch with the community through our chatroom or if you'd prefer, you can email me directly at alex@strugee.net.

Thanks so much! :)

New stuff in pump.io

So I promised a (long) while ago that I'd blog about all the stuff going on in pump.io. And there is a lot going on. Where to even begin?

LFNW talk

I think the first thing I should mention is the talk I gave at LinuxFest Northwest this year. It went really, really well (even though I finished half the slides the night before), and people seemed to be really engaged, especially during questions. It starts off by covering the history behind pump.io: that includes the relevant protocols, like OStatus and ActivityStreams, but also the motivation behind abandoning StatusNet (aka current GNU Social) in favor of a brand-new network. Next I move on to the way that pump.io itself works, namely, its function as a generic ActivityStreams distribution engine. To put this another way, I explain why this quote from the README:

I post something and my followers see it. That's the rough idea behind the pump.

is a pretty accurate way of describing what pump.io actually does. (I quote that quite a few times in the slides themselves.) I end with a discussion of the recent developments in the community, which are of course wonderful, and a short call-to-action for people to contribute - either directly to the software, or by joining the network and spreading pump.io.

Oh, and by the way - the talk was recorded! So you can watch it on YouTube or, even better, on my personal MediaGoblin. Fitting, since (as I mention in the talk) MediaGoblin federation will soon be released, and it's based on (and fully interoperable with) the pump.io protocol!

Conservancy application

Pump.io is applying to the Software Freedom Conservancy! This is super fantastic for us for a number of reasons. One of the biggest advantages is the fact that inside Conservancy (assuming our application is accepted), we'll be able to take donations much, much easier. This is primarily important because nodes in the pump.io network are volunteer-run, but they still require funds to operate. We're thinking about models wherein people will be able to donate to "pump.io", and then some (most?) of those funds will be used to pay for the network. (In particular, they'll be used to pay for the existing E14N nodes that Evan currently runs, which will become extremely useful once we transition those nodes to community owners). Conservancy also provides useful miscellaneous services, like owning our logo and making sure that if we encounter license violations, the license is properly enforced. But perhaps most importantly, becoming part of Conservancy cements pump.io even more in the community - both the group of people working on the software & network, but also the larger free software world.


We've started a dedicated space for project documentation, hosted on ReadTheDocs. We're plannning to move a bunch of content from the GitHub wiki into this project, and hopefully it'll become a thorough and central place for pump.io documentation - both for users and for deployers.


We've spent quite a bit of time going through open issues and prioritizing them. A lot of issues have a release target now, and it feels really nice to feel like our issue tracker is a bit more organized.

Special thanks to Laura Arjona for driving this work.

Various minor code improvements

There have been a bunch of small bugfixes and improvements that have gone into the master branch - some of them user-facing, and some of them making the development experience better. Notable changes include:

  • Migrating from Connect 1.x to Connect 2.x (this is just a start - Connect 2.x is still deprecated, but it gets us closer than we were to relying on a fully non-deprecated stack)
  • JSHint is now automatically run against bin/ and routes/ when npm test is run. This makes it super obvious when there are regressions in code quality, especially in Pull Requests (since Travis CI will fail if JSHint doesn't succeed.)
  • JSCS is now used to enforce code style. It's automatically run against the entire codebase (whoo!) when you run npm test, and it's awesome for the same reason - much of the style-related feedback that would've previously ended up in a Pull Request can now be dealt with directly on a local development machine, reducing PR review time for both the reviewer and the contributor.
  • LibreJS is now supported
  • Tests now pass! Whooooooooo! (Thanks to Menno Vossen for sending the enormous Pull Request that made this happen.)

Those are just the bigger ones, of course - there are a bunch of even smaller problems that got squashed as well. I'd also like to point out that quite a few of these were long-standing PRs which finally made it into core, which is awesome for everyone.

Express 3.x migration

I've been putting in a lot of work to migrate pump.io to Express 3.x. It's a huge amount of work, but when complete, it will bring us very, very close to being able to migrate onto Express 4.x, which is modern and fully-supported by upstream. Basically what I've been doing is just running the app, seeing where it crashes, going to the exception site, and fixing the problem. Rinse, repeat. You can check out this work on the express-3.x branch - currently, this branch can successfully start up the app, but will crash pretty soon after you try to do almost anything else.

This work, unfortunately, is on hold while another important project is completed: converting all the templates from utml to Jade.

utml to Jade transition

This is basically what it sounds like. Previously, the templates in pump.io were based on utml, which is essentially a thin wrapper around Underscore.js's _.template() function. However, utml doesn't work with Express 3.x (and it's not really worth making it work), plus it's not the prettiest to work with. Jade is an extremely popular templating language in Node-land nowadays, so a couple months ago I spent somewhere between 14 and 18 hours going through and rewriting all the utml in Jade, which was absolutely brutal - but necessary. Then, of course, I had to fix the client-side templating logic to handle Jade instead of Underscore templates, which took quite a while, along with the fact that I made a very large number of minor (largely cosmetic) errors in my conversions.

As I said above, this was kind of awful work (especially the beginning), but it's necessary and great, as it paves the way for Express 3.x and massively improves the contributor experience.

You can check out the gory details of this work in PR #1170, and the original reasoning behind why we're doing this in issue #1167. This work is actually done, but I'm going to write a separate blog post about it, calling for testers.

Upcoming 1.0.0 release

Last but certainly not least, we're gearing up for our 1.0.0 release! From a codebase standpoint, this is really just a small bugfix release (although it will make a lot of things less broken and - if I recall correctly - fix the actual installation process), but more importantly, it means that we're now committing to semantic versioning, which is a win for everyone (but especially administrators). The main thing that needs to be fixed before this goes out the door is the behavior of the XSS scrubber, which was accidentally made a little too aggressive. This is being tracked in issue #1169.

As a bonus, I'm also designing a t-shirt that (if there's sufficient interest) we may print as a celebration of this release - but more on this in my next post.