One of the failures here is that they weren't able to keep deployed software up ...

Arathorn · on May 9, 2019

This is an excellent point.

In the (distant) past, we tended to prefer to wrap our own stuff for critical services (e.g. apache, linux kernel) rather than use distribution-maintained packages. The reason was pretty much one of being control freaks: wanting to be able to patch and tweak the config precisely as it came from the developers rather than having to work out how to coerce Debian's apache package to increase the hardcoded accept backlog limit or whatever today's drama might happen to be.

However, this clearly comes at the expense of ease of keeping things patched and up-to-date, and one of the things we got right (albeit probably not for the right reasons at the time) when we did the initial rushed built-out of the legacy infrastructure in 2017 was to switch to using Debian packages for the majority of things.

Interestingly, cgit was not handled by Debian (because we customised it a bunch), and so definitely was a security liability.

Gitlab is a different beast altogether, given it's effectively a distro in its own right, so we treat it like an OS which needs to be kept patched just like we do Debian.

For what it's worth, I think by far the hardest thing to do here is to maintain the discipline to go around keeping everything patched on a regular basis - especially for small teams who lack dedicated ops people. I don't know of a good solution here other than trying to instil the fear of God into everyone when it comes to keeping patched, and throwing more $ and people at it.

Or I guess you can do https://wiki.debian.org/UnattendedUpgrades and pray nothing breaks.

rlpb · on May 9, 2019

> Or I guess you can do https://wiki.debian.org/UnattendedUpgrades and pray nothing breaks.

Better than getting compromised! I used to have a very conservative approach to changing anything, including great caution with security updates and the desire to avoid automatic security updates with a plan to carefully gate everything.

In practice though, security fixes are cherry-picked and therefore limited in scope, and outages caused by other factors are orders of magnitude more common than outages caused by security updates. Better to remain patched, in my opinion, and risk a non-security outage, than to get compromised by not applying them immediately.

A better way to mitigate the risk is to apply the CI philosophy to deployments. Every deployment component should come with a test to make sure it works in production. Add CI for that. Then automate security updates in production gated on CI having passed. If your security update fails, then it's your test that needs fixing.

Arathorn · on May 9, 2019

fwiw, we do do https://wiki.debian.org/UnattendedUpgrades for the debian packages - I should have mentioned in the writeup.

But it's there are still a few custom things running around which aren't covered by that (e.g. custom python builds with go-faster-stripe decals; security upgrades which require restarts etc), hence needing the manual discipline for checking too. But given we need manual discipline for running & checking vuln scans anyway, not to mention hunting security advisories for deps in synapse, riot, etc, i maintain one of the hardest things here is to have the discipline to keep doing that, especially if you're in a small team and you're stressing about writing software rather than doing sysadmin.

marcobarco · on May 11, 2019

Why won't you use https://github.com/liske/needrestart to automatically restart services that need restarting after security upgrades and unattendedupgrades or a cron job for rebooting the whole machine after kernel upgrades/periodically?

Shouldn't ansible do all this for you? I heard it's the recommended way for automatic updates and service restarts.

Please let me know about this as I'm interested myself.

gnufx · on May 9, 2019

I wonder what's missing from Debian to automate such things since my automation experience is mainly with RHEL. (I realize it may be partly a question of effort for automation, but it sounds as if that's not the root of it.)

Debian can restart processes dependent on updated packages and issue alerts about the need to, and you can automate checking for new releases of things for which you've done package backports. That doesn't finesse reboots for kernel updates and whatever systemd forces on you now, but I assume you can at least have live kernel patching as for the RHEL systems for which I used not to get system time.

nh2 · on May 9, 2019

The Nix package manager can help keeping packages that are not available for your distribution updated and customised (https://nixos.org/nix/).

In the past I used to install newer, or customised, versions of e.g. `git` than were available on my Ubuntu into my home directory using e.g. `./configure --prefix=$HOME/opt`. That got me the features I wanted, but of course made me miss out on security updates, and I would have to remember each software I installed this way.

With nix, I can update them all in one go with `nix-env --upgrade`.

Nix also allows to declaratively apply custom patches to "whatever the latest version is".

That way I can have things like you mentioned (e.g. hardcoded accept backlock for Apache, hardening compile flags) without the mentioned "expense of ease of keeping things patched and up-to-date". I found that very hard to do with .deb packages.

It's not as good as just using unattended-upgrades from your main distro, because you still have to run the one `nix-env --ugprade` command every now and then, but that can be easily automated.

gnufx · on May 9, 2019

I only know Guix, not Nix, but I found it mostly harder to make package definitions for that than to backport rpm and dpkgs, at least for requirements that aren't radically different from the base system. (That's nothing to do with Scheme, by the way.)

Then, if you're bothered about security, it's not clear that having to keep track of two different packaging systems and possible interaction between them, is a win.

sytse · on May 9, 2019

BTW have you considered using GitLab CI instead of buildkite?

Arathorn · on May 9, 2019

Hey! yes, although one the reasons for going with Buildkite was that we had it working well for Synapse before we got our Gitlab up and running, and so rather than doing a Jenkins->Travis->Circle->Buildkite->Gitlab journey, we decided to give Buildkite a fair go. The team also has more experience with it. Gitlab CI could work too, though, and we've heard good things. It would be yet another function to worry about self-hosting though (if we used gitlab.matrix.org for it).

sytse · on May 10, 2019

Ah, thanks for the answer. Makes sense to not change it. Please let us know if you do want to convert and need help.