One of the failures here is that they weren't able to keep deployed software up to date for security fixes, even when those security fixes were publicly known.
They have acknowledged this in their section "Keeping patched".
However, there is one thing I think they have omitted to consider. The more reliance on third party software not from the server distribution they are using, the more disparate and unreliable sources for security fixes become.
Careful choice of production software dependencies is therefore a factor. Usually it is unavoidable for some small number of dependencies that are central to the mission. But in general, I wonder if they have any kind of policy to favour distribution-supplied dependencies over any other type.
Another way of looking at this: we already have a community that comes together to provide integrated security updates that can be automatically installed, and you already have access to it. Not using this source compromises that ability. If some software isn't available through Debian, it is usually because there is some practical difficulty in packaging it, and I argue that security maintenance difficulty arises from the same root cause.
On a similar note, I'm curious about their choice to switch from cgit to GitLab. Both are packaged in Debian, but I believe that even Debian doesn't use the packaged GitLab for Debian's own GitLab instance. Assuming that Debian GitLab package's version is therefore not practical, wouldn't cgit be better from a "receives timely security updates through the distribution" perspective?
In the (distant) past, we tended to prefer to wrap our own stuff for critical services (e.g. apache, linux kernel) rather than use distribution-maintained packages. The reason was pretty much one of being control freaks: wanting to be able to patch and tweak the config precisely as it came from the developers rather than having to work out how to coerce Debian's apache package to increase the hardcoded accept backlog limit or whatever today's drama might happen to be.
However, this clearly comes at the expense of ease of keeping things patched and up-to-date, and one of the things we got right (albeit probably not for the right reasons at the time) when we did the initial rushed built-out of the legacy infrastructure in 2017 was to switch to using Debian packages for the majority of things.
Interestingly, cgit was not handled by Debian (because we customised it a bunch), and so definitely was a security liability.
Gitlab is a different beast altogether, given it's effectively a distro in its own right, so we treat it like an OS which needs to be kept patched just like we do Debian.
For what it's worth, I think by far the hardest thing to do here is to maintain the discipline to go around keeping everything patched on a regular basis - especially for small teams who lack dedicated ops people. I don't know of a good solution here other than trying to instil the fear of God into everyone when it comes to keeping patched, and throwing more $ and people at it.
Better than getting compromised! I used to have a very conservative approach to changing anything, including great caution with security updates and the desire to avoid automatic security updates with a plan to carefully gate everything.
In practice though, security fixes are cherry-picked and therefore limited in scope, and outages caused by other factors are orders of magnitude more common than outages caused by security updates. Better to remain patched, in my opinion, and risk a non-security outage, than to get compromised by not applying them immediately.
A better way to mitigate the risk is to apply the CI philosophy to deployments. Every deployment component should come with a test to make sure it works in production. Add CI for that. Then automate security updates in production gated on CI having passed. If your security update fails, then it's your test that needs fixing.
But it's there are still a few custom things running around which aren't covered by that (e.g. custom python builds with go-faster-stripe decals; security upgrades which require restarts etc), hence needing the manual discipline for checking too. But given we need manual discipline for running & checking vuln scans anyway, not to mention hunting security advisories for deps in synapse, riot, etc, i maintain one of the hardest things here is to have the discipline to keep doing that, especially if you're in a small team and you're stressing about writing software rather than doing sysadmin.
Why won't you use https://github.com/liske/needrestart to automatically restart services that need restarting after security upgrades and unattendedupgrades or a cron job for rebooting the whole machine after kernel upgrades/periodically?
Shouldn't ansible do all this for you? I heard it's the recommended way for automatic updates and service restarts.
Please let me know about this as I'm interested myself.
I wonder what's missing from Debian to automate such things since my automation experience is mainly with RHEL. (I realize it may be partly a question of effort for automation, but it sounds as if that's not the root of it.)
Debian can restart processes dependent on updated packages and issue alerts about the need to, and you can automate checking for new releases of things for which you've done package backports. That doesn't finesse reboots for kernel updates and whatever systemd forces on you now, but I assume you can at least have live kernel patching as for the RHEL systems for which I used not to get system time.
The Nix package manager can help keeping packages that are not available for your distribution updated and customised (https://nixos.org/nix/).
In the past I used to install newer, or customised, versions of e.g. `git` than were available on my Ubuntu into my home directory using e.g. `./configure --prefix=$HOME/opt`. That got me the features I wanted, but of course made me miss out on security updates, and I would have to remember each software I installed this way.
With nix, I can update them all in one go with `nix-env --upgrade`.
Nix also allows to declaratively apply custom patches to "whatever the latest version is".
That way I can have things like you mentioned (e.g. hardcoded accept backlock for Apache, hardening compile flags) without the mentioned "expense of ease of keeping things patched and up-to-date". I found that very hard to do with .deb packages.
It's not as good as just using unattended-upgrades from your main distro, because you still have to run the one `nix-env --ugprade` command every now and then, but that can be easily automated.
I only know Guix, not Nix, but I found it mostly harder to make package definitions for that than to backport rpm and dpkgs, at least for requirements that aren't radically different from the base system. (That's nothing to do with Scheme, by the way.)
Then, if you're bothered about security, it's not clear that having to keep track of two different packaging systems and possible interaction between them, is a win.
Hey! yes, although one the reasons for going with Buildkite was that we had it working well for Synapse before we got our Gitlab up and running, and so rather than doing a Jenkins->Travis->Circle->Buildkite->Gitlab journey, we decided to give Buildkite a fair go. The team also has more experience with it. Gitlab CI could work too, though, and we've heard good things. It would be yet another function to worry about self-hosting though (if we used gitlab.matrix.org for it).
They have acknowledged this in their section "Keeping patched".
However, there is one thing I think they have omitted to consider. The more reliance on third party software not from the server distribution they are using, the more disparate and unreliable sources for security fixes become.
Careful choice of production software dependencies is therefore a factor. Usually it is unavoidable for some small number of dependencies that are central to the mission. But in general, I wonder if they have any kind of policy to favour distribution-supplied dependencies over any other type.
Another way of looking at this: we already have a community that comes together to provide integrated security updates that can be automatically installed, and you already have access to it. Not using this source compromises that ability. If some software isn't available through Debian, it is usually because there is some practical difficulty in packaging it, and I argue that security maintenance difficulty arises from the same root cause.
On a similar note, I'm curious about their choice to switch from cgit to GitLab. Both are packaged in Debian, but I believe that even Debian doesn't use the packaged GitLab for Debian's own GitLab instance. Assuming that Debian GitLab package's version is therefore not practical, wouldn't cgit be better from a "receives timely security updates through the distribution" perspective?