Fortunately, the attackers didn't, or weren't able to, use their access to slip backdoor code into the OpenSSL software, which websites around the world use to provide HTTPS encryption for the pages they serve. That assurance is possible because the code is maintained and distributed through Git, a source-code management system that allows developers and users to maintain independent copies all over the Internet. Since the cryptographic hashes found on OpenSSL matched those elsewhere, there is a high degree of confidence the code hasn't been altered.
A few days ago I posed the question of whether Git's crypto is an example of dangerous amateur cryptography, since Linus isn't (AFAIK) a crypto expert: https://news.ycombinator.com/item?id=6961683
The general answer I got was that Git isn't really crypto, because it isn't using the hash to guarantee integrity, but simply as a checksum to detect corruption.
I didn't find this argument very convincing at the time, and I would now offer the above quotation as evidence that people do in fact treat Git's hashes as a security mechanism that can withstand an adversarial attack.
You'll see plenty of operations do not check the sha1, and will happily hand you bad data, including checkout and clone, depending on circumstances.
I haven't checked on the latest version, but at least previously, you'd have to run around using git fsck specifically to get it to notice some sha1 mismatches.
So at least trivially, the idea that "it uses sha-1 so it's safe" is silly even without considering the implementation, because it doesn't always check the sha-1 :)
(I'm not even going to argue it should, just pointing out it didn't)
Indeed. Assuming the rest of the code is implemented without any vulnerabilities (and that's one hell of an assumption about any code), SHA1, which Git uses as its hashing algorithm, has been completely broken as early as 2005[1].
That said, I'm not sure how you'd go about crafting source code in such a way as to collide with a known hash without it being amazingly obvious. You could change history to make it appear as if your crocked code was always there or had been for a long time, but then you don't have to figure just one hash, then you have to figure every single hash down to the parent. Depending on the number of commits, that's a hell of a lot of work, even with an algorithm to break SHA1.
And even then if you change the hashes, since Git is distributed, the developers will quickly figure out that what's in one repo doesn't match all the others when patches start breaking..
Your definition of "completely broken" is misleading.
While it makes complete sense to use SHA-256 and beyond, a collision of two carefully constructed plaintexts is not indicative of a general break in SHA-1. It is much more difficult to find a collision against a fixed plaintext and hash and ridiculously more difficult for that collision to produce non-gibberish, working code that compiles and creates a vulnerability.
Reading your list of complications though I wonder if the last one is really a problem? Hiding a vulnerability that doesn't look suspicious on first sight? Agreed. But hiding a vulnerability at all - if compiled without reading the source? Isn't that 'just' appending /* garbage */ or even code in IFDEFs? In other words: Is the last point really an issue or is it 'just' a collection against a fixed plaintext/hash?
You're right. Git really isn't crypto, and this news blurb does sound a bit odd. Hopefully this is just a misunderstanding in the reporting. Matching hashes is probably a pretty good sanity check, but it isn't the kind of solid guarantee that a project like OpenSSL needs.
Fortunately, one great thing about decentralized source control systems is that everyone has a copy of everything. Hopefully someone has simply located a repository which hasn't been updated since before the breach, and done some direct comparisons.
detecting corruption is pretty much the same thing as detecting changes though right? Git uses SHA-1 hashes (I think). If someone managed to change the code in a way that still properly compiles AND doesn't change the checksum hash, well... pretty unlikely.
It's pretty much the same thing as detecting unintentional changes - very important distinction!
Something like CRC32 might catch some 99.999% of the cases of unintentional corruption. But you could probably figure out how to generate a collision using pen and paper, without being a cryptography expert of any sort, in which case CRC32 will only catch the cases where you forget to carry a one!
EDIT: SHA1's naturally a bit tougher to beat than CRC32, but there's plenty of whitespace and comments and identifiers to rename without affecting the execution of a program. Your typical binary format will have plenty of uninitialized padding bytes to tweak as well. Plenty of room to change unimportant data to generate a collision.
> SHA1's naturally a bit tougher to beat than CRC32...
That's like saying that bank vaults are tougher to break into than cardboard boxes. Bank vaults are actively guarded and designed to prevent unauthorized access, cardboard boxes are just for organizing your junk and can't even keep a small child out.
Making a CRC32 collision is easy if you try. You just have to solve a mathematical equation in a small, well-understood finite ring.
By comparison, SHA1 does not have a simple equation. It was purposefully designed to be difficult to solve. NOBODY has yet found two different pieces of data with the same SHA1 hash, or if they have they have kept it secret.
OK, so I think "a bit tougher" is an understatement to say the least. If they're really worried, compare file sizes too. Then compile the code and verify that the checksum on the executable is the same as before. I just think it would be really hard to pull this off, maybe the NSA could do it but they wouldn't have tipped their hand by defacing the site too.
> OK, so I think "a bit tougher" is an understatement to say the least.
Sure. But the point is that a detecting corruption is a far different beast than detecting intentional collision attempts.
> If they're really worried, compare file sizes too.
This adds nearly no security.
> Then compile the code and verify that the checksum on the executable is the same as before.
Compare the whole binary! If you don't trust a SHA1 on the source, trusting a SHA1 on the binary is merely moving the goalposts around. This also requires deterministic builds - this isn't the default or necessairly easy, but it's possible and is the approach the TOR project takes at least: https://blog.torproject.org/category/tags/deterministic-buil...
Alternatively, even without deterministic builds, compare the whole source tree! The whole file must be read to compute the hash in the first place, so why not compare all those bytes?
> I just think it would be really hard to pull this off
And while I would agree with you on that point, the point I'm trying to make is simply that this has absolutely nothing to do with the fact that git attempts to detect accidental corruption -- the specifics are paramount! While SHA1 is not the weakest cryptographic hash, it is not the strongest cryptographic hash either. Git was not designed to focus on cryptographic security, and it was written by a programmer who is not a cryptographic security expert.
I would also stress that "really hard" is quite different from "impossible" (an in the general case, quite different from even "unlikely".)
(That said, sneaking in a benign-looking backdoor through standard channels sounds way easier.)
Not at all unlikely. Plenty of places to put adjustment bits to correct into a hash collision, like in comments, firmware blobs, unused files. One of the md5 collision examples from earlier was two postscript files that both parsed fine but had different messages.
You might have a point regarding binary blobs but, IMO, if they can alter text(comments, source, etc.) in such a way that it's not obvious to the human reviewer, then they deserve to succeed. I don't know what `unused files` means. Git used `sha1` last time I checked, so I'm not sure why you mention `md5`.
Unused files: there are always plenty of files in a source distribution that aren't actually being used during a compile, for example a README file. As far as I know a commit id is a sha sum of a tree of files, so you could probably put any "adjusting bits" in an unimportant file.
Md5: Just an example of how hash collisions can work in real life (not aware of any published sha1 examples)
Have there been recent public disclosures of vulnerabilities in hypervisors?
Breaking out of virtual machines is a really interesting process but it's important to remember that a hypervisor can be attacked with pretty much the same techniques you can attack any other program. Virtual machines aren't a magic contain-all-the-hackers solution. There was an interesting talk on DEFCON 19 about breaking out of KVM: http://www.youtube.com/watch?v=tVSVdudfF8Q
Technically that was breaking out of QEMU. It was not KVM specific.
If you break into QEMU, you should be a non-privileged user. If you are using libvirt, you are in a cgroup based jail (basically a container) with SELinux being enforced too. So after breaking into QEMU, you would still need to break out of the container before you could attack anything.
But Nelson's exploit was pretty cool. I initially thought remote code execution wasn't possible and he turned around pretty quickly with the exploit. It's quite impressive.
There may be some alarmism getting started here. "The attack was made via hypervisor through the hosting provider" can be interpreted in several ways and (to me) doesn't necessarily indicate a hypervisor exploit. It sounds like it could be similar to the Linode admin access hack.
That was my impression, too. It sounds more likely a hosting company admin had malware installed or re-used passwords, which naturally would give attackers full access to all VMs hosted by the provider.
Considering the amateurish defacement, I think that's far more likely than some sort of a Xen/KVM breakout exploit. If there was such a zero-day, some group would probably be going after numerous providers in secret, not doing something like this.
i may be misunderstanding (there seem to have been at least two famous hacks on linode), but why would you use the word "hypervisor" to explain that someone had hacked your host's admin account? isn't the presence of that word (the h word) what makes this (potentially) scary?
also, will htp ever come back from exile? (not that it seems too unhappy where it is).
There have been pretty much no hypervisor exploits seen in the wild but plenty of admin hacks. (This is basically the same as Andrew Hay's argument.) I want to give the OpenSSL people the benefit of the doubt when they say they saw Bigfoot, but... are they experts in virtualization? Or are they using loose terminology (e.g. considering dom0 or vCenter to be the "hypervisor")?
(HTP is stuck in the scarcity trap, so I should probably block off a weekend to work on it.)
It didn't get a huge press release and media coverage but it was a pretty big deal. Probably because there weren't any 0day exploits floating around at the time so this flew under the radar of the mainstream.
Simple logic: The defacement was amature at best. If the group has a 0-day in a hypervisor they would have gone to multiple hosting companies and multiple attacks would have taken place, there are many more targets that are worth much more than openSSL.
Most likely, the administration panel of the hosting company was comprimised through malware/phishing. Seriously, if a group like this had a 0-day in hypervisor then they would be doing much much more damage.
Defacing OpenSSL's website might be low-value, but backdooring OpenSSL code (trusting-trust style!) would be about as high-value a target as I could imagine.
This doesn't surprise me one bit if its a hypervisor hack. You have to design in this stuff from day one rather than tack it on as an afterthought. To quote Theo de Raadt on virtualization, who I agree with:
"x86 virtualization is about basically placing another nearly full kernel, full of new bugs, on top of a nasty x86 architecture which barely has correct page protection. Then running your operating system on the other side of this brand new pile of shit.
You are absolutely deluded, if not stupid, if you think that a worldwide collection of software engineers who can't write operating systems or applications without security holes, can then turn around and suddenly write virtualization layers without security holes."
may as well link directly to OpenSSL's post on it (which says the same thing) [0]. Also, assuming the traceroute on www.openssl.org is correct, this[1] is their webhost.
A few days ago I posed the question of whether Git's crypto is an example of dangerous amateur cryptography, since Linus isn't (AFAIK) a crypto expert: https://news.ycombinator.com/item?id=6961683
The general answer I got was that Git isn't really crypto, because it isn't using the hash to guarantee integrity, but simply as a checksum to detect corruption.
I didn't find this argument very convincing at the time, and I would now offer the above quotation as evidence that people do in fact treat Git's hashes as a security mechanism that can withstand an adversarial attack.