Duo finds SAML vulnerabilities affecting multiple implementations

lvh · on Feb 27, 2018

This is a great example of a hairy canonicalization bug!

SAML has a process called canonicalization. Why? There are a bunch of XML documents that "mean the same thing", especially if you extend the meaning of "mean the same thing" to "how parsing libraries expose the tree and how consuming applications use it" instead of just the narrow definition of "exactly what XML says it means". But signature algorithms don't sign semantic XML trees: they sign bytes. So it's important that for an XML document you have exactly 1 byte representation. (Well, I say that, and I don't really believe it is important for SAML. But W3C believed that, and that's why XML canonicalization is in the spec and we get to deal with that now so :shrug:)

So, let's say I (legitimately!) own attacker.com, so I (legitimately!) own victim.com.attacker.com, and I (legitimately!) get someone to sign me@victim.com.attacker.com. All is well: the signature is only valid for that boring domain.

Here's the problem: the canonicalization algorithm and the parsing algorithm disagree on how to deal with a particular case. Specifically: the parsing algorithm doesn't throw away comments, and the canonicalization algorithm does. When you parse:

    me@victim.com<!-- what -->.attacker.com

The canonicalization algorithm sees "me@victim.com.attacker.com" because it stripped comments, and the parsing algorithm sees some text (lvh@lvh.com) followed by a comment followed by ".attacker.com". If you tell your parser to give you the first bit of text, you end up with "me@victim.com". The attacker adds the comment, but because the signature validation uses the a canonicalization method that strips the comment first, it gives the SAML token the all-clear: the comment does not affect the validity of the signature, but it does affect the final parse tree.

One of the (other) ways you can prevent this (which I heartily recommend to clients who can afford to do it) is "cryptographic audience restrictions". Traditional audience restrictions exist in a variety of standards including SAML and JWT and basically just say "this token is for xyzzy.com". This isn't great because tons of applications just ignore those. The SAML messages in question in this bug don't even contain them. They also wouldn't have helped much here, because the attacker potentially produces a valid SAML document for the right RP anyway.

Cryptographic audience validation is about having a key pair per (IdP, RP) pair, making it mandatory! If you as the victim.com IdP have a unique key between you and the RP, a so-forged SAML message wouldn't work: not because the message is wrong (it isn't), but because it'll be signed with the wrong key. Even if you can get the IdP to sign something for me@victim.com.attacker.com, and even if the RP's parser would parse the forged message as being for me@victim.com, it wouldn't matter because the signature wouldn't validate because it's signed with attacker.com. That's the case even if you do the sorta-skeevy thing where you get the parser to extract the alleged principal the message is for (in order to select the key to validate with).

(If you go through my comments you'll find recent examples of me arguing with people that they probably don't want public key crypto for these sorts of tokens anyway; the counterargument is "but it simplifies key distribution", and my counterargument is: "it really doesn't because if you can you want a key-per-pair, not a key-per-principal".)

ak217 · on Feb 28, 2018

This is actually a violation of the "see what is signed" principle.

When verifying an XML signature, you need to only look at the data that was signed, and disregard everything else. The XML Signature standard defines a number of "transforms", one of which is canonicalization, which may or may not strip comments. Only data that has been transformed is signed. This means an application relying on signature verification should never look at untransformed data! Comments are not the only attack source - the XML Signature standard allows "custom transforms" which can do crazy stuff like evaluation of xpath-based filters.

The "see what is signed" principle is simple and has little to do with SAML. Canonicalized data that passed signature verification is the only data that the relying application should look at. The signature verification library should not allow access to anything else.

If I understand everything correctly, my implementation https://github.com/XML-Security/signxml is not vulnerable because it obeys the rule above.