Is there a reason a complete IP address rainbow-table wouldn't defeat this?

ddevault · on Nov 21, 2013

bcrypt is designed to thwart rainbow-table attacks. It salts the hashes and it takes a while (1/3 of a second on my machine) to compute a single hash.

https://en.wikipedia.org/wiki/Bcrypt

ddevault · on Nov 21, 2013

I'm going to respond to all of you at once by saying this: bcrypt is the best possible solution that we are aware of. It's infeasible for anyone but the most resourceful adversaries to brute force your hashed IP, and even then it's still expensive.

However, that's part of why we're open source. You can't trust us when we say that we aren't storing your IP. We could be doing it and you'd have no way of being able to tell. If you're concerned about this, run a private instance of MediaCrush. There are instructions in the README, it's pretty easy to set up.

birken · on Nov 21, 2013

I don't know exactly what situation you are trying to avoid, but with the standard bcrypt, if somebody has the IP hash and a candidate's specific IP, they can positively match the two (something you specifically mention on your privacy page).

One possible tweak is to continue using bcrypt and a salt, but instead shorten the hash output to something like 24 bits. This way it still cannot be so easily reversed or rainbow-tabled, and collisions still shouldn't be an active problem. However, it wont be possible to positively match a given IP to a hash, since multiple IPs will likely hash to a given output. Granted, if you have a candidate IP and it matches the output hash, there is a very high probability that it was the source IP, but it wouldn't be 100%.

wgd · on Nov 21, 2013

At 1/3 of a second to compute a single hash, brute forcing the entire space of possible addresses takes around 45 CPU-years. But computing the hash of every single IP address is ridiculously parallel, so it's trivial to spin up 2k machines on EC2 and brute force the entire thing in a week. Total cost, somewhere under $8k if you don't want to bother owning real machines, less for any organization that happens to need to do similar things on a regular or semi-regular basis.

It's not a trivial investment since that effort only gets you a single IP address, but it's easily within the reach of a vast number of organizations if they have real motivation (read: not a fishing expedition) to reverse it.

bmelton · on Nov 22, 2013

And of course, they wouldn't have to test every IP address in the world, they'd only have to test the IP addresses that appeared in the webserver logs at some point, substantially reducing the time requirement.

ddevault · on Nov 23, 2013

For what it's worth, we don't keep IPs in the http log.

gunn · on Nov 21, 2013

Good answer... It looks like it would take on the order of 7 CPU years to create a table of every used address, much less time to target an area or individual. I don't think this is an issue at all though, I was just curious.

ddevault · on Nov 21, 2013

Actually, the salting is an important detail. Those 7 CPU years would only create a table for one hash. That's how long it takes to brute force a single hash, not the entire space.