Geoff: some of the photographs have blur spots to conceal something you don't want to be public. Just an FYI that malicious attackers could bypass this blur in some cases by starting with a generated version of the 'before' content and blurring it and iterating different inputs until it matches your blurred photo. This matters for things like a document with an account number or some other secret. It may not matter much for your network adapters (mac addresses, maybe?). Just a heads up.
Thanks wyldfire; the photos that are blurred are the identifiers of my RIPE atlas nodes. It's not a big issue if they become public knowledge. Fully aware of attack vectors related to image bluring.
You can't reverse a typical blur algorithmically, since it destroys information. You can only guess at the information that was there before the blur, and work from there.
The stronger the blur intensity, the amount of possible starting states to get to that blur increases exponentially. If you blur enough, every pixel is the same color, which obviously has destroyed all information.
But if I know that the blurred content is a social security code written in 12pt Times New Roman - I can perform a blur operation on a million SSNs, and see which one matches most closely to the mess I have on the screen. It's easier with bar codes.
> If you blur enough, every pixel is the same color, which obviously has destroyed all information.
Not true. The ASCII art people have sorted characters by their “shade”; if you know the foreground and background colour of a letter, and you know the single colour of pixel it blurs to, you can still work out what the letter was.
Blurring discards some information and obfuscates other information, but really, given how easy it is to reverse-engineer such things, we should measure the number of bits of information remaining.
If information in image + information about image > bits of information in sensitive data, then in theory you can recover the data.
So simply deleting the data from the image (e.g. blacking it out, removing reflections) is preferable. Saves you a lot of effort!
> if you know the foreground and background colour of a letter, and you know the single colour of pixel it blurs to, you can still work out what the letter was.
Totally agreed, and I like what you said about information remaining - you start to get into fun information theory stuff.
For example, let's say you're given one of the 'shades' you mentioned. I give you #a9a9a9. There's 32 bits of information there. However, if it's a blurred letter of black text on white background, it's always going to be grayscale. There's only 256 possible grayscale values - only 8 bits. Luckily, since there's only 26 lowercase english characters, we can easily fit that into our 255 values. Information is not destroyed! This is the shade->char map you were talking about.
But now what if we blur enough to turn two letters into one shade. That one shade will still be between 0 and 255. But now there's 26*26=676 possibilities that could've created that shade. More inputs than we have outputs for. There's no way to fit more than 256 possible inputs into 8 bits of information. Shade '98' could be 3 or 4 different inputs. However, we can be very clever...
We know that the letter 'u' comes after 'q' almost always. We know that 'x' never appears next to 'j'. We know lots of things, and can supplement the destroyed information, with outside information. We can actually change the whole context for this conversation now. We're not talking about blurring. We're talking about fitting english text into less bits than ought to be possible. We're talking about compression. This is exactly what compression algorithms for english text will do. There's a lot of redundant information in plaintext, just like there's a lot of redundant information in our images of text. An 8x8 pixel character glyph can easily reduce the information to a single pixel. However, there are limits. You can compress english text by 10x, if it's simple enough. You can't compress War and Peace into 5 bytes. You can blur text a few pixels and not destroy information. But you can't blur a paragraph of 12px font by 500px and get your original information.
It’s difficult or impossible to derive the original information from the output, but if you have a small enough keyspace you can generate all of the blurred (“hashed”) versions and compare the results.
To some degree, yes, but that metaphor breaks down because (often/typically) hash functions are designed so that small changes to the input result in large changes to the output.
Anything opaque would work. But aesthetically that's not always preferable. Maybe an opaque central region with blended borders would be a good balance.
I black out all sensitive regions in magenta, blur any bits I suspect might have reflections (blacking them out if I can see that there's actually something there), then copy from elsewhere in the image to get the aesthetics right, then blur to hide the fact I copied stuff.
Yes, making sure that the file format you picked doesn't support layers (or at least that those layers are flattened) that mistake bit the New York Times a while ago.
I think so. If you don’t like the way that looks aesthetically, I’ve overlaid the text with a white background and more text and blurred that, so you get the same blurred effect but you’re blurring nothing important.