Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How Google’s CDN prevents your site from loading in China (edjiang.com)
50 points by anubiann00b on Sept 12, 2014 | hide | past | favorite | 47 comments


It's partially related but something I would really like to have is a cross-website cache for public scripts. Around 80% of the size of the scripts is public libraries used by almost everyone (jquery, bootstrap, moment.js, various jquery plugins, angular...) and each of them is downloaded thousands of times.

One simple solution could be something like this:

<script type="text/javascript" src="/js/jquery.min.js" public="sha1:356a192b7913b04c54574d18c28d46e6395428ab">

This way the browser can have a look at the hash and not query the file at all. This could not lead to security issues since the hash saved by the browser is not the hash displayed but the one computed with the actual file. (and obviously you are only using the public attribute for scripts which are meant to be public).

With this technique, the most popular libraries could be cached and not downloaded by users.


Subresource Integrity (SRI) addresses the problem indirectly but it allows you to add checksums to resources. There's various security considerations in regards to caching, I don't think that the doc touches them all: http://www.w3.org/TR/SRI/#caching-optional-1

Anyhow, that might be a good place to contribute.


In addition to the (significant) bandwidth savings, this is an important idea for privacy/tracking reasons as well. I may be fine with websites A, B, and C logging that I made a request for one of their pages, but I'd rather not give Google[1] the browsing path A->B->C just because they host jQuery.

While browsers having an internal copy of various common scripts is a great idea, I was briefly working on a Firefox addon that would simply hard-caches any URL that matched some sort of criteria (e.g. regexp for "//ajax.googleapis.com/ajax/libs/.*\.js")

Unfortunately, the project is on hold for now. While it it was easy to match HTTP requests with an observer for 'http-on-modify-request', the nsIHttpChannel[2] object you get from that only seems to let you redirect the request. I considered trying to redirect to a "chrome:" or "file:" url, but that seem like a horrible solution. The real way to mess with HTTP loading and caching, unfortunately, is buried somewhere I have yet to find. :/

[1] or any other shared CDN, such as CloudFlare and their horrible hashed domain names

[2] https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/...


That would be really neat. It could also solve a security issue with public CDNs.

Right now there's nothing to stop a malicious CDN from changing the content of an included script on your site without you knowing it.

With a hash tag like this the browser could refuse to load the file or warn the user if it didn't match.


You could have a small JS snippet on the page (served from your own domain) that checks the hash of the JS loaded from a CDN before running it.


That would work well for most, until the /jquery-latest.min.js or whatever is updated to the newest, latest release. But that would also be a problem with the browser based solution.

The question then is - how do you distribute the trusted hash?

Maybe there should be an independent organization or website that serves trusted hashes for common or registered libraries and files.


Right, you can't verify hashes for resources that change. You'd have to link to a specific version that everyone can agree on. As for trusting the hash itself - I guess someone you trust (probably the author) would have to sign the hash, then you could verify the signature.


As long as the author isn't serving the signed hash via the same CDN as the files. Then there's the logistics problem of having to looking in different hash locations for each file.

I'm just thinking of some libraries that could be security sensitive, and thus using latest releases on day 1 is the most important. I surmise these would also be the same libraries you would want to use this type of authentication on.


If an attacker changes the signed copy on the CDN, the signature check will fail.


Maybe browsers should ship with these libraries so nobody's relying on every single random website to be impenetrable.


Then you end up with the question of "how do you decide what libraries to include"


Exactly.

Then you create an entirely new, fragmented ecosystem like the current html and css web standards, adding more complexity and layers to front-end web development.

Best that the browsers stay agnostic in that regard.


http://trends.builtwith.com/javascript

All the browser companies are in a particularly good position to collect this information too.


Or they could all link to the same copy, like Google's hosted libraries. https://developers.google.com/speed/libraries/devguide#Libra...


Did you read the post? ;)

It needs to be built into the browser because of issues like the one he was having.


This solves the problem on the server side, using existing standards, without building any new tech into the browser.


The problem is that the current "solution" is to cache based on the URL, which breaks if the URL is not accessible, as in this instance.

The suggestion solves that issue by using hashes of the files, so it doesn't matter if they are loaded from a remote/CDN URL or from the same server, they will be considered cached by the browser (and loaded from cache) regardless once the hash matches.


Did you read the article?

> It turns out that many websites are loading content from Google’s CDN, or Facebook/Twitter APIs, which are blocked in China.


I'm not commenting on the article, I'm replying to realusername's "partially related" idea.


It doesn't solve the problem because of the OP's issues.

Using a hash would allow you to load them from any URL, including the blocked ones.


No, it doesn't.


Realusername would like "a cross-website cache for public scripts". That's what this does. Every site gets to load the version from your browser's cache without downloading it. The problem given is that "each of them is downloaded thousands of times", and this fixes that.


But you're presuming that the shared URL is available to the browser. The whole point of this story is that that presumption is absolutely false for internet users in China. I'm betting that you'd find the same to be true for users in Iran, North Korea and any other embargoed nation. Realusername's solution was an attempt to solve the problem for everyone without writing off the billion or so users unfortunate enough to live in repressive countries.

But, you know, you live in the US and your solution works for everyone in the US, so F everyone who doesn't.


Great - use the hash of an obscure site specific script, then detect how quickly the script loads and you know whether your victim has visited the site because they have it in their cache. Looks like a surefire route to a cache information leak to me.


You can already do that.

     good.com:
        <script src="/js/site.js">

     evil.com:
        <img src="https://www.good.com/js/site.js">
Then use the navigation timing api to figure out whether the js was already in cache.


(Actually, you could use the onload event; you don't actually need navigation timings.)


Cache information leakage for common js libraries is a non-issue compared to a CDN being compromised and mass MITMing via javascript libraries.

Even things like NoScript don't stop that vector if you whitelist common CDN's like google's.


The main goal of a technology like this is to help the caching of common scripts, not to cache your entire website.

The only information you could have with this is that the browser already downloaded jquery from another website, that is not going to help that much.


Just make sure that you use a properly secure hash...


[deleted]


Not cooperating?

The blocking done at a decentralized level as is imposed by the central government. You can read the policy documents in English if you look around.

This is why sometimes sites work in one part of China and not in another. And sometimes the firewall will go down for periods of time.

The telecoms cooperate with Chinese government as much as they need to, in much the same way Google and other cooperate with the NSA.


jhancock was referring to cooperation between telecoms, not cooperation between telecoms and the government.


Linkbait title aside, this is a pretty good argument in favour of graceful degradation for scriptless pages. If the JS libraries/cdn/what not you deal with are on a CDN which is down for whatever reason (be it blocked, temporarily offline or "I forgot to pay them"), it's important for your site not to block on the requests and to display the text content your users want in a readable format.

We're not talking "web 3.0 apps" here, we're talking documents - news articles, "Contact Us" pages on a company site, etc.

It's also one of the scarier downsides of centralized CDNs. It's too easy for a single site to get blocked or go down temporarily and suddenly, thousands of websites become unaccessible. And this is not a situation we can keep brushing off for long, there is a real need for decentralized solutions.


Every time I've talked to a business that wants a web site or server backed software product for Chinese users, they've said the server has to be in China. This is why. Even when you do manage to get a request out, it is often laggy and worthless from a UX perspective. Linking out of the country for resources needed to load the page is just ignorant.


The problem is that most things hosted by Google resolve to ghs.google.com.

Given that China blocks sites that it doesn't like by simple dns then all Google hosted content is blocked.

And of course Google blocks sites hosted from being seen in places like North Korea, Iran, Cuba, Syria, etc. due to the way that Google enforces U.S. sanctions. Google is not alone on this.


Or maybe just use the standard fonts and don't use a CDN. 99.99999% of the sites on the internet gain very little from CDN. Yeah it's a cool technology, it's nice to pretend you're important, but in the end CDN is an expensive (in complexity and risks) toy.


It's not like Google is blocking something. Google is blocked by China.


google fonts alternative = https://github.com/alfredxing/brick


Thanks for that! I have not seen this one come up.


Why does the "fix" have to be the cdn versus China itself? Why is that being ignored as the real issue?


Pragmatism? If you want your web site to be viewable in China you can

a) change a URL

b) campaign for open internet access in China

which one is likely to be more effective?


Shouldn't we design systems to not be susceptible to this kind of problem rather than just asking everyone to be nice?

A similar issue would be for one of the CDNs to go down, this isn't just a problem with censorship.


And a web developer is going to change China how?


> I’m not sure of a good alternative to Google Fonts though.

Is Adobe's Typekit blocked?


Simple solution: Host JS libs on a HTTPS domain that China cannot afford to block, e.g. GitHub. See https://greatfire.org for a practical take on this approach.


I don't understand. If China can afford to block Google, why wouldn't the be able to afford blocking GitHub?


Because in OP's world GitHub is very important, and he believes Chinese bureaucrats are like him.

I currently do consulting work for a Fortune 20 corporation and their firewall blocks cloning of Github repositories. They have over a thousand developers on site ... I'm thinking of writing a scraper, that clones from the Web pages, which do open.


There is no such thing as a site which China cannot afford to block. GitHub has been blocked from time to time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: