It's partially related but something I would really like to have is a cross-website cache for public scripts. Around 80% of the size of the scripts is public libraries used by almost everyone (jquery, bootstrap, moment.js, various jquery plugins, angular...) and each of them is downloaded thousands of times.
This way the browser can have a look at the hash and not query the file at all. This could not lead to security issues since the hash saved by the browser is not the hash displayed but the one computed with the actual file. (and obviously you are only using the public attribute for scripts which are meant to be public).
With this technique, the most popular libraries could be cached and not downloaded by users.
Subresource Integrity (SRI) addresses the problem indirectly but it allows you to add checksums to resources. There's various security considerations in regards to caching, I don't think that the doc touches them all: http://www.w3.org/TR/SRI/#caching-optional-1
In addition to the (significant) bandwidth savings, this is an important idea for privacy/tracking reasons as well. I may be fine with websites A, B, and C logging that I made a request for one of their pages, but I'd rather not give Google[1] the browsing path A->B->C just because they host jQuery.
While browsers having an internal copy of various common scripts is a great idea, I was briefly working on a Firefox addon that would simply hard-caches any URL that matched some sort of criteria (e.g. regexp for "//ajax.googleapis.com/ajax/libs/.*\.js")
Unfortunately, the project is on hold for now. While it it was easy to match HTTP requests with an observer for 'http-on-modify-request', the nsIHttpChannel[2] object you get from that only seems to let you redirect the request. I considered trying to redirect to a "chrome:" or "file:" url, but that seem like a horrible solution. The real way to mess with HTTP loading and caching, unfortunately, is buried somewhere I have yet to find. :/
[1] or any other shared CDN, such as CloudFlare and their horrible hashed domain names
That would work well for most, until the /jquery-latest.min.js or whatever is updated to the newest, latest release. But that would also be a problem with the browser based solution.
The question then is - how do you distribute the trusted hash?
Maybe there should be an independent organization or website that serves trusted hashes for common or registered libraries and files.
Right, you can't verify hashes for resources that change. You'd have to link to a specific version that everyone can agree on. As for trusting the hash itself - I guess someone you trust (probably the author) would have to sign the hash, then you could verify the signature.
As long as the author isn't serving the signed hash via the same CDN as the files. Then there's the logistics problem of having to looking in different hash locations for each file.
I'm just thinking of some libraries that could be security sensitive, and thus using latest releases on day 1 is the most important. I surmise these would also be the same libraries you would want to use this type of authentication on.
Then you create an entirely new, fragmented ecosystem like the current html and css web standards, adding more complexity and layers to front-end web development.
Best that the browsers stay agnostic in that regard.
The problem is that the current "solution" is to cache based on the URL, which breaks if the URL is not accessible, as in this instance.
The suggestion solves that issue by using hashes of the files, so it doesn't matter if they are loaded from a remote/CDN URL or from the same server, they will be considered cached by the browser (and loaded from cache) regardless once the hash matches.
Realusername would like "a cross-website cache for public scripts". That's what this does. Every site gets to load the version from your browser's cache without downloading it. The problem given is that "each of them is downloaded thousands of times", and this fixes that.
But you're presuming that the shared URL is available to the browser. The whole point of this story is that that presumption is absolutely false for internet users in China. I'm betting that you'd find the same to be true for users in Iran, North Korea and any other embargoed nation. Realusername's solution was an attempt to solve the problem for everyone without writing off the billion or so users unfortunate enough to live in repressive countries.
But, you know, you live in the US and your solution works for everyone in the US, so F everyone who doesn't.
Great - use the hash of an obscure site specific script, then detect how quickly the script loads and you know whether your victim has visited the site because they have it in their cache. Looks like a surefire route to a cache information leak to me.
Linkbait title aside, this is a pretty good argument in favour of graceful degradation for scriptless pages. If the JS libraries/cdn/what not you deal with are on a CDN which is down for whatever reason (be it blocked, temporarily offline or "I forgot to pay them"), it's important for your site not to block on the requests and to display the text content your users want in a readable format.
We're not talking "web 3.0 apps" here, we're talking documents - news articles, "Contact Us" pages on a company site, etc.
It's also one of the scarier downsides of centralized CDNs. It's too easy for a single site to get blocked or go down temporarily and suddenly, thousands of websites become unaccessible. And this is not a situation we can keep brushing off for long, there is a real need for decentralized solutions.
Every time I've talked to a business that wants a web site or server backed software product for Chinese users, they've said the server has to be in China. This is why. Even when you do manage to get a request out, it is often laggy and worthless from a UX perspective. Linking out of the country for resources needed to load the page is just ignorant.
The problem is that most things hosted by Google resolve to ghs.google.com.
Given that China blocks sites that it doesn't like by simple dns then all Google hosted content is blocked.
And of course Google blocks sites hosted from being seen in places like North Korea, Iran, Cuba, Syria, etc. due to the way that Google enforces U.S. sanctions. Google is not alone on this.
Or maybe just use the standard fonts and don't use a CDN. 99.99999% of the sites on the internet gain very little from CDN. Yeah it's a cool technology, it's nice to pretend you're important, but in the end CDN is an expensive (in complexity and risks) toy.
Simple solution: Host JS libs on a HTTPS domain that China cannot afford to block, e.g. GitHub. See https://greatfire.org for a practical take on this approach.
Because in OP's world GitHub is very important, and he believes Chinese bureaucrats are like him.
I currently do consulting work for a Fortune 20 corporation and their firewall blocks cloning of Github repositories. They have over a thousand developers on site ... I'm thinking of writing a scraper, that clones from the Web pages, which do open.
One simple solution could be something like this:
<script type="text/javascript" src="/js/jquery.min.js" public="sha1:356a192b7913b04c54574d18c28d46e6395428ab">
This way the browser can have a look at the hash and not query the file at all. This could not lead to security issues since the hash saved by the browser is not the hash displayed but the one computed with the actual file. (and obviously you are only using the public attribute for scripts which are meant to be public).
With this technique, the most popular libraries could be cached and not downloaded by users.