Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's a system design oversight. Permanence was long sought for. I'm not claiming I've got all the solutions but it's yet another example of how the web is incomplete.

If you read the contemporaneous history from the early 1990s when the concrete of the web was still wet it should become obvious that it's worth a revisit of the fundamentals

For instance, DNS could include archival records or the URL schema could have an optional versioning parameter. Static snapshot could be built in to webservers and archival services would be as standard as AWS or a CDN; something all respectable organizations would have, like https.

These only sound nutty because it's not 1993 anymore and we've presumed these topics are closed.

We shouldn't presume that. There's lots of problems that can be fixed with the right effort and intentions. Especially because we no longer live in the era where having 1GB of storage means you have an impressively expensive array of disks.

Many unreasonable things then are practically free now



I am constantly pondering ideas how could intuitive and "cool" web without broken URLs look like, and whether mechanisms for it are or are not embraided into original standards.

Farthest I got is that we probably should see two addresses where we currently see one in our URL bar: Locator and Identifier and whole web-related technology should revolve around this distinction with immutability in mind.

- On server side Locators should always respond with locations of Identifiers or other Locators. So, redirects. Caching headers makes sense here, denoting e.g. "don't even ask next five minutes". - Content served under Identifier should be immutable. So "HTTP 200" response always contains same data. Caching headers here makes no sense at all, since the response will always be the same (or error).

In practice, navigating to https://news.ycombinator.com/ (locator) should always result in something like HTTP 302 to https://news.ycombinator.com/?<timestamp> or any other identifier denoting unique state of the response page. Any dynamic page engine should first provide mechanism producing identifiers of any resulting response.

I feel there are some fundamental problems in this utopian concept (especially around non-anonymous access), but nevertheless would like to know if it could be viable at least in some subset current/past web.


PURLs have been a thing for a while. And while rel="canonical", on the other hand, is a frighteningly recent invention relatively speaking, it does exist.


I think this is being handled the right way now by the Internet Archive. Sort of like a library that keeps old copies of newspapers around on microfiche, or a museum that has samples of bugs in Indonesia 100 years ago. They have a dedicated mission to preserve, around which supportive people can organize effort and money.

I don’t think this can be solved by decentralized protocols. A lot of folks just won’t put in the effort. Quite a few companies already actively delete old content; there’s no way they are going to opt into web server software that prevents that.


That's just a function of expectations.

Expectations are set, not interrogated. Let me give you an example

Companies and organizations with domains are expected to also be running mail on that domain.

Why? I can sit around and make up a bunch of reasons but none of them are given when that mail service is being set up, it's done out of expectation, just like how someone might pay $295,000.00 for the .com they want and wouldn't even pay $2.95 for the .me or .us

Are the .com keys closer together? Easier to type? Supported by more browsers? No.

There's mostly arbitrary social norms that get institutionalized.

They can go away. Having ftp service or a fax line, for instance, used to be one of them. Those weren't thrown into the trash for cost cutting reasons, the norms changed.

The question is where do we want these norms to go and what are we doing to encourage it?

This is how this could materialize - say there's an optional archival fee when registering a domain. Next search engines could prioritize domains that pay this fee under the logic that by doing it, the website owners are standing behind what they publish.

These types of schemes are pretty easy to fabricate - the point is the solutions are plentiful, it's all a matter of focus, effort and intentions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: