When Rust separated its forumses, discuss.rust-lang.org became internals.rust-lang.org and a new forum, users.rust-lang.org was created. the discuss one had a nice redirect to internals, but for some freaking reason it was deleted and all now old links are permanently broken (you can fix them manually in the url though)
If someone here is running any part of the Rust infra, please consider getting this redirect back.
For words like forum and datum, both are possible, but also there's a clearly more commonly used option that doesn't confuse everyone. In Dutch, fora is probably about equally common as forums; in English I'm not sure I've seen it ever used. It's like using datum to mean a single data point: "look at me being clever about word technicalities and making everyone do a double take to understand this sentence". At least, that's how I feel when seeing a native speaker use datum instead of data point in English, or in Dutch using data to mean multiple calendar dates (the common word being datums).
I don’t know when I internalized that “data” is plural and its singular is “datum”, but it wasn’t when I had a similar reaction to “fora”. The former is just how my brain handles the word’s count, and the latter is very much like your reaction.
Personally, I'd go for "forums." (I don't actually think I've ever actually heard "fora," so I'm skeptical that it's even used enough to qualify as a "valid" English word...)
Hah, I fought a battle for months at a Nordic automotive OEM, to try to stop them from overusing "fora", to no avail. They were so convinced that it was the proper word to use, they also used it in singularis, as in "We will bring that up in another fora", often then also referring to a single meeting to be held in the next day. This is only one tale out of many on how corrupted the corporate Scandinavian English can be, and sadly, since being essentially marinated in this parlance, I can probably only detect and reflect of a fraction of the linguistical felonies committed around me, and worse: by myself.
You are technically correct, the best kind of correct. But in reality correctness in language is dictated by usage. After all, their r hella many formerly “correct” parts of language no1 uses ne more or sth.
Otherwise we’d still be speaking proto-Indo European arguing about words that came from whatever language came before that.
There's always new things to discuss, even if they're the same things. Because the people doing the discussing are new. For example, I've never seen those old discussions and I never in a million years would have searched for and found them.
I've found the key, when you're on a forum and see something discussed more than once, is to change your thinking. Instead of getting annoyed, think great, a new cohort of people will get to talk about this important topic.
This kind of reframing works in many different areas of my life and the older I get, the more I have to apply it to avoid becoming a grumpy old curmudgeon.
(Links to the old discussions are great though, they just don't need to be followed up by complaining every time.)
So much this. Many (most?) oldstyle forums suffered from a pretty strict no-duplicate rule even in “free talk” sections. This led to few oldtime gatekeepers dominating these places, new post anxiety and general stagnation, because new users had no way to learn occasionally or talk about things. When I started to visit HN, it was so relieving to see that well-separated duplicates don’t get moderated away or even discouraged. It simply means that you can eventually learn what you don’t know yet or missed for some reason, can participate/discuss it from a today’s point of view and renew your understanding.
Repetition may be seen as a visitor’s expense, but appreciating it creates a huge net profit over time, cause it helps a community to stay fit and alive.
All the tut-tutting makes searching harder for other users too! You end up with relevant and active looking threads that are actually just regulars dunking on the poster for not searching.
My personal favorite is doing a Google search and landing on an ancient forum where the discussion is some variation of "look at the stickies" or "use the search feature", but the stickies are 404s and the search returns only the aforementioned thread.
> I'm not sure there's anything left to comment that hasn't already been commented
I would have agreed with you a year ago. But now, with AI, the entire web model may become less of a thing.
I thought the web would endure forever, despite the attacks from platforms and browser monopolizers.
Now it looks like social will soon be overrun by robots (maybe creepy ID verification will save it?), and the need to publish information will diminish as you can generate it and probably share the results on some future AI platform.
Maybe it's the opposite. "Social" spaces will become so flooded with worthless AI posts trying to grab people's attention for advertising purposes that real websites created by real people will flourish once again. That's what I want to believe.
What does that even mean? Because it could credibly mean like 20 different things, some of which would clearly be nonsense, but you haven’t left us with a specific enough of an idea to engage with either way.
AI applications function with the web model. The web will to some extent endure forever, I doubt it'd ever go fully extinct. We may use AI tools more, but not for everything.
It may be an old article now, but the point still stands all these years laters. Like, I get it. If the site shuts down, then the URLs are obviously going to change there.
But the sheer amount of times where sites get redesigned and somehow every link to the site breaks at once is utterly ridiculous. There's zero reason to change your URLs every time you change CMS, and there's even less of a reason not to redirect the old URL format to the new one.
Yet somehow it happens constantly, especially on news sites which seem to love changing their site structure every few months or something. Sigh.
1. Using URL schemes that come with the framework. Change of framework breaks everything.
2. Simply not caring. If the site is commercial, preserving rarely accessed parts for the sake of consistency is not important for them. PR carousel with big stock images and 1-3 sentence empty statements is the norm. You are not supposed to "use the site" you are supposed to come trough channels and campaigns that are temporary.
The obvious reason is every single document you publish is a liability and potentially a burden. At minimum, it is a cognitive burden and additional responsibility. If it weren't a liability, why bother killing the links? If it weren't a cognitive burden, how could they have forgotten to keep the old URL structure in place?
Everything is temporary. Archive what you care about if you really need it to last.
>There's zero reason to change your URLs every time you change CMS, and there's even less of a reason not to redirect the old URL format to the new one.
Existing links to your site should be fairly valuable. Search engines know what links there, users following links from other pages think your site is more useful than the current page.
Last time we redesigned a major site, we carefully analyzed every URL for organic traffic value. Most were worthless: barely indexed by Google and receiving at most a couple visits a month. We killed most of them and saw improvements in overall traffic from search.
I think a lot of folks originally thought of websites like reference books. And some should still be thought of that way (Wikipedia, open source documentation, etc). I used to try very hard to keep all URLs functional during site redesigns. We accumulated thousands and thousands of redirects… most of which were never used.
I’ve since come to consider that a lot of sites are more like magazines: useful for a limited time span but not something that needs to live on your shelf forever.
Yeah. When a site says "the old xyz.oursite.com pages no longer exist. use the search bar to blah blah", there's 90% chance that I'm just going to close the tab and just try a different (even if non-official) site.
Thanks for proving my point: There's no value in pursuing people who aren't your customers, who aren't providing you value.
If we as end-users want URLs to not rot away, we need to put value on working URLs that convinces webmasters to put in the effort to maintain working URLs.
If it’s a business, it certainly makes me less interested in using their product or service. Honeywell’s site has this problem and now I associate their brand with annoyance at trying to navigate dead links to technical documentation. I end up thinking stuff like: “I wish this was a Siemens part” which I can’t imagine is their goal.
Weird question. I'm equally willing to pay for working links as I am to pay for the content, so the variable becomes irrelevant. A broken link is no different from broken content.
It's a system design oversight. Permanence was long sought for. I'm not claiming I've got all the solutions but it's yet another example of how the web is incomplete.
If you read the contemporaneous history from the early 1990s when the concrete of the web was still wet it should become obvious that it's worth a revisit of the fundamentals
For instance, DNS could include archival records or the URL schema could have an optional versioning parameter. Static snapshot could be built in to webservers and archival services would be as standard as AWS or a CDN; something all respectable organizations would have, like https.
These only sound nutty because it's not 1993 anymore and we've presumed these topics are closed.
We shouldn't presume that. There's lots of problems that can be fixed with the right effort and intentions. Especially because we no longer live in the era where having 1GB of storage means you have an impressively expensive array of disks.
Many unreasonable things then are practically free now
I am constantly pondering ideas how could intuitive and "cool" web without broken URLs look like, and whether mechanisms for it are or are not embraided into original standards.
Farthest I got is that we probably should see two addresses where we currently see one in our URL bar: Locator and Identifier and whole web-related technology should revolve around this distinction with immutability in mind.
- On server side Locators should always respond with locations of Identifiers or other Locators. So, redirects. Caching headers makes sense here, denoting e.g. "don't even ask next five minutes".
- Content served under Identifier should be immutable. So "HTTP 200" response always contains same data. Caching headers here makes no sense at all, since the response will always be the same (or error).
In practice, navigating to https://news.ycombinator.com/ (locator) should always result in something like HTTP 302 to https://news.ycombinator.com/?<timestamp> or any other identifier denoting unique state of the response page. Any dynamic page engine should first provide mechanism producing identifiers of any resulting response.
I feel there are some fundamental problems in this utopian concept (especially around non-anonymous access), but nevertheless would like to know if it could be viable at least in some subset current/past web.
PURLs have been a thing for a while. And while rel="canonical", on the other hand, is a frighteningly recent invention relatively speaking, it does exist.
I think this is being handled the right way now by the Internet Archive. Sort of like a library that keeps old copies of newspapers around on microfiche, or a museum that has samples of bugs in Indonesia 100 years ago. They have a dedicated mission to preserve, around which supportive people can organize effort and money.
I don’t think this can be solved by decentralized protocols. A lot of folks just won’t put in the effort. Quite a few companies already actively delete old content; there’s no way they are going to opt into web server software that prevents that.
Expectations are set, not interrogated. Let me give you an example
Companies and organizations with domains are expected to also be running mail on that domain.
Why? I can sit around and make up a bunch of reasons but none of them are given when that mail service is being set up, it's done out of expectation, just like how someone might pay $295,000.00 for the .com they want and wouldn't even pay $2.95 for the .me or .us
Are the .com keys closer together? Easier to type? Supported by more browsers? No.
There's mostly arbitrary social norms that get institutionalized.
They can go away. Having ftp service or a fax line, for instance, used to be one of them. Those weren't thrown into the trash for cost cutting reasons, the norms changed.
The question is where do we want these norms to go and what are we doing to encourage it?
This is how this could materialize - say there's an optional archival fee when registering a domain. Next search engines could prioritize domains that pay this fee under the logic that by doing it, the website owners are standing behind what they publish.
These types of schemes are pretty easy to fabricate - the point is the solutions are plentiful, it's all a matter of focus, effort and intentions.
Only tangentially related, but reminds me of something I ran across recently. I've been maintaining a couple of old blogs my wife used to write for years; I don't want to get rid of them, but it was a bit of a pain having to regularly keep wordpress updated and fix any related issues. Recently found a WP plugin, Simply Static, that will generate a static site from a wordpress blog, maintaining all the URLs and such. So it's perfect for my use case of archiving something that won't be added to in the future. But you can also use it for a live site, just exporting a new static version after each post.
Used it for those couple of blogs with zero issues, and now I can keep the originals around but inaccessible from the internet, so I don't have to worry about keeping them updated, aside from when I update php itself.
> So it's perfect for my use case of *archiving* something
> I can keep the originals around but *inaccessible* from the internet
So it's only tangentially related to cool URIs not changing, since in this case the actual URIs did (apparently) go kaboom, this plugin just helps maintain the internal links within their local archive.
Technically it's not archived, but just the live wordpress install is no longer accessible. From a web users's perpective, nothing changed (aside from the pages maybe loading a touch faster).
I like the flowers in your front garden. Please be cool and never change them. I like them there when I drive by from time to time.
No I won't just take a photo of them, or plant my own. I need you to maintain them, never landscape your property or be uncool. It doesn't bother me that it costs you money to maintain them, you should have planned for that before I had a chance to see your lovely flowers.
Please hand this note to the next home owner. Since I want the flowers to live forever and humans don't.
The homeowner is the one that posted about a garden in front of their home in their own newsletter. I came there to find out the garden had been moved, but with no notice of where the new address is.
I have to go into the archives to look at the flowers, which means the archive person has to give me images from whenever they took the photo, where it might not be from the same year
Like, are they? "Their own newsletter" makes it sound like internal links remain broken, something that I will immediately cede is rude to a visitor. But – even if I had publicized something and said "hey everybody come look", it's only social context that determines whether you infer from that "hey everybody come look (forever)". And I think that social context is sort of nonsense. For other people who've created links – that's something they should take care of.
I try to clear out my blogroll so people don't see broken links – but no one on that list owed me their permanent web presence.
Right. But if I move stuff around and fix any preserved internal link, the fact that someone else somewhere had some other link that is now broken: not morally on me, IMO!
100% in agreement. It might seem cool to imagine an idealistic internet, but in reality it's maintenance, and cost. I feel like the owner is free to change whatever whenever, and it's up to them to weigh the consequences of pissing off potential visitors.
> Except insolvency, nothing prevents the domain name owner from keeping the name
I maintained 100+ domains over the years, and had to stop renewing most of them because my income drastically got reduced and I couldn't afford to renew them 'forever'. I was careful about which ones got nuked. They were typically sites that received very little traffic and had their heyday and fun in the sun, and there was no point in having them live on for perpetuity.
The few remaining ones I renew (sometimes 10 years in advance because ICANN) still get a lot of traffic and I regularly check the hosting setup to see if they're operating properly, and I check for 404s and downtime, or missing assets like images, JS, etc
A domain is something you commit to. If the project atrophies, you have to be willing to nuke it. But there will always be domains which are too good to nuke.
I too used to get domains, now about 5 years ago I got domain byDav.in & now most of my domains are subdomains like, hukamnama.bydav.in apps.bydav.in basically foo.bydav.in
Later I got tired of adding txt records, now my simple apps are like spa.bydav.in/weather or spa.bydav.in/radio spa.bydav.in/otp.html
I wish someone at Microsoft would read this. Every 6 months it seems like everything around their documentation site changes and half the links permanently 404. The amount of link rot is just staggering.
I'm surprised no one has created a service similar to archive.org but for redirects. It could work as a browser extension and would simply be a library of old broken links and their new locations. Whenever you hit a 404 page it would query the service for an alternative. People can submit links themselves.
I've brought up broken links in projects I contribute to on a few occasions and it seems like people basically don't care if there's even a tiny bit of extra work involved to fix it. Complaining about this and expecting people to maintain links themselves won't work.
I have been known to use Hypothes.is to annotate 404 pages with a signpost to the correct place, especially if getting to the correct one isn't as easy as putting the original URL into the Wayback Machine. I always tag them with the "uncool URI" tag. (Funnily enough, some of the pages I've had to use this tag include ones on w3.org or ones associated with projects like runmycode.org or operated by folks at the Internet Archive.)
URL-Rule 2: permanent (they do not change, no dependencies to anything)
URL-Rule 3: manageable (equals measurable, 1 logic per site section, no complicated exceptions,
no exceptions)
URL-Rule 4: easily scalable logic
URL-Rule 5: short
URL-Rule 6: with a variation (partial) of the targeted phrase the page wants to get found for
URL-Rule 1 is more important than 1 to 6 combined, URL-Rule 2 is more important than 2 to 6 combines, … URL-Rule 5 and 6 are a trade-of. 6 is the least important. A truly search optimized URL must fulfill all URL-Rules.
%short-namespace% – one or two letter that identify the pagetype, no dependency to any site hierachy
%unique-slug% – only use a-z, 0-9, and – in the slug, no double — and no – or – at the end.
Only use “speaking slugs” if you have them under your total editorial control.
i.e.:
https://www.example.com/a/artikel-name
https://www.example.com/c/cool-list
https://www.example.com/p/12345 (does not fulfill the least important URL-Rule 6)
https://www.example.com/p/12345-prouct-name
You’re not thinking very long-term. As the article explains, for long-term resources, your should put some form of date in the URL, such that you do not continually deplete the namespace.
For example, suppose that you once wrote an article on AI, in, say, 2008, with the URL
When the subject, in this case AI, suddenly changes in later years, and you want to write an update, you’ll be forced to use this monstrosity of a URL:
Then if you ever see the need to shuffle around your site, you can do redirects based on unique ID without needing to keep track of slugs or other metadata.
I'd rather occasionally have an awkward url than need to put the date in every time.
I've written ~2k posts and 1% ended up with suffixes like this (ex: https://www.jefftk.com/p/nomic-report-ii). And most of those were published in the same year as the original post anyway.
If a date is something that helps categorise something in a sane, indexable way, and SEO experts don't like them, they can go and [obscenity] an [animal-noun] with a rusty [ambiguous-from-context] and [illegal-act] and put it in their pipe and smoke it.
I have no idea why your text is grey because this is all very good advice. The character class limitations might sound arbitrary but no one who takes this seriously is handling Unicode in URLs because there are too many risks. (Not that I’m an authority on the subject but my most regrettable StackOverflow contribution is the regular expression for IRIs which has some flaws and I’ll never revise it because it’s not a good idea.)
Ah the good ol' days of cgi-bin and perl scripts on Apache.
When I first learned Apache I thought that all URIs served by a webserver needed to be the same name as the file that they pointed to. Then I learned (embarassingly) much later that in HTTP the URI name can be anything at all and the content that it pointed to didn't even need to be a static file, that it could be dynamic.
Interestingly, I have found out that there is a lifetime threshold, especially in blogs. I often bookmark relatively new content only to find out that six months later it has 404ed. But then you bookmark some old article and find out ten years later that it still is there, regardless if the blog is still alive or not.
Shameless self promotion: this is the problem my project sets out to solve, because while this advice is obvious and well known, URIs keep changing and sites keep breaking.
So I'm developing a service that every day crawl your website and tests every single resource, be it regular links, images, CSS, fonts, and also external links over one has little control. And one of the features coming soon post-launch is getting notified if you change an URL and forgot to set a redirect from the old one, breaking SEO, bookmarks, and this very advice.
I'm starting a closed alpha this month and launch the next, so if you're interested in trying this out, send me an email.
I always struggle designing urls in the first place with SEO in mind.
Do I put an ID that bad actors can crawl in ascending order? Do I hash the ID? Do I have a unique slug that requires thought to avoid clashing in the future? Does that add another DB lookup or are you looking up the ID anyway? Do you do /id-slug or /id/slug but if you move to Wordpress or off your stuck with /slug or something with no ID.
Maybe none of it matters but it seems like it does, especially if you consider it a change you expect to last the life of the domain. Every tech solution instead of rolling their own has their own unique way of handling it.
It seems bizarre to me that URLs should matter for SEO. We have hypermedia for that. URLs should technically be opaque and make sense only to the server.
I learned this the hard way when Wikipedia was updated to include some (iirc) dumb password advice (or maybe it was about hashing) and I nearly included that in a customer report because my snippet from a previous report used the normal en.wikipedia.org/whatever form instead of en.wikipedia.org/whatever?oldid=123.
Wish they wouldn't call it old ID: the latest revision is also referred to as such, and this just looks silly, like why is my consultant linking to a stale ID? Call it page revision or just ID or something... this discourages use of them but it's a core feature.
Also, Github links that die all. the. damn. time. Please use the permalink option when copying a file link: this will include the commit hash in the URL and so the link will also work if the file was moved, branch renamed, contents changed, etc.
I guess I don't find: "just use a permalink" without a weighing the pros and cons.
I guess I weirdly find descriptive links valuable. I could easily do both, but I think just recording old links is probably an option in the long run. I just think people should be able to click on a URL and know where they're probably going.
This still doesn't explain what to do when DB entries are destroyed.
Permalinks don't have to be obfuscated. The downside I would rather mention is that the link doesn't update to newer revisions if improvements are made. It's not like an LTS version where fixes are still applied but no changes (I guess because every change is supposed to be an improvement to the topic, but yeah, not always).
Huh... yea, i haven't really thought about doing urls via argument. I guess i just much prefer a single path. You've really given me something to think about though.
If you genuinely want to make content unavailable, feel free too do that. The article isn’t really about that.
Wikis typically keep a history, so they can show a “this topic has been removed” page that still lets you access the history. Or if the topic was merely moved, trigger a redirect. A redirect is perfectly fine, it means that links using the old URL continue to work.
I mean, that's all fine and good, but how does one have a descriptive uri, while also having it unchanged?
Wikipedia has descriptive URI's, presumably they change over time. I know some folks would prefer never breakage to anything else, but it seems like having a permanent URI with a UUID and a descriptive URI that could change over time, if, say, a building changes it's name:
I feel like, yea, it's cool if your URI's never change, but as someone building a wiki, it's really always seemed a problem without a right answer. I'm honestly asking because the only way I know how to build my site with this feature is to just save the previous uri every time the descriptive once changes.
Redirects is fine. "Cool URIs don't change" refers to the fact that this resource should be reachable on that URI forever, which a redirect solves; changing URIs without keeping the old URI is what many people do and that's not that nice.
Ah, so what you have is a problem that's more special than you can briefly describe for readers here. :-)
That's interesting to me, as a topic.
Some don't mind having that kind of problem, because it opens a door through which they can gesture at a veritable treasury of details. Details are interesting and very important to them. Perhaps they are even somehow the master of those particular details, for having considered them! Ah ha, great.
Others hate it, because they feel like they are springing an unfair trap on innocent passers-by who suggest a simple solution to what seems like a simple problem.
I can't say I can solve your problem, and for one it seems different now than it did before, in significant ways.
But having walked into it, I can tell you I know this kind of situation! I hope you enjoy puzzling it out, whether through others' ideas or your own well-calibrated wiki-developer brain.
> Historical note: At the end of the 20th century when this was written, "cool" was an epithet of approval particularly among young, indicating trendiness, quality, or appropriateness. In the rush to stake our DNS territory involved the choice of domain name and URI path were sometimes directed more toward apparent "coolness" than toward usefulness or longevity. This note is an attempt to redirect the energy behind the quest for coolness.
If someone here is running any part of the Rust infra, please consider getting this redirect back.